Monday, January 11, 2010

Defining a Distributed Operating System

In this note, a definition for the notion of Distributed Operating System is presented. The definition is based on the concept of an autonomous process. Accordingly, we first review the notion of process, and define an autonomous process.

Consider a ubiquitous operating system, such as UNIX. What do we expect from an operating system for a particular computing device, such as a desktop or a handheld? One category of functions expected from an OS is to manage the resources (physical/virtual) of the device and provide a means of interacting with them. For instance, keyboard, screen, mouse, soundcard, video card, disk drives etc. We also expect an OS to manage virtual memory and provide memory-protection among several simultaneous applications. This category of OS abstractions is related to controlling and managing physical components comprising a computing device.

The most visible, perhaps the most useful function of an OS from a user’s perspective is to turn an application into an executing process, which happens when a user runs the application. The OS abstractions in this category are associated with managing processes/threads and providing means of inter-communication among them. Processes depend on OS functions in the first category for IO, memory and file access etc. However, this category (process/thread management) is distinct from device-management functions of the first category. In fact, an OS may not even support the notion of a process at all.

The X-Server for UNIX illustrates the distinction between the first and the second category of OS functions. An X application can run on a remote machine while user interacts with it using the keyboard, mouse and monitor attached to a local machine. A variation of this idea is the notion of file-server, which for some time seemed to be an example of a distributed OS.

Now consider a multi-processor computing device. The OS for such a device provides true support for a multi-processor system if whenever a process gains its time-slice it could run on a processor other than the one that put that process on the wait queue. A simple conclusion is that, a process in such an OS may execute on several processors during its lifetime, or that a process is independent of the processor on which it runs.

Preliminary Definition. In an OS for a multi-processor computing device, a process is autonomous if it can gain its time slice independent of the processor that powered it for a previous time slice.

Having defined the restricted notion of an autonomous process, we observe that a distributed OS must comprise of several computing devices (nodes), each of which equipped with its own OS. Thus, a distributed OS for a set of nodes is distinct from the operating systems managing each of its nodes. The expectation from a distributed OS is to manage a class of processes in a manner as if the nodes are CPUs in a multi-processor OS. That is, a process of the distributed OS may gain its time slice on a node other than the node on which it went on to the wait queue.

Definition.
Autonomous process. A process is autonomous with respect to a set of nodes if, for each time slice, it is capable of executing on any one of nodes.
Distributed OS. An operating system for a set of nodes is distributed if and only if it supports autonomous processes among those nodes.

An autonomous process is a representation of a form of application known as autonomous agent. Thus, our final definition for a distributed OS takes the following form.

Definition. An operating system is distributed if and only if it supports autonomous agents.

An autonomous process of a distributed OS cannot be the same as a process of any OS controlling the nodes in a network. This is because a process created by the OS of a computing device cannot move to other computing devices. In other words, autonomous processes are distinct from system processes and cannot be created via system calls of any of the nodes. This simply implies that a distributed OS must be self-contained with regard to the creation of processes (threads, and their intercommunication).

Having defined the notion of a distributed OS the natural question that arises is whether it is possible to construct such an operating system. The answer is yes. Indeed, Z47 is a self-contained distributed operating system (SC-DOS) according to the definition presented in this note.

Labels: , , ,