The evolution of Abstract Distributed Software Development
In this article the term software refers to applications
such as a word processor, a spreadsheet, email client/server, or software that
automates the operations of a corporation. In contrast, programs such as an
operating system are not considered applications, and are not within the scope
of this article.
There are at least three distinct categories of software
developers. The first group engages directly with programming a computing
device, for instance those who write operating systems and device drivers. The
second group uses the work of the first group to create tools such as compilers
and Integrated Development Environments (IDE). The main topic of this article
is the medium of development for the target of the second group, i.e. those who
finally develop application software.
The fundamental means for developing application software is
a programming language. This article is concerned with linguistic abstractions
required for Abstract Distributed Software Development. The language Z++ is a
materialization of this research.
An abstract language for developing distributed applications
plays the same role for software engineering as mathematics does for other
areas of engineering. However, the nature of an abstract language is not the
same as mathematics. Furthermore, an abstract language requires the existence
of a universal engine available on each and every computing device.
The point to keep in mind is that an application can be
specified independent of characteristics of any computing device. The
identifying point for an abstract language is then, whether it permits the
construction of a perceived (distributed) application oblivious of the
computing devices on which it will run.
In order to identify an abstract language, we need to
provide definitions for distributed applications and the notion of distributed
operating system. Furthermore, we need to review scattered attempts towards
achieving an abstract language. This will allow us to realize the most general
forms of successful linguistic abstractions.
History is extension of instinctive evolution. A person with
more historical knowledge in a domain gains better intuition in solving
problems, without walking the dead-end paths of the past efforts.
The rest of this article is divided into following parts.
Part I. Introduction
Part II. Software engineering principles
Part III. Computing abstractions
Part IV. Interoperability
Part V. Types
Part VI Autonomous agents
Part VII. Conclusion
Part I. Introduction
Very early it was recognized that a language for developing
application software should provide sufficient abstractions so software could
be entirely developed by expressing it in that language (without help from
underlying operating system via system calls). The path to this goal has been
very confusing with many dead-end forks. We begin with a brief review of a thin
slice of recent history of computing as related to programming languages.
Mainly for commercial reasons, the distinction between those
who program computing devices, and those who develop software for use on
computing devices was carried too far. Software developers were broken into
categories like, FORTRAN for scientists, COBOL for business developers and LISP
for artificial intelligence software.
Scientists, however, were engaged with finding ways to
overcome the issues of software correctness, or at least its reliability. Some
focused on desirable abstractions for a language with the goal of reducing
programming errors. The notions of structured programming, information hiding,
object-orientation, Eiffel’s invariants and generic programming of ADA are
among the successful ideas in this effort.
Paradigms
Initially the notion of paradigms, and confining a
programming language to functional or logic was proposed. This idea lasted for
a while and at times made headlines until it was given up. Confining a
programming language to any particular paradigm became impractical as the size
of software continued to grow. The new user interfaces, such as graphical, and
new communication protocols among computing devices necessitated the use of all
paradigms. In particular, the notion of assignment could not be disposed of the
same way that structured programming alleviates the use of goto.
Specification
languages
Some proposed the notion of a specification language. This
did not materialize because a specification language to be of any use, it will
need a compiler to generate code in some programming language, which obviously
implies that the specification language is just a programming language.
Specification languages later gave rise to the notion of pattern languages,
which were not understood by those who wrote about them or debated the notions
of such languages. This was a dead-end.
Correctness proof
The idea of correctness proof resulted in many publications.
The argument was that, there must be a particular form of logic so one can
automatically establish the correctness of software. Now, according to halting
problem this is unachievable. Later, some logical formulations of programming
statements were offered for manual proof of correctness. Disregarding the lack
of scientific value of proposed logical formulations, the enormous size of
software made it clear that testing in accordance to software engineering
procedures is more practical, even though testing cannot prove the lack of
existence of defects.
Indeed, software development is the act of putting something
together for a practical purpose, like a car or an airplane. Clearly, an idea like
jet engine must be established scientifically before attempting to make them.
However, mistakes made by a software engineer are not different in nature from
the inadequacy of strength of the screws attaching a jet engine to the wing.
Thus, main algorithms must be established mathematically and the overall
architecture of software must follow software engineering principles. However,
correctness proof of code implies somewhat the absurd idea of mapping an
implementation to an abstract algorithm. Is it possible to map the operations
of an assembly line to some mathematical model so one can be sure that all its
products are actually correct? What does correctness of a car mean?
Platforms
Platforms and toolkits are commercial attempts to encourage
developers to work on computing systems of a particular vendor. Familiarity
with the toolkit for a platform and all its macros and library calls are not
transferable to other platforms. Basically, platforms and certification reduce
a software engineer to a programmer for a particular technology vendor, and
should not be confused with research in software engineering. However, the term
platform as used in SMALLTALK was, proposed by scientists.
Interoperability
The idea of making several languages interoperate gave rise
to some mechanisms like COM and CORBA. The confusion of this period and
commercial attempts also introduced XML as a medium for interoperation among
programming languages, as well as, standards like SOAP. We will discuss
interoperability in Part IV.
This long and confusing trek in the uncharted territory of
software engineering finally resulted in the view that we should focus on
linguistic abstractions, with the acceptance of all paradigms. We should be
able to express our solutions entirely within an abstract language that can be
properly studied and learned by those who we refer to as software engineers. In
other words, scientists knew that software engineering needs an abstract
language just as other areas of engineering need mathematics for expressing
their solutions. Now, we have the experience of avoiding most dead-ends and
focusing on improving the successful attempts of the past.
At highest level we can divide linguistic abstraction into
three categories of software engineering principles (Part II), computing
notions (Part III) and interoperation with established technologies (database)
and libraries written in other major languages (Part IV). In Part V we take a
brief look at the notion of type, and in Part VI we discuss Autonomous Agents.
Part II. Software
engineering principles
Abstractions for software engineering principles are general
enough that should be possible to implement in any language intended for
developing application software. A few none-trivial examples follow.
Principles of
information hiding and localization
At micro level, object orientation is the mechanism for the
materialization of the principles of information hiding and localization. The
set of public methods (interface) of a class localizes interactions with instances
of that class thereby making it easier to trace defects. Furthermore, class
hides its internal implementation of services it provides through its
interface. Thus, an unusual behavior from an instance of a class is more likely
to be the result of a defect in the implementation of the class rather than its
usage.
At macro level the principles of information hiding and
localization are mechanized via the notion of component orientation. The view
of a component in this article is that a component is a complete standalone
program. Thus, every program is also a component. A component can be loaded by
any other component during execution. The program that loads a component can
only make calls to the set of entry points of the component. That is, the role
of the set of entry points of a component is analogous to the interface of a
class.
Object orientation is an effective mechanism for reliable
implementation of a single program. However, large and complex software must be
broken down into components, each of which covering one aspect of the overall
software. A trivial example is a department store with several departments.
Each department needs a complete program for its operations, while all these
departments may also need to interact with one another, as well as, synchronize
their operations with other parts of the software. Loose implementation of
components with languages that do not support component orientation is as
disastrous as trying to use C in an object-oriented fashion rather than using
C++, instead. Of course C++ is not a component-oriented language.
Components are fundamental means in developing distributed
software. A program can load a component on a remote node and interact with it
just as it would, had the component been loaded on the local node. Furthermore,
component as a linguistic abstraction is the mechanism for software engineering
notion of layering. A layer is a component that covers a collection of services
while hiding the details of implementation of such services. Layering is an architectural
means for breaking down and managing complexity, as well as, facilitating
reusability.
Components also provide the means for (geographically)
scattered implementation of large software that require various areas of
expertise. Each group specializing in the implementation of certain aspects of
software could develop their components in a location better equipped for their
testing needs.
Invariants and
constraints
Invariants of a class ensure that its instances remain in a
desirable state during execution. Invariants are conditions tested at end of
public methods and their violation can be made to either raise an exception or
trigger (invoke) a private method of the class.
Constraints for a public method are conditions that are
tested before the execution of the method begins. Their violation, like
invariants, can be made to raise an exception or trigger a private method.
Among other things, constraints can be used to ensure that values passed as
arguments to the call are within acceptable range.
The abstractions of invariants and constraints are just as
valuable as the class itself. Their consistent use greatly reduces the
occurrence of the so-called glitches, and other hard-to-fix unexpected events.
Exception and
resumption
The purpose of raising an exception is to inform upper
layers (for instance, as in a nested sequence of calls) failure in responding
to a request. While some exceptions are terminal, many exceptions in a program
are not. Otherwise the exception mechanism would be of no value at all. There
is no purpose in raising exceptions when at every occurrence of an exception
all we can do is to terminate the program. Therefore, an exception mechanism
without resumption is utterly absurd.
Resumption can take one of two forms. Sometimes, we can
ignore the failure at that point and simply continue with the rest of the
program. That means, we should be able to continue the execution of the program
from the line right after the one that caused the exception.
There are many situations where it is possible to anticipate
possible causes of exceptions and to repair the problem. Exception mechanism is
most useful for handling these situations. After repairing the problem, we must
be able to resume from the line that caused the exception and repeat its
execution.
Software packaging
The packaging mechanism of software facilitates the use of
libraries. The idea goes back to the design of ADA, though the term namespace
is now more popular than package. The most primitive design of a namespace is to
provide a mechanism for opening it, which is all that C++ supports. ADA
packages include an export section, and hide the internals of a package from
its users. The information hiding mechanism of a C++ class is unrelated to the
internals of a namespace.
A namespace should be designed similar to class, with
private and public sections. Users of a namespace can only access the public
section of a namespace. Given the purpose of a namespace, it should be rather
clear that namespace should support multiple-inheritance. The inheritance
mechanism is indispensable in building more specialized libraries from existing
ones. Without the ability to derive privately from other namespaces the
specialized namespaces will include all the namespaces used in their packaging.
Part III. Computing
abstractions
The goal of linguistic abstractions of computing notions is
to eliminate the use of system calls and low-level libraries associated with
computing platforms (i.e. the computing environment of a particular technology vendor).
Among these notions are, threads, processes, inter-process communications and
signaling.
The goal we are seeking is a language for developing
applications so the entire solution can be expressed within the language
without resort to services (system calls) of the underlying operating system or
specialized libraries that use such services. The domain of applications is not
confined in any way, which implies that the abstract language we are seeking
will be monotonic. That is, the language will grow in an orthogonal manner so
that more sophisticated applications can be developed, while previously
developed applications will continue to run without any change.
The abstract language will monotonically grow in all three
aspects. New software engineering principles may give rise to new abstractions
like object-orientation and component-orientation. Operating systems will
continue to provide new kinds of services and computing notions, like threading
and signaling. In the future, there may be more established technologies like
database that the abstract language will have to interact with. This indicates
that we should not confine the abstract language to any particular paradigm
such as declarative or functional.
Definition of
distributed computing
The term distributed computing is too general to define for
all applicable domains. Our goal is an abstract language for developing
applications. So, our definition of distributed computing will be in terms of
the notion of distributed applications. We consider distributed application a
primitive notion and list its identifying characteristics rather than formally
defining it.
An application is distributed when its execution involves
simultaneous use of a set of nodes. The nodes could be homogeneous, or more often
than not, heterogeneous. Each component of the application running on a node is
a process of the operating system controlling that node. The components execute
as one single application via remote communication and synchronization. During
execution, the number of components of a distributed application, and the
number of nodes involved are variable. As things are, each component is
developed for its target node using system calls and standard libraries like
the socket library.
The distributed
operating system Z47
For clarity, we shall refer to an operating system in
control of a computing device as the native operating system. A native
operating system is responsible for managing (hardware) resources, virtual
memory, paging etc. Thus, aside from the idea being absurd, a native operating
system cannot be turned into a distributed operating system by any stretch of
imagination.
An application (program) becomes a (native) process of the
native operating system and acquires the resources it needs before its
execution begins. If an application written in the abstract language behaves
like a native processes during its execution, one cannot tell the difference
between a native process, and a process corresponding to an abstract
application. However, for this illusion we need an operating system because a
process must be created and managed by an operating system.
We shall refer to a distributed operating system as Z47.
Technically, Z47 is a distributed (system) application as we defined earlier.
On each node it begins its execution as a native process.
Z47 creates its own processes, which are not native
processes. Z47 manages its processes, threads and their inter-communications
and signaling. The execution of a Z47 process is independent of the node on
which it is initiated. For instance, a Z47 process can go to the waiting queue
on one node, but resume its execution on another.
Z47 also manages the resources allocated to its processes.
However, all such resources are obtained from the native operating system. Z47 as a native process obtains the resources needed for the execution of its processes. Z47 also dispatches the hardware processor among its processes and
threads. Thus, Z47 on each node provides the illusion of a true operating
system for abstract applications.
Definition. Z47 is a distributed system application whose
components on each node behave like an operating system.
The abstract language
Z++
With Z47 at our disposal, we can create linguistic
abstractions for threading, process creation, communication and signaling. We
shall refer to the abstract language we are seeking as Z++.
The components of a Z++ application may reside on any number
of physical nodes as processes of Z47. However, the linguistic abstractions for
communication in Z++ remain the same regardless of the physical locations of
execution of components. In other words, a Z++ program is developed for
execution on Z47 as a single application. But the components could execute as
processes of Z47 on any number of heterogeneous nodes, which is how we defined
a distributed application.
The conclusion from the above paragraph is that we develop a
distributed application entirely within the language Z++. Therefore,Z++ is the
abstract language we were seeking for developing distributed applications. Z47 manages the communication and synchronization among the components of a Z++ application, which are processes of Z47.
It should also be clear that Z++ plain method invocation
covers the notion of Remote Procedure Call (RPC), or Remote Method Invocation
(RMI). Actually, Z++ extends this notion for solving some difficult categories
of problems such as Web Services.
The tell/hear distributed signaling of Z++ yields an
effective model of computation, namely asynchronous (remote) function call. A
tell signal carries along data the same way a function call instantiates its
formal parameters. However, the teller does not wait after the hearer of the
signal informs it of the acceptance of the call. Instead, the hearer becomes
teller at a later time and sends a tell signal along with return data.
Communicating
Concurrent Processes
The abstract language Z++ exploits Z47 capabilities via its
linguistic abstractions for solving complex problems. In particular, the
tell/hear distributed signaling of Z++, among other things, yields a simple
pattern of communication for processes scattered on heterogeneous nodes.
The general perception of the model known as, Communicating
Sequential Processes (CSP) is that process X makes a synchronous function call
and blocks until the point of rendezvous when process Y returns the result of
the call. The concurrent model (CCP) uses asynchronous function call without
blocking the caller. First, process X makes a call to Y using tell signaling,
and continues with whatever it is doing. At a later time, at the point of
rendezvous, process Y returns the result of the call to X.
We use CCS in referring to the model of Z++ in order to
emphasize the lack of blocking and waiting for the point of rendezvous, as it
is generally assumed for the CSP model.
Part IV.
Interoperability
Interoperability with major system languages like C, C++ and
ADA is necessary, especially for real-time situations. At times, direct
interaction with the computing device is unavoidable and this calls for a
language like C.
Interoperability with lasting technologies like relational
databases and browsers is an integral part of software development. While
browsers are the medium for presentations, databases are the means of
well-organized persistent data storage. Many categories of distributed
applications present themselves via a browser while using a database server for
managing their data.
Interoperability by means of startup scripts and programming
tricks is a costly nuisance. Instead, a well-defined interface under the
control of the compiler can eliminate most of the obscure errors. Furthermore,
a properly designed interface can automate much of the tedious work while
eliminating the need for error-prone programming tricks.
Z++ interoperates with other languages through linkage with
their dynamic libraries. A dynamic library provides a set of entry points
usually called exported functions. The exported functions of a dynamic library
become methods of a Z++ class. The compiler generates the bodies of these
methods in the background.
Z++ SQL statements are extended object-oriented forms of
their equivalent SQL Data Manipulation Language. Z++ SQL statements allow
intermixing programming objects and database entities. Errors are reported by
raising exceptions.
As for browser, the PHP interface to Z++ is intuitive.
Basically, PHP loads a Z++ component and exchanges data with it by invoking its
entry points. So, while the browser provides the graphical user interface, Z++ does all the computations, including communication with database servers.
Part V. Types
Type is a (linguistic) mechanism for specifying the
characteristics of objects. In other words, type provides a definition for
interacting entities of a process (a program in execution). An object is,
therefore, an instance of a type during execution.
When developing software, we attempt to describe the
entities of the target domain via types. In doing so, we abstract away the
relevant characteristics of domain entities so we can map those entities to
types. We also use types to describe objects that are purely of computational
nature, without any representation in the domain we are attempting to automate.
The latter is what separates an algorithm from its implementation as related to
correctness proof.
The complexity of entities we need to map to types compels
us to finding new mechanisms for defining types. Class is only one such
mechanism. The Z++ abstract language introduces a few more, such as task,
collection and component, as well as, extending some known types like
enumeration.
Abstract data types
and templates
The notion of abstract data type goes back to Knut’s
treatment of stack and queue. A generalization of this idea is Parna’s
information hiding, and localization as we discussed earlier. The type
definition mechanisms class, task and component are linguistic abstractions for
realization of these ideas.
Information hiding prevents the definition of other types
from reliance on the implementation of an abstract data type. Localization confines
interactions of other objects with an instance of an abstract data type so the
instance can maintain its intended state via invariants and method constraints.
A language with loopholes allowing violations of these principles contradicts
its own compiler, the enforcer of these principles.
A container is an abstract data type for managing objects of
some type in accordance to the interface (set of methods) of the container.
Examples of containers are stack, queue and the symbol table of a compiler.
Template is a linguistic mechanism for defining a container
without specifying the type of objects it is going to manage. After the
definition of a container is submitted to the compiler, at a later time one can
request the construction of the container (instantiation) by specifying the
type of objects that will be handled by the container.
The template mechanism can be used for defining reusable
abstract data types other than containers. However, instantiating a template
with a literal such 5 or a reference to another object makes little sense, if
any. That turns the template mechanism into a macro expansion during
preprocessing, giving rise to bizarre programming techniques such as recursive
template expansions. Like many other cases in C++, the development of template
mechanism was left incomplete by not enforcing types as the only means of
instantiation. In Z++ templates can only be instantiated with types, which
further allows the specification of the category of types that may be used for
instantiation.
Class and task templates can be derived from one another, as
usual. Invariants and constraints work for templates as well. Proper use of
templates is quite effective in reducing errors. A well-tested template
definition can be reused over and over again without any modification.
Part VI Autonomous
agents
An autonomous agent is a process (a program in execution)
with the ability to transport itself from one node to another while retaining
its state as a process.
Definition. A distributed operating system is autonomous
only when it supports autonomous agents.
Z47 is an autonomous distributed operating system. The
travel statement of Z++ is a linguistic abstraction for transporting a process
from its home node to a remote node.
Earlier we introduced the notion of Communicating Concurrent
Processes. Z47 provides the means for solutions using the notion of Traveling
Communicating Processes. Consider a group of autonomous agents cooperating in
accomplishing a task. Each agent can inform all others that it is about to take
off. Once it reaches the intended destination, the agent will transmit its
coordinates to all other agents. That way, every agent will be aware of
whereabouts of all others.
Presently, it is not clear whether autonomous agents can
solve problems that cannot be implemented without their use. For instance, we
know that without the ability of nodes to exchange data (communicate) we cannot
solve the category of problems that require the use of client-server model.
Nonetheless, the availability of the autonomous agent technology could inspire
solutions to problems that yet have to be conceived.
Part VII. Conclusion
We have introduced the distributed operating system Z47. Z47 manages its own threads, processes, inter-process communications, signaling and
exceptions without making any system calls to the underlying (native) operating
system. System calls are only used for input/output operations (including
graphics and other devices). Thus, Z47 is essentially self-contained and easily
available on any platform.
We also introduced Z++, the abstract language for developing
distributed applications. Z++ is component-oriented, and offers a simple
linguistic abstraction for the development of true Autonomous Agents. The
signaling mechanisms of Z++ provide effective means of communication among
distributed components.
Z++ is a coherent medium, comprising of all successful
linguistic computing mechanisms along with enforcement of software engineering
principles, for developing applications. Since Z++ rests on the distributed operating system Z47, the language is monotonic, capable of absorbing future
advancements in computing technology and software engineering. That is, the
absorption of new advancements will be orthogonal to the rest of the language.
Z++ is the ideal medium for developing large software,
especially when teams need to work at geographically apart locations.
Furthermore, Z++ compiler can link with dynamic libraries of other languages as components. Thus, should it be necessary to use another language like C or C++
for a specific purpose, simply turning the program into a dynamic library makes
it available as a Z++ component.
Z++ contains object-oriented SQL statements for direct
simultaneous interaction with multiple relational databases, under the
enforcement of the compiler (extensive compile time checks). Z++ interface for interoperation with PHP enables engineers to use a browser for presentation,
and databases for data storage. The actual computations can all be performed by Z++ components instead of the code imbedded in browsers. This hides all
essential computations, as well as facilitating modifications to the
computations without any change to the browser.
To the memory of Debo. Rest in peace in my memories my
friend.
Fall of 2013.
Dr. Z.
0 Comments:
Post a Comment
<< Back to Blogger Start Page >>