Monday, January 27, 2014

The evolution of Abstract Distributed Software Development

In this article the term software refers to applications such as a word processor, a spreadsheet, email client/server, or software that automates the operations of a corporation. In contrast, programs such as an operating system are not considered applications, and are not within the scope of this article.

There are at least three distinct categories of software developers. The first group engages directly with programming a computing device, for instance those who write operating systems and device drivers. The second group uses the work of the first group to create tools such as compilers and Integrated Development Environments (IDE). The main topic of this article is the medium of development for the target of the second group, i.e. those who finally develop application software.

The fundamental means for developing application software is a programming language. This article is concerned with linguistic abstractions required for Abstract Distributed Software Development. The language  Z++ is a materialization of this research.

An abstract language for developing distributed applications plays the same role for software engineering as mathematics does for other areas of engineering. However, the nature of an abstract language is not the same as mathematics. Furthermore, an abstract language requires the existence of a universal engine available on each and every computing device.

The point to keep in mind is that an application can be specified independent of characteristics of any computing device. The identifying point for an abstract language is then, whether it permits the construction of a perceived (distributed) application oblivious of the computing devices on which it will run.

In order to identify an abstract language, we need to provide definitions for distributed applications and the notion of distributed operating system. Furthermore, we need to review scattered attempts towards achieving an abstract language. This will allow us to realize the most general forms of successful linguistic abstractions.

History is extension of instinctive evolution. A person with more historical knowledge in a domain gains better intuition in solving problems, without walking the dead-end paths of the past efforts.

The rest of this article is divided into following parts.

Part I. Introduction
Part II. Software engineering principles
Part III. Computing abstractions
Part IV. Interoperability
Part V. Types
Part VI Autonomous agents
Part VII. Conclusion

Part I. Introduction

Very early it was recognized that a language for developing application software should provide sufficient abstractions so software could be entirely developed by expressing it in that language (without help from underlying operating system via system calls). The path to this goal has been very confusing with many dead-end forks. We begin with a brief review of a thin slice of recent history of computing as related to programming languages.

Mainly for commercial reasons, the distinction between those who program computing devices, and those who develop software for use on computing devices was carried too far. Software developers were broken into categories like, FORTRAN for scientists, COBOL for business developers and LISP for artificial intelligence software.

Scientists, however, were engaged with finding ways to overcome the issues of software correctness, or at least its reliability. Some focused on desirable abstractions for a language with the goal of reducing programming errors. The notions of structured programming, information hiding, object-orientation, Eiffel’s invariants and generic programming of ADA are among the successful ideas in this effort.

Paradigms

Initially the notion of paradigms, and confining a programming language to functional or logic was proposed. This idea lasted for a while and at times made headlines until it was given up. Confining a programming language to any particular paradigm became impractical as the size of software continued to grow. The new user interfaces, such as graphical, and new communication protocols among computing devices necessitated the use of all paradigms. In particular, the notion of assignment could not be disposed of the same way that structured programming alleviates the use of goto.

Specification languages

Some proposed the notion of a specification language. This did not materialize because a specification language to be of any use, it will need a compiler to generate code in some programming language, which obviously implies that the specification language is just a programming language. Specification languages later gave rise to the notion of pattern languages, which were not understood by those who wrote about them or debated the notions of such languages. This was a dead-end.

Correctness proof

The idea of correctness proof resulted in many publications. The argument was that, there must be a particular form of logic so one can automatically establish the correctness of software. Now, according to halting problem this is unachievable. Later, some logical formulations of programming statements were offered for manual proof of correctness. Disregarding the lack of scientific value of proposed logical formulations, the enormous size of software made it clear that testing in accordance to software engineering procedures is more practical, even though testing cannot prove the lack of existence of defects.

Indeed, software development is the act of putting something together for a practical purpose, like a car or an airplane. Clearly, an idea like jet engine must be established scientifically before attempting to make them. However, mistakes made by a software engineer are not different in nature from the inadequacy of strength of the screws attaching a jet engine to the wing. Thus, main algorithms must be established mathematically and the overall architecture of software must follow software engineering principles. However, correctness proof of code implies somewhat the absurd idea of mapping an implementation to an abstract algorithm. Is it possible to map the operations of an assembly line to some mathematical model so one can be sure that all its products are actually correct? What does correctness of a car mean?

Platforms

Platforms and toolkits are commercial attempts to encourage developers to work on computing systems of a particular vendor. Familiarity with the toolkit for a platform and all its macros and library calls are not transferable to other platforms. Basically, platforms and certification reduce a software engineer to a programmer for a particular technology vendor, and should not be confused with research in software engineering. However, the term platform as used in SMALLTALK was, proposed by scientists.

Interoperability

The idea of making several languages interoperate gave rise to some mechanisms like COM and CORBA. The confusion of this period and commercial attempts also introduced XML as a medium for interoperation among programming languages, as well as, standards like SOAP. We will discuss interoperability in Part IV.

This long and confusing trek in the uncharted territory of software engineering finally resulted in the view that we should focus on linguistic abstractions, with the acceptance of all paradigms. We should be able to express our solutions entirely within an abstract language that can be properly studied and learned by those who we refer to as software engineers. In other words, scientists knew that software engineering needs an abstract language just as other areas of engineering need mathematics for expressing their solutions. Now, we have the experience of avoiding most dead-ends and focusing on improving the successful attempts of the past.

At highest level we can divide linguistic abstraction into three categories of software engineering principles (Part II), computing notions (Part III) and interoperation with established technologies (database) and libraries written in other major languages (Part IV). In Part V we take a brief look at the notion of type, and in Part VI we discuss Autonomous Agents.

Part II. Software engineering principles

Abstractions for software engineering principles are general enough that should be possible to implement in any language intended for developing application software. A few none-trivial examples follow.

Principles of information hiding and localization

At micro level, object orientation is the mechanism for the materialization of the principles of information hiding and localization. The set of public methods (interface) of a class localizes interactions with instances of that class thereby making it easier to trace defects. Furthermore, class hides its internal implementation of services it provides through its interface. Thus, an unusual behavior from an instance of a class is more likely to be the result of a defect in the implementation of the class rather than its usage.

At macro level the principles of information hiding and localization are mechanized via the notion of component orientation. The view of a component in this article is that a component is a complete standalone program. Thus, every program is also a component. A component can be loaded by any other component during execution. The program that loads a component can only make calls to the set of entry points of the component. That is, the role of the set of entry points of a component is analogous to the interface of a class.

Object orientation is an effective mechanism for reliable implementation of a single program. However, large and complex software must be broken down into components, each of which covering one aspect of the overall software. A trivial example is a department store with several departments. Each department needs a complete program for its operations, while all these departments may also need to interact with one another, as well as, synchronize their operations with other parts of the software. Loose implementation of components with languages that do not support component orientation is as disastrous as trying to use C in an object-oriented fashion rather than using C++, instead. Of course C++ is not a component-oriented language.

Components are fundamental means in developing distributed software. A program can load a component on a remote node and interact with it just as it would, had the component been loaded on the local node. Furthermore, component as a linguistic abstraction is the mechanism for software engineering notion of layering. A layer is a component that covers a collection of services while hiding the details of implementation of such services. Layering is an architectural means for breaking down and managing complexity, as well as, facilitating reusability.

Components also provide the means for (geographically) scattered implementation of large software that require various areas of expertise. Each group specializing in the implementation of certain aspects of software could develop their components in a location better equipped for their testing needs.

Invariants and constraints

Invariants of a class ensure that its instances remain in a desirable state during execution. Invariants are conditions tested at end of public methods and their violation can be made to either raise an exception or trigger (invoke) a private method of the class.

Constraints for a public method are conditions that are tested before the execution of the method begins. Their violation, like invariants, can be made to raise an exception or trigger a private method. Among other things, constraints can be used to ensure that values passed as arguments to the call are within acceptable range.

The abstractions of invariants and constraints are just as valuable as the class itself. Their consistent use greatly reduces the occurrence of the so-called glitches, and other hard-to-fix unexpected events.

Exception and resumption

The purpose of raising an exception is to inform upper layers (for instance, as in a nested sequence of calls) failure in responding to a request. While some exceptions are terminal, many exceptions in a program are not. Otherwise the exception mechanism would be of no value at all. There is no purpose in raising exceptions when at every occurrence of an exception all we can do is to terminate the program. Therefore, an exception mechanism without resumption is utterly absurd.

Resumption can take one of two forms. Sometimes, we can ignore the failure at that point and simply continue with the rest of the program. That means, we should be able to continue the execution of the program from the line right after the one that caused the exception.

There are many situations where it is possible to anticipate possible causes of exceptions and to repair the problem. Exception mechanism is most useful for handling these situations. After repairing the problem, we must be able to resume from the line that caused the exception and repeat its execution.

Software packaging

The packaging mechanism of software facilitates the use of libraries. The idea goes back to the design of ADA, though the term namespace is now more popular than package. The most primitive design of a namespace is to provide a mechanism for opening it, which is all that C++ supports. ADA packages include an export section, and hide the internals of a package from its users. The information hiding mechanism of a C++ class is unrelated to the internals of a namespace.

A namespace should be designed similar to class, with private and public sections. Users of a namespace can only access the public section of a namespace. Given the purpose of a namespace, it should be rather clear that namespace should support multiple-inheritance. The inheritance mechanism is indispensable in building more specialized libraries from existing ones. Without the ability to derive privately from other namespaces the specialized namespaces will include all the namespaces used in their packaging.

Part III. Computing abstractions

The goal of linguistic abstractions of computing notions is to eliminate the use of system calls and low-level libraries associated with computing platforms (i.e. the computing environment of a particular technology vendor). Among these notions are, threads, processes, inter-process communications and signaling.

The goal we are seeking is a language for developing applications so the entire solution can be expressed within the language without resort to services (system calls) of the underlying operating system or specialized libraries that use such services. The domain of applications is not confined in any way, which implies that the abstract language we are seeking will be monotonic. That is, the language will grow in an orthogonal manner so that more sophisticated applications can be developed, while previously developed applications will continue to run without any change.

The abstract language will monotonically grow in all three aspects. New software engineering principles may give rise to new abstractions like object-orientation and component-orientation. Operating systems will continue to provide new kinds of services and computing notions, like threading and signaling. In the future, there may be more established technologies like database that the abstract language will have to interact with. This indicates that we should not confine the abstract language to any particular paradigm such as declarative or functional.

Definition of distributed computing

The term distributed computing is too general to define for all applicable domains. Our goal is an abstract language for developing applications. So, our definition of distributed computing will be in terms of the notion of distributed applications. We consider distributed application a primitive notion and list its identifying characteristics rather than formally defining it.

An application is distributed when its execution involves simultaneous use of a set of nodes. The nodes could be homogeneous, or more often than not, heterogeneous. Each component of the application running on a node is a process of the operating system controlling that node. The components execute as one single application via remote communication and synchronization. During execution, the number of components of a distributed application, and the number of nodes involved are variable. As things are, each component is developed for its target node using system calls and standard libraries like the socket library.

The distributed operating system Z47

For clarity, we shall refer to an operating system in control of a computing device as the native operating system. A native operating system is responsible for managing (hardware) resources, virtual memory, paging etc. Thus, aside from the idea being absurd, a native operating system cannot be turned into a distributed operating system by any stretch of imagination.

An application (program) becomes a (native) process of the native operating system and acquires the resources it needs before its execution begins. If an application written in the abstract language behaves like a native processes during its execution, one cannot tell the difference between a native process, and a process corresponding to an abstract application. However, for this illusion we need an operating system because a process must be created and managed by an operating system.

We shall refer to a distributed operating system as Z47. Technically, Z47 is a distributed (system) application as we defined earlier. On each node it begins its execution as a native process.

Z47 creates its own processes, which are not native processes. Z47 manages its processes, threads and their inter-communications and signaling. The execution of a Z47 process is independent of the node on which it is initiated. For instance, a Z47 process can go to the waiting queue on one node, but resume its execution on another.

Z47 also manages the resources allocated to its processes. However, all such resources are obtained from the native operating system. Z47 as a native process obtains the resources needed for the execution of its processes. Z47 also dispatches the hardware processor among its processes and threads. Thus, Z47 on each node provides the illusion of a true operating system for abstract applications.

DefinitionZ47 is a distributed system application whose components on each node behave like an operating system.

The abstract language Z++

With Z47 at our disposal, we can create linguistic abstractions for threading, process creation, communication and signaling. We shall refer to the abstract language we are seeking as Z++.

The components of a Z++ application may reside on any number of physical nodes as processes of Z47. However, the linguistic abstractions for communication in Z++ remain the same regardless of the physical locations of execution of components. In other words, a Z++ program is developed for execution on Z47 as a single application. But the components could execute as processes of Z47 on any number of heterogeneous nodes, which is how we defined a distributed application.

The conclusion from the above paragraph is that we develop a distributed application entirely within the language Z++. Therefore,Z++ is the abstract language we were seeking for developing distributed applications. Z47 manages the communication and synchronization among the components of a Z++ application, which are processes of Z47.

It should also be clear that Z++ plain method invocation covers the notion of Remote Procedure Call (RPC), or Remote Method Invocation (RMI). Actually, Z++ extends this notion for solving some difficult categories of problems such as Web Services.

The tell/hear distributed signaling of Z++ yields an effective model of computation, namely asynchronous (remote) function call. A tell signal carries along data the same way a function call instantiates its formal parameters. However, the teller does not wait after the hearer of the signal informs it of the acceptance of the call. Instead, the hearer becomes teller at a later time and sends a tell signal along with return data.

Communicating Concurrent Processes

The abstract language Z++ exploits Z47 capabilities via its linguistic abstractions for solving complex problems. In particular, the tell/hear distributed signaling of Z++, among other things, yields a simple pattern of communication for processes scattered on heterogeneous nodes.

The general perception of the model known as, Communicating Sequential Processes (CSP) is that process X makes a synchronous function call and blocks until the point of rendezvous when process Y returns the result of the call. The concurrent model (CCP) uses asynchronous function call without blocking the caller. First, process X makes a call to Y using tell signaling, and continues with whatever it is doing. At a later time, at the point of rendezvous, process Y returns the result of the call to X.

We use CCS in referring to the model of Z++ in order to emphasize the lack of blocking and waiting for the point of rendezvous, as it is generally assumed for the CSP model.

Part IV. Interoperability

Interoperability with major system languages like C, C++ and ADA is necessary, especially for real-time situations. At times, direct interaction with the computing device is unavoidable and this calls for a language like C.

Interoperability with lasting technologies like relational databases and browsers is an integral part of software development. While browsers are the medium for presentations, databases are the means of well-organized persistent data storage. Many categories of distributed applications present themselves via a browser while using a database server for managing their data.

Interoperability by means of startup scripts and programming tricks is a costly nuisance. Instead, a well-defined interface under the control of the compiler can eliminate most of the obscure errors. Furthermore, a properly designed interface can automate much of the tedious work while eliminating the need for error-prone programming tricks.

Z++ interoperates with other languages through linkage with their dynamic libraries. A dynamic library provides a set of entry points usually called exported functions. The exported functions of a dynamic library become methods of a Z++ class. The compiler generates the bodies of these methods in the background.

Z++ SQL statements are extended object-oriented forms of their equivalent SQL Data Manipulation Language. Z++ SQL statements allow intermixing programming objects and database entities. Errors are reported by raising exceptions.

As for browser, the PHP interface to Z++ is intuitive. Basically, PHP loads a Z++ component and exchanges data with it by invoking its entry points. So, while the browser provides the graphical user interface, Z++ does all the computations, including communication with database servers.

Part V. Types

Type is a (linguistic) mechanism for specifying the characteristics of objects. In other words, type provides a definition for interacting entities of a process (a program in execution). An object is, therefore, an instance of a type during execution.

When developing software, we attempt to describe the entities of the target domain via types. In doing so, we abstract away the relevant characteristics of domain entities so we can map those entities to types. We also use types to describe objects that are purely of computational nature, without any representation in the domain we are attempting to automate. The latter is what separates an algorithm from its implementation as related to correctness proof.

The complexity of entities we need to map to types compels us to finding new mechanisms for defining types. Class is only one such mechanism. The Z++ abstract language introduces a few more, such as task, collection and component, as well as, extending some known types like enumeration.

Abstract data types and templates

The notion of abstract data type goes back to Knut’s treatment of stack and queue. A generalization of this idea is Parna’s information hiding, and localization as we discussed earlier. The type definition mechanisms class, task and component are linguistic abstractions for realization of these ideas.

Information hiding prevents the definition of other types from reliance on the implementation of an abstract data type. Localization confines interactions of other objects with an instance of an abstract data type so the instance can maintain its intended state via invariants and method constraints. A language with loopholes allowing violations of these principles contradicts its own compiler, the enforcer of these principles.

A container is an abstract data type for managing objects of some type in accordance to the interface (set of methods) of the container. Examples of containers are stack, queue and the symbol table of a compiler.

Template is a linguistic mechanism for defining a container without specifying the type of objects it is going to manage. After the definition of a container is submitted to the compiler, at a later time one can request the construction of the container (instantiation) by specifying the type of objects that will be handled by the container.

The template mechanism can be used for defining reusable abstract data types other than containers. However, instantiating a template with a literal such 5 or a reference to another object makes little sense, if any. That turns the template mechanism into a macro expansion during preprocessing, giving rise to bizarre programming techniques such as recursive template expansions. Like many other cases in C++, the development of template mechanism was left incomplete by not enforcing types as the only means of instantiation. In Z++ templates can only be instantiated with types, which further allows the specification of the category of types that may be used for instantiation.

Class and task templates can be derived from one another, as usual. Invariants and constraints work for templates as well. Proper use of templates is quite effective in reducing errors. A well-tested template definition can be reused over and over again without any modification.

Part VI Autonomous agents

An autonomous agent is a process (a program in execution) with the ability to transport itself from one node to another while retaining its state as a process.

Definition. A distributed operating system is autonomous only when it supports autonomous agents.

Z47 is an autonomous distributed operating system. The travel statement of Z++ is a linguistic abstraction for transporting a process from its home node to a remote node.

Earlier we introduced the notion of Communicating Concurrent Processes. Z47 provides the means for solutions using the notion of Traveling Communicating Processes. Consider a group of autonomous agents cooperating in accomplishing a task. Each agent can inform all others that it is about to take off. Once it reaches the intended destination, the agent will transmit its coordinates to all other agents. That way, every agent will be aware of whereabouts of all others.

Presently, it is not clear whether autonomous agents can solve problems that cannot be implemented without their use. For instance, we know that without the ability of nodes to exchange data (communicate) we cannot solve the category of problems that require the use of client-server model. Nonetheless, the availability of the autonomous agent technology could inspire solutions to problems that yet have to be conceived.

Part VII. Conclusion

We have introduced the distributed operating system Z47Z47 manages its own threads, processes, inter-process communications, signaling and exceptions without making any system calls to the underlying (native) operating system. System calls are only used for input/output operations (including graphics and other devices). Thus, Z47 is essentially self-contained and easily available on any platform.

We also introduced Z++, the abstract language for developing distributed applications. Z++ is component-oriented, and offers a simple linguistic abstraction for the development of true Autonomous Agents. The signaling mechanisms of Z++ provide effective means of communication among distributed components.

Z++ is a coherent medium, comprising of all successful linguistic computing mechanisms along with enforcement of software engineering principles, for developing applications. Since Z++ rests on the distributed operating system Z47, the language is monotonic, capable of absorbing future advancements in computing technology and software engineering. That is, the absorption of new advancements will be orthogonal to the rest of the language.

Z++ is the ideal medium for developing large software, especially when teams need to work at geographically apart locations. Furthermore, Z++ compiler can link with dynamic libraries of other languages as components. Thus, should it be necessary to use another language like C or C++ for a specific purpose, simply turning the program into a dynamic library makes it available as a Z++ component.

Z++ contains object-oriented SQL statements for direct simultaneous interaction with multiple relational databases, under the enforcement of the compiler (extensive compile time checks). Z++ interface for interoperation with PHP enables engineers to use a browser for presentation, and databases for data storage. The actual computations can all be performed by Z++ components instead of the code imbedded in browsers. This hides all essential computations, as well as facilitating modifications to the computations without any change to the browser.

To the memory of Debo. Rest in peace in my memories my friend.
Fall of 2013.
Dr. Z.