ECS Distributed Object Middleware and Alternative Technology Assessment

Progress Report - April 1999

by David McNab, Rod Fatoohi & Tom Lasinski

POC: mcnab@nas.nasa.gov

I. Lessons Learned

Summary observations on ECS:

1/ The current ECS middleware is problematic for two reasons: firstly, it is built on DCE, and secondly, there are too many layers of software with too much crosstalk.

2/ Conceptually, ECS is an archetypical object-based distributed system. It closely fits a message bus system architecture, with system components "plugged into" the message bus.

3/ There are a number of COTS products that provide functionality similar to ECS components currently implemented with custom developed and supported code.

4/ ECS does not appear to adhere to a rigorous abstraction model. There are syntactic abstraction violations in code and muddling of detail levels in documentation. This is a concern because it represents a departure from the conceptual model of an object-based distributed system, the primary consequence of which will be severe hindrance to system evolution.

These summary points are amplified in the following discussions.

Observation 1: Problems with the Existing ECS Middleware.

Repeating the summary point, there are two main problems in the current ECS middleware. The first is the use of DCE, a huge software package employed because a small subset of its functionality addresses ECS needs. Unfortunately even the applicable subset of functionality does not map particularly well to ECS needs.

A. The DCE Problem

It is interesting to note that the problems associated with DCE’s use in ECS are strikingly similar to those that are generally accepted to have led to its commercial downfall:

A secondary effect of these characteristics is that relatively few DCE-based applications appeared in the marketplace. As mentioned above, all three of these characteristics are directly relevant to ECS. Distributed programming is a difficult task made easier by careful adherence to principles of object oriented design, in particular the definition of interfaces and strict separation of their implementation. The lack of direct DCE support for distributed object programming lead to the well-intentioned introduction of OODCE and additional infra-structural layers, including at least indirectly the SRF and PF. Unfortunately the result is a system with too many layers and too much extraneous functionality.

The very sparse installed base of DCE systems, which is a direct result of the expense and difficulty of installing DCE, has led to a requirement for the MOJO gateway. Since most external organizations will not be running DCE, nor can they be reasonably expected to install it, a bridge into the DCE system is required for them to interact with ECS. In a related way the same problem makes the support of a federated architecture substantially more difficult.

The third consequence of the use of DCE is poor reliability that leads to a requirement for watchdog software and other workarounds to enhance reliability. Furthermore the authors’ direct experience suggests that DCE itself requires substantial maintenance expenditure that could otherwise have been spent elsewhere, and although it is extremely difficult to quantify this expenditure it is likely to have had a substantial affect on the project overall.

B. The Layering Problem

As described in Doug Dotson’s white-paper "Future Directions for ECS: An Infrastructure Perspective" (draft dated 16 September 1998), a major problem for ECS has been the proliferation of complicated and imperfectly encapsulated software layers. In every case the addition of a new layer was well justified by technological needs. However the overall result is an encumbering collection of layers with too much crosstalk and, in all too many cases, exposure of the details of low level implementation to higher level code. Even without the presumably accidental or artifactual occurrences, the layering design exposes the Process Framework to at least three lower level interfaces. And in practice, as Doug’s paper shows, there are many applications that end up directly accessing five of the six major software layers—a clear violation of desirable encapsulation rules. Examination of ECS code shows violations of even the most basic encapsulation. For example DCE header files are included into higher level application code. Possibly this particular type of problem can be easily remedied, but on the other hand it suggests that less obvious semantic dependencies may also be lurking.

The "layering problem" results in reliability and maintenance issues, but perhaps more importantly it is a major impediment to migrating ECS to a new communications infrastructure.

Observation 2: ECS is an Archetypical Object-based Distributed System

In Doug Dotson’s paper he describes the general ECS communication infrastructure architecture: "During the initial design phases of the project, the ECS system was viewed as a large client-server network [...] ECS is more accurately described as a Peer-to-Peer topology." [pp. 7] He then proceeds to describe a system in which elements interact with each other by passing messages along a common logical bus. This is exactly the application model supported by the distributed object architecture. Components encapsulate their detailed functionality and present standardized interfaces to a general purpose message bus. In this case the components map to ECS servers and the message bus is the communications infrastructure.

Note that this architecture is very well suited for evolution, since the strict separation of implementation from interface allows implementation to be changed without affecting the rest of the system. Components can be "swapped out" when they become obsolete, they can be tested by "stubbing" their interfaces, and they can be replaced with COTS products if appropriate ones become available.

The catch is that although ECS is conceptually an archetypical example of this architecture, a number of practical obstacles are preventing it from operating in this way. One of these is the lack of a well defined abstraction model, for example a weakly enforced separation of interface from implementation. Another is the reliance on a communications infrastructure—DCE—that is poorly suited to supporting distributed objects.

Observation 3: COTS Products provide functionality very similar to that currently provided by custom code.

This is true of both the middleware infrastructure and portions of application-level code. As an example, consider CORBA. CORBA supports a distributed object model, with the ORB acting as a communications bus. Low level layers such as the Event Service or Messaging Service support the delivery of messages, synchronously and asynchronously. Thus CORBA services can provide functionality very similar to several of the ECS middleware layers. There are also CORBA services that provide functionality similar to that of ECS servers, e.g. the CORBA Trader Service is analogous to the ECS Advertiser, the CORBA Event Service provides features similar to the ECS Subscription Server, and so forth.

The advantages to replacing custom code with COTS products are clear, as long as the COTS product can subsequently be easily replaced. One could argue that DCE is a COTS product but that it is responsible for many ECS problems, hence that it demonstrates a disadvantage to using COTS. This is invalid. The problem is not the use of COTS DCE, but the failure to ensure that it was treated as an independent component that could be extracted and replaced when necessary.

One of the primary advantages of a component-based distributed object model is that it explicitly and formally supports replaceability, and hence reuse.

Observation 4: ECS Suffers from the Lack of a Strong Abstraction Model

The primary ECS design document, 305-CD-100-005 (dated February 1999), hereafter referred to as "the 305 document", is idiosyncratic and obfuscatory. Overuse of acronyms and a paucity of attention to the more abstract representations of the system make the document difficult to read for those new to the ECS system. This is a concern for two reasons. In order for a system as complicated as ECS to be comprehensible, it should be defined within a hierarchy of abstractions.

At each level of the hierarchy one should be able to abstractly describe the components and their interactions. It is crucial that a model like this be defined and accessible to those working with the system, so that they are able to understand system-wide interactions at a variety of detail levels. The component descriptions in the 305 document often vary widely in abstraction level, sometimes referring to high level concepts and specific code interfaces in the same diagrams.

Secondly but perhaps more importantly, the lack of this type of abstraction model in the prime design document of such a large system may very well reflect a failure to adhere to abstraction models in the actual system implementation. Direct interactions between components at different abstraction levels is a serious violation of modular and object oriented design principles, and one of the primary consequences of such violations is difficulty in evolution and maintenance. Some evidence of this has appeared in the Ames team’s examination of ECS source code, although the most noticeable problems—which are typically syntactic, such as the inclusion of DCE header files into application-level code—are also in many ways the least serious. However they raise the concern that more subtle semantic dependencies exist and will hinder attempts to modularize the system.

II. Future Directions

Summary Recommendations

1/ The primary cost in DCE replacement is due to the effort of extracting DCE. Thus the major consideration in choosing a replacement should be maximizing the benefit to offset the extraction cost. A CORBA-based system offers substantially superior overall benefits and so is preferable to one involving a custom developed DCE replacement.

2/ The ECS side of the collaboration should strongly consider undertaking an effort to document ECS using standard object oriented software engineering techniques, most importantly UML but making mention of design patterns where useful, and begin iteratively reorganizing ECS code to make the system more modular and to reduce abstraction violations.

These points are discussed in detail below.

 

1. Replacement of DCE

The primary task identified at the collaboration kickoff workshop was a study of the possible replacement of DCE with a simple non-COTS software layer, for example one based on Sun RPC with an LDAP naming service. It was thought that this might be a relatively low cost, high payoff activity.

A. Replacement with a Simple non-COTS Component

The Ames investigation of this option suggests that the cost is higher than expected, primarily due to the aforementioned problems with middleware layer crosstalk and abstraction violation. The key observation is that the most difficult and cost-variable portion of this approach is in removing DCE. This is an unknown cost but one that is independent of the choice of replacement for DCE.

After review of Doug Dotson’s 16 Sept. 1998 "Future Directions for ECS" draft, it appears that replacing DCE with a comparable non-COTS layer would fail to address several serious outstanding technical problems within the ECS infrastructure. In particular:

1/ it would not address the fundamental mismatch between the client-server based DCE-like model and the actual peer-to-peer ECS architecture;

2/ it does not reduce the number of middleware layers; it merely replaces DCE with a DCE work-alike;

3/ it will not support asynchronous messaging any better than the current system (i.e. it will not obviate the need for a messaging layer built on top of the DCE/OODCE layer);

4/ it does not address the requirement for a transaction processing infrastructure;

5/ it retains OODCE—obsolete, non-standard, and orphaned code—as a primary component of the ECS middleware.

There is a further disadvantage, in that introducing a new non-COTS component—particularly one that fulfills a ubiquitous infra-structural role—will require additional maintenance expenditure. Clearly there is still a benefit in reducing the total code footprint by replacing the bloated DCE component with a potentially much simpler non-COTS layer.

The Ames team’s position is that overall, the benefits are not sufficient to justify the cost. In summary the new layer would not address enough of the current problems in the ECS infrastructure, and furthermore it would not introduce substantially different opportunities for evolution of the ECS system.

B. Replacement with CORBA

An intriguing alternative is the introduction of a completely new COTS middleware layer that subsumes several of the current ECS infrastructure layers. In particular, CORBA is COTS middleware that explicitly supports a peer-to-peer distributed object model. It provides a standardized language for defining object interfaces suitably for their use in a distributed system. Furthermore there are COTS CORBA products that implement much of both the current ECS middleware functionality, for example asynchronous messaging, and even desired functionality, such as transaction processing. In fact a CORBA middleware replacement offers the potential of satisfying each of the needs identified in Doug Dotson’s analysis of ECS infrastructure.

CORBA provides other advantages. It obviates the need for a gateway from external systems to ECS, since there are freely available mechanisms for interacting with CORBA. (For example, Sun provides tools that make Java interact with CORBA services. Netscape’s Navigator browser incorporates a CORBA ORB.) Because CORBA uses a standard interface description language, the modifications it requires are likely to support further evolution. CORBA is already much more widely accepted and used in industry than DCE ever was, and it is likely that any successor to CORBA will be required to provide mechanisms that support migration from a CORBA system.

There is also the possibility that COTS CORBA services can replace some of the code currently developed and maintained by ECS. For example the CORBA Event Service appears very similar to the ECS Subscription Server, and the CORBA Trader Service is similar to the ECS Advertiser. Combined with a more rigorous abstraction model, which will help to isolate components and better specify their interfaces, CORBA offers the possibility of replacing substantial custom developed and maintained pieces of ECS with COTS code.

Replacing DCE with CORBA is certainly a substantial task. However the primary cost is not the insertion of CORBA, it is the extraction of DCE. Thus the important consideration in choosing a replacement is the potential payoff it provides. If the benefit is great enough it will completely offset the cost. The Ames team believes that a CORBA solution has the potential to do so.

 

2. Improving the ECS Abstraction Model

Two recent developments have had a substantial impact on the object-oriented software community. The first is the identification of design patterns. These are high level patterns of object architecture and object interaction that have been found to appear repeatedly in object oriented systems. Identifying and documenting design patterns is a way of increasing the level of abstraction at which the system can be viewed, hence of making the system more familiar and comprehensible.

The second but equally important innovation is the Universal Modeling Language (UML). This "language" is actually a family of modeling tools that have been developed to document and characterize object oriented systems. UML is essentially a formalized set of tools for depicting object oriented abstraction models.

Both of these tools could be used to substantially improve both ECS documentation and the ECS abstraction model, which will indirectly make it easier to maintain and improve the system. UML can be used in a cyclical fashion as a core tool in an effort to reduce ECS abstraction violations and to bring the system more closely into accord with its conceptual structure as a component-based distributed object system built around a software message bus. The first step is to describe the current system using UML, perhaps identifying design patterns where they appear. This will allow software engineers—including outside experts if necessary—to understand the system more readily, and it will tend to make abstraction problems more visible. ECS code can then be incrementally modified and re-documented, gradually bringing the system closer to a clean model. The final result will be a much better documented system that is much more amenable to evolutionary modification.