In my earlier post, I expressed some reservations about Agile methodology (or possibly, about misapplications thereof, I'm open to persuasion). I introduced the Conversational Meta-Methodology as an antidote to the problems cited in that post. My rant continues here with a different problem I'm seeing very clearly in my current project.
I'll repeat an important clarification from the earlier post briefly: my current project doesn't appear to be failing at all, in fact I think it will result in a kick-ass product, and the part of the product that will be of greatest ultimate import is the component where the Agile approach (and the Conversational Meta-Methodology) have been followed most religiously. The problem I'm describing here is one that, if it were overcome, would have resulted in "kicking it up another notch", as Emeril would say.
In twenty-five years of commercial and enterprise product development, I have worked on many systems that can be viewed as distributed data repositories, or more generally, distributed resource frameworks that may encapsulate both data and behavior. Application of best-practices software engineering principles leads to a more-or-less standard architecture for such systems, in effect a high-level architectural design pattern. It consists of the following components:
- A query front-end, in which the end user (human or machine) expresses queries in some abstract query language, and receives results rendered in some desired output data structure.
- A controller, responsible for partitioning the work to be done, dispatching subqueries to remote data sources (workers) in parallel, monitoring their progress and reporting progress to the front-end, aggregating results received from the remote nodes in some standard format, performing aggregate transformations on the composite result set, rendering the aggregate results into the desired output form, and returning the results to the front-end. The controller also provides graceful degradation in the event of exceptional occurrences, such as a remote node being unavailable or receipt of a malformed query.
Another common but by no means universal function of the controller is to provide a master entity index, to identify real-world entities represented in multiple remote data stores and provide a common unique identifier for such entities. - Remote end-point adapters that can map the abstract subquery it receives onto the local data store(s), retrieve data, and translate as needed into the standard format employed by the controller. The adapters can provide a wide variety of additional functions, such as deidentification, performing local aggregate functions on their result set, or running procedural code that processes or even generates local data.
DBMS developers, database architects (DBAs) and others will recognize how this pattern describes the internal structure of an enterprise DBMS. The remote data stores in the above description may be different schemas in the same database, different database instances, or simply ad hoc algorithms that can be delegated to worker threads in order to parallelize query execution on a single local data store. The abstract query language is SQL if the database is relational, and the aggregate operations are stats functions, grouping, sorting, and so on. The adapters may do sorting and the controller does a merge of sorted results. It's all pretty familiar stuff.
Here's a diagram that shows how this applies to the architecture of a DBMS. There are a lot of variations on this theme, but the picture holds for the DBMS I have worked with over the years. Click on the picture to open full-size in a new tab or window.
In a DBMS, the subplans are queries against various data storage resources -- tables, indices, and so on -- as specified in the retrieval aspect of the overall execution plan. "Develop execution plans" covers a lot of ground, since this is where rocket-science algorithms like cost-based optimization are applied. Of course this diagram only looks at the query side of a DBMS (the "U" in "CRUD"), but the picture is not overly different for updates and deletes, or even for DDL operations that affect the metadata (aka data dictionary tables in a DBMS). The "Group and order" and "Perform aggregate transformations" boxes are often interdependent and may occur in a different order than shown here, depending on the transformation plan. Nonetheless, I would maintain this is a reasonable high-level view of DBMS architecture, and it conforms well to the pattern described above.
To make it more specific to this blog's readership, the diagram below shows a healthcare-specific instance of the pattern (click to open full-size in a new window). What makes it healthcare-specific is the Honest Broker with its Master Patient Index for HIPAA compliance. The ontology mentioned in the diagram represents the problem domain for the purpose of formulating abstract queries and structuring results, and therefore is a lot like a cross-enterprise data dictionary, a syntactic data model. If it is a true ontology, though, it overlays semantics on the syntactic structure and can be used for reasoning activities.
There's no reason to limit the algorithm to familiar database implementations. For example...
- The pattern meets the requirements for a comprehensive Intensive Care Unit monitoring system very nicely, with the remote data sources being EEGs, EKGs, BP monitors, video streams, etc. and the front end being a dashboard on nursing station and hallway monitors.
- In much the same way, it works well for Computer Integrated Manufacturing systems like those used in the automotive, aerospace, semiconductor, and other industries. The front end is found in the factory control room and the remote data sources include a plethora of A/D sensors and actuators. In such situations, "queries" sent from the front end can include commands to remote devices or even entire distributed control programs.
- Large-scale HVAC control systems like those that manage university and corporate campuses also fit the pattern in a manner similar to CIM. As with CIM systems, remote nodes may be computer systems that operate autonomously much of the time, periodically pushing data to the control module or responding to ad hoc queries from control.
- It works nicely for search engines like Google, whose MapReduce algorithm and more recent variants meet performance requirements on huge data sets far exceed the capabilities of the most powerful RDBMS.
- The pattern maps well onto industrial-strength content management systems like ProQuest's proprietary system, and the Documentum implementations at Big Pharma sites that manage IDE and NDA submission preparation. In these systems, the remote data stores may be highly heterogeneous, including RDBMS, legacy network-hierarchical DBMS, XML document stores, OODBMS, and vast distributed file systems.
You can also turn the pattern's algorithm on its head, figuratively speaking, and produce a similar pattern in which remote data sources push data to the control module. I'll give just one example so I can move on to my main point. Document aggregation systems accumulate news stories in real time from many heterogeneous sources by receiving news feeds, translating them into a common intermediate format (e.g. XML or SGML), then redistributing to one or many subscribing client systems, filtering and rendering according to the subscriber system's specific requirements. The pattern is virtually the same, but the the data flows are pushed to rather than pulled by the customer at the front end. I call this architectural pattern 'Information Refinery', because of its similarity in structure to the process view of a petroleum refinery.
With these examples the reader may already be envisioning situations where these two high-level patterns are combined in a centrally-mediated peer-to-peer bidirectional communications architecture, but let's not go there. Enough examples already!
Implications of high-level architectural patternsThese patterns did not emerge from thin air, but from lessons learned on numerous prior instances of similar projects. Nor did I create this pattern; instead, like most of us, I stood on the shoulders of giants like Ed Yourdon and Tom DeMarco, whose work on the New York Times article archive is a cardinal example of the above-described pattern. Such high-level patterns are abstractions, and will be "wrong" in some sense in every instantiation. Nonetheless they represent knowledge derived from hard-won experience on the front lines of development, and you ignore them at your peril (or really in most cases, your customer's peril).
A key benefit of a high-level architectural pattern is that it helps you avoid major design changes as your iterative development process builds out the system. In the pattern above, there are some design heuristics that work well, based on the application of proven software engineering principles like information hiding, functional decomposition, increasing cohesion and reducing coupling.
- Algorithms should stick close to the data structures they maintain.
- The front end interface is a virtual machine that embodies a metaphor (mental model) based on abstract representations of domain-specific entities and relationships (ontologies).
- Affordances through which the user (human or machine) accesses the distributed resource framework operate on domain-specific knowledge rendered in forms tailored to the end user's needs and proclivities.
- Complexities involved in mediating access to arbitrary numbers of heterogeneous remote data stores are handled by the controller middleware, hidden from the end user and the remote end-point adapters.
- Mapping between domain-specific ontologies and local data store schemata is kept as close to the remote data store as possible, ideally in the end-point adapter itself.
- Any remote-node-specific operations on the data should happen as close to the data source as possible, especially where security considerations are involved: e.g., deidentification and enforcement of the terms of informed consent.
The problem with our Agile project
I've been working with this high-level architectural design pattern for many years, on numerous projects of different types. In fact, virtually every project I've worked on since 1985 ended up implementing this pattern (or its upside-down variant, or the combination of the two).
There have been aspects of virtually all these projects that used something very close to what is now called the Agile approach, in particular the iterative development of use case scenarios and user experience design. However, the design and implementation phase of most successful projects began with a detailed review of this architectural pattern, presented by the system architect (often me). It was then discussed and debated until the team was clear on the architecture, the plan of attack, and the roles, responsibilities, and relationships of all individuals and subteams involved in the project. Agile methods were involved upstream in use case scenario development, and downstream in user experience and interface design, but even these efforts were informed by the architectural pattern.
Presented in different ways for different audiences, the architectural design pattern provided the framework for a shared mental model of the product and the iterative development plan. The process of reaching this shared mental model is my Conversational Meta-MethodologyTM (wink), in which the framework provides an anchor-post that prevents going in circles or settling for a local optimum when the global optimum is just as easily achieved.
The shared mental model is created through intensive conversation employing both informal and formal components. The high-level architectural design pattern, if well enough articulated, is a formal component. Other formal components are lower-level design patterns, to the degree such patterns exist for the problem domain, such as those I described in my book, Design Conversations: Envisioning Content-Rich Enterprise Systems. In addition, the set of formal components may include such artifacts such as UML model diagrams, UI wireframes, etc. Oral and written informal conversational work is the catalyst that helps create compatible mental models of the project in the minds of participants and stakeholders. Both formal and informal components are essential to success in this endeavor.
Now that I am finally working on a project that is using the Agile methodology as promulgated by its real or virtual trademark-holders, I see a problem occurring that I think of as the "X-Files" problem. In every X-Files episode, there was a scene I called the "flashlights in the dar" scene, with Scully and/or Mulder in some dark basement or windowless warehouse, searching for someone or something by shining flashlights in the dark. It's a very effective way to build suspense in a drama. On a software project, though, suspense is not an ideal emotion to arouse.
We have deliberately and consciously avoided (over my objections) any overall architectural approach based on the pattern described above, even though the project's grand design is as shown. It will either ultimately end up conforming to the high-level pattern, or it will suffer from design defects that will impede one or more of the normative attributes of software requirements, the "ilities": usability, supportability, reliability, extensibility, maintainability, long-term viability, and so on.
In fact, I already see this happening. There are problems with the product architecture that could have been avoided had we developed a shared mental model of the architecture up front. They are costing us effort that could have been better spent on new features or robustification of existing features.
I have to apologize for not revealing more details, but I have good reasons for being reticent at this point. First and foremost, I work with talented and highly motivated people, and they deserve some privacy as we work things out. Moreover, as I said up front, the project is meeting its goals and is nowhere near complete.
So I'm not going to air dirty laundry. The problems I'm seeing may rectify themselves as time goes by. I'm not yet confident that we will achieve the optimal result, but I am confident that a) the project will meet its aims and produce something in which its stakeholders can take pride, and b) we will learn a great deal -- from our missteps as much as or more than our right choices. If we do rectify the problems, and there's a good chance we will, it will be through application of my Conversational Meta-Methodology. In other words, by communicating with each other.
Comments