In my earlier post, I expressed some reservations about Agile methodology (or possibly, about misapplications thereof, I'm open to persuasion). I introduced the Conversational Meta-Methodology as an antidote to the problems cited in that post. My rant continues here with a different problem I'm seeing very clearly in my current project.
I'll repeat an important clarification from the earlier post briefly: my current project doesn't appear to be failing at all, in fact I think it will result in a kick-ass product, and the part of the product that will be of greatest ultimate import is the component where the Agile approach (and the Conversational Meta-Methodology) have been followed most religiously. The problem I'm describing here is one that, if it were overcome, would have resulted in "kicking it up another notch", as Emeril would say.
In twenty-five years of commercial and enterprise product development, I have worked on many systems that can be viewed as distributed data repositories, or more generally, distributed resource frameworks that may encapsulate both data and behavior. Application of best-practices software engineering principles leads to a more-or-less standard architecture for such systems, in effect a high-level architectural design pattern. It consists of the following components:
- A query front-end, in which the end user (human or machine) expresses queries in some abstract query language, and receives results rendered in some desired output data structure.
- A controller, responsible for partitioning the work to be done, dispatching subqueries to remote data sources (workers) in parallel, monitoring their progress and reporting progress to the front-end, aggregating results received from the remote nodes in some standard format, performing aggregate transformations on the composite result set, rendering the aggregate results into the desired output form, and returning the results to the front-end. The controller also provides graceful degradation in the event of exceptional occurrences, such as a remote node being unavailable or receipt of a malformed query.
Another common but by no means universal function of the controller is to provide a master entity index, to identify real-world entities represented in multiple remote data stores and provide a common unique identifier for such entities. - Remote end-point adapters that can map the abstract subquery it receives onto the local data store(s), retrieve data, and translate as needed into the standard format employed by the controller. The adapters can provide a wide variety of additional functions, such as deidentification, performing local aggregate functions on their result set, or running procedural code that processes or even generates local data.
DBMS developers, database architects (DBAs) and others will recognize how this pattern describes the internal structure of an enterprise DBMS. The remote data stores in the above description may be different schemas in the same database, different database instances, or simply ad hoc algorithms that can be delegated to worker threads in order to parallelize query execution on a single local data store. The abstract query language is SQL if the database is relational, and the aggregate operations are stats functions, grouping, sorting, and so on. The adapters may do sorting and the controller does a merge of sorted results. It's all pretty familiar stuff.
Here's a diagram that shows how this applies to the architecture of a DBMS. There are a lot of variations on this theme, but the picture holds for the DBMS I have worked with over the years. Click on the picture to open full-size in a new tab or window.
In a DBMS, the subplans are queries against various data storage resources -- tables, indices, and so on -- as specified in the retrieval aspect of the overall execution plan. "Develop execution plans" covers a lot of ground, since this is where rocket-science algorithms like cost-based optimization are applied. Of course this diagram only looks at the query side of a DBMS (the "U" in "CRUD"), but the picture is not overly different for updates and deletes, or even for DDL operations that affect the metadata (aka data dictionary tables in a DBMS). The "Group and order" and "Perform aggregate transformations" boxes are often interdependent and may occur in a different order than shown here, depending on the transformation plan. Nonetheless, I would maintain this is a reasonable high-level view of DBMS architecture, and it conforms well to the pattern described above.
To make it more specific to this blog's readership, the diagram below shows a healthcare-specific instance of the pattern (click to open full-size in a new window). What makes it healthcare-specific is the Honest Broker with its Master Patient Index for HIPAA compliance. The ontology mentioned in the diagram represents the problem domain for the purpose of formulating abstract queries and structuring results, and therefore is a lot like a cross-enterprise data dictionary, a syntactic data model. If it is a true ontology, though, it overlays semantics on the syntactic structure and can be used for reasoning activities.
Continue reading "Agile Methodology and the Conversational Meta-Methodology: Part two" »











