What Is A Polymorphic Database

From VistApedia
Revision as of 15:16, 20 September 2005 by 152.131.9.69 (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The difference between a relational database and a hierarchical one (or, for that matter, a polymorphic one like FileMan) has to do with how it models the world.

All databases exist to record an abstract model of pieces of the world. Databases are usually structured as files (or tables or classes), each of which lists entities of a similar kind, such as patients, or drugs, or visits. Just as the file represents a category of entities, so each record (or entry or row or object or instance) in that file represents a specific entity, such as a specific patient, a specific drug, or a specific visit. Databases, files, and records are not the real things they represent, only abstract representations of them. An entry in the patient file is not a real patient, but an abstraction of a patient, a metaphor for that patient. Very much as with poetry, the more closely that metaphor matches the important parts of the real thing it represents, the more powerful the metaphor, the more meaningful, and from the perspective of medical informatics, the more likely it is to assist in improving patient health. Whether you have the right information and whether you have organized it into the right metaphor is largely dictated by how that information will be used--that tells you which operations can be inefficient and which need to be efficient, which tells you how to balance the tradeoffs that are always involved.

A good database designer chooses apt metaphors that match well the kinds of information the clients need to record. The strategic part of that choice involves selecting the right database paradigm; the tactical part is using that paradigm effectively. WHICH data a file records is up to the file designer, but HOW that data is stored is up to the database paradigm you choose (relational, hierarchical, network, polymorphic, object-oriented, etc.). As with successful adaptation in nature, the secret to success lies not with rigid orthodoxy but with responsive flexibility, varying your approach to let each problem dictate its own best solution.

Relational databases represent files as tables, records as rows, fields as columns. This is the spreadsheet's view of the world. In truth, spreadsheets are excellent for certain categories of problems (e.g., inventories of parts), and terrible for others (e.g., Beethoven's Pastorale piano sonata). The relational database is no more the perfect solution to every problem than is the hammer. Neither is it the wrong solution for every problem. It must be used appropriately, to solve those problems for which it is well-suited.

Relational purists, those who insist every database problem must be solved relationally, love the reductionist simplicity of having only a single metaphor for all problems, and argue that by reducing all problems to this common form you increase interoperability. This assertion is false. When used universally and rigidly, the relational metaphor becomes a Procrustean bed, stretching short people and cutting off the legs of tall people so they all fit "perfectly" in the same size bed. A relational solution is most apt for atomic information bound by simple keys, or for groups of such entities likely to be sorted different ways at different times, in which there is no clear way they are usually organized. It is weak at representing asymmetrical relationships, and does not represent behavior at all (making it incompatible with the object-oriented approach to modeling the world has increasingly favored over the last quarter century). The things left out of the relational model are just as important to creating a good model as are the things it includes, and often turn what may appear to be the trivial problem of making two relational systems share data into a nightmare of hidden assumptions and missing relationships.

Monomaniacal devotion to the relational model (or any one technology) is based on the common but erroneous belief that most software problems are technological problems, that using a common technology removes the most important barriers. In actuality, software is a human communication and relationship problem far more than a technical one, consisting of struggling to understand and communicate our needs, first to each other, then to the computer and whoever maintains the code over the decades that follow. Arguing that common use of a relational model solves interoperability is like arguing that world peace has been solved by having all human beings share the same DNA; the proposed solution does not match the context of the problem.

FileMan is a polymorphic database management system. You can use a relational approach when that is appropriate, hierarchical when that is, and so on. It does not fully implement any of these paradigms, but it does enough of most of them to give us a reasonable solution to most the database problems we have faced over the last twenty-eight years.

In VistA, one place we use a relational model is for information relating to visits, because the priority of the relationships among the many visit-related entities is roughly symmetrical--you are just as likely to look up visits by date as you are by patient or by doctor or by any of a number of other things. Since no one of these entities is the clear majority choice for how to organize the rest, we flatten them all into a collection of separate V files that all point to the shared Visit file. As is often the case with relational solutions, this is as flexible as possible, but using the resulting files is then labor-intensive, since the software or users must correctly combine the files they need to get the right results. This was a good use of the relational model by Indian Health Service in RPMS, which the VA had the good sense to adopt into VistA.

One place we use a hierarchical model is for information relating to the menus a user can choose from. To simplify a bit from reality, in VistA all options are collected into menus, which may themselves be collected into other menus and so on; the resulting structure looks something like a tree. To simplify a bit more, each user is assigned a primary menu that contains all the options she needs to do her job. The software that uses this option information is called MenuMan, and it uses this file to solve several different problems that actually require it to use more than one database paradigm for the same data.

When a user is signed into VistA, MenuMan has to make sure she can reach all the options in her primary menu tree, but only those options. For example, if a user tries to jump to an option, MenuMan has to read the option she wants to jump to and then decide whether that option is indeed somewhere among the options she can reach from her primary menu (and its children, and their children, and so on). This is very much a tree operation, not a table operation, and storing this in relational tables would make MenuMan slow, since it would have to recompute all the interrelationships among the options every time it tried to solve this problem. Instead, MenuMan builds a tree for each primary menu that contains all the interrelationships, and builds an index on that tree to quickly identify 1) which options are contained within each tree, and 2) if so, how do you navigate through the tree to reach those options. A partially hierarchical model is exactly right for this problem, since it is about traversing trees, so that is what VistA uses in its compiled menu trees in the ^XUTL global.

On the other hand, options are also managed individually--locked or unlocked, changes to the menu text, sending out new versions if they are patched, and so on. For this kind of management, an option's location within the compiled menu trees is unimportant. All that matters is being able to quickly and independently find and manipulate each option. For this a relational approach is appropriate, so VistA stores options partially relationally in the Option file. Indeed, although all of the information about each option is stored in the Option file, some of it is stored a second time in the menu trees that VistA copies out of the Option file and reorganizes--the same information is stored both relationally and hierarchically because MenuMan needs it one way for some operations but not others. This is more or less the right solution to MenuMan's problems, but the software industry's warring camps would instead choose to twist the solution one way or the other to make it fit their ideological loyalties. The VistA way is not to take one side or the other nor even to remain neutral, but to advocate an end to the war so the two sides can get on with intermarrying to produce healthier, prettier, and hopefully wiser children.

The software industry has a huge failure rate. It is in the midst of an ongoing, growing, forty-year problem called The Software Crisis, in which as the costs of hardware plummet and the power of that hardware increases geometrically, the costs of software are rising, as is the failure rate. No one has solved or shows any immediate signs of solving this industry-wide dilemma, but everyone who admits to the existence of this crisis claims to have a solution, and all of the solutions are technological. Use SQL and all your troubles will go away. No, use C#. No, use brand X instead. These approaches cannot solve the crisis because none of them deal with the fundamental issues. They all propose that The Software Crisis is a werewolf, so all you need is a magic bullet, which by coincidence they just happen to have with them and are willing to part with in exchange for your savings account. An expensive hammer can indeed solve every problem; it just solves most of them badly, and so do these many technologies-du-jour. Each has a small number of proper uses and a large number of inappropriate uses, and each is proposed to be the only solution you need for entire classes of problems.

When an entire industry is this deluded, you can expect crazy behavior. For example, if the American College of Physician Executives says the state of medical informatics is a disaster, except for VistA, which is great, you can expect the industry to complain as one that no, that cannot be so, for look: VistA is not written in brand X. No matter how bad most medical software is, VistA's success cannot be as interesting as which technologies it uses. Failure according to the prescribed technology is the only permitted option. Succeeding with a technology that everyone agrees cannot work must be a mirage. Look away, before the VistA heathens corrupt you with their non-brand-X ways.

So let me play Ariadne to your Theseus and offer you the golden clew with which to escape the labyrinth: ignore technology; focus on features. The short answer to whether you should choose a relational or hierarchical model is: don't. Forget about technology. Leave that to the programmers. Focus instead on the problems you are trying to solve, the features you want, the kinds of information you need. That is by far the harder problem, and something the technologist cannot do for you. If instead you focus on the technology, not only are you artificially constraining the solution (thus increasing the chances of failure) but also neglecting the harder problem that only you can do (thus ensuring failure).

Yours truly, Rick