VistA Version Control is Hard

From VistApedia
Revision as of 23:31, 8 July 2009 by JohnLeoZ (talk | contribs)
Jump to: navigation, search

People have suggested that we use a version control system for VistA countless times. However, VistA cannot just use git or subversion or whatever. Those systems cannot handle the way VistA's code is designed. This wiki article was copied in mass from a post on the subject by Jonathan Tai. In response to a Hardhats thread on the subject

For VistA development to really take off, we need distributed revision control and automated merging.

Take Linux for example. Linux uses a distributed revision control system called git. Linus has a branch of Linux that is the "official" branch. This is called the "mainline". Everyone can make a copy of this branch and do their own development. So far, it sounds just like centralized revision control, but the difference between centralized and distributed is that you can commit to your own private branch. You don't need permission from anyone. With a centralized system like Subversion, you have to have commit access to the repository, so you start by submitting one-off patches here and there until you gain enough karma to be entrusted with commit access. This doesn't scale. More on this later...

Getting back to the example, when you have done some development and are satisfied with it, you can publish your branch in a public location. You can then bring the branch to the official maintainer's attention, and request that he "merge" or "pull" your branch. This is another critical difference - with distributed version control, changes are *pulled*, not *pushed*. So the first time you request a merge, the maintainer can inspect your code closely. He may suggest changes. You can do those in your own branch (and each change is tracked because you can commit to your own branch) and re-submit your request. With a centralized system, you'd be sending a patch. If you're a frequent contributor, the maintainer may review your code less closely because he trusts you more, but the review process is built in.

After a little re-working, the maintainer finally decides your code is worthy for merging. With automated merging, even if that maintainer has made changes to other parts of the codebase since you cloned his branch, when the maintainer merges your branch, your changes are brought in automatically. If other people have branched off *your* branch and those changes have been merged already, most distributed systems are smart enough to figure that out and only merge the unmerged revisions. Subversion has gotten a lot better at this recently, but in earlier versions, it was *terrible* at this.

Another important point is that each branch is equal (at least on a technical level) - the maintainer's branch is only special because it's the maintainer's branch and is recognized by the community as the official branch. If the maintainer were to become disinterested in the software or otherwise unable to maintain it, or should you want to take the software in a different direction, your branch could just as easily become *the* branch. Getting back to the Linux example, each subsystem maintainer has his own tree - there are trees dedicated to filesystems, to memory managers, drivers, etc. There are also trees dedicated to stability, so they branch off an official Linux release and then only merge critical bugfixes from that point on. If you wanted a more stable version of Linux, you may follow that branch instead of the official mainline branch.

So in the VistA world - imagine several distributions like WorldVistA, OpenVista, FOIA VistA, vxVistA, etc. Each would have their own mainline. We could have package branches - so the FileMan group might have their own branch that's distro-independent. All the distros would merge changes from the FileMan group to maintain a "shared core", while each distribution does their own application work. Provided that the code/data structures are compatible and the licenses are compatible, the distributions could merge changes across - so WorldVista could merge in the server-side OpenVista CIS patches to allow CIS to run against WorldVista. And most importantly, when OpenVista updates those patches, WorldVista could just merge the new changes. If automated merging worked properly, this would be very seamless and easy - no re-porting required.

In fact, you could take it a step further. Each implementing site could choose to have (and maintain) their own branch. So a site might branch off of WorldVistA's mainline, make a few local mods, then merge new changes from WorldVista as time goes on. And if WorldVistA liked those local changes, they could merge in the other direction - the local changes are accepted back into the main distribution. For an all-volunteer distribution like WorldVistA, this may be the main source of their changes.

So why don't we do all of this now? First of all, some of us are still using centralized revision control. That's what Ignacio was trying to address by starting this thread. Fixing this is the easy part - switching from a centralized model to a distributed model is fairly simple because the centralized model is just a subset of the distributed model. It's like having just one branch that everyone commits to.

The second thing we need is automated merging. What we need to do is come up with a way to express VistA structures in a way that existing revision control tools can understand. Right now, a menu is stored in FileMan, which is stored as globals in a binary database file. A tool designed to handle source code can't handle that. It doesn't know how to merge two multi-gigabyte binary blobs to make a meaningful new version. We could export the database as plain text (e.g., %GO or ZWRITE format), but even that is not enough to allow automated merging because a menu might have a certain IEN at one site but another IEN at another site. The existing tools have no idea what's important and needs to be merged and what is site-specific and can be discarded. What we need is an abstract representation of a menu, similar to what KIDS has. If we can present these VistA structures to an existing tool, we can leverage the power of these existing tools to do development as a community more efficiently.

From talking to various folks at the VistA Community Meetings, some thought has gone into how to do #2, but I haven't seen anything workable yet. For now, I agree with Ignacio that we should do what we can, i.e., #1. Standardize on a distributed revision control tool so we can build the tools for #2 around it.

editors note: because this was copied from Medsphere.org, its creative commons license applies.




See also:VISTA Expertise Network