Exposition

A smart recruiter told me that I should start brushing up on leetcode for interviews, but instead of doing something practical like that, I’ve spent the past few days doing a deep dive on MUMPS (the database, not the disease). It would actually probably be more useful to do a deep dive on the disease.

It started when I was listening to the Acquired podcast episode on Epic Systems. I’m late to the game on Acquired, but it’s an incredible podcast that brings Hardcore History vibes to business analysis. I don’t have warm fuzzy feelings towards Epic, but their founder Judy is a baller, and the episode made me appreciate the software that they’ve managed to develop. And as a database person I was super intrigued when they mentioned how Epic uses a single database for all of their data. This is wild. Not only that, the database they use is built on MUMPS, which I had never heard of.

Now I’m not the most well-versed in all of tech, but after spending 5 years getting a Ph.D. in database systems I feel like I should have at least heard about MUMPS given that it backs electronic health records (at Epic and the VA system in the US), financial institutions like American Express, and the European Space Agency, among others. Do only people who go to the University of Northern Iowa learn about it?

The podcast episode started me down a literature review binge that surfaced gems like:

People often ask … why MUMPS does not support the features of the language and operating systems to which they are accustomed. These questions are of the variety: “When did you stop beating your wife?”

This is in a published paper! I don’t often do a double take when reading a CS paper, but clearly people feel passionately about this topic.

Tangentially, it was also entertaining reading takes that aged poorly with the benefit of hindsight, such as this guy claiming that PostgreSQL is dying:

It doesn’t matter that PostgreSQL or Perl were the right answer then. They’re the wrong answer now, and more wronger tomorrow.

Tell that to Neon.

The rest of this post is a TIL on MUMPS, with links to the articles and documentation I found interesting.

Database History

What is MUMPS

Briefly, MUMPS (Massachusetts General Hospital Utility Multi-Programming System) is a combined language and database that was developed in 1966 at Massachusetts General Hospital to manage medical records. It uses a schema-less, hierarchical data model based on sparse arrays. It’s basically every NoSQL system decades before NoSQL was a buzzword.

I’ll talk more about the technological merits and detractions of the language/database later on, but to help establish a mental model, the following code snippet shows how to set a value that gets saved to disk:

SET ^Car("Door","Color")="BLUE"

In Redis, the same command would look like:

SET car:door:color "BLUE"

MUMPS refers to both the language and underlying database system; they are tightly integrated.

Database History Around MUMPS

At the same time that MUMPS was being created outside of the database sphere in the late 60s, similar data models were being proposed, as described in What Goes Around Comes Around:

  • In 1968 a hierarchical model similar to MUMPS called IMS was released. It requires defining a schema for each node, whereas MUMPS is schema-less. IBM still ships an IMS database for high-frequency OLTP systems.
  • In 1969 the CODASYL model was first proposed. Compared to MUMPS, it is optimized for representing non-hierarchical data.
  • In 1970 Ted Codd proposed the relational data model that is still used by relational databases today. Compared to MUMPS, IMS, and CODASYL, it operates set-at-a-time instead of record-at-a-time.

A Great Debate ensued between the Codd-camp and CODASYL advocates before IBM essentially institutionalized the relational model in 1984 when it announced the release of DB/2. IMS and MUMPS seem to have soldiered on in the background while the relational and CODASYL acolytes were fighting it out.

Despite IMS still being in use at IBM, I haven’t encountered it outside of this paper. And I never came across MUMPS in any literature that I read, likely because it was developed outside of the database community. The one semi-recent reference to MUMPS I could find in a database conference was in a variant of “What Goes Around” for NoSQL in EDBT/ICDT ‘13.

MUMPS Merits: A Technology Ahead of its Time

I don’t any stake in the MUMPS debate, but from my reading, MUMPS was an unbelievably impressive piece of technology at the time it was developed, and it still is incredibly efficient at the tasks that use it.

Dunking on MUMPS

Despite its technological prowess, people love to dunk on MUMPS. And not for nothing; it makes itself easy to pick on.

Readability

The syntax can be diabolical. Here’s a fun code snippet if you want to melt your brain:

GMRCAAC ;SLC/DLT - Administrative Complete action consult logic ;7/16/98  01:47
 ;;3.0;CONSULT/REQUEST TRACKING;**4,12,53,46**;DEC 27, 1997;Build 23
COMP(GMRCO) ;Clerk action to Complete an order
 ;GMRCO is the selected consult
 K GMRCQUT,GMRCQIT
 I '+$G(GMRCO) D SELECT^GMRCA2(.GMRCO) I $D(GMRCQUT) D END Q
 I '+$G(GMRCO) D END S GMRCQUT=1 Q
 ;
 N GMRC,GMRCSTS,GMRCQUT
 S GMRC(0)=$G(^GMR(123,+GMRCO,0)) Q:GMRC(0)=""
 ;
 ;Completion action restricted if status is 1,2,or 13
 S GMRCSTS=$P(GMRC(0),"^",12)
 I $S(GMRCSTS<3:1,GMRCSTS=13:1,1:0) D  Q
 . N GMRCMSG
 . S GMRCMSG="This order has already been "_$S(GMRCSTS=1:"discontinued",GMRCSTS=2:"completed",1:"cancelled")_"!"
 . D EXAC^GMRCADC(GMRCMSG)
 . S GMRCQUT=1
 . D END
 ;

I don’t know enough to check the ChatGPT explanation, but supposedly “this code is the guard clause that prevents someone from trying to “Complete” a consult request that’s already closed in any way”. Obviously.

As Rob fairly points out in many places, MUMPS doesn’t have to be this incomprehensible, but it seems that a lot of the legacy code is written this way.

Semantics

Note: I’m using YottaDB to run the examples.

MUMPS has no order of operations:

YDB>WRITE 2+3*10
50

MUMPS hates spaces (actually, it is very explicit about the meaning of zero vs. one vs. two or more spaces):

YDB>WRITE 2 + 3 * 10
%YDB-E-CMD, Command expected but not found
	WRITE 2 + 3 * 10
	        ^-----

MUMPS infers data types, sometimes with confusing outcomes:

YDB>WRITE 5+"40 ducks"
45

(to be fair, birds aren’t real anyways)

Variable scoping is weird. Variables stay in scope for all sub-routine calls. Routines thus must preemptively new variables to avoid unintentionally clobbering them. Good practice looks like:

f ;
  new x
  set x=$$g()
  write x
  quit

g()
  new x
  set x = 5
  quit x

do f

Its support for indirection and dynamically modifying the running program make it almost impossible to understand what a given program will do at runtime (though maybe now we’re all more comfortable with this given the popularity of other interpreted languages like Python).

It can be impossible to understand control flow when the MUMPS code is stored in the MUMPS database and executed dynamically: X(^ROUTINES("XSTARTGB")).

Final Thoughts

I don’t have much to say in conclusion, but I had a good time learning about the MUMPS ecosystem, and if you read this far hopefully you learned something interesting!

Some random final thoughts:

For posts about the merits of MUMPS and thoughts about how to modernize it, I recommend checking out Rob Tweed’s blog.

Modern implementations of MUMPS include:

The VA system in the US is attempting and struggling to migrate off of VistA (MUMPS-backed) to Cerner. This has to be one of the most complex software migrations out there.