FHIR (in)Consistency? Data, please

I can’t count the number of gripe sessions on how a certain FHIR resource does this, while other does that, or how one resource calls something foo, and another calls the exact same thing bar, etc. I guess that’s because I hang out with modelers, a bunch of Felix Ungars with an obsession to make everything as neat-and-tidy as Marie Kondo’s closet. If that’s a character flaw, I’ll embrace it. I like information models to be consistent, because in real life, it’s much easier to work with predictable patterns.

Chris Moesel, my esteemed MITRE colleague, produced for me a list of every FHIR resource and its attributes in R4, STU3 and DSTU2. Here’s a tiny sample:

LMF-Patient-clip1

It was quick work to produce a couple of informative pivot tables that give a unique perspective on FHIR. The caveat on this analysis is that it is only able to look superficially at how things are named, not at the underlying semantics. Also, I’m looking primarily at horizontal consistency across resources, which is NOT a stated goal of FHIR.

You can download the dataset here.

A few immediate observations (R4 unless noted):

FHIR has grown. DSTU2 had 95 resources with 3454 attributes, STU3 had 118 resources with 5155 attributes and R4 has 148 resources and 7004 attributes.
FHIR has an extremely long tail. More than 90% of attributes in FHIR are completely unique. Less than 1% of attributes recur in more than 10 resources.
Many resources might missing important attributes. 30 resources don’t have an identifier attribute, only 101 of 148 have status, and only 23 have author. I’m not claiming this is right or wrong, but it does make you wonder…is it possible or appropriate that 125 resources don’t have an author?
Some attributes are used consistently, but others aren’t. Almost every time an identifier attribute is used, the cardinality is 0..* and the data type is Identifier. Great! But other attributes, like author and performer, are implemented inconsistently. Again, I’m not saying if this is right or wrong, but I have to wonder why purpose is consistently markdown, while comment is mostly string, and description is a bit of both worlds.
Some resources are enormous. 10 resources have over 100 attributes, and 45 have more than 50 attributes. The champion is ExplanationOfBenefits, weighing in at 255 attributes. I find that bitterly ironic — it’s supposed to be an explanation of benefits, i.e., something a consumer can understand.

I also looked at the top 50 commonly-occurring attributes in terms of the consistency of data types and cardinality. The full table is in the spreadsheet. Here’s a sample:

LMF-Consistency-clip2

The Bottom Line

I encourage you to download and play with the data yourself and reach your own conclusions. I have done a very shallow analysis.

From my perspective, FHIR’s extremely long tail emphasizes the degree to which each resource is a domain unto itself. The consequence is that, for the most part, implementers cannot leverage code written for one resource on another resource. There are potentially useful abstractions that could be created around FHIR. FHIR patterns is an attempt in this direction; so is this model. Claude Nanjo has also undertaken a horizontal consistency analysis of FHIR (I don’t know if Claude is ready to share those results.)

I also noticed several cases of the “many-names-for-the-same-thing” phenomenon. One that stuck out for me is the date associated with authoring a resource, which goes by several names in FHIR, including authoredOn, recordedDate, date, created, issued, dateRecorded, dateAsserted, authored. But that’s a story for another day.

fhir-summary-analysis-kramer-05032019-2 Download

3 Responses to FHIR (in)Consistency? Data, please

Grahame Grieve says:

May 3, 2019 at 10:57 am

Also, see this page: http://hl7.org/fhir/fivews.html#mappings

That page is hard to consume but contains a great deal of useful information. I’m not sure how to improve it

LikeLike

- Mark Kramer says:
  
  May 3, 2019 at 11:37 am
  
  Thanks for that link. In that table, what are the numbers? The colors? N, NT, NTC? I don’t see a key.
  
  LikeLike
  
  - Vassil Peytchev says:
    
    May 6, 2019 at 3:23 pm
    
    N = Name changed
    T = Type Changed
    C = Cardinality violation
    
    LikeLike

	Vassil Peytchev on FHIR (in)Consistency? Data,…
	Mark Kramer on FHIR (in)Consistency? Data,…
	Grahame Grieve on FHIR (in)Consistency? Data,…
	Grahame Grieve on Profile Validation in FHIR…
	Mark Kramer on Profile Validation in FHIR…