Last time, I explained why the Standard Health Record (SHR) is based on data elements. Primarily, clinicians think in terms of data elements, and there are many preexisting libraries of data elements that we can use. Data elements are reusable building blocks that help SHR scale up most effectively.
In this post, I’m going to explore conversions between a data element-style model and more typical class-attribute models. Here’s the bottom line up front: not only is the data element model more parsimonious (smaller) than class-attribute model, but the larger the system, the greater the advantage.
FHIR is a class-attribute model, and so are the V3 Reference Information Model (RIM), CIMI, and FHIM. In an class-attribute model, you might see something like the following definition of FHIR’s Location Resource (for brevity, several attributes are removed):
Class: Location
Parent class: DomainResource
Description: "Details and position information for a physical place"
Attributes:
name: string [0..1] // Name of the location as used by humans
type: CodeableConcept [0..1] // Type of function performed
telecom: ContactPoint [0..*] // Contact details of the location
address: Address [0..1] // Physical location
position: [0..1] // The absolute geographic location
longitude: decimal [1..1] // Longitude with WGS84 datum
latitude: decimal [1..1] // Latitude with WGS84 datum
altitude: decimal [1..1] // Altitude with WGS84 datum
To re-express this in terms of reusable data elements, each top-level attribute is converted to a separate data element. To distinguish an attribute from a data element, we use a leading capital letter (e.g., name [attribute] versus Name [data element]). In SHR, it looks like this:
Element: Location
Based on: DomainResource
Concept: MTH#C0450429
Description: "Details and position information for a physical place"
0..1 Name
0..1 Type
0..* ContactPoint
0..1 Address
0..1 Geoposition // Re-named to distinguish from body position
Note that I’ve added a model meaning binding (the Concept keyword), so computationally, “location” is not just a word but a formal concept that can be related to other model concepts to support reasoning. (In SHR, codes are indicated using the syntax codesystem#code. In this case, MTH is an alias for the UMLS Metathesaurus code system, and C0450429 is the code from that system that means “location”.)
Each data element in the composite object Location now must be defined. Here’s how it is expressed in SHR:
Element: Name
Concept: MTH#C0027365
Definition: "The words or language units by which a thing is known."
Value: string
Element: Type
Concept: MTH#C0332307
Description: "Something distinguishable as an identifiable class based on common qualities".
Value: CodeableConcept
Element: ContactPoint
Concept: MTH#C2986441
Description: "An electronic means of contacting an organization or individual."
// data elements composing ContactPoint...
Element: Address
Concept: MTH#C1442065
Description: "A standardized representation of the location of a person, business, building, or organization."
// data elements composing Address...
Element: Geoposition
Concept: TBD
Description: "The location on the surface of the Earth, described by a latitude and longitude (and optional altitude)."
1..1 Latitude
1..1 Longitude
0..1 Altitude
Element: Latitude
Concept: MTH#C1627936
Description: "The angular distance north or south between an imaginary line around a heavenly body parallel to its equator and the equator itself. Measured with with WGS84 datum."
Value: decimal
Element: Longitude
Concept: MTH#C1657623
Description: "An imaginary great circle on the surface of a heavenly body passing through the poles at right angles to the equator. Measured with with WGS84 datum."
Value: decimal
Element: Altitude
Concept: MTH#C0002349
Description: "Height above sea level or above the earth's surface. Measured with WGS84 datum."
Value: decimal
The data element representation is more verbose (partly because I included model meaning bindings and full definitions), but actually, there are fewer entities defined in the data element approach:
- Class-attribute representation (11 definitions): Location, name, type, telecom, ContactPoint, address, Address, position, longitude, latitude, altitude
- Data element representation (9 definitions): Location, Name, Type, ContactPoint, Address, Position, Longitude, Latitude, Altitude
The difference? In the class-attribute representation, address and telecom are tautological attributes, expressing the same thing as their datatype, in slightly different words. The definition of Location.address is the same as Address; the definition of Location.telecom is the same as ContactPoint. The data element approach allows you (almost forces you) to eliminate tautological attributes.
When a class-attribute model is converted into a data element model, someone has to interpret the attributes across different objects and determine which attributes mean the same thing, and therefore can be represented with a single data element. We might find multiple “effectiveTime” attributes, but identical naming doesn’t assure identical semantics. Conversely, there could be attributes with different names (e.g., “validityInterval”) that do mean the same thing. Unless there are model meaning bindings on each attribute (usually not), the conversion to an accurate, non-redundant data element model requires hand-work. I did this for the entire CIMI model in the course of about 2 weeks for the September 2017 HL7 ballot cycle (what I learned during that process is another discussion that I might take up later).
What happens to the semantics of an attribute when it is liberated from its class context and converted to an independent data element? Location.type seems much more defined than the free-floating data element Type. That’s true. But when the data element Type is added to Location, as above, it literally means “Type of Location”, same as the attribute. But even that meaning is vague – if you do city zoning, then “Type of Location” could mean something entirely different than is meant in the FHIR context. What helps defines “type” – even within the location context – are the possible answers. In FHIR, the answer set for “type of location” includes codes from http://hl7.org/fhir/v3/RoleCode, which has answers like hospital, chronic care facility, addiction treatment center, and coronary care unit. We really aren’t done defining semantics until we associate a value set. Value set binding goes a long way to refine what we actually mean by a data element in a given context.
In SHR, we use the following syntax to apply a value set binding:
Element: Location
Based on: DomainResource
Concept: MTH#C0450429
Description: "Details and position information for a physical place"
0..1 Name
0..1 Type from http://hl7.org/fhir/v3/RoleCode if covered
0..* Telecom
0..1 Address
0..1 Geoposition
The “from … if covered” syntax is the SHR’s equivalent of FHIR’s “extensible” binding. To say a value set is required, “if covered” is omitted.
The real payoff from a data element model is reuse. Data elements such as Name, Type, and Address can be used over and over. For example, we can use Type in other contexts, bound to different value sets:
Element: Claim
...
0..1 Type from http://hl7.org/fhir/ValueSet/claim-type // required binding
...
Element: Encounter
...
0..* Type could be from http://hl7.org/fhir/ValueSet/encounter-type // example binding
...
Element: OralDiet // from NutritionOrder
...
0..1 Type should be from http://hl7.org/fhir/ValueSet/diet-type // preferred binding
...
Regardless of whether it is “type of claim,” “type of encounter,” or “type of oral diet,” the same data element, Type, works perfectly well. You don’t have to define Type more than once.
What about reversing the transformation, and going from a data element model to a class-attribute model? Because we have eliminated tautological attributes, some information necessary for the reversal is missing. We can, however, generate tautological names by taking the data element names, and writing them in leading lower case, e.g.:
Class: Location
Parent class: DomainResource
Description: "Details and position information for a physical place"
Attributes:
name: string [0..1] // Name of the location as used by humans
type: CodeableConcept [0..1] // Type of function performed
contactPoint: ContactPoint [0..*] // Contact details of the location
address: Address [0..1] // Physical location
position: [0..1] // The absolute geographic location
longitude: decimal [1..1] // Longitude with WGS84 datum
latitude: decimal [1..1] // Latitude with WGS84 datum
altitude: decimal [1..1] // Altitude with WGS84 datum
That’s almost what we started with. Note that the position attribute is in-lined as a BackboneElement because in FHIR, we can’t create new complex data types. We don’t have to do that for ContactPoint and Address, because they refer to existing complex types.
So, we can go back and forth between the two forms, but when we use common data elements, redundant attribute definitions are eliminated, and they need to be regenerated in the other direction. Elimination of those redundancies is fundamentally why the data element approach is more parsimonious than the class-attribute approach, and why the advantage grows as the size of the model increases.