From Class-Attribute Models to Data Element Models (and Back)

Last time, I explained why the Standard Health Record (SHR) is based on data elements. Primarily, clinicians think in terms of data elements, and there are many preexisting libraries of data elements that we can use. Data elements are reusable building blocks that help SHR scale up most effectively.

In this post, I’m going to explore conversions between a data element-style model and more typical class-attribute models. Here’s the bottom line up front: not only is the data element model more parsimonious (smaller) than class-attribute model, but the larger the system, the greater the advantage.

FHIR is a class-attribute model, and so are the V3 Reference Information Model (RIM), CIMI, and FHIM. In an class-attribute model, you might see something like the following definition of FHIR’s Location Resource (for brevity, several attributes are removed):

Class: Location 
Parent class: DomainResource
Description: "Details and position information for a physical place"
Attributes:
  name: string [0..1] // Name of the location as used by humans
  type: CodeableConcept [0..1]  // Type of function performed
  telecom: ContactPoint [0..*]  // Contact details of the location
  address: Address [0..1] // Physical location
  position: [0..1]  // The absolute geographic location
    longitude: decimal [1..1] // Longitude with WGS84 datum
    latitude: decimal [1..1]  // Latitude with WGS84 datum
    altitude: decimal [1..1]  // Altitude with WGS84 datum

To re-express this in terms of reusable data elements, each top-level attribute is converted to a separate data element. To distinguish an attribute from a data element, we use a leading capital letter (e.g., name [attribute] versus Name [data element]). In SHR, it looks like this:

Element:  Location
Based on: DomainResource
Concept:  MTH#C0450429
Description: "Details and position information for a physical place"
0..1  Name
0..1  Type
0..*  ContactPoint
0..1  Address
0..1  Geoposition // Re-named to distinguish from body position

Note that I’ve added a model meaning binding (the Concept keyword), so computationally, “location” is not just a word but a formal concept that can be related to other model concepts to support reasoning. (In SHR, codes are indicated using the syntax codesystem#code. In this case, MTH is an alias for the UMLS Metathesaurus code system, and C0450429 is the code from that system that means “location”.)

Each data element in the composite object Location now must be defined. Here’s how it is expressed in SHR:

Element:  Name
Concept:  MTH#C0027365
Definition: "The words or language units by which a thing is known."
Value:  string

Element: Type
Concept: MTH#C0332307
Description: "Something distinguishable as an identifiable class based on common qualities".
Value:  CodeableConcept

Element: ContactPoint
Concept: MTH#C2986441
Description: "An electronic means of contacting an organization or individual."
// data elements composing ContactPoint...

Element: Address
Concept: MTH#C1442065
Description: "A standardized representation of the location of a person, business, building, or organization."
// data elements composing Address...

Element: Geoposition 
Concept: TBD
Description: "The location on the surface of the Earth, described by a latitude and longitude (and optional altitude)."
1..1 Latitude
1..1 Longitude
0..1 Altitude

Element: Latitude
Concept: MTH#C1627936
Description: "The angular distance north or south between an imaginary line around a heavenly body parallel to its equator and the equator itself. Measured with with WGS84 datum."
Value: decimal

Element: Longitude
Concept: MTH#C1657623 
Description: "An imaginary great circle on the surface of a heavenly body passing through the poles at right angles to the equator. Measured with with WGS84 datum."
Value: decimal

Element: Altitude
Concept: MTH#C0002349
Description: "Height above sea level or above the earth's surface. Measured with WGS84 datum."
Value: decimal

The data element representation is more verbose (partly because I included model meaning bindings and full definitions), but actually, there are fewer entities defined in the data element approach:

  • Class-attribute representation (11 definitions): Location, name, type, telecom, ContactPoint, address, Address, position, longitude, latitude, altitude
  • Data element representation (9 definitions): Location,  Name, Type, ContactPoint,  Address, Position, Longitude, Latitude, Altitude

The difference? In the class-attribute representation, address and telecom are tautological attributes, expressing the same thing as their datatype, in slightly different words. The definition of Location.address is the same as Address; the definition of Location.telecom is the same as ContactPoint. The data element approach allows you (almost forces you) to eliminate tautological attributes.

When a class-attribute model is converted into a data element model, someone has to interpret the attributes across different objects and determine which attributes mean the same thing, and therefore can be represented with a single data element. We might find multiple “effectiveTime” attributes, but identical naming doesn’t assure identical semantics. Conversely, there could be attributes with different names (e.g., “validityInterval”) that do mean the same thing. Unless there are model meaning bindings on each attribute (usually not), the conversion to an accurate, non-redundant data element model requires hand-work. I did this for the entire CIMI model in the course of about 2 weeks for the September 2017 HL7 ballot cycle (what I learned during that process is another discussion that I might take up later).

What happens to the semantics of an attribute when it is liberated from its class context and converted to an independent data element? Location.type seems much more defined than the free-floating data element Type. That’s true. But when the data element Type is added to Location, as above, it literally means “Type of Location”, same as the attribute. But even that meaning is vague – if you do city zoning, then “Type of Location” could mean something entirely different than is meant in the FHIR context. What helps defines “type” – even within the location context – are the possible answers. In FHIR, the answer set for “type of location” includes codes from http://hl7.org/fhir/v3/RoleCode, which has answers like hospital, chronic care facility, addiction treatment center, and coronary care unit. We really aren’t done defining semantics until we associate a value set. Value set binding goes a long way to refine what we actually mean by a data element in a given context.

In SHR, we use the following syntax to apply a value set binding:

Element:  Location
Based on: DomainResource
Concept:  MTH#C0450429
Description: "Details and position information for a physical place"
0..1  Name
0..1  Type from http://hl7.org/fhir/v3/RoleCode if covered
0..*  Telecom
0..1  Address
0..1  Geoposition

The “from … if covered” syntax is the SHR’s equivalent of FHIR’s “extensible” binding. To say a value set is required, “if covered” is omitted.

The real payoff from a data element model is reuse. Data elements such as Name, Type, and Address can be used over and over. For example, we can use Type in other contexts, bound to different value sets:

Element:  Claim
...
0..1  Type from http://hl7.org/fhir/ValueSet/claim-type  // required binding
...

Element:  Encounter
...
0..*  Type could be from http://hl7.org/fhir/ValueSet/encounter-type  // example binding
...

Element: OralDiet  // from NutritionOrder
...
0..1  Type should be from http://hl7.org/fhir/ValueSet/diet-type  // preferred binding
...

Regardless of whether it is “type of claim,” “type of encounter,” or “type of oral diet,” the same data element, Type, works perfectly well. You don’t have to define Type more than once.

What about reversing the transformation, and going from a data element model to a class-attribute model? Because we have eliminated tautological attributes, some information necessary for the reversal is missing. We can, however, generate tautological names by taking the data element names, and writing them in leading lower case, e.g.:

Class: Location 
Parent class: DomainResource
Description: "Details and position information for a physical place"
Attributes:
  name: string [0..1] // Name of the location as used by humans
  type: CodeableConcept [0..1]  // Type of function performed
  contactPoint: ContactPoint  [0..*]  // Contact details of the location
  address: Address [0..1] // Physical location
  position: [0..1]  // The absolute geographic location
    longitude: decimal [1..1] // Longitude with WGS84 datum
    latitude: decimal [1..1]  // Latitude with WGS84 datum
    altitude: decimal [1..1]  // Altitude with WGS84 datum

That’s almost what we started with. Note that the position attribute is in-lined as a BackboneElement because in FHIR, we can’t create new complex data types. We don’t have to do that for ContactPoint and Address, because they refer to existing complex types.

So, we can go back and forth between the two forms, but when we use common data elements, redundant attribute definitions are eliminated, and they need to be regenerated in the other direction. Elimination of those redundancies is fundamentally why the data element approach is more parsimonious than the class-attribute approach, and why the advantage grows as the size of the model increases.

Advertisements

About Mark Kramer

Mark Kramer, Ph.D., is Group Leader for Healthcare Standards and Interoperability at MITRE Corporation. Mark led the standardization of hData, the first lightweight REST methodology for healthcare data exchange, now a normative standard of HL7. With HL7, ONC, and CMS, Mark focused on developing FHIR profiles unifying standards for clinical quality measures and decision support. Mark has also worked the US Department of Veterans Affairs on interoperable healthcare information exchange, and helped the DOD transfer radiological images of wounded warriors from Afghanistan and Iraq to regional hospitals. Prior to joining MITRE, Mark was Chief Technology Officer at Gensym Corporation, a leading provider of AI software, VP of Engineering at InterOPS, a network management company, CTO of Light Pharma, provider of capability improvement software and services to the pharmaceutical industry, and Associate Professor of Chemical Engineering at Massachusetts Institute of Technology.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s