on June 29, 2011 by Simon Cockell in 2011, Comments (0)

Records and situations. Integrating contextual aspects in clinical ontologies

Abstract

In order to achieve interoperability between different flavors of information model / ontology combinations to represent medical record entries we propose a comprehensive framework based on expressive description logics. Focusing on the context of clinical findings we demonstrate how the variability of clinical discourse can be logically represented. We emphasize the need for a clear categorial distinction between information entities and clinical objects, based on principles of Applied Ontology. An example OWL file can be downloaded from http://purl.org/steschu/BO2011.

Authors

Stefan Schulz1,2* and Daniel Karlsson3

1Institute for Medical Informatics, Statistics and Documentation, Medical University, Graz, Austria
2Institute of Medical Biometry and Medical Informatics, University Medical Center, Freiburg, Germany
3Department of Biomedical Engineering, Medical Informatics, Linköping University, Sweden

Introduction

 

SNOMED CT [IHTSDO 2011] claims to cover the entirety of the electronic health record by roughly 300,000 concepts. Although named and promoted as a terminology, SNOMED CT’s content development process is, inherently, also a process of ontology engineering, as its development is based on a logic-based framework, which enforces precise definitions (using the Description Logics [Baader 2007]). The dependability of entailments computed out of these definitions is crucial for whatsoever use case that requires more than just the provision of controlled reference terms.

A considerable amount of SNOMED CT concepts do not simply denote domain entities but represents rather complex clinical assertions [Schulz 2010]. Expressions like Family history unknown, Injury of head without lack of consciousness, Planned cholecystectomy are not clinical terms but propositions about complex situations. Thus they facilitate single-code representations for commonplace utterances which place one or more domain terms in (i) a physical or social context (the clinical situation to which the utterance refers) as well as (ii) an epistemic context (referring to what is known about this situation) [Bodenreider 2004]. SNOMED CT, which has inherited many of such expressions from one of its sources, CTV3, has reserved an own branch for them, named Situation in specific context.

Computer representations of health record content have motivated the development of information models for messages and documents in the frameworks of, e.g., HL7 Version 3 and openEHR archetypes, in order to express information about entities involved into the diagnostic and treatment process. Such information, by large, extends simple instantiation of concepts from a terminology or ontology, usually including a spatio-temporal specification of the patient and the time of the assertion. Further, this information specifies its sources and includes statements about plans, hypotheses, beliefs, and certainties. For instance, Planned cholecystectomy denotes a plan [Schulz 2011a], but not an operation which may or may not ensues. A diagnostic statement Pneumonia done by a general practitioner may be speculative and does not imply the existence of a real instance of pneumonia, as little as a patient’s mention of pneumonia in childhood can be taken at face value. Nevertheless such information needs to be documented.

It has been postulated that a clear boundary exists between ontologies of information and ontologies of reality . Whereas the latter represents the context-independent properties of types of entities health professionals refer to, the former describes the composition of information entities as in the electronic patient record.

In current information models and ontologies the distinction between the ontology of clinical entities and the ontology of observation of those clinical entities, is blurred. Users of both types of systems tend to be unaware of the very nature of things they represent. The resulting overlaps give rise to conflicting representations, which require sophisticated mitigation strategies. A mixed representation of the invariant properties of entities as they are (ontology), the implicit setting to which they are related, and the way they are seen / known / recorded is prevalent in most biomedical terminology systems. Unless these issues are dealt with, the deployment of informatics applications like decision-support systems will be hampered [Rector 2001].

We are neither very optimistic that the postulated boundary between ontologies and information models will be accounted for in future representational artifacts, nor that a final consensus can be reached where this line is to be drawn. It is realistic to expect that that the very same complex information (e.g. a clinician’s hypothesis of a stenosis of the left carotid artery) is represented to different proportions in clinical ontologies and clinical information models, therefore hampering semantic interoperability [Garde 2007]. We therefore propose a different strategy. Instead of defending a “canonic” division between ontologies and information models we recommend a common ontological framework which helps us to reach interoperability between different representational flavors .

Methods

Representational language

We use the ontology web language OWL-DL [OWL2 2009], based on description logics (SNOMED CT uses an inexpressive variant known as EL), in which classes are arranged in taxonomic hierarchies. This means that all members of a class Gallbladder (i.e. all individual gallbladders) are also members of the parent class Digestive organ, expressed by Gallbladder subclassOf Digestive Organ. The meaning of OWL classes can be further described by the properties all their members have in common. In the following example, we employ ‘and’, together with the existential quantifier (‘some’). For example, the expression InflammatoryDisease and hasLocation some Gallbladder extends to all instances that both instantiate Inflammatory disease and are further related through the relation hasLocation to some instance of Gallbladder. This example actually gives us both the necessary and the sufficient conditions needed in order to fully define a class, e.g.:

Cholecystitis equivalentTo InflammatoryDisease and hasLocation
some Gallbladder. SNOMED CT, is so far limited to simple constructors as summarized in Table 1.

Table 1. SNOMED CT’s logical constructors, corresponding to the description logics
EL

Constructor Meaning Example
and Intersection
between E and F
Acid and
Organic Molecule
some Existential restriction of the relation r by G partOf some Liver
subclassOf B subsumes A Liver subclassOf Organ
equivalentTo C and D are
equivalent
OrganicAcid

equivalentTo Acid and
OrganicMolecule

Note that is not possible to express value constraints (e.g. hasLaterality can only have the values Right and Left), and (ii) negations, such as Injury without infection. Such restrictions allow the definition of simple terms, but they impede any more complex terms or statements to be compositionally represented. Table 2 provides additional constructors required for representing more complex assertions.

Table 2. Additional description logics (DL) constructors

DL
Constructor
Meaning Example
not Negation of A Base and not Acid
only Value restriction of the relation r by the filler G Hand subclassof
hasLaterality only (Left or Right)
or Union of A with B
max/min/ exactly INT Cardinality restriction Object and bearerOf exactly 1 Color

Ontological foundations

We subscribe to the tenet of realist ontologies [Klein 2010], which – though not uncontroversial – have gained ground in the fields of biology and medicine, and which we defend primarily by practical reasons. One guiding principle is the use of well-defined categorial divisions such as provided by upper level ontologies. Another principle is to consistently interpret terms and codes as denoting classes of individual objects, grouped together according to the properties they have in common. Our upper-level distinction discriminates (among others) between the categories:

  1. Living organism, normally the subject of care, i.e. the patient, a human (or an animal in veterinary medicine).
  2. Clinical condition: (mostly abnormal) processes, states, dispositions, qualities and material entities, which are reportable in the context of the medical records. They are mainly related to (parts of) the subject of care, but also to specimens, derived materials, and to other persons. The most generic relation we use is hasLocus, which encompasses parthood, location, and inherence.
  3. Clinical situation: the sum of all processes that make up a treatment episode, as suggested by [Rector 2008].
  4. Information artifact: an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity [Ruttenberg 2010]. Electronic patient records and their components are typical instances of information artifacts. We further point out record entries, as atomic parts of the electronic patient record, as not further divisible piece of structured clinical discourse.

In the following we will demonstrate how typical representations of clinical statements, for which different combinations of information model / ontology combinations had been proposed, can be expressed by a common framework.

Results

Representation of finding contexts

We will concentrate on a generic representation of an atomic clinical finding, as illustrated by the following template:

Attribute Value
Finding Context
Disorder *Disorder_D*
Location *BodyPart_B*
Laterality *Laterality_L*

We propose the following DL formalization:


RecordEntryAboutDisorder_D equivalentTo


RecordEntry and

(isAbout only (Situation and

(includes some   (LivingHuman and
(bearerOf
SubjectOfRecordRole) and

(locusOf some (*Disorder_D* and

(hasLocus some
(*BodyPart_B* and bearerOf some

*Laterality_L*)))))))) (1)

The formalized pattern exposes several entities which are not explicit in the attribute-value schema, such as a clinical situation, a record entry, a human and the role he/she plays, as well as the relations between them. Note that this pattern states the existence of a record entry, but not of a situation it refers to. As argued above, this is an important aspect, as medical records may express beliefs or hypotheses which not necessarily correspond to the reality of the patient.

In order to refer to OWL classes for which the existence of members cannot be asserted we use a modeling pattern recently proposed by several authors [Hastings 2011, Schulz 2011b], using the universal quantifier “only”, thus opposing to the practice of the Information Artifact Ontology [Ruttenberg 2010]. It can therefore be refined in terms of epistemic contexts such as “known present” or “known absent”.

Let us instantiate this pattern with a record entry about a stenosis of the left carotid. In the first example post-coordination is done at the information model level in an attribute-value structure:

Attribute Value
Finding Context (undefined)
Disorder Stenosis
Location Carotid artery
Laterality Left

RecordEntry and

(isAbout only (Situation and

(includes some (LivingHuman and
(bearerOf
SubjectOfRecordRole) and

(locusOf some (Stenosis and

(hasLocus some (CarotidArtery and

bearerOf some LeftLaterality)))))))) (2)

Alternatively, the same scenario is described with pre-coordination at the ontology level:

Attribute Value
Finding Context (undefined)
Disorder Stenosis of the left carotid artery
Location
Laterality

 

RecordEntry and
(isAbout only (Situation and

(includes some   (LivingHuman and
(bearerOf
SubjectOfRecordRole) and

(locusOf some StenosisOfLeftCarotidArtery))))) (3)

Given the definition

StenosisOfLeftCarotidArtery equivalentTo

Stenosis and (hasLocus some (CarotidArtery and

bearerOf some LeftLaterality)) (4)

a description logics reasoner can state the equivalence of expressions (2) and (3).

If the reported disorder is known to be present, the template is refined as follows:

Attribute Value
Finding Context Known present
(…) (…)

In OWL this is encoded in the following two equivalence statements:


ConfirmedRecordEntryAboutDisorder equivalentTo

RecordEntryAboutDisorder and isAbout some Situation

If the mentioned disorder is known to be absent, the template is modified as follows:

Attribute Value
Finding Context Negated
(…) (…)

We propose the following OWL encodings for this:

RecordEntryAboutAbsenceOfDisorder_D equivalentTo


RecordEntry and

(isAbout some (Situation and

(includes some   (LivingHuman and
(bearerOf
SubjectOfRecordRole) and

not (locusOf some (*Disorder_D* and

(hasLocus some
(*BodyPart_B* and bearerOf some

*Laterality_L*)))))))) (5)

Representation of other contexts and typical clinical statements

We briefly sketch how to account for other contexts. The Subject Relationship Context, according to SNOMED CT, is asserted if the referred situation does not apply to the patient but to a family member. Here we substitute Subject-OfRecordRole by other roles, e.g. ParentRole. A Temporal Context can be specified, at the instance level, by additional references to timestamps. The default temporal context is the situation about which the record entry is about. For abstract DL representations, amenable for DL queries, we can introduce qualitative modifiers, such as substituting Situation by PastSituation. If we want to include a reference to a future Situation we must avoid the ‘known present’ context, as this is, by definition, disjoint from a future context. More detail would be required for an analysis of Procedure contexts. In [Beale 2010] an extensive value list for procedure modifiers is given, including heterogeneous values such as {action status unknown, stopped before completion, rejected, under consideration} to name just a few out of 46 items. Practically all of them do not modify procedures but procedure plans. Just as with findings, an ontologically precise representation would require to clearly distinguish between record entries and real procedures. Again, value restrictions are used to avoid false existential statements as we find in the current version of SNOMED CT. The procedure context is epistemic in relation to the situation the record entry is about, but not to the record entry itself. The record entry being an information entity is the result of some observation or evaluation procedure. Still, there are several open questions, e.g. on how to represent partially completed or aborted procedures. Other types of record entries, such as lab results or statements about signs and symptoms, may also be represented using this schema. Lab results are about some quality inherent in the patient, and they are the result of some observation procedure. However, the relation between the resulting value and the quality the result is about mostly is not straightforward, due the inherent uncertainty of the observation procedures. Still, in most contexts, it is safe to infer an inherent quality from an observation result.

Record entries making statements about relations between signs and symptoms and disorders, e.g. statements about causality, is still another area for consideration. We find it is still an open question whether e.g. causality is inherent in the situation or in the human assessment of that situation.

Evaluation based on competency questions

For a preliminary evaluation using competency questions expressed as DL queries we refer to the example OWL file at http://purl.org/steschu/BO2011. One query retrieves all records for which a disease of a certain type was referred to but not confirmed. Another query shows how a confirmed assertion of a ‘situation without stenosis’ rules out that the situation contains a stenosis of the left carotid artery. Equally important is an assessment of the computational properties of the approach. Rich description logics with the constructors in Table 2 is known for its computational complexity and lack of scalability. Benchmark simulations are required to ascertain to what extent an acceptable performance can be reached when scaled up towards clinically interesting dimensions.

Conclusion

Formally representing statements which refer to units of unclear reference is a common problem in both scientific and clinical discourse. As proposed for the representation of e.g. chemical entities of unclear existence we here present how to express the reference to a certain disease in a patient record in hypothetic, affirmative and negative context. We demonstrate how semantic equivalence between more ontology-oriented and more information-model-oriented encodings can be proven. Our approach constitutes a moderate first step towards the ambitious goal of interoperable representations of health records using a common logical framework grounded in expressive ontologies. This expressiveness is a major challenge and a potential obstacle to implementation due to the known complexity of rich description logics with negation and value restrictions. Additionally, new ground needs to be broken by leveraging the use of description logics as a query language for clinical queries.

Acknowledgements

This work was supported by the EC project “DebugIT” (FP7-217139).

References

IHTSDO (Intern. Health Terminology Standards Development Organisation). Systematized Nomenclature of Medicine – Clinical Terms. http://www.ihtsdo.org/snomed-ct .

Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF, editors. The Description Logic Handbook. Theory, Implementation, and Applications (2nd Edition). Cambridge: Cambridge University Press, 2007.

Bodenreider O, Smith B, Burgun A (2004). The Ontology-Epistemology Divide: A Case Study in Medical Terminology. Int. Conf. on Formal Ontology and Information Systems (FOIS 2004). Amsterdam: IOS-Press, 185-195.

Rector A, Johnson P, Tu S, Wroe C, Rogers J. Interface of inference models with concept and medical record models. In: S Quaglini, P Barahona and S Andreassen (eds) Proc Artificial Intelligence in Medicine Europe. 2001: 314-323.

Garde S, Knaup P, Hovenga E, Heard S. Towards semantic interoperability for electronic health records. Methods of Information in Medicine 2007; 46(3): 332-343.

Hastings J, Batchelor C, Neuhaus F, Steinbeck C. What’s in an ‘is about’ link? Chemical diagrams and the Information Artifact Ontology. International Conference on Biomedical Ontologies, 2011, Accepted for Publication.

Klein GO, Smith B. Concept Systems and Ontologies: Recommendations for Basic Terminology. Transactions of the Japanese Society for Artificial Intelligence. 2010;25(3):433-441.

OWL2 Web Ontology Language. W3C. (2009) http://www.w3.org/TR/owl2-overview/

Rector AL, Brandt, S. Why Do It the Hard Way? The Case for an Expressive Description Logic for SNOMED. Journal of the American Medical Informatics Association 2008; 15: 744–751.

Ruttenberg, A., Courtot, M., The IAO Community: The Informa-tion Artifact Ontology (2010)
http://code.google.com/p/information-artifact-ontology/

Schulz S, Schober D, Daniel C, Jaulent MC. Bridging the semantics gap between terminologies, ontologies, and information models. Studies of Health Technology and Informatics 2010;160 (Pt 2):1000-1004.

Schulz S, Cornet R, Spackman K. Consolidating SNOMED CT’s ontological commitment. Applied Ontology 6 (2011a) 1-11 DOI 10.3233/AO-2011-0084

Schulz S, Brochhausen M, Hoehndorf R. Higgs bosons, mars missions, and unicorn delusions: How to deal with terms of dubious reference in scientific ontologies. International Conference on Biomedical Ontologies (2011b), accepted for Publication.

No Comments

Leave a comment

Login