“Of Mice and Men” Revisited: Basic Quality Checks for Reference Alignments Applied to the Human-Mouse Anatomy Alignment

Abstract

Identifying relationships between hitherto unrelated entities in different ontologies is the key task of ontology alignment. An alignment is either manually created by domain experts or automatically by an alignment system. In recent years, several alignment systems have been made available, each using its own set of methods for relation detection. To evaluate and compare these systems, typically a manually created alignment is used as so-called reference alignment. Based on our experience with several of these reference alignments we derived requirements and translated them into simple quality checks to ensure the alignments’ reliability and also their reusability. In this paper, these quality checks are applied to a standard reference alignment in the biomedical domain, the OAEI Anatomy Track reference alignment.

Authors

Elena Beisswanger* and Udo Hahn

Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany

Introduction

In knowledge-intensive domains such as the life sciences, there is an ever-increasing need for concept systems and ontologies to organize and classify the large amounts of clinical and lab data and to describe it with value-adding meta data. For this purpose, numerous ontologies on different levels of coverage, expressivity and formal rigor have evolved that, from a content point of view, complement each other and partially even overlap sometimes. To facilitate the interoperability between information systems using different ontologies and to detect overlaps between them, ontology alignment has become a crucial task.

Since the manual alignment of ontologies is quite labor-expensive and time-consuming, alignment tools have been developed that can automatically detect correspondences between entities in different ontologies as, for example, equivalentClass and subClassOf relations between ontology classes. Many different approaches to and techniques for ontology alignment have been proposed up until now, and dedicated scientific workshops have been organized to accelerate the progress in this field. In 2005, the Ontology Alignment Evaluation Initiative (OAEI) initiated a series of annual evaluation events to monitor and compare the quality of different alignment systems. A somewhat broader view on the evaluation of semantic technologies is promoted by the Semantic Evaluation At Large Scale (SEALS) project that started in 2009. An open source platform is under development to facilitate the remote evaluation of ontology alignment systems and other semantic technologies in terms of both, large-scale evaluation campaigns but also ad hoc evaluations of single systems. Amongst others, the platform provides a test data repository, a tools repository, and a results repository for the evaluation and comparison of systems.

The most valuable content of the SEALS platform’s test data repository and also the core of the OAEI campaigns are manually created or at least manually curated reference alignments which constitute the ground truth against which alignment systems are to be evaluated. Clearly, the quality of these reference alignments is of paramount importance for the validity and reliability of the evaluation results.

For the evaluation of our own ontology alignment system, we were also looking for trustable test data (ontologies and reference alignments). Some data sets we inspected have been used for several years in the OAEI campaigns, or have already been integrated in the SEALS test data repository. Others have just recently been published and have not been used in any public challenge up until now. Notwithstanding the enormous efforts that have gone into the development of such resources, our inspection of many different data sets revealed a number of content-specific shortcomings and technical deficiencies. Hence, we decided to formulate a list of basic quality checks, summarizing our observations. We propose to apply these checks to any given alignment as a kind of minimal reliability test before it is used as a reference standard in any evaluation.

In the remainder of this paper, we will first introduce the basic requirements we have defined and then we will apply them to one of the standard data sets used in the yearly OAEI campaigns, the anatomy reference alignment. Finally, we will discuss how the application of the checks to this data set leads to an improved version of both, the reference alignment itself and the input ontologies.

Basic Quality Checks for Reference Alignments

An alignment consists of a set of correspondences between entities from two different ontologies. In this paper, we focus on correspondences between ontology classes only. In this case, a correspondence consists of a pair of classes (one class from the first, the other from the second input ontology) and the relation that, according to the creator of the alignment, holds between these classes. Most alignments that have been proposed so far are only concerned with equivalentClass and subClassOf relations.

The usefulness of a manually created or curated alignment as reference data for the evaluation of ontology alignment systems depends on various parameters. The following quality checks address fundamental reliability and reusability aspects:

Check 1: Is the alignment provided together with the input ontologies on which it is based and are the input ontologies provided in the correct release versions?

Check 2: Are the classes to which correspondences in the alignment refer still available in the provided versions of the input ontologies?

Check 3: If classes are referred to by URI-label pairs in the alignment, do the URI-label pairs still persist in the available versions of the input ontologies?

Check 4: Is the alignment made available in a machine-readable format?

Check 5: Are ontology classes in the alignment referred to in terms of unique identifiers (e.g., URIs)?

Check 6: Are the relations holding between classes specified explicitly for all correspondences in the alignment?

Check 7: If there are cases in which a class from ontology O1 is linked to several (target) classes in ontology O2 by equivalentClass relations, are the target classes in O2 linked by equivalentClass relations as well?

Check 8: Are pairs of classes with identical labels linked by an equivalentClass relationship in the alignment?

Check 9: How many non-trivial correspondences (ones that cannot be detected via the identity of class labels after applying a simple term normalization step) occur in the alignment?

The first six quality checks focus on the (re)usability of an alignment as reference for the evaluation of alignment systems. Checks 1 and 2 test whether the correspondences contained in the alignment can be found at all by the alignment systems based on the available release versions of the input ontologies (imagine cases where, e.g., classes are deleted from an ontology, and, consequently correspondences in the reference alignment referring to these classes cannot be reproduced anymore). Check 3, which tests for label changes, is targeted at the tacit evolution of the meaning of a class. In particular for light-weight ontologies lacking thorough formal class definitions, verbal labels virtually carry the entire meaning of a class and, hence, a new label might indicate a subtle change of the meaning of an ontology class requiring further scrutiny. Of course, if check 1 is positive, checks 2 and 3 can be skipped.

Check 4 is concerned with the accessibility of an alignment, while check 5 aims at finding out whether the references to classes are unique (imagine the case where local names or labels would be given as class references, then those references might be ambiguous). Check 6 is meant to assure that the relationships asserted between the classes by the alignment creator are made explicit (according to our experience some alignments are published without a clear distinction between different types of semantic relations).

Since in an alignment a class from one ontology should be mapped to at most one class in the other ontology by an equivalentClass relation (or, if it links to several classes, these should be marked as being equivalent themselves), check 7 may provide valuable hints for implicit class equivalences in the input ontologies, but also for redundant or even mistaken correspondences in an alignment.

Check 8 picks on the observation that when two ontologies are aligned, especially when they show a strong conceptual overlap, label identity between classes is a strong hint for class equivalence. Checking for label identity may help in detecting missing correspondences in an existing alignment.

Check 9 incorporates evidence we found that it often makes sense to evaluate an alignment also against the non-trivial subset of a reference alignment to see how much better the ontology alignment system does than a simple exact string matcher. Certainly, a large proportion of trivial correspondences in an alignment decreases its value as reference alignment, although trivial correspondences do play a certain role as anchors for advanced alignment strategies.

Anatomy Use Case

To illustrate the potential of the proposed quality checks we now apply them to one of the standard reference alignments in the biomedical domain, viz. the one used in the anatomy track of the OAEI campaign since 2007.

OAEI Anatomy Track Reference Alignment

The reference alignment used in recent years in the OAEI anatomy track links classes from the anatomy branch of the NCI Thesaurus (describing human anatomy) to the mouse adult gross anatomy ontology (MA) based on the Anatomical Dictionary for the Adult Mouse [Hayamizu et al., 2005]. This alignment was created in a combined manual and automatic effort (the automatic alignment exploited lexical and structural techniques) followed by an extensive manual curation step [Bodenreider et al., 2005].

The version of the alignment used in the OAEI 2010 anatomy track comprises 1,520 correspondences linking pairs of classes. The vast majority denotes equivalentClass relations (few subClassOf relations were added by the anatomy track organizers after the original alignment had been published).

Applying the Quality Checks

We found the following results when we applied the nine basic quality checks described in Section 2 to the anatomy reference alignment.

Check 1. In the OAEI anatomy track, the reference alignment is used together with a version of the NCI Thesaurus anatomy branch as from 2006-02-13, and a version of the MA as from 2007-01-18 (both in OWL format), while the alignment itself was created based on the NCI Thesaurus release version 04.09a (from 2004-09-10) and the MA version as from 2004-11-22 [Bodenreider et al., 2005]. Obviously, different release versions of the input ontologies have been mixed for the creation of the reference alignment and for running the anatomy track.

Check 2.
All classes involved in the alignment are still contained in the new versions of the input ontologies used in the anatomy track. Hence, class consistency is preserved.

Check 3.
Although in the version of the reference alignment used in the OAEI anatomy track classes involved in correspondences are specified by URIs only (and no class labels), we received from the curator of the alignment the original mapping table on which the alignment was based. This mapping table lists both, URIs as well as the labels of class pairs. We tested whether the URI-class label combinations are still valid in the new versions of the input ontologies and found 85 NCI classes and 34 MA classes for which the labels had changed. A manual inspection revealed that in most cases labels had been made more precise in the new ontology versions (e.g., the label of class NCI_C12443 was changed from Cortex to Cerebral Cortex), were replaced by synonyms (e.g., the label of class NCI_C33178 was changed from Nostril to External Nare), or small spelling or syntax modifications were inserted (e.g., the label of class MA_0000475 was changed from aortic arch to arch of aorta), while the meaning of the classes remained stable and the mappings were still valid. However, the check also pointed us to six mistakes in the alignment that seem to have been caused by shifts in the mapping table. For example, the class NCI_C49334 brain white matter was mapped to MA_0000810 brain grey matter and NCI_C49333 brain gray matter to MA_0000820 brain white matter.

Check 4. The reference alignment is distributed in the Alignment API format [Euzenat, 2004] and thus can easily be accessed and used via the JAVA-based Alignment API.

Check 5. Classes are referred to by class URIs.

Check 6. For each correspondence, the relation holding between the two classes involved is explicitly specified.

Check 7. Looking at the equivalentClass relations expressed in the anatomy alignment, we found 17 NCI classes being linked to more than one MA class (three MA classes in one case and two in all other cases) and 22 MA classes being linked to more than one NCI class (namely two). We checked for equivalentClass relations between the respective target classes in the ontologies, but found none. Thus we manually inspected all cases of multiple mapping targets. We found that in 20 cases, the target classes in fact seem to be equivalent classes that are just not yet marked appropriately in the given versions of the respective ontologies. Cross-checking with the most recent versions of the input ontologies revealed that from this set 12 target class pairs from the NCI meanwhile have been merged. For another three cases we proposed a merger to the NCI team (for example, for the classes NCI_C33708 suprarenal artery and NCI_C52844 adrenal artery) that already have been accepted and will be considered for the next version release. Furthermore, we found 18 cases in which the target classes were linked by relations other than equivalentClass in the respective ontologies. In 12 cases the target classes were linked by partOf relations, in four cases by subClassOf relations, and in two cases they were treated as sibling classes. We inspected these relations and judged the majority of them as being correct. This allowed us to draw the conclusion that for the classes concerned only the mapping to one target class is correct, while the others should be removed from the alignment. In a few cases, we considered the relation that we found between target classes as being incorrect.

Check 8. After lowercasing all labels and removing underscores we found 14 class pairs between the NCI thesaurus anatomy branch and the MA ontology with identical labels that were not linked by an equivalentClass relationship in the reference alignment. A manual inspection revealed that in two cases the respective classes, in fact, referred to slightly differently defined concepts. For example, the classes MA_0000323 and NCI_C12378 share the label gastrointestinal system. However, the MA class fits the usual understanding of gastrointestinal system comprising the stomach, intestine and the structures from mouth to anus, while the NCI class does not, but includes, in addition, accessory organs of digestion, such as the pancreas and the liver. (The NCI anatomy branch comes with another class, NCI_C22510 gastrointestinal tract, which corresponds to the class MA_0000323). However, in the remaining 12 cases the equivalentClass relationships between classes seem to be effectively missing in the alignment. An example is the class pair (NCI_C33460, MA_0002730) sharing the label renal papilla.

Check 9. We found that 937 out of 1,520 correspondences (62%) in the anatomy alignment are trivial ones.

Results and Discussion

The result from check 2 guarantees that, at least from a formal point of view, all correspondences in the anatomy reference alignment can be found by an automatic alignment system. This check was compulsory, since (given the result of check 1) more recent versions of the input ontologies had been used in the anatomy track than the original alignment is based on. Obviously, it could have been the case that in newer versions classes had been removed or made obsolete.

The results of checks 4, 5 and 6 reflect the fact that the anatomy alignment serves as reference data set in a public evaluation campaign. Other than some more recent alignments, that we have already reviewed as well, it is published in a community-accepted standard format and classes and relations are referred to in a well-defined way.

Check 9 revealed that only one third of the correspondences in the alignment are non-trivial, i.e., they cannot be detected by simple string matching tools. Since the alignment is quite large with respect to the number of correspondences, this makes it still a valuable evaluation data set. However, the large percentage of trivial correspondences must be considered when interpreting the results that alignment systems achieve on this data set, or when comparing these results to those achieved by the same systems on different data sets.

By far the most interesting results we achieved analyzing the outcomes of checks 3, 7 and 8. In total, these checks helped us detect 30 erroneous correspondences that need to be removed from the reference alignment (this accounts for 2% of the complete alignment and 5% of the non-trivial subset) and 14 new ones that we propose to add to the alignment. The list of invalid and newly proposed correspondences has already been communicated to the anatomy alignment curators. In agreement with the organizers of the OAEI anatomy track the confirmed changes will be considered in the 2011 OAEI campaign.

An issue that we did not focus on in this paper is checking for the logical consistency of an alignment. With regard to this issue, we refer the reader to related work by Meilicke et al. [2009], who propose a Web-based tool that supports the human alignment curator in detecting and solving conflicts in an alignment by capitalizing on logical reasoning.

Conclusions

We presented nine basic quality requirements and associated checks intended to assist developers and curators of ontology alignments to create and maintain both, reliable and easy to (re)use references for the evaluation of alignment systems. As we could show – using the example of the anatomy reference alignment – very basic checks can already help in detecting both, incorrect correspondences that should be removed from an alignment and missing correspondences that should be added. We also observed that the tests can reveal shortcomings in the input ontologies themselves, such as missing or invalid relations between classes.

The set of basic checks presented in this paper should be seen as a first, rather simple, yet effective step in a multi-stage procedure of extensively checking the quality of an alignment before it is used as a reference in an evaluation setting. Our work is thus targeted at the sanity of comparison standards, an issue of prime importance for any conclusion we can draw from the outcome of any evaluation campaign. We plan to complement the basic checks by more advanced logical consistency checks and more elaborate considerations on alignment quality, as proposed, e.g., by Joslyn et al. [2009], checking for the structural preservation of semantic hierarchy alignments.

Acknowledgements

We would like to thank Terry Hayamizu (curator of the anatomy alignment) and Christian Meilicke (OAEI anatomy track organizing committee) for their active collaboration and Stefan Schulz (Graz University of Medicine, Austria) for assisting us in anatomy questions. This work was funded under BMBF grant 0315581D as part of the JenAge project.

References

Bodenreider, O., Hayamizu, T. F., Ringwald, M., de Coronado, S., and Zhang, S. (2005). Of Mice and Men: Aligning mouse and human anatomies. Proceedings of the 2005 AMIA Annual Symposium, pp. 61–65.

Euzenat, J. (2004). An API for ontology alignment. Proceedings of the 3rd International Semantic Web Conference, pp. 698712.

Euzenat, J. and Shvaiko, P. (2007). Ontology matching. Springer-Verlag.

Hayamizu, T. F., Mangan, M., Corradi, J., Kadin, J., and Ringwald, M. (2005) The Adult Mouse Anatomical Dictionary: A tool for annotating and integrating data. Genome Biology, 6(3): 1-8.

Joslyn, C., Paulson, P., and White, A.M. (2009). Measuring the structural preservation of semantic hierarchy alignment. Proceedings of the 4th International Workshop on Ontology Matching at the 8th International Semantic Web Conference.

Meilicke, C., Stuckenschmidt, H., and Svab-Zamazal, O. (2009). A reasoning-based support tool for ontology mapping evaluation. Proceedings of the 6th European Semantic Web Conference (Demo-Paper).