on October 4, 2012 by Phillip Lord in 2012, Comments (0)

Query Enhancement through the Practical Application of Ontology: The IEDB and OBI

Abstract

Ontologies categorize entities, provide standardized definitions, and enforce specific relationships between the entities that the specific ontology describes. Previously, we described how the Immune Epitope Database and Analysis Resource (IEDB, www.iedb.org) used the Ontology for Biomedical Investigations (OBI) to represent immune epitope mapping experiments. Here, we present how the IEDB utilized this OBI representation in order to provide enhanced database search functionality. We have applied a simple method to incorporate the benefits of a formal ontology directly into the user web interface, resulting in an improved user experience with minimal changes to the database itself. The integration is easy to maintain, provides standardized terms and definitions, and presents the existing database content in a manner more accessible to the end users.

Authors

Randi Vita*, Jason A. Greenbaum, Alessandro Sette, OBI consortium, Bjoern Peters

Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA

Article

The IEDB is a free resource that makes published experimental data regarding the recognition of epitopes by immune adapters available in a searchable manner. This is accomplished by a team of curator scientists who read relevant manuscripts and extract the data using a consistent approach following established guidelines [1]. In order to represent the data in a systematic and interoperable manner, the IEDB actively contributes to OBI, an integrated ontology for the description of biological and clinical investigations [2]. This ontology represents the experimental design, protocols, materials, and instruments used in biomedical investigations, the data generated and the type of analyses performed. Previously, we utilized OBI to model and export the immunological experiments described in the IEDB into OBI, allowing reasoning and driving schema redesign [3, 4]. Here we describe the experiences we encountered while integrating OBI into the IEDB’s search interface. We believe this practical use of an ontology enhances search capabilities, increases consistency in data curation, avoids duplicates, and improves documentation.

To represent the experimental data described within a publication curated by the IEDB, the type of assay that was performed must be consistently and intuitively described. The IEDB first began by surveying relevant epitope publications and itemizing the types of experiments that were encountered. Some assay types were straightforward and easily described, e.g. tetramer staining, while others were more complex, such as those in which the effect of epitope interventions on disease development in mice is assessed. A list of assay types was produced and used as a controlled vocabulary for curators to select from when adding new experiments to the database (Figure 1a). However, this list approach was quite limited and did not capture the inherent relationships between different assay types. For example, the same method may be used to measure a different aspect of the same antibody:antigen binding event, as when capturing different binding constants, such as equilibrium association and disassociation constants, both by surface plasmon resonance. Similarly, different methods such as ELISA and ELISPOT assays can be used to measure the same kind of biological effect, such as IFN-g production by T cells. Thus, the different assay types utilized by the IEDB have relationships to each other that were not being adequately described.

In order to model the IEDB’s data using OBI, each assay type described in the database had to be incorporated into the ontology. This was an iterative and collaborative process between the IEDB and OBI, requiring careful consideration of term names, definitions, and relationships between terms. In OBI, an ‘assay’ is defined as: ‘A planned process with the objective to produce information about an evaluant having the following logical definition:

Class: assay

EquivalentTo:

has_specified_input some (material_entity and (has_role some ‘evaluant role’))

has_specified_output some (‘information content entity’ and

(‘is about’ some (continuant and (has_role some ‘evaluant role’)))).’

Thus, for each assay type, we identified the input materials, evaluants, outputs, and the information that was being detected (Figure 1b). Further, metadata such as a definition and an example are also required. In addition, these immunological assays measure a biological process or some readout that is proxy for that process having occurred. Therefore, each assay was also linked to a Gene Ontology (GO) [5] ‘biological process’ that the ‘information content entity’, the data generated, is_about. Other external ontologies were also referenced as appropriate, such as the Unit Ontology (UO) [6] when an assay has a specified readout of known units, such as “nM”. Term requests to outside ontologies were sometimes required; for example, if the biological process the assay describes was not already present within GO. The production of certain cytokines by T cells, including CCL4, CCL5, and CCL9, and antibody activities such as immune complex formation and neutralization of antigen are examples of terms that were added to GO based on our requests.

To generate the term names, logical definitions, and metadata for the hundreds of assay types used in the IEDB, the Quick Term Template (QTT) ontology tool was utilized [7]. This tool allows for upload of a large number of new terms into an ontology via an excel spreadsheet. The QTT method allowed for efficient construction of standardized term names, logical restrictions, and definitions.

The process of converting each IEDB assay type into an OBI term was followed by reasoning, which in some cases identified redundant assay types. For example, because new assay types were added to the previous assay list as they were encountered in the literature, one assay measuring ‘chemokine (C-X-C motif) ligand 9 release’ and one measuring ‘MIG release’ were separately added to the list. The process of creating logical definitions for these assays based on GO biological processes followed by reasoning identified that the two assays were logically equivalent as the two terms are in fact referring to the same cytokine.

Having to clearly specify what makes two assays different based upon the biological processes they measure or the technique applied clarified curation rules, because an exact definition allows a meaningful discussion of which type of assay is actually used in an investigation instead of arguing about labels for assays without definitions. In addition, we found that the hierarchical structure of an ontology is better suited for curation rather than having to pick from a flat list of assays. If the details of a manuscript are not specific enough for a curator to decide which of two assays to pick, from a flat list, the curator can now select the parent class of those assays instead. This improves curation consistency, as in the previous list format, parentage for assays did not exist, and curators would arbitrarily select one of the two available choices instead.

Reasoning upon the ontology produces an inferred version of the hierarchy that allows for assays to be in multiple locations within the tree. For example, all assays that use surface plasmon resonance will appear under the term ‘surface plasmon resonance assay’, regardless of what they measure (KA, KD, kon, etc), while any surface plasmon resonance assay that also measures a KA will additionally appear under the term ‘equilibrium association constant (KA).’ The task of migrating the ontological tree format into the IEDB’s search interface was straightforward. First, each of the existing IEDB assay types was mapped to an OBI identifier. The assay finder web application in the IEDB is reading the OBI definitions in OWL using the Jena Web Ontology Language (OWL) application programming interface (API). The displayed assay tree in the IEDB finder is produced by displaying only the subset of the ontology that contains the assay types that are children of the parent term ‘immune epitope assay’ which defines the scope of the IEDB. The IEDB finder application is similar to tools provided by NCBO bioportal which allow user interfaces to be populated using ontologies housed in bioportal.

This new ontological presentation of the assay types provides a hierarchy that we believe enhances search capabilities. End users are now able to view all of the previously curated data in a more meaningfully organized and intuitive manner (Figure 1c). Formerly, end users were not able to select all assays that shared a parent, such as all assays that measure KA. Now one may select all of a higher level of assay type, such as ELISA, or refine their criteria to a subset (ELISA with binding constant) or single assay type (ELISA with KD). Additionally, search options now include both what is measured (GO biological process) and how it is measured (OBI assay type). Furthermore, new content is being made available as each assay type now links, via the OBI ID, to its metadata provided by OBI to give end users the option of viewing formal definitions and examples and learning more about the term’s relationships.

We endeavored to further improve the end user experience by creating ‘iedb alternative terms’ to provide the names commonly used by immunologists to describe assay types, instead of the formal ontological names used as the OBI term names, which often are long in order to clearly identify an assay among the vast scope covered by OBI. The IEDB alternative terms, on the other hand, can be much shorter as IEDB users are already expecting to only be presented with epitope related assays. Thus the benefits of a formal ontology (i.e., the standardized definitions, hierarchical tree, and term relationships) are provided while we avoid confusing end users with ontological jargon.

Formal representation of all of the IEDB’s assay types within OBI permits export of the curated data in a semantically expressive format. A process has been implemented to export the content of the IEDB into an OWL file using the Perl Template Toolkit. This is followed by importing the file into the Virtuoso Triple Store to allow for querying the data using the SPARQL query language. Currently under active development, this URL will be made available on the IEDB website in the near future. The benefit of this format is that by importing ontologies that are used in the OBI term definitions, queries across resources can be performed. For example, any assay method used to measure a certain GO biological process can now be queried upon using the complete GO hierarchy. For instance, it becomes possible to query for assays that measure ‘chemokine responses’ and distinguish them from other ‘cytokine responses’ even though the IEDB does not distinguish which cytokines are chemokines. We envision that the SPARQL endpoint will initially be used as a proof of concept by the relatively few users familiar with this technology. It will allow us and others to test if biological queries requested by end users can indeed be better answered by this approach. Queries that are deemed useful will be integrated into the standard IEDB web interface which does not require our end users to have any knowledge of the ontology.

Another significant future benefit of integration of a formal ontology into the IEDB is the creation of rule based validation. The logical restrictions and definitions of terms in OBI can be used to formulate curation rules that can be enforced via the ontology. For instance, if an assay type is defined in OBI as having a specified input of some virus, then the curator must enter an input variable, the antigen, that is a virus. These rules can be extended to the external ontologies, such as GO. For example, if GO defines a certain cytokine as being produced only by CD4+ T cells, then an assay measuring that cytokine should not have CD8+ T cells curated as the effector cell.

As the IEDB encounters new assay types in the literature, each is added to OBI utilizing the same QTT template that was used before. Thus, new assays can quickly and easily be added to OBI. Once a new OBI.owl file is generated, this file simply replaces the existing one in use by the IEDB’s search interface.

The overall effort involved in the implementation described here breaks down as follows: To replace the 244 assays in the IEDB with corresponding terms in OBI took a total of 4 man-months. The majority of this work went into analyzing how to precisely model the assays in our list formally, and how they relate to each other, which involved reading papers in which they were used and talking to experts in the field. The technical steps of integration were largely a one-off effort to enable our developers to read OWL files, which took 2 man-months. Going forward, updates are integrated into the build process, and require no human intervention.

The IEDB plans on integration of formal ontology throughout its search interface. Projects include modeling of MHC molecules, disease states, geographic locations, mouse phenotype, proteins, tissues, and cells. Many of these projects overlap with existing ontologies. Wherever possible, we plan on collaborating with the existing project and linking to other resources through ontological identifiers. Once accomplished, we hope these ontological presentations of existing data within the IEDB will enhance search capabilities, increase consistency, and make the end user experience simpler and more intuitive. We believe that the problems we have encountered and the benefits we have found in utilizing ontologies to drive the user interface are of general relevance. By comparing our experience with those working on similar problems, such as the eagle-i discovery system [8], it should be possible to develop best practices regarding how databases should utilize ontologies to reap direct benefits for their user interfaces.

Acknowledgements

We gratefully acknowledge the IEDB team and funding by the National Institutes of Health contract HHSN2272201200010C.

References

1. Vita R, Peters B, Sette A. The curation guidelines of the immune epitope database and analysis resource. Cytometry A. 2008 Nov;73(11):1066-70.

2. The OBI Consortium http://purl.obolibrary.org/obo/obi

3. Greenbaum JA, Vita R, Zarebski L, Emami H, Sette A, Ruttenberg A, Peters B (2009a) ONTology of Immune Epitopes (ONTIE) Representing the Immune Epitope Database in OWL. The 12th Annual Bio-Ontologies Meeting, ISMB 2009, pages 45–48.

4. Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B. The immune epitope database 2.0. Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62. Epub 2009 Nov 11.

5. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. May 2000;25(1):25-9.

6. http://code.google.com/p/unit-ontology/

7. Peters B, Ruttenberg A, Greenbaum J, Courtot M, Brinkman R, Whetzel P, Schober D, Sansone A S, Scheuerman R, Rocca-Serra P. Overcoming the Ontology Enrichment Bottleneck with Quick Term Templates. Nature Precedings : doi:10.1038/npre.2009.3970.1: 12 Nov 2009.

8. Vasilevsky N, Johnson T, Corday K, Torniai C, Brush M, Segerdell E, Wilson M, Shaffer C, Robinson D, Haendel M. Research resources: curating the new eagle-i discovery system. Database (Oxford). Mar 20;2012:bar067.

Footnotes

* rvita@liai.org

Appendix

Fig. 1.Conversion of IEDB assay types from a static list to terms in OBI used to generate new search interface utilizing a hierarchical tree.

No Comments

Leave a comment

Login