on July 12, 2011 by Simon Cockell in 2011, Comments (0)

Using Multiple Ontologies to Annotate and Integrate Phenotype Records from Multiple Sources


Motivation: The completion of finished and draft sequences for model organisms such as rat has been followed by multiple SNP and knockout projects as well as the complete genome sequencing of a variety of strains exhibiting a vast array of phenotypes. While there have been several larger scale phenotyping projects, in general the data has not been integrated and the majority of phenotype measurement data remains scattered with a small proportion of it available in published literature. Because laboratories use various strains, methods and experimental protocols, phenotype data has been difficult to integrate. Described here is a multiple ontology approach to standardizing and integrating data from multiple laboratories using various protocols.



Mary Shimoyama* , Rajni Nigam, Melinda Dwinell

Medical College of Wisconsin, Milwaukee Wisconsin


The potential value of integrating phenotype data from multiple sources (different laboratories, varying techniques to measure similar phenotypes, multiple strains) is enormous. The power to identify novel genes associated with human disease is greatly increased by including phenome data since environment, experimental conditions and background genome influences can have a significant impact. The inclusion of environmental and experimental context increases the success of generating phenome-genome relationships for understanding the role of genes in disease.1 However, most phenotype data is gathered or generated without thought to integrating the results with the results from other studies even within the same laboratory, creating a barrier to integrating and comparing results reported in publications. Experimental conditions, strain (genetic background), age and time course (multiple measurements made across time or under different experimental conditions) all contribute to the difficulty in comparing phenotype data from multiple sources. For example, the comparison of blood pressure measured in different laboratories or programs can be impacted by the way in which blood pressure is measured (e.g. direct measurement via catheter in artery, telemetry, blood pressure cuff), the conditions under which the animals have been housed (e.g. low salt/high salt diet, chemicals in water), surgical manipulations (e.g. removal of a kidney), gender and age.


Two approaches are currently used to house rat or mouse phenotype data. One approach is to ensure that all data for a phenotype measurement is measured using a standard operating procedure with baseline conditions. 2, 3
The procedures and information on assay method used, sample information and other information is contained within Standard Operating Procedure documents and they are not part of the phenotype data records. Users have access to all the data in the database for cross strain comparisons for these limited sets of assays and experimental conditions. A second approach is to allow data from multiple projects but to keep them as separate datasets or projects and allow the user access to a single project data at a time. 4, 5 This allows access to data from multiple projects, including those with varying experimental conditions and assays, but it does not allow the data to be truly integrated because of the lack of standardized formats and labeling, nor does it allow the user to compare data from multiple projects easily.


Fig 1. Components for standardizing and integrating phenotype data.

Presented here are four ontologies that provide the backbone of a standardized format for integrating phenotype measurement data from multiple sources. This system is designed to accommodate mammalian model organisms such as rat and mouse.



Multiple ontologies were developed to address standardization of the four major elements of phenotype measurement records: 1) who was measured, 2) what was measured, 3) how was it measured, and 4) under what conditions was it measured (Fig 1). All ontologies are available through the National Center for Biomedical Ontology BioPortal (http://bioportal.bioontology.org/) and the Rat Genome Database ftp site (http://rgd.mcw.edu/pub/ontology/)

Rat Strain Ontology

The Rat Strain Ontology (RS) (Fig 2) was created to standardize nomenclature and organize strains according to type of strain: inbred, outbred, mutant, congenic, and consomic, and also according to breeding history. It presents a hierarchy for parental strains, substrains and those with portions of the parental genetic background to allow users to retrieve and compare annotations and phenotype records for groups of related strains and also provides them with the ability to distinguish between substrains which may exhibit subtle genetic differences.

Fig 2 Rat Strain Ontology

Fig 3 Clinical Measurement Ontology

Clinical Measurement Ontology

The Clinical Measurement Ontology (CMO) (Fig 3) provides the standardized vocabulary necessary to indicate the type of measurement made to assess a particular trait. Each term in CMO describes a distinct type of measurement used to assess one or more traits. The terms are arranged in a hierarchy of classes organized on the higher levels according to the body system in which the measurement is made. The ontology is designed to address phenotype measurements commonly made in both clinic and research settings humans and model organisms alike. This affords the opportunity for integrating data from the medical record into data gathered as part of clinical trials or research and also facilitates comparisons across organisms.

Measurement Method Ontology

An important component of a phenotype measurement record is identification of the method used to make the measurement since results can vary based on method. The Measurement Method Ontology (MMO) was created to standardize method classification using the mechanism of the method as the underlying principle for organization (Fig 4). It is organized around two major branches “ex vivo method” and “in vivo method”. This ontology was developed in parallel with the Clinical Measurement Ontology as phenotype measurement areas were addressed. Methods were identified from publications, experimental protocols, laboratory manuals and vendors’ catalogues.


Fig 4 Measurement Method Ontology

Experimental Condition Ontology

Changing experimental conditions to identify the effect on particular clinical measurements is a common part of research design so it is important to capture this information in a standardized way to allow data to be compared across

Fig 5 Experimental Condition Ontology

studies. The Experimental Conditions Ontology (XCO) (Fig 5) was created to provide structure and standardization of the variety of experimental conditions which are typically imposed in model organism and clinical research projects. These include such factors as diet, oxygen and carbon dioxide levels, drugs and chemicals, activity and body position, as well as surgical interventions. In cases such as chemicals and drugs, the organization and terminology used was borrowed from existing sources such as ChEBI6 with the inclusion of appropriate identifiers for reference and integration.



The four ontologies created here are currently used in two projects involving rat and human data. PhenoMiner is a project to integrate phenotype measurement data for the laboratory rat from multiple sources and is housed at the Rat Genome Database (http://rgd.mcw.edu/phenotypes/). Over 14,000 records have been mapped to the four ontologies and integrated into a single database from two large scale phenotype sources PhysGen (http://pga.mcw.edu/ ), the National BioResource Project for the Rat in Japan (http://www.anim.med.kyoto-u.ac.jp/nbr/ ) and from published literature. The innovative query and data display tools leverage the power of the ontologies so that researchers can create and filter queries and manage data returns and displays easily. Figure 6 illustrates the strength of the ontology driven data integration. Although the data represented is present in the PhysGen resource, users can only access data one protocol at a time and would have to download and devise their own system for examining sex differences across experimental conditions. Through the PhenoMiner resource, a single query allows users to identify these differences at a glance

Fig 6 Sex differences for BN/NHsdMcwi across multiple conditions


The Clinical Measurement Ontology, Measurement Method Ontology and Experimental Condition Ontology are also being used by the Cardiovascular Ontologies and Vocabularies in Epidemiological Research (COVER) project which integrates demographic and phenotype measurement data from three large scale family blood pressure studies. Using the ontologies to map data elements from each of the studies to a common format, to date, records for 8,778 subjects spanning over 100 phenotype measurement types have been integrated (http://cover.wustl.edu/Cover/). The ontologies have proven successful in standardizing phenotype measurement data regardless of technology platform.

Creating structures to integrate phenotype measurement data from multiple sources is an important task as investigators draw on the strength of the genomic and sequence variation resources to identify underlying genotype factors related to phenotypes and diseases. In order to make these connections, researchers need to easily access and analyze phenotype measurement data related to individuals and various model strains, and information on experimental conditions and methodologies that may affect the measurement values. Employing multiple ontologies to standardize data formats facilitates the integration of these vital datasets and provides the structure on which innovative data mining, analysis and presentation tools can be built. These types of resources can provide researchers with a more accurate picture of phenotype variations among populations and as well as the impact that measurement methods may have on measurement results. The influence of experimental and environmental conditions on phenotypes and disease will also be easier to elucidate when researchers have access to large numbers of measurements from a wide variety of studies. This is an important step in helping investigators link genotypes to phenotypes.


The authors would like to acknowledge the efforts of the Rat Genome Database curators and bioinformatics staff.


Butte AJ, Kohane IS. (2006) Creation and implication of a phenome-genome network.
Nat Biotechnol. 24:55-62


Mashimo T, Voigt B, Kuramoto T, Serikawa T. (2005) Rat phenome project: the untapped potential of existing rat strains. J Appl Physiol, 98(1):371-9


Mallon AM, Blake A, Hancock JM. (2008) EuroPhenome and EMPReSS:online mouse phenotyping resource. Nucleic Acids Res. 36(Database issue):D715-8.


Kwitek AE, Jacob HJ, Baker JE, Dwinell MR, Forster HV, et al. (2006) BN phenome: detailed characterization of the cardiovascular, renal, and pulmonary systems of the sequenced rat. Physiol Genomics. 25(2):303-13.


Bogue MA, Grubb SC, Maddatu TP, Bult CJ. (2007) Mouse Phenome Database (MPD). Nucleic Acids Res. 35(Database issue):D643-9.


Degtyarenko K, deMatos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcantara R, Darsow M, Guedj M, Ashburner M, (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36(Database issue):D344-D350.

No Comments

Leave a comment