on July 5, 2011 by in 2011, Comments (0)

CALOHA: A new human anatomical ontology as a support for complex queries and tissue expression display in neXtProt.

neXtProt(http://www.nextprot.org) is a new bioinformatics resource aiming to be a comprehensive human-centric discovery platform, offering its users a seamless integration of and navigation through protein-related data. neXtProt integrates all of the high-quality human sequences and annotations from UniProtKB/Swiss-Prot (UniProt consortium, 2010) and also contains a vast amount of information obtained by mining many external data resources with very stringent quality criteria. It provides, for every integrated data, a Gold/Silver quality tag.

Regarding human protein expression, neXtProt integrates data obtained on healthy tissues from two resources: microarray and EST data from BGee (http://bgee.unil.ch/), and immunohistochemistry data from the Human Protein Atlas (HPA) (http://www.proteinatlas.org/). These resources capture expression data using different experimental methodologies, at different levels of granularity, in partially overlapping anatomical structures ranging from cell types to organs, and at different developmental stages. In addition, the different resources use synonyms to describe the same object. As a discovery platform, neXtProt intends: (1) to describe all imported expression data with the original granularity (e.g.: intestinal epithelium and not intestine), (2) to compare datasets with different granularity levels and (3) to support complex queries about protein expression.. A prerequisite to accomplish those objectives was to use a suitable ontology of human anatomy that describes organs, tissues and cell types. This ontology should contain all terms provided by the different expression resources; should be complete enough to support user’s queries, but simple enough to be used also as expression viewer. This led us to develop CALOHA, the “CALipho Ontology for Human Anatomy”.

The current version of CALOHA contains 688 terms, including terms describing currently available data and additional terms that permit the connection of these anatomical entities.
New terms will be added according to new imported data and to the identification of neXtProt user’s queries requirements. Each term has cross-references to eVoc, BRENDA and MeSH, and is associated with a wealth of synonyms collected from different resources. The definitions are imported from MeSH, NCI Thesaurus, Wikipedia or from the literature.

The ontology is structured on the basis of two relationships: is_a and part_of, and is implemented in OBO format. It is organized in different interconnected categories: anatomical systems (alimentary, circulatory, dermal, endocrine, exocrine, etc.), tissues (epithelium, mucosa, connective, lymphoid, etc.), cell types, fluids and secretions, and gestational structures (embryo, fetus, extraembryonic tissues and fluids). In this way, the ontology can be browsed fluently from a system down to its constituting organs, tissues and cell types.

This ontology has been implemented in neXtProt as a support for capturing experimental information from HPA and BGee. It is used to reconcile data obtained at different granularity levels: in neXtProt expression tables, information captured at each level is integrated in upper levels, facilitating visualization and allowing comparison between experiments done in different sub-fractions of a same entity provided by different resources. To be able to describe experimental results with accuracy, we complement this ontology with a controlled vocabulary for human developmental stages, from embryo to adulthood, maintained by F. Bastian and collaborators (Bastian et al. 2008) and downloadable on http://bgee.unil.ch/.

Along with our stringent selection of data to be integrated, and our system of quality grading (Gold/Silver quality tag), the full integration of expression data from various sources across the complete human anatomy allows neXtProt users to obtain high-quality sets of proteins expressed in a given location. These sets can easily be obtained by searching for a particular term, under the topic ‘expression’ and selecting the desired Gold/Silver stringency. For example, a set of proteins known to be expressed in the retina with a high confidence level can be obtained using the following URL: http://www.nextprot.org/db/search#{f:expression,t:retina}

In conclusion, CALOHA, a new human anatomical ontology, has been successfully used to reconcile expression data from heterogeneous resources. It has been implemented in neXtProt as a support for complex queries and tissue expression display. We want to keep it up to date; if possible in collaboration with groups that are interested in using and/or developing such ontology. It can be downloaded at ftp.nextprot.org.


Paula D. Duek, Anne Gleizes, Catherine Zwahlen, Anaïs Mottaz, Amos Bairoch and Lydie Lane*

CALIPHO Group, SIB – Swiss
Institute of Bioinformatics, CMU, 1 rue Michel Servet 1211 Geneva 4

No Comments

Leave a comment