Increased Expressivity of Gene Ontology Annotations

1 Introduction

Gene Ontology (GO) annotations can be modeled as pairwise associations between a gene or gene product G and a pre-coordinated class C from the ontology, augmented with additional metadata (evidence, provenance, etc). The primary way to provide more fine-grained annotations is by requesting more specific terms; e.g. “B-cell apoptosis” as a subtype of “apoptosis”. The GO is using Web Ontology Language (OWL) reasoners and TermGenie1 templates to instantly create compositional terms, but there is still a need for anonymous class expressions to avoid ontology inflation and to represent cases such as a kinase and its phosphorylated target. The GO consortium has published version 2.0 of the Gene Association Format (GAF) which allows for dynamic post-composition of classes at the time of annotation.

Authors

Christopher J. Mungall, Amelia Ireland, Valerie Wood, Midori Harris, David Hill, Paul D. Thomas, Emily Dimmer

Gene Ontology Consortium http://geneontology.org

2 Syntax and Semantics

In a standard GO annotation G is associated with a single class C is specified (col. 5 of the tabular GAF file) – this must be a GO class. With GAF2.0, the annotation provider can provide zero or more relation label-class ID pairs of the form R(Y) separated by comma and pipe characters (col. 16) 2. This has a standard OWL translation – each pipe-separated block is separated into its own annotation, and the remainder is translated to an anonymous intersection of the main class plus all relation-class pairs:

R1(Y1),R2(Y2),…,Rn(Yn) à C’ and R1′ some Y1′ and … Rn’ some Yn’. (Here X’ denotes the translation of an OBO identifier or a relation label to an OWL IRI).

3 Example

The rat gene Has2 has an active role during hyaluronan biosynthesis in the renal cortex interstitium. Here G=(ID for) Has2 and C=(ID for) hyaluronan biosynthesis; the annotation extension colum has the value “occurs_in(UBERON:0005270)”. This is equivalent to annotating G to a class that is defined as ‘hyaluronan biosynthesis’and ‘occurs_in’ some ‘renal cortex interstitium’

4 Availability

Annotations are available from the main GO downloads page3. Currently: only the goa, pombase, mgi GAFs have this information.

5 Applications

Currently there are few applications that make use of extended annotations. The GO consortium has implemented code for translating annotations to OWL4. The resulting class expressions can optionally be added to a dynamically generated ontology and pre-reasoned. This allows the extended annotations to be used in standard enrichment tools. In addition, standard techniques for computing the role-bounded Least Common Subsumer (Baader et al 2003) can be applied to enhance this ontology.

6 Future work and Discussion

The current syntax may be extended to allow recursive nesting of OWL expressions. GO annotations are also being extended to allow for the specification of non-default relationships between genes/gene products and GO classes.

References

Christopher J. Mungall, Michael Bada, Tanya Z. Berardini, Jennifer Deegan, Amelia Ireland, Midori A. Harris, David P. Hill, and Jane Lomax. Cross-Product Extensions of the Gene Ontology. Journal of Biomedical Informatics, 44(1):80 – 86, 2011.

F. Baader. Least common subsumers and most specific concepts in a description logic with existential restrictions and terminological cycles. In Proc. of the 18th Int. Joint Conf. on Artificial Intelligence (IJCAI-03). Morgan Kaufm., 2003

Footnotes

1 http://go.termgenie.org

2 http://www.geneontology.org/GO.annotation.col_16.shtml

3 http://geneontology.org/GO.downloads.annotations.shtml

4 http://owltools.googlecode.com