Enzymes, as biological catalysts, are important and extremely intricate systems, without which life, as we know it, could not exist. Enzyme complexity is a function of the protein sequence, 3D structure and the mechanism of the reaction they catalyze. Whilst enzymes range in size from tens of amino acid residues to thousands, only a few residues are catalytically vital (the catalytic residues). These are found in a cleft, often deeply buried in the protein, called the active site. Information relating to these residues, identified using the enzyme’s atomic structure, are held in the Catalytic Site Atlas (CSA) [Porter,2004] while chemical reaction and mechanistic details are held in a sister database MACiE [Holliday,2007]. Both of these databases utilize a controlled vocabulary, with MACiE possessing a more detailed vocabulary as it focuses on enzymes in a much greater depth to include thorough descriptions of the chemical reaction steps performed. Likewise, the Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) [UniProt,2011] also captures enzyme related data at a broader protein sequence level, including information on catalytic residues. Annotations are made as both free text and using an independently developed controlled vocabulary.
Whilst the CSA and MACiE resources have been developed somewhat in tandem and thus share a common data model, it is not currently simple to link these to enzyme annotations in resources, primarily UniProtKB, due to differences in the definitions of enzyme properties and the vocabularies used in their description. Though descriptions and definitions of some of the information held in all three databases are made in existing ontologies such as GO and the ChEBI ontology, marrying these and applying them uniformly to all three databases proved far from trivial.
In this paper we present the Enzyme Mechanism Ontology, EMO, which builds upon the controlled vocabulary developed for MACiE and the CSA and will be submitted to the OBO Foundry. This vocabulary was created to describe the active components of the enzyme’s reactions (cofactors, amino acid residues and cognate ligands) and their roles in the reaction. EMO builds upon this by formalizing key concepts, and the relationships between them, necessary to define enzymes and their functions. This describes not only the general features of an enzyme, including the EC number (catalytic activity), 3D structure and cellular locations, but also allows for the detailed annotation of the mechanism. This mechanistic detail can be either at a gross level (overall reaction only), or the more detailed granularity of the steps and components required to effect the overall chemical transformation.
EMO allows for many different resources to be drawn together for a more complete description of an enzyme and its function/mechanism, even where data are only partially annotated in some resources. Communication between databases can be facilitated through the use of such a universal resource that maps disparate terms to a common data model. To this end, EMO is being applied to the Enzyme Portal, a project in development within the EBI, which aims to provide a unified portal to all EBI enzyme-related resources. The ontology will also allow us to ask more general enzyme related questions, which are currently not trivial to address or require queries to be run across many different databases. Questions such as identifying which enzymes can be found in specific cellular compartments, or the exact nature and combination of cofactors will be able to be addressed in a coherent manner. It will also be possible to identify disease and drug associations relating to specific enzymes, linking this information back to more specific mechanistic details. Furthermore it should enable automated classification and detection of misclassification of enzymes, based on their mechanism.
This ontology, as a collaboration between the UniProt Consortium, CSA and MACiE, has been created in an effort to standardize our vocabulary and has not only permitted us to unify the various levels of detail held about similar information in each of our databases, but the implementation of which will also permit the many other users of enzyme data to cross-reference and share data with each other.
Julius O. B. Jacobsen, Nicholas Furnham and Gemma L. Holliday
EMBL-EBI, The Wellcome Trust Genome Campus, Hinxton, Cambs, CB10 1SD, UK
Holliday,G.L., Almonacid,D.E., Bartlett,G.J., O’Boyle,N.M., Torrance,J.W., Murray-Rust,P., Mitchell,J.B.O and Thornton,J.M (2007) MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms. Nucl. Acids Res., 35, D515-D520.
Porter,C.T., Bartlett,G.J. and Thornton,J.M. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucl. Acids Res., 32, D129-D133
The UniProt Consortium (2011), Ongoing and future developments at the Universal Protein Resource. Nucl. Acids Res., 39, D214-D219