The EDAM ontology for bioinformatics tools and data

Motivation

Researchers demand simple and powerful means to organise, find, compare, use and connect an increasingly large and complex set of tool and data resources. These tasks depend on consistent, machine-understandable resource descriptions. There is an urgent need for an ontology that unifies semantically common bioinformatics concepts and provides a controlled vocabulary for the annotator.

Authors

Jon Ison*, Matus Kalas**, Steve Pettifer*** and Peter Rice*

* EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK

** Computational Biology Unit, Uni Computing, 5008 Bergen, Norway

*** School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK

EDAM ONTOLOGY

EDAM (Figure 1) includes 5 sub-ontologies (Table 1) within the scope of bioinformatics resource (tool and data) description. There are 5 types of EDAM-specific relationships (Table 2) which relate concepts from different branches. EDAM provides:

EDAM provides a starting point for nomenclature and is ready for use in pilot annotations.

Table 1. EDAM sub-ontologies

Branch

Description

topic

A general field of bioinformatics study, analysis or technique, e.g. “Sequence analysis”, “Phylogenetics”

data

A type of data commonly used in bioinformatics, e.g. “Sequence alignment”, “Sequence record”

format

A commonly used data format, e.g. “FASTA”, “SAM”

identifier

A label that identifies (typically uniquely) biological or computational entities, e.g. “Ensembl ID”, “EC number”

operation

A specific, singular function performed by a tool. What is done, but typically not how or in what context, e.g. “Sequence alignment”, “Sequence database search”

EDAM includes 5 sub-ontologies which collectively define the scope.


Fig. 1. Sub-ontologies are in boxes, relations are shown as arrows.

EDAM terms reflects well-established concepts and correspond to categories of things. The current version includes over 2000 concepts with names (terms) and definitions, 1000 EDAM-specific relations and 3000 is_a (subclass) relations.

Table 2. EDAM relations

Relation

Description

is_a

A (child) concept is a specialisation of its parent, e.g. “Pairwise sequence alignment is_a Sequence alignment”

in_topic

A concept (‘data’ or ‘operation’) is within scope of a ‘topic’, e.g. “Sequence alignment” in_topic “Sequence analysis”

has_input

An ‘operation’ consumes a certain type of ‘data’, e.g. “Sequence alignment has_output Sequence”

has_output

An ‘operation’ produces a certain type of ‘data’ , e.g. “Sequence alignment has_input Sequence”

is_format_of

A data ‘format’ is a format of a certain type of ‘data’, e.g. “FASTA is_format_of Sequence record”

is_identifier_of

A data ‘identifier’ is an identifier of a certain type of ‘data’, e.g. “EMBL accession is_identifier_of Sequence record”

EDAM includes 5 custom relations (in addition to is_a) which relate concepts from one branch (in quotes, e.g. ‘data’) to another.

http://edamontology.sourceforge.net/

https://sourceforge.net/projects/edamontology/files/

http://bioportal.bioontology.org/ontologies/44600