Background to Semantic Annotations

Biological models are often constructed with more attention to the mathematical detail of their encoding than the biological concepts that they represent.

Consider models in the online BioModels repository. These models are encoded in the Systems Biology Markup Language (SBML), which describes the states (“species” in SBML) and processes (“reactions” in SBML) that make up a dynamical system. From a purely mathematical perspective, this state/process representation can be translated directly into a set of differential equations (for deterministic simulations) or master equations (for stochastic simulations). This, the following would be a valid SBML model (using Antimony syntax):

# a reaction J0 converts A to B
var species A, B, C
J0: A -> B; k*A*B*C
# variable initializations
A = 10
B = 0
C = 1
k = 1

However, this example is completely nonsensical from a biological perspective. What are the quantities A, B, and C? What is the process J0? To encode this information, SBML uses controlled-vocabulary (CV) terms to connect model elements to resources, which are URIs that point to ontology terms describing what the elements are intended to represent in the physical world. In the preceding example, assume that the reaction in question was the conversion of phosphoenol pyruvate (A) to pyruvate (B) by the enzyme pyruvate kinase (C). Chemical entities such as metabolites can be described via the ChEBI database: the CHEBI:18021 term describes phosphoenol pyruvate and CHEBI:15361 describes pyruvate. The pyruvate kinase enzyme can be described by the Protein Ontology (PR) as PR_000023655 (if amino acid sequence / organism of origin is not important) or UniProt (if amino acid sequence / organism is important). Using these ontology terms, we can encode the chemical identity of the variables in the model, but there are still quite a few key pieces of biological information that are missing. Where does this reaction take place? What type(s) of cell does it occur in? Some of this information can also be encoded in or extracted from SBML, with some difficulty, but not in a form suitable for automated semantic logic (such as would be possible using OWL).

Semantic annotations play an even more important role in CellML models. Unlike SBML, CellML does not have a structured way of specifying what model elements represent (abstractions such as species, reactions, and comaprtments are lost). Thus, semantic annotations are the only way to establish biological meaning in CellML models.

To address these drawbacks, we previously developed SimSem/SemGen. SemSim is a library for working with semantic annotations in SBML and CellML models, and SemGen is a GUI application for annotating models [2]. Both SemSim and SemGen were written in Java. This project (libOmexMEta) aims to provide a C++ / Python (via an extension module) implementation with a lean / reduced feature set.

In both projects (Java/C++), the main goal is to provide a tool for working with composite annotations, which are “super-structures” composed on multiple RDF triples. Composite annotations are designed to address the limitations of current annotation systems in CellML and SBML. We have previously described the benefits and use cases of composite annotations [1, 3].

References

1: John H Gennari, Maxwell L Neal, Michal Galdzicki, and Daniel L Cook. Multiple ontologies in action: composite annotations for biosimulation models. Journal of biomedical informatics, 44(1):146–154, 2011.
2: Maxwell L Neal, Christopher T Thompson, Karam G Kim, Ryan C James, Daniel L Cook, Brian E Carlson, and John H Gennari. SemGen: a tool for semantics-based annotation and composition of biosimulation models. Bioinformatics, 35(9):1600–1602, 09 2018. doi:10.1093/bioinformatics/bty829.
3: Maxwell Lewis Neal, Matthias König, David Nickerson, Göksel Mısırlı, Reza Kalbasi, Andreas Dräger, Koray Atalag, Vijayalakshmi Chelliah, Michael T Cooling, Daniel L Cook, Sharon Crook, Miguel de Alba, Samuel H Friedman, Alan Garny, John H Gennari, Padraig Gleeson, Martin Golebiewski, Michael Hucka, Nick Juty, Chris Myers, Brett G Olivier, Herbert M Sauro, Martin Scharm, Jacky L Snoep, Vasundra Touré, Anil Wipat, Olaf Wolkenhauer, and Dagmar Waltemath. Harmonizing semantic annotations for computational models in biology. Briefings in Bioinformatics, 20(2):540–550, 11 2018. doi:10.1093/bib/bby087.