1.4. Using the automatic cross-reference expansion

CobraMod contains a function add_crossreferences to automatically add missing cross-references to the metabolites and reactions of a model if those contain at least one valid identifier that can be recognised by MetanetX [2, 3, 4, 5]. These identifiers must either be the object ID or must be included in the object’s annotations. By default, when using the function add_pathway CobraMod calls add_crossreferences, gathers available cross-references, and adds them to the respective object before adding them to the model. Additionally, add_crossreferences can be used with metabolites, reactions, and groups that are already in the model. This way reactions, metabolites, groups, and even entire models can be annotated with cross-references as demonstrated below.

1.4.1. The internal cross-referencing procedure

Cross-references are retrieved using MetaNetX. The EC number (Enzyme Commission number) are produced using the ‘reac_prop’ file provided by MetaNetX. This file contains all reactions listed in MetanetX, the corresponding MetaNetX IDs, and their EC numbers. This file is downloaded and cached once per Python instance. MetaNetX.org contains cross-references for the following databases [2, 3, 4, 5]:

Database

Metabolites

Reactions

BIGG

x

x

ChEBI

x

enviPath

x

HMDB

x

KEGG

x

x

LipidMaps

x

MetaCyc

x

x

Reactome

x

Rhea

x

SABIO-RK

x

x

SwissLipids

x

The SEED

x

x

In addition to the references listed in the table, the InChI, the InChIKey and for reactions also the EC numbers are added for metabolites. MetaNetX is also used for this information. If a Metabolite contains an InChIKey, ‘pubchem.compound’ references are retrieved directly from PubChem [6]. Additionally, Brenda [7] identifiers are added for reactions that include EC numbers.

1.4.2. Caching

To ensure reproducibility of the models and to accelerate the repetition of actions already performed, the cross-reference component of CobraMod has a built-in cache. During the search for missing cross-references, all results obtained get stored in the specified data directory in the XRef folder. Due to performance and compatibility reasons we use the Apache Arrow feather format for these files [8]. This avoids duplicate retrieval of references and provides a way to share all necessary files to reproduce the original achieved result.

The disadvantage of such a cache is that locally stored references are not controlled and could potentially change on the servers. To obtain the latest cross-references the XRef folder can be deleted and CobraMod will create all necessary files and retrieve the newest cross-references.

1.4.2.1. Extending the annotations of different CobraPy objects.

In the following, we first extend a metabolite and then a reaction with ‘add_crossreferences’. Then the annotations of the default model provided by Memote are expanded.

1.4.3. Metabolite

First, we generate a metabolite that we want to annotate and look at its existing cross-references.

[12]:
from cobramod.test import textbook
from cobramod.core.crossreferences import add_crossreferences

directory = "/home/jan/arbeit/memote_test/memote-model-repository_forXRef"
%cd $directory

model = textbook.copy()
metabolite = textbook.metabolites[0]
metabolite
/home/jan/arbeit/memote_test/memote-model-repository_forXRef
[12]:
Metabolite identifier13dpg_c
Name3-Phospho-D-glyceroyl phosphate
Memory address 0x07f1b40fe57d0
FormulaC3H4O10P2
Compartmentc
In 2 reaction(s) PGK, GAPD
[2]:
metabolite.annotation
[2]:
{'bigg.metabolite': '13dpg',
 'biocyc': 'DPG',
 'chebi': ['CHEBI:16001',
  'CHEBI:1658',
  'CHEBI:20189',
  'CHEBI:57604',
  'CHEBI:11881'],
 'hmdb': 'HMDB01270',
 'kegg.compound': 'C00236',
 'pubchem.substance': '3535',
 'reactome': 'REACT_29800',
 'seed.compound': 'cpd00203',
 'unipathway.compound': 'UPC00236'}

Now we execute the function ‘add_crossreferences’ and display the cross-references again. The argument ‘consider_sub_elements’ has no influence on the function at this point because unlike reactions and whole models, metabolites do not consist of further reactions or metabolites. The ‘include_metanetx_specific_ec’ argument specifies whether MetaNetX specific EC numbers should be included or not. Last but not least, the directory argument defines the location of the cache.

[3]:
add_crossreferences(metabolite,
                    directory = directory + "/data",
                    consider_sub_elements: bool = True,
                    include_metanetx_specific_ec: bool = False,)
[4]:
metabolite.annotation
[4]:
{'bigg.metabolite': '13dpg',
 'biocyc': 'DPG',
 'chebi': ['CHEBI:89363',
  'CHEBI:57604',
  'CHEBI:1658',
  'CHEBI:11881',
  'CHEBI:16001',
  'CHEBI:20189'],
 'hmdb': ['HMDB0062758', 'HMDB01270', 'HMDB62758', 'HMDB0001270'],
 'kegg.compound': 'C00236',
 'pubchem.substance': '3535',
 'reactome': ['REACT_29800', 'R-ALL-29800'],
 'seed.compound': 'cpd00203',
 'unipathway.compound': 'UPC00236',
 'reactomem': 'R-ALL-29800',
 'sabiork.compound': ['29', '21215'],
 'biggm': ['M_13dpg', '13dpg'],
 'sabiorkm': ['29', '21215'],
 'keggc': ['M_C00236', 'C00236'],
 'metacyc.compound': 'DPG',
 'seedm': ['M_cpd00203', 'cpd00203'],
 'inchikey': ['LJQLQCAXBUHEAZ-UHFFFAOYSA-N', 'LJQLQCAXBUHEAZ-UWTATZPHSA-J'],
 'inchi': ['InChI=1S/C3H8O10P2/c4-2(1-12-14(6,7)8)3(5)13-15(9,10)11/h2,4H,1H2,(H2,6,7,8)(H2,9,10,11)',
  'InChI=1S/C3H8O10P2/c4-2(1-12-14(6,7)8)3(5)13-15(9,10)11/h2,4H,1H2,(H2,6,7,8)(H2,9,10,11)/p-4/t2-/m1/s1'],
 'metacycm': 'DPG',
 'pubchem.compound': ['683', '46878409']}

1.4.4. Reaction

Here we repeat the procedure from before but this time we use a reaction instead of a metabolite.

[5]:
reaction = model.reactions[0]
reaction
[5]:
Reaction identifierACALD
Nameacetaldehyde dehydrogenase (acetylating)
Memory address 0x07f1b40d0ffd0
Stoichiometry

acald_c + coa_c + nad_c <=> accoa_c + h_c + nadh_c

Acetaldehyde + Coenzyme A + Nicotinamide adenine dinucleotide <=> Acetyl-CoA + H+ + Nicotinamide adenine dinucleotide - reduced

GPRb0351 or b1241
Lower bound-1000.0
Upper bound1000.0
[6]:
reaction.annotation
[6]:
{'bigg.reaction': 'ACALD'}

This time the argument ‘consider_sub_elements’ would influence the function since this determines whether the annotations of the metabolites of this reaction should also be expanded.

[7]:
add_crossreferences(reaction,
                    directory = directory + "/data",
                    consider_sub_elements: bool = True,
                    include_metanetx_specific_ec: bool = False,)
reaction.annotation
[7]:
{'bigg.reaction': ['ACALDh', 'R_ACALDh', 'R_ACALD', 'ACALD'],
 'rhea': ['23288', '23289', '23290', '23291'],
 'biggr': ['ACALDh', 'R_ACALD', 'R_ACALDh', 'ACALD'],
 'seedr': ['rxn32711', 'rxn27656', 'rxn00171', 'rxn32710'],
 'sabiorkr': '163',
 'sabiork.reaction': '163',
 'metacyc.reaction': 'ACETALD-DEHYDROG-RXN',
 'kegg.reaction': 'R00228',
 'metacycr': 'ACETALD-DEHYDROG-RXN',
 'rhear': ['23288', '23289', '23290', '23291'],
 'seed.reaction': ['rxn32711', 'rxn27656', 'rxn00171', 'rxn32710'],
 'keggr': 'R00228'}

1.4.5. Model

[21]:
directory = "/home/jan/arbeit/memote_test/memote-model-repository_forXRef"
%cd $directory

from cobra.io import write_sbml_model, validate_sbml_model
from cobramod.core.crossreferences import add_crossreferences

model, errors = validate_sbml_model("model.xml")
/home/jan/arbeit/memote_test/memote-model-repository_forXRef

Again we run ‘add_crossreferes’. This time we get a progress bar because the cross-reference extension of a whole model generally takes some time.

[22]:
add_crossreferences(model,directory + "/data")
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 172/172 [02:12<00:00,  1.30it/s]
[23]:
write_sbml_model(model,"model_with_Xref.xml")

Now we use Memote to create a report that compares the model before and after expanding the annotations. We will then display this report.

[37]:
!memote report diff --filename _static/xref.html model.xml model_with_Xref.xml
Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled
The current solver interface glpk doesn't support setting the optimality tolerance.
The current solver interface glpk doesn't support setting the optimality tolerance.
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchi/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
warning: https://identifiers.org/inchikey/ does not conform to 'http(s)://identifiers.org/collection/id' or'http(s)://identifiers.org/COLLECTION:id
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/jan
plugins: anyio-3.3.0
collecting ... ============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/jan
plugins: anyio-3.3.0
collected 146 items / 1 skipped / 145 selected

collected 146 items / 1 skipped / 145 selected                                 st_annotation.py F
 [  0%]

../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_annotation.py F [  0%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.F....FF.FF......F...F...F..F.F....FF...FF...F.F.FF..F.F.F.         [ 44%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_basic.py . [ 45%]
....F..F..         [ 44%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_basic.py . [ 45%]
.....F.F........F..........F...F.FFFF                                                   [ 60%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_biomass.py . [ 60%]
F.F                                                   [ 60%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_biomass.py . [ 60%]
.F..FFF...F...F                                                                [ 67%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_consistency.py F                                                                [ 67%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_consistency.py . [ 67%]
. [ 67%]
...ssssssssssssssssF..ssssssssssssssssF.FFF.F.F.F.FFFFF                                             [ 86%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_essentiality.py s [ 87%]
                                                                         [ 87%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_growth.py s [ 88%]
                                                                         [ 88%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_matrix.py . [ 89%]
...                                                                      [ 91%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbml.py . [ 91%]
.                                                                        [ 92%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbo.py . [ 93%]
F                                             [ 86%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_essentiality.py s [ 87%]
                                                                         [ 87%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_growth.py s [ 88%]
                                                                         [ 88%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_matrix.py . [ 89%]
F........                                                                      [ 91%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbml.py . [ 91%]
.                                                                        [ 92%]
../../../miniconda3/envs/cobramod/lib/python3.7/site-packages/memote/suite/tests/test_sbo.py . [ 93%]
F.....FsF.                                                               [100%]

=============== 63 failed, 64 passed, 20 skipped in 2.19 seconds ===============
FsF.                                                               [100%]

=============== 65 failed, 62 passed, 20 skipped in 2.22 seconds ===============
Writing diff report to '_static/xref.html'.

We move the file index.html to prevent problems with files that have the same name.

[1]:
from IPython.display import IFrame
IFrame(src='./_static/xref.html', width="100%", height=800)
[1]:

It should be noted that for the “Metabolite Annotation Conformity Per Database”, both InChI and HMDB can perform poorly. This is because Cobramod adds all available InChIs based on the existing IDs. These could be InChIs consisting of several compounds, which have a common InChI. For HMDB, this poor performance is due to Memote classifying HMDB identifiers with more than five digits as incorrect. However, the current definition of identifiers.org does not limit the number of digits in HMDB identifiers and there are valid identifiers on HMDB with more than five digits. For this reason, CobraMod adds the HMDB IDs even if this lowers the Memote score.

[1]

Christian Lieven et al., MEMOTE for standardized genome-scale metabolic model testing. Nature Biotechnology, 38(3):272–276, March 2020. doi:10.1038/s41587-020-0446-y.

[2] (1,2)

Sébastien Moretti et al., MetaNetX/MNXref: unified namespace for metabolites and biochemical reactions in the context of metabolic models. Nucleic Acids Research, 49(D1):D570–D574, November 2020. doi:10.1093/nar/gkaa992.

[3] (1,2)

Sébastien Moretti et al., MetaNetX/MNXref – reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Research, 44(D1):D523–D526, November 2015. doi:10.1093/nar/gkv1117.

[4] (1,2)

M. Ganter et al., MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics, 29(6):815–816, January 2013. doi:10.1093/bioinformatics/btt036.

[5] (1,2)

T. Bernard et al., Reconciliation of metabolites and biochemical reactions for metabolic networks. Briefings in Bioinformatics, 15(1):123–135, November 2012. doi:10.1093/bib/bbs058.

[6]

Sunghwan Kim et al., PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Research, 49(D1):D1388–D1395, November 2020. doi:10.1093/nar/gkaa971.

[7]

Antje Chang et al., BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Research, 49(D1):D498–D508, November 2020. doi:10.1093/nar/gkaa1025.

[8]

Apache Software Foundation, Apache Arrow Version 6.0.1. , November 2021. URL: https://arrow.apache.org/.