{ "cells": [ { "cell_type": "markdown", "id": "2d50d540", "metadata": {}, "source": [ "# Using the automatic cross-reference expansion" ] }, { "cell_type": "markdown", "id": "413a4b2a", "metadata": {}, "source": [ "CobraMod contains a function `add_crossreferences` to automatically add missing cross-references to the metabolites and reactions of a model if those contain at least one valid identifier that can be recognised by MetanetX MetaNetX Ref. These identifiers must either be the object ID or must be included in the object's annotations. By default, when using the function `add_pathway` CobraMod calls `add_crossreferences`, gathers available cross-references, and adds them to the respective object before adding them to the model. Additionally, `add_crossreferences` can be used with metabolites, reactions, and groups that are already in the model. This way reactions, metabolites, groups, and even entire models can be annotated with cross-references as demonstrated below." ] }, { "cell_type": "markdown", "id": "a7b8574a", "metadata": {}, "source": [ "### The internal cross-referencing procedure \n", "\n", "Cross-references are retrieved using MetaNetX. The EC number (Enzyme Commission number) are produced using the 'reac_prop' file provided by MetaNetX. This file contains all reactions listed in MetanetX, the corresponding MetaNetX IDs, and their EC numbers. This file is downloaded and cached once per Python instance. MetaNetX.org contains cross-references for the following databases MetaNetX Ref:" ] }, { "cell_type": "markdown", "id": "e62a28c5", "metadata": {}, "source": [ "| Database | Metabolites | Reactions |\n", "| --- | --- | --- |\n", "| BIGG | x | x |\n", "| ChEBI | x | |\n", "| enviPath | x | |\n", "| HMDB | x | |\n", "| KEGG | x | x |\n", "| LipidMaps | x | |\n", "| MetaCyc | x | x |\n", "| Reactome | x | |\n", "| Rhea | | x |\n", "| SABIO-RK | x | x |\n", "| SwissLipids | x | |\n", "| The SEED | x | x |" ] }, { "cell_type": "markdown", "id": "fd543dbe", "metadata": {}, "source": [ "In addition to the references listed in the table, the InChI, the InChIKey and for reactions also the EC numbers are added for metabolites. MetaNetX is also used for this information. If a Metabolite contains an InChIKey, 'pubchem.compound' references are retrieved directly from PubChem Kim2020 Ref. Additionally, Brenda Brenda Ref identifiers are added for reactions that include EC numbers." ] }, { "cell_type": "markdown", "id": "80ed79d6", "metadata": {}, "source": [ "### Caching\n", "\n", "To ensure reproducibility of the models and to accelerate the repetition of actions already performed, the cross-reference component of CobraMod has a built-in cache. During the search for missing cross-references, all results obtained get stored in the specified data directory in the XRef folder. Due to performance and compatibility reasons we use the Apache Arrow feather format for these files arrow Ref. This avoids duplicate retrieval of references and provides a way to share all necessary files to reproduce the original achieved result.\n", "\n", "The disadvantage of such a cache is that locally stored references are not controlled and could potentially change on the servers. To obtain the latest cross-references the XRef folder can be deleted and CobraMod will create all necessary files and retrieve the newest cross-references." ] }, { "cell_type": "markdown", "id": "8e3609c3", "metadata": {}, "source": [ "## Extending the annotations of different CobraPy objects.\n", "\n", "In the following, we first extend a metabolite and then a reaction with 'add_crossreferences'. Then the annotations of the default model provided by Memote are expanded." ] }, { "cell_type": "markdown", "id": "d6aac3ae", "metadata": {}, "source": [ "### Metabolite" ] }, { "cell_type": "markdown", "id": "81ef7169", "metadata": {}, "source": [ "First, we generate a metabolite that we want to annotate and look at its existing cross-references." ] }, { "cell_type": "code", "execution_count": 12, "id": "de82f58d", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/jan/arbeit/memote_test/memote-model-repository_forXRef\n" ] }, { "data": { "text/html": [ "\n", "
| Metabolite identifier | 13dpg_c | \n", "
| Name | 3-Phospho-D-glyceroyl phosphate | \n", "
| Memory address | \n", "0x07f1b40fe57d0 | \n", "
| Formula | C3H4O10P2 | \n", "
| Compartment | c | \n", "
| In 2 reaction(s) | \n", " PGK, GAPD | \n", "
| Reaction identifier | ACALD | \n", "
| Name | acetaldehyde dehydrogenase (acetylating) | \n", "
| Memory address | \n", "0x07f1b40d0ffd0 | \n", "
| Stoichiometry | \n", "\n",
" acald_c + coa_c + nad_c <=> accoa_c + h_c + nadh_c \n", "Acetaldehyde + Coenzyme A + Nicotinamide adenine dinucleotide <=> Acetyl-CoA + H+ + Nicotinamide adenine dinucleotide - reduced \n", " | \n",
"
| GPR | b0351 or b1241 | \n", "
| Lower bound | -1000.0 | \n", "
| Upper bound | 1000.0 | \n", "