{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "XV2pICiM740l" }, "source": [ "# Functions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jObtNBXPw30t", "outputId": "00ae3942-af4a-4e90-f0f7-6d3626269813" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CobraMod version: 1.2.0\n", "COBRApy version: 0.29.0\n" ] } ], "source": [ "from IPython.display import display\n", "from cobramod import __version__\n", "from cobra import __version__ as cobra_version\n", "print(f'CobraMod version: {__version__}')\n", "print(f'COBRApy version: {cobra_version}')\n", "# From Escher:\n", "# This option turns off the warning message if you leave or refresh this page\n", "import escher\n", "escher.rc['never_ask_before_quit'] = True" ] }, { "cell_type": "markdown", "metadata": { "id": "tjphNJOQF7B3" }, "source": [ "## Retrieving metabolic pathway information\n", "\n", "CobraMod can retrieve metabolic pathway information (metabolites, reactions or pathways) from various databases by using database-specific identifiers. It supports all databases from the [BioCyc collection](\n", "https://biocyc.org/\n", "), [Plant Metabolic Network (PMN)](\n", "https://pmn.plantcyc.org/\n", "), \n", "the [KEGG database](\n", "https://www.genome.jp/kegg/\n", ") and the [BiGG Models repository](\n", "http://bigg.ucsd.edu/\n", "). Call `cobramod.retrieval.Databases` to see all supported databases." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 339 }, "id": "FKwfLSf07408", "outputId": "855b34ca-28f5-4a34-bf20-67af248b3155" }, "outputs": [ { "data": { "text/html": [ "\n", "
CobraMod supports BioCyc, the Plant Metabolic Network (PMN), KEGG and BiGG Models repository. BioCyc includes around 18.000 sub-databases. A complete list for BioCyc can be found at: 'https://biocyc.org/biocyc-pgdb-list.shtml'.
The database-specific identifiers can be found in the URL of the respective data.CobraMod uses abbreviations to represent the databases or sub-databases:\n",
"
| Abbreviation | \n", "Database name | \n", "
| META | \n", "Biocyc, subdatabase MetaCyc | \n", "
| ARA | \n", "Biocyc, subdatabase AraCyc\n", " | \n", "
| KEGG | \n", "Kyoto encyclopedia of Genes and Genomes\n", " | \n", "
| BIGG | \n", "BiGG Models | \n", "
| PMN:META | \n", "PlantCyc, subdatabase META | \n", "
| PMN:ARA | \n", "PlantCyc, subdatabase ARA | \n", "
This applies for all subdatabases from BioCyc and Plantcyc\n", "
\n", " " ], "text/plain": [ "CobraMod supports BioCyc, the Plant Metabolic Network (PMN), KEGG and BiGG Models repository. BioCyc includes around 18.000 sub-databases. A complete list for BioCyc can be found at: 'https://biocyc.org/biocyc-pgdb-list.shtml'.\n", "The database-specific identifiers can be found in the URL of the respective data.CobraMod uses abbreviations to represent the databases or sub-databases:\n", "Abbreviation Database Name\n", "META Biocyc, subdatabase MetaCyc\n", "ARA Biocyc, subdatabase AraCyc\n", "\n", "KEGG Kyoto encyclopedia of Genes and Genomes\n", "BIGG BiGG Models\n", "\n", "PMN:META Plantcyc, subdatabase META\n", "PMN:ARA Plantcyc, subdatabase ARA\n", "This applies for all subdatabases from SolCyc, BioCyc and Plantcyc" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cobramod.retrieval import Databases\n", " \n", "Databases()" ] }, { "cell_type": "markdown", "metadata": { "id": "t3ADoeWMGqCV" }, "source": [ "\n", "\n", "The user can download the metabolic pathway information using the\n", "`cobramod.get_data` function. In this example we download information from the BioCyc sub-database YEAST.\n", "\n", "\n", "**NOTE**: In order to retrieved data from Metacyc. An account is required. Create a file called `credentials.txt` and add in the first line the username and the password in the second line\n", "\n", " my_username\n", " secret_password\n", " \n", "Only after setting the credentials, CobraMod is able to download data from BioCyc" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "RFawmPqO740o" }, "outputs": [], "source": [ "from cobramod import get_data\n", "from pathlib import Path\n", "\n", "dir_data = Path.cwd().resolve().joinpath(\"data\")\n", "identifiers = [\n", " \"CPD-12575\",\n", " \"BETA-D-FRUCTOSE\",\n", " \"SUCROSE\",\n", " \"UDP\",\n", "]\n", "\n", "for metabolite in identifiers:\n", " get_data(\n", " directory=dir_data,\n", " identifier=metabolite,\n", " database=\"YEAST\"\n", " )" ] }, { "cell_type": "markdown", "metadata": { "id": "GL14MNJG740q" }, "source": [ "The first argument in [cobramod.get_data()](\n", "module/cobramod/index.html#cobramod.get_data) is the system path where CobraMod stores the metabolic pathway information. CobraMod uses [pathlib](\n", "https://docs.python.org/3/library/pathlib.html#pathlib.Path) for path representation. The second argument is the data identifier used in the respective database. In this example we retrieve data from the BioCyc sub-database `YEAST`. The last argument is the abbreviation of the database. \n", "\n", "CobraMod creates a directory with the name of the database and stores the metabolic pathway information in it:\n", "\n", "```\n", "data\n", "`-- YEAST\n", " |-- CPD-12575.xml\n", " |-- BETA-D-FRUCTOSE.xml\n", " |-- SUCROSE.xml\n", " |-- UDP.xml\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "CjC8UWKk740t" }, "source": [ "## Converting stored-data into COBRApy objects\n", "\n", "CobraMod can convert metabolic pathway information (metabolites and reactions) into COBRApy objects ([cobra.Reaction](\n", " https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Reaction\n", ") and [cobra.Metabolite](\n", " https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Metabolite\n", ")). It can thus be seamlessly integrated with a COBRApy workflow.\n", "\n", "The function [cobramod.create_object()](\n", "module/cobramod/index.html#cobramod.create_object) creates COBRApy objects from metabolic pathway data retrieved by [cobramod.get_data()](\n", "module/cobramod/index.html#cobramod.get_data). If no pathway information was downloaded [cobramod.create_object()]( module/cobramod/index.html#cobramod.create_object) retrieves it automatically.\n", "\n", "In this example, we convert the metabolite *2-Oxoglutarate* with the\n", "KEGG identifier [C00026](\n", "https://www.genome.jp/dbget-bin/www_bget?C00026) into a COBRApy object.\n", "CobraMod automatically identifies the KEGG entry as a metabolite and converts it into the corresponding COBRApy metabolite.\n", "\n", "The first argument is the database-specific identifier (`C00026`) followed by\n", "the database abbreviation (`KEGG`). The third argument is the path \n", "representation for the directory of the metabolic pathway information. CobraMod\n", "downloads the data into this directory and utilize it instead of downloading it again. The last argument is the compartment of the reaction (`c` for cytosol)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 168 }, "id": "s7osopQo740t", "outputId": "ee0d87e1-6fca-4cc5-d9fb-709851702f45" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "| Metabolite identifier | C00026_c | \n", "
| Name | alpha-Ketoglutaric acid | \n", "
| Memory address | \n", "0x7afc8d2703e0 | \n", "
| Formula | C5H6O5 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Reaction identifier | RXN_11501_c | \n", "
| Name | alkaline α- galactosidase | \n", "
| Memory address | \n", "0x7afcc0510c80 | \n", "
| Stoichiometry | \n", "\n",
" CPD_170_c + WATER_c --> ALPHA_D_GALACTOSE_c + CPD_1099_c \n", "stachyose + H2O --> alpha-D-galactopyranose + raffinose \n", " | \n",
"
| GPR | ZM00001EB033890 or ZM00001EB033880 or ZM00001EB033870 | \n", "
| Lower bound | 0 | \n", "
| Upper bound | 1000 | \n", "
| Metabolite identifier | MET_c | \n", "
| Name | L-methionine | \n", "
| Memory address | \n", "0x7afc8ca6a8a0 | \n", "
| Formula | C5H11N1O2S1 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Metabolite identifier | accoa_c | \n", "
| Name | Acetyl-CoA | \n", "
| Memory address | \n", "0x7afc8c95c5c0 | \n", "
| Formula | C23H34N7O17P3S | \n", "
| Compartment | c | \n", "
| In 7 reaction(s) | \n", " ACALD, MALS, PDH, CS, Biomass_Ecoli_core, PFL, PTAr\n", " | \n", "
| Metabolite identifier | SUCROSE_c | \n", "
| Name | sucrose | \n", "
| Memory address | \n", "0x7afc8c9a53d0 | \n", "
| Formula | C12H22O11 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Metabolite identifier | MET_c | \n", "
| Name | L-methionine | \n", "
| Memory address | \n", "0x7afc8c9a7410 | \n", "
| Formula | C5H11N1O2S1 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Metabolite identifier | SUCROSE_c | \n", "
| Name | sucrose | \n", "
| Memory address | \n", "0x7afc875b01a0 | \n", "
| Formula | C12H22O11 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Metabolite identifier | MALTOSE_c | \n", "
| Name | MALTOSE[c] | \n", "
| Memory address | \n", "0x7afc8c7399a0 | \n", "
| Formula | C12H22O11 | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Metabolite identifier | xu5p__D_c | \n", "
| Name | D-Xylulose 5-phosphate | \n", "
| Memory address | \n", "0x7afc8cb758e0 | \n", "
| Formula | C5H9O8P | \n", "
| Compartment | c | \n", "
| In 3 reaction(s) | \n", " TKT2, TKT1, RPE\n", " | \n", "
| Metabolite identifier | Red_NADPH_Hemoprotein_Reductases_c | \n", "
| Name | Red-NADPH-Hemoprotein-Reductases | \n", "
| Memory address | \n", "0x7afc86120710 | \n", "
| Formula | X | \n", "
| Compartment | c | \n", "
| In 0 reaction(s) | \n", " \n", " | \n", "
| Reaction identifier | R08549_c | \n", "
| Name | 2-Oxogluterate dehydrogenase | \n", "
| Memory address | \n", "0x7afc86120fb0 | \n", "
| Stoichiometry | \n", "\n",
" C00003_c + C00010_c + C00026_c --> C00004_c + C00011_c + C00091_c \n", "Nicotinamide adenine dinucleotide + Coenzyme A + 2-Oxoglutarate --> Nicotinamide adenine dinucleotide - reduced + CO2 + Succinyl-CoA \n", " | \n",
"
| GPR | b0116 and b0726 and b0727 | \n", "
| Lower bound | 0.0 | \n", "
| Upper bound | 1000.0 | \n", "
| Reaction identifier | R00315_c | \n", "
| Name | acetate kinase | \n", "
| Memory address | \n", "0x7afc86121160 | \n", "
| Stoichiometry | \n", "\n",
" C00002_c + C00033_c <=> C00227_c + G11113_c \n", "ATP + Acetate <=> Acetyl phosphate + ADP \n", " | \n",
"
| GPR | c2838 | \n", "
| Lower bound | -1000.0 | \n", "
| Upper bound | 1000.0 | \n", "
| Reaction identifier | R02736_c | \n", "
| Name | beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase | \n", "
| Memory address | \n", "0x7afc8cafdc40 | \n", "
| Stoichiometry | \n", "\n",
" C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c \n", "Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone \n", " | \n",
"
| GPR | c2265 | \n", "
| Lower bound | -1000 | \n", "
| Upper bound | 1000 | \n", "
| Reaction identifier | R04382_c | \n", "
| Name | 4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase | \n", "
| Memory address | \n", "0x7afc6fcd64b0 | \n", "
| Stoichiometry | \n", "\n",
" C06118_c <=> 2.0 C04053_c \n", "Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate \n", " | \n",
"
| GPR | c0319 | \n", "
| Lower bound | -1000 | \n", "
| Upper bound | 1000 | \n", "
| Reaction identifier | R02736_c | \n", "
| Name | beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase | \n", "
| Memory address | \n", "0x7afc6fcef290 | \n", "
| Stoichiometry | \n", "\n",
" C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c \n", "Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone \n", " | \n",
"
| GPR | c2265 | \n", "
| Lower bound | -1000 | \n", "
| Upper bound | 1000 | \n", "
| Reaction identifier | C06118_ce | \n", "
| Name | digalacturonate transport | \n", "
| Memory address | \n", "0x7afc6fceffe0 | \n", "
| Stoichiometry | \n", "\n",
" C06118_c <=> C06118_e \n", "Unsaturated digalacturonate <=> Unsaturated digalacturonate \n", " | \n",
"
| GPR | \n", " |
| Lower bound | -1000 | \n", "
| Upper bound | 1000 | \n", "
| Reaction identifier | ACALDt | \n", "
| Name | R acetaldehyde reversible - transport | \n", "
| Memory address | \n", "0x7afc6fcec560 | \n", "
| Stoichiometry | \n", "\n",
" C00084_e <=> C00084_c \n", "Acetaldehyde <=> Acetaldehyde \n", " | \n",
"
| GPR | s0001 | \n", "
| Lower bound | -1000.0 | \n", "
| Upper bound | 1000.0 | \n", "
| Reaction identifier | TRANS_RXN_455_c | \n", "
| Name | acetic acid uptake | \n", "
| Memory address | \n", "0x7afc86123ec0 | \n", "
| Stoichiometry | \n", "\n",
" CPD_24335_e --> CPD_24335_c \n", "acetic+acid --> acetic+acid \n", " | \n",
"
| GPR | G3O-32144 | \n", "
| Lower bound | 0 | \n", "
| Upper bound | 1000 | \n", "
| Reaction identifier | R04382_c | \n", "
| Name | 4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase | \n", "
| Memory address | \n", "0x7afc6fcb5c40 | \n", "
| Stoichiometry | \n", "\n",
" C06118_c <=> 2.0 C04053_c \n", "Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate \n", " | \n",
"
| GPR | \n", " |
| Lower bound | -1000 | \n", "
| Upper bound | 1000 | \n", "
| Pathway identifier | \n", "ACETOACETATE-DEG-PWY |
| Name | \n", "|
| Memory address | \n", "0x0135224490952272 |
| Reactions involved | \n", " ACETOACETYL_COA_TRANSFER_RXN_c, ACETYL_COA_ACETYLTRANSFER_RXN_c |
| Genes involved | EG12432, EG11670, EG11669, EG11672 |
| Visualization attributes |
|
" ], "text/plain": [ "
| Pathway identifier | \n", "curated_pathway |
| Name | \n", "|
| Memory address | \n", "0x0135224490393088 |
| Reactions involved | \n", " PYRUVFORMLY_RXN_c, PEPDEPHOS_RXN_c, FHLMULTI_RXN_c |
| Genes involved | G7627, EG11784, EG10701, EG10804, EG10803, EG10477, EG10480, EG10475, EG10285, EG10476, EG10479, EG10478 |
| Visualization attributes |
|
" ], "text/plain": [ "
| Pathway identifier | \n", "curated_pathway |
| Name | \n", "|
| Memory address | \n", "0x0135225782733520 |
| Reactions involved | \n", " GLCpts, G6PDH2r, PGL, GND |
| Genes involved | b1819, b1621, b1817, b1101, b2415, b2417, b1818, b2416, b1852, b0767, b2029 |
| Visualization attributes |
|
" ], "text/plain": [ "