1.1. Functions¶
[1]:
from IPython.display import display
from cobramod import __version__
from cobra import __version__ as cobra_version
print(f'CobraMod version: {__version__}')
print(f'COBRApy version: {cobra_version}')
# From Escher:
# This option turns off the warning message if you leave or refresh this page
import escher
escher.rc['never_ask_before_quit'] = True
CobraMod version: 1.2.0
COBRApy version: 0.29.0
1.1.1. Retrieving metabolic pathway information¶
CobraMod can retrieve metabolic pathway information (metabolites, reactions or pathways) from various databases by using database-specific identifiers. It supports all databases from the BioCyc collection, Plant Metabolic Network (PMN), the KEGG database and the BiGG Models repository. Call cobramod.retrieval.Databases to see all supported databases.
[2]:
from cobramod.retrieval import Databases
Databases()
[2]:
CobraMod supports BioCyc, the Plant Metabolic Network (PMN), KEGG and BiGG Models repository. BioCyc includes around 18.000 sub-databases. A complete list for BioCyc can be found at: 'https://biocyc.org/biocyc-pgdb-list.shtml'.
The database-specific identifiers can be found in the URL of the respective data.CobraMod uses abbreviations to represent the databases or sub-databases:
| Abbreviation | Database name |
| META | Biocyc, subdatabase MetaCyc |
| ARA | Biocyc, subdatabase AraCyc |
| KEGG | Kyoto encyclopedia of Genes and Genomes |
| BIGG | BiGG Models |
| PMN:META | PlantCyc, subdatabase META |
| PMN:ARA | PlantCyc, subdatabase ARA |
This applies for all subdatabases from BioCyc and Plantcyc
The user can download the metabolic pathway information using the cobramod.get_data function. In this example we download information from the BioCyc sub-database YEAST.
NOTE: In order to retrieved data from Metacyc. An account is required. Create a file called credentials.txt and add in the first line the username and the password in the second line
my_username
secret_password
Only after setting the credentials, CobraMod is able to download data from BioCyc
[3]:
from cobramod import get_data
from pathlib import Path
dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
"CPD-12575",
"BETA-D-FRUCTOSE",
"SUCROSE",
"UDP",
]
for metabolite in identifiers:
get_data(
directory=dir_data,
identifier=metabolite,
database="YEAST"
)
The first argument in cobramod.get_data() is the system path where CobraMod stores the metabolic pathway information. CobraMod uses pathlib for path representation. The second argument is the data identifier used in the respective database. In this example we retrieve data from the BioCyc sub-database YEAST. The last argument is the abbreviation of the database.
CobraMod creates a directory with the name of the database and stores the metabolic pathway information in it:
data
`-- YEAST
|-- CPD-12575.xml
|-- BETA-D-FRUCTOSE.xml
|-- SUCROSE.xml
|-- UDP.xml
1.1.2. Converting stored-data into COBRApy objects¶
CobraMod can convert metabolic pathway information (metabolites and reactions) into COBRApy objects (cobra.Reaction and cobra.Metabolite). It can thus be seamlessly integrated with a COBRApy workflow.
The function cobramod.create_object() creates COBRApy objects from metabolic pathway data retrieved by cobramod.get_data(). If no pathway information was downloaded cobramod.create_object() retrieves it automatically.
In this example, we convert the metabolite 2-Oxoglutarate with the KEGG identifier C00026 into a COBRApy object. CobraMod automatically identifies the KEGG entry as a metabolite and converts it into the corresponding COBRApy metabolite.
The first argument is the database-specific identifier (C00026) followed by the database abbreviation (KEGG). The third argument is the path representation for the directory of the metabolic pathway information. CobraMod downloads the data into this directory and utilize it instead of downloading it again. The last argument is the compartment of the reaction (c for cytosol).
[4]:
from cobramod import create_object
from pathlib import Path
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
new_object = create_object(
identifier="C00026",
database="KEGG",
directory=dir_data,
compartment="c"
)
print(type(new_object))
new_object
<class 'cobra.core.metabolite.Metabolite'>
[4]:
| Metabolite identifier | C00026_c |
| Name | alpha-Ketoglutaric acid |
| Memory address | 0x7afc8d2703e0 |
| Formula | C5H6O5 |
| Compartment | c |
| In 0 reaction(s) |
In the second example, we convert the reaction RXN-11502 from the PMN sub-database CORN into a COBRApy reaction. The first argument is the database-specific identifier (RXN-11502) followed by the database identifier (pmn:CORN). CobraMod automatically takes reversibility and gene information from the database entry and adds it to the reaction object.
[5]:
new_object = create_object(
identifier="RXN-11501",
database="pmn:CORN",
directory=dir_data,
compartment="c"
)
print(type(new_object))
display(new_object)
new_object.genes
Gene-reaction rule for reaction "RXN-11501" set to "OR". Please modify it if necessary.
<class 'cobra.core.reaction.Reaction'>
| Reaction identifier | RXN_11501_c |
| Name | alkaline α- galactosidase |
| Memory address | 0x7afcc0510c80 |
| Stoichiometry |
CPD_170_c + WATER_c --> ALPHA_D_GALACTOSE_c + CPD_1099_c stachyose + H2O --> alpha-D-galactopyranose + raffinose |
| GPR | ZM00001EB033890 or ZM00001EB033880 or ZM00001EB033870 |
| Lower bound | 0 |
| Upper bound | 1000 |
[5]:
frozenset({<Gene ZM00001EB033870 at 0x7afc8ce13350>,
<Gene ZM00001EB033880 at 0x7afcc0511670>,
<Gene ZM00001EB033890 at 0x7afca7531e50>})
1.1.3. Adding metabolites¶
The function cobramod.add_metabolites() extends the COBRApy function model.add_metabolites() and can be used with a simple syntax. It can utilize a single string, a list of strings, a file path or a COBRAPy metabolite object. In the next examples we showcase these options. We use the E. coli core model from COBRApy
as test model. This core model can be found under cobramod.test.textbook. If the argument obj is used as a string, it can be the database-specific identifier of the respective metabolite and its compartment or a user-defined metabolite. It uses the following syntax:
SYNTAX
To add metabolite information from a database:
database-specific_identifier, compartment
To add user-curated metabolites:
user-curated_identifier, name, compartment, chemical_formula, molecular_charge
In the first example, we add the metabolite L-methionine with the MetaCyc identifier MET to the test model. The first argument is the model name. The argument obj contains the identifier MET and the compartment c. The second argument is the database identifier (META) and the third argument is the directory where CobraMod stores the metabolite information.
[6]:
from cobramod import add_metabolites
from cobramod.test import textbook
from pathlib import Path
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook.copy()
add_metabolites(
model=test_model,
obj=["MET, c"],
database="META",
directory=dir_data,
)
print(type(test_model.metabolites.get_by_id("MET_c")))
test_model.metabolites.get_by_id("MET_c")
Metabolite "MET_c" was added to model.
<class 'cobra.core.metabolite.Metabolite'>
[6]:
| Metabolite identifier | MET_c |
| Name | L-methionine |
| Memory address | 0x7afc8ca6a8a0 |
| Formula | C5H11N1O2S1 |
| Compartment | c |
| In 0 reaction(s) |
In the second example, we add two metabolites (acetyl-coa and sucrose) from MetaCyc. In the argument obj we define a list with the database-specific metabolite identifiers and the respecitive compartments. The rest of the arguments remains the same as in the previous example. Before adding a metabolite, CobraMod tests if the given metabolite is already present in the model
(potentially with a different identifier) by checking the cross-reference database entries in the metabolite’s metadata against the model’s metabolite entries. If CobraMod identifies a metabolite as already present in the model it skips adding the metabolite, used the already present one, and prints a warning.
[7]:
add_metabolites(
model=test_model,
obj=["ACETYL-COA, c", "SUCROSE, c"],
database="META",
directory=dir_data,
)
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("accoa_c"))
test_model.metabolites.get_by_id("SUCROSE_c")
Metabolite "ACETYL_COA_c" was added to model.
Metabolite "SUCROSE_c" was added to model.
| Metabolite identifier | accoa_c |
| Name | Acetyl-CoA |
| Memory address | 0x7afc8c95c5c0 |
| Formula | C23H34N7O17P3S |
| Compartment | c |
| In 7 reaction(s) | ACALD, MALS, PDH, CS, Biomass_Ecoli_core, PFL, PTAr |
[7]:
| Metabolite identifier | SUCROSE_c |
| Name | sucrose |
| Memory address | 0x7afc8c9a53d0 |
| Formula | C12H22O11 |
| Compartment | c |
| In 0 reaction(s) |
In the third example, we use a text file to add metabolites to the test model. We use the file metabolites.txt in the current working directory with the following content:
SUCROSE, c
MET, c
MALTOSE_c, MALTOSE[c], c, C12H22O11, 1
In this example, CobraMod downloads the first two metabolites from MetaCyc, while MALTOSE_c is a user-defined metabolite. The user can specify the path to this file in the obj argument. The remaining arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the metabolites to the model.
[8]:
from cobramod.test import textbook_biocyc
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# This is our file
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()
print(f'Number of metabolites prior addition: {len(test_model.metabolites)}')
# Using CobraMod
add_metabolites(
model=test_model,
obj=file,
directory=dir_data,
database="META",
)
print(f'Number of metabolites after addition: {len(test_model.metabolites)}')
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("MET_c"))
display(test_model.metabolites.get_by_id("SUCROSE_c"))
test_model.metabolites.get_by_id("MALTOSE_c")
Metabolite "SUCROSE_c" was added to model.
Metabolite "MET_c" was added to model.
Metabolite "MALTOSE_c" was added to model.
Number of metabolites prior addition: 72
Number of metabolites after addition: 75
| Metabolite identifier | MET_c |
| Name | L-methionine |
| Memory address | 0x7afc8c9a7410 |
| Formula | C5H11N1O2S1 |
| Compartment | c |
| In 0 reaction(s) |
| Metabolite identifier | SUCROSE_c |
| Name | sucrose |
| Memory address | 0x7afc875b01a0 |
| Formula | C12H22O11 |
| Compartment | c |
| In 0 reaction(s) |
[8]:
| Metabolite identifier | MALTOSE_c |
| Name | MALTOSE[c] |
| Memory address | 0x7afc8c7399a0 |
| Formula | C12H22O11 |
| Compartment | c |
| In 0 reaction(s) |
Since cobramod.add_metabolites() is an extension of the COBRApy function model.add_metabolites() the user can also utilize COBRApy metabolites. In the following example, we use a variation of the test model (textbook_biocyc) which uses BioCyc metabolite identifiers. We copy a COBRApy metabolite from the test model
and add it to the BioCyc test model.
[9]:
from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc
# Copying Metabolite from original model
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")
# Using a copy
test_model = textbook_biocyc.copy()
add_metabolites(
model=test_model,
obj=[metabolite]
)
test_model.metabolites.get_by_id("xu5p__D_c")
Metabolite "xu5p__D_c" was added to model.
[9]:
| Metabolite identifier | xu5p__D_c |
| Name | D-Xylulose 5-phosphate |
| Memory address | 0x7afc8cb758e0 |
| Formula | C5H9O8P |
| Compartment | c |
| In 3 reaction(s) | TKT2, TKT1, RPE |
If CobraMod detects large molecules (e.g. enzymes) or if the metabolite information does not include a chemical formula the user receives a warning. In this example, we use the enzyme with the MetaCyc identifier Red-NADPH-Hemoprotein-Reductases and add it to the test model. CobraMod raises a warning due to the missing chemical formula.
[10]:
# Using a copy
test_model = textbook.copy()
add_metabolites(
model=test_model,
obj=["Red-NADPH-Hemoprotein-Reductases, c"],
directory=dir_data,
database="META",
)
test_model.metabolites.get_by_id("Red_NADPH_Hemoprotein_Reductases_c")
Curated metabolite "Red_NADPH_Hemoprotein_Reductases_c does not include a chemical formula
Metabolite "Red_NADPH_Hemoprotein_Reductases_c" was added to model.
[10]:
| Metabolite identifier | Red_NADPH_Hemoprotein_Reductases_c |
| Name | Red-NADPH-Hemoprotein-Reductases |
| Memory address | 0x7afc86120710 |
| Formula | X |
| Compartment | c |
| In 0 reaction(s) |
NOTES
CobraMod replaces hyphens (
-) with underscores (_) in the identifiers when creating COBRApy metabolites.When adding several metabolites the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.
CobraMod saves the curation process in a directory ‘logs’
1.1.4. Adding reactions¶
The function cobramod.add_reactions() extends the COBRApy function model.add_reactions() and can be used with a simple syntax. It can utilize a single string, a list of string, a file path or a COBRApy reaction object. In the next examples we showcase these options. Again, we use the E. coli core model from COBRApy as test model.
If the argument obj is used as a string, it can be the database-specific identifier of the respective reaction and its compartment or a user-defined reaction. It uses the following syntax:
SYNTAX
To add reaction information from a database:
database-specific_identifier, compartment
To add user-defined reactions (curated-reactions), the user can specify the identifier and name of the reaction following the COBRApy reaction string syntax:
user-curated_identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2
Metabolites must contain a suffix which specifies the compartment. This is given by an underscore (_) followed by the compartment-abbreviation. In this example, we create a transport reaction for oxygen between the external comparment (e) and the cytosol (c):
TRANS_H2O_ec, Oxygen Transport | 2 OXYGEN-MOLECULE_e <-> 2 OXYGEN_MOLECULE_c
In the first example, we add the BIGG reaction AKGDH to a test model. This time we use a text model which is a variaton of the original core model and uses KEGG identifiers (textbook_kegg). The first argument is the model name. The obj argument takes the identifier AKGDH and the compartment c. The next argument is the database identifier (BIGG) and the last argument is the directory where CobraMod stores and retrieves the
metabolic pathway information from. The argument model_id is a BIGG-specific argument. Please read the notes below for more information.
Because the test model uses KEGG identifiers, CobraMod finds the reaction AKGDH with the KEGG identifier R08549_c.
[11]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_kegg.copy()
add_reactions(
model=test_model,
obj=["AKGDH, c"],
database="BIGG",
directory=dir_data,
model_id="e_coli_core"
)
display(test_model.reactions.get_by_id("R08549_c"))
Reaction "R08549_c" is already present in the model. Skipping additions.
| Reaction identifier | R08549_c |
| Name | 2-Oxogluterate dehydrogenase |
| Memory address | 0x7afc86120fb0 |
| Stoichiometry |
C00003_c + C00010_c + C00026_c --> C00004_c + C00011_c + C00091_c Nicotinamide adenine dinucleotide + Coenzyme A + 2-Oxoglutarate --> Nicotinamide adenine dinucleotide - reduced + CO2 + Succinyl-CoA |
| GPR | b0116 and b0726 and b0727 |
| Lower bound | 0.0 |
| Upper bound | 1000.0 |
In the second example, we add two reactions (R00315 and R02736) from KEGG. The argument obj is a list with the database-specific identifiers and the respective compartments of the reactions. The majority of the arguments remains the same as in the previous example. The argument genome is a KEGG-specific argument. Please read the notes below for more information.
CobraMod parses the reaction information for gene identifiers and automatically adds them to the COBRApy reaction. In this example, CobraMod creates the genes 2296, b3115 and b1849, and adds it to the first reaction R00315.
[12]:
add_reactions(
model=test_model,
obj=["R00315, c", "R02736 ,c"],
directory=dir_data,
database="KEGG",
genome="ecc"
)
display(test_model.reactions.get_by_id("R00315_c"))
print(test_model.reactions.get_by_id("R00315_c").genes)
test_model.reactions.get_by_id("R02736_c")
Gene-reaction rule for reaction "R00315" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R00315_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.
| Reaction identifier | R00315_c |
| Name | acetate kinase |
| Memory address | 0x7afc86121160 |
| Stoichiometry |
C00002_c + C00033_c <=> C00227_c + G11113_c ATP + Acetate <=> Acetyl phosphate + ADP |
| GPR | c2838 |
| Lower bound | -1000.0 |
| Upper bound | 1000.0 |
frozenset({<Gene c2838 at 0x7afc7eb8c080>})
[12]:
| Reaction identifier | R02736_c |
| Name | beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase |
| Memory address | 0x7afc8cafdc40 |
| Stoichiometry |
C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone |
| GPR | c2265 |
| Lower bound | -1000 |
| Upper bound | 1000 |
In the following example, we use a text file to add reactions to the test model. To this end, we use the file reactions.txt in the current working directory. This file has the following content:
R04382, c
R02736, c
C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e
CobraMod downloads the first two reactions from KEGG. C06118_ce is a user-defined reaction. The user can specify the file path in the obj argument. The next arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the reaction to the model.
[13]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
# This is the file with text
file = dir_data.joinpath("reactions.txt")
print(f'Number of reactions prior addition: {len(test_model.reactions)}')
add_reactions(
model=test_model,
obj=file,
directory=dir_data,
database="KEGG",
genome="ecc"
)
print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Show in jupyter
display(test_model.reactions.get_by_id("R04382_c"))
display(test_model.reactions.get_by_id("R02736_c"))
test_model.reactions.get_by_id("C06118_ce")
Gene-reaction rule for reaction "R04382" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Number of reactions prior addition: 95
Reaction "R04382_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.
Reaction "C06118_ce" was added to model.
Number of reactions after addition: 98
| Reaction identifier | R04382_c |
| Name | 4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase |
| Memory address | 0x7afc6fcd64b0 |
| Stoichiometry |
C06118_c <=> 2.0 C04053_c Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate |
| GPR | c0319 |
| Lower bound | -1000 |
| Upper bound | 1000 |
| Reaction identifier | R02736_c |
| Name | beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase |
| Memory address | 0x7afc6fcef290 |
| Stoichiometry |
C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone |
| GPR | c2265 |
| Lower bound | -1000 |
| Upper bound | 1000 |
[13]:
| Reaction identifier | C06118_ce |
| Name | digalacturonate transport |
| Memory address | 0x7afc6fceffe0 |
| Stoichiometry |
C06118_c <=> C06118_e Unsaturated digalacturonate <=> Unsaturated digalacturonate |
| GPR | |
| Lower bound | -1000 |
| Upper bound | 1000 |
Since cobramod.add_reactions() is an extension of the original COBRApy function model.add_reactions() the user can also utilize COBRApy reactions. In this example, we use the test model (textbook_kegg). We copy a COBRApy reaction from the test model and add it to the KEGG test model.
[14]:
from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path
# Using copy of test model
test_model = textbook_kegg.copy()
# Obtaining a reaction
reaction = textbook.reactions.get_by_id("ACALDt")
add_reactions(model=test_model, obj=[reaction], directory=dir_data)
test_model.reactions.get_by_id("ACALDt")
Reaction "ACALDt" is already present in the model. Skipping additions.
[14]:
| Reaction identifier | ACALDt |
| Name | R acetaldehyde reversible - transport |
| Memory address | 0x7afc6fcec560 |
| Stoichiometry |
C00084_e <=> C00084_c Acetaldehyde <=> Acetaldehyde |
| GPR | s0001 |
| Lower bound | -1000.0 |
| Upper bound | 1000.0 |
By default, COBRApy ignores metabolites that appear on both sides of a reaction equation. CobraMod identifies such reactions and assigns one of these metabolites to the extracellular compartment and raises a warning suggesting manual curation. In the following example, we add a transport reaction for acetic acid from BioCyc sub-database YEAST to the test model.
[15]:
test_model = textbook_kegg.copy()
add_reactions(
model=test_model,
obj=["TRANS-RXN-455, c"],
database="YEAST",
directory=dir_data,
)
# Show in jupyter
test_model.reactions.get_by_id("TRANS_RXN_455_c")
Gene-reaction rule for reaction "TRANS-RXN-455" set to "OR". Please modify it if necessary.
Reaction "TRANS_RXN_455_c" has the same metabolite on both sides of the equation (e.g transport reaction). COBRApy ignores these metabolites. To avoid this, by default, CobraMod will assign the reactant side to the extracellular compartment. Please curate the reaction if necessary.
Reaction "TRANS_RXN_455_c" is already present in the model. Skipping additions.
[15]:
| Reaction identifier | TRANS_RXN_455_c |
| Name | acetic acid uptake |
| Memory address | 0x7afc86123ec0 |
| Stoichiometry |
CPD_24335_e --> CPD_24335_c acetic+acid --> acetic+acid |
| GPR | G3O-32144 |
| Lower bound | 0 |
| Upper bound | 1000 |
NOTES
CobraMod replaces hyphens (
-) to underscores (_) in the identifiers when creating COBRApy reactions.When adding several reactions the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.
CobraMod tries to identify if a given metabolites or reactions are already present in the model and used those instead of adding redundant entries.
CobraMod saves the curation process to the file
debug.log.The argument
model_idis specific for the BiGG Models repository. CobraMod can download metabolite and reaction information directly from the BIGG models. A list of all models can be found here.The argument
genomecan be used with the databaseKEGGand specifies the genome for which gene information will be retrieved. The complete list of all available genomes can be found here. If no genome is specified, no gene information will be retrieved and a warning is printed as shown below:
[16]:
test_model = textbook_kegg.copy()
add_reactions(
model=test_model,
obj=["R04382, c"],
database="KEGG",
directory=dir_data,
)
test_model.reactions.get_by_id("R04382_c")
Nothing was specified in argument "genome". Reaction "R04382" will not include genes. Please modify if necessary.
Reaction "R04382_c" is already present in the model. Skipping additions.
[16]:
| Reaction identifier | R04382_c |
| Name | 4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase |
| Memory address | 0x7afc6fcb5c40 |
| Stoichiometry |
C06118_c <=> 2.0 C04053_c Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate |
| GPR | |
| Lower bound | -1000 |
| Upper bound | 1000 |
1.1.5. Adding pathways¶
CobraMod can add metabolic pathways to a given model. The function cobramod.add_pathway() takes as an argument either a sequence of database-specific reaction identifiers or a single pathway identifier. CobraMod download the pathway information from the databases, creates the COBRApy objects and prints a short summary about the curation process when adding the respective pathway. Additionally, to ensure reproducibility and for tracking the curation process, the user can write changes to file. CobraMod creates the object cobramod.Pathway which contains the pathway information.
When adding serveral pathways, we recommend adding pathways one-by-one to perform neccessary curation steps inbetween. In the examples below we showcase two options. Again, we use the E. coli core model from COBRApy as test model.
In the first example, we add the acetoacetate degradation pathway from the BioCyc sub-database ECOLI to the test model. This pathway contains two reactions and six metabolites.
In this example, the first argument is the model to extend. The pathway argument uses the database-specific identifier ACETOACETATE-DEG-PWY and the database identifier ECOLI. We define the compartment as c (cytosol), i.e. all metabolites and reactions will be assigned to this compartment.
The argument filename specifies a file to which a summary of the changes is written. Additionally, add_pathway() prints a summary of the changes to the model. Calling the respective cobramod.Pathway outputs a table with the main attributes of the object.
All COBRApy reactions of the pathway are tested for their capacity to cary a non-zero flux. Read more about it in the non-zero flux test section.
[17]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy of test model
test_model = textbook.copy()
add_pathway(
model=test_model,
pathway="ACETOACETATE-DEG-PWY",
database="ECOLI",
compartment="c",
filename="summary.txt",
directory=dir_data,
)
# Display in jupyter
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")
Gene-reaction rule for reaction "ACETOACETYL-COA-TRANSFER-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "ACETYL-COA-ACETYLTRANSFER-RXN" set to "OR". Please modify it if necessary.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" unbalanced. Following atoms are affected. Please verify: {'charge': -4.0, 'C': 23.0, 'H': 34.0, 'N': 7.0, 'O': 17.0, 'P': 3.0, 'S': 1.0}
The model cannot turnover the following metabolites ['3_KETOBUTYRATE_c', 'accoa_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'ACETOACETYL_COA_TRANSFER_RXN_c' passed.
Reaction "ACETOACETYL_COA_TRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Non-zero flux test for reaction 'ACETYL_COA_ACETYLTRANSFER_RXN_c' passed.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Auxiliary sink reaction for "SK_accoa_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_3_KETOBUTYRATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Number of new | removed entities in
*=====================|===================*
Reactions 2 | 0
Metabolites 2 | 0
Exchange 0 | 0
Demand 0 | 0
Sinks 2 | 0
Genes 4 | 0
Groups 1 | 0
[17]:
| Pathway identifier | ACETOACETATE-DEG-PWY |
| Name | |
| Memory address | 0x0135224490952272 |
| Reactions involved | ACETOACETYL_COA_TRANSFER_RXN_c, ACETYL_COA_ACETYLTRANSFER_RXN_c |
| Genes involved | EG12432, EG11670, EG11669, EG11672 |
| Visualization attributes |
|
Below is an example of the summary in form of a text file. The first part lists names of all reactions, metabolites, exchange reactions, auxiliary demand and sink reactions, genes, and groups in the model. The second part of the summary lists all elements that were added or removed by the function call add_pathway().
[18]:
%cat summary.txt
Summary:
Model identifier: e_coli_core
Model name:
None
Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'TPI', 'ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['13dpg_c', '2pg_c', '3pg_c', '6pgc_c', '6pgl_c', 'ac_c', 'ac_e', 'acald_c', 'acald_e', 'accoa_c', 'acon_C_c', 'actp_c', 'adp_c', 'akg_c', 'akg_e', 'amp_c', 'atp_c', 'cit_c', 'co2_c', 'co2_e', 'coa_c', 'dhap_c', 'e4p_c', 'etoh_c', 'etoh_e', 'f6p_c', 'fdp_c', 'for_c', 'for_e', 'fru_e', 'fum_c', 'fum_e', 'g3p_c', 'g6p_c', 'glc__D_e', 'gln__L_c', 'gln__L_e', 'glu__L_c', 'glu__L_e', 'glx_c', 'h2o_c', 'h2o_e', 'h_c', 'h_e', 'icit_c', 'lac__D_c', 'lac__D_e', 'mal__L_c', 'mal__L_e', 'nad_c', 'nadh_c', 'nadp_c', 'nadph_c', 'nh4_c', 'nh4_e', 'o2_c', 'o2_e', 'oaa_c', 'pep_c', 'pi_c', 'pi_e', 'pyr_c', 'pyr_e', 'q8_c', 'q8h2_c', 'r5p_c', 'ru5p__D_c', 's7p_c', 'succ_c', 'succ_e', 'succoa_c', 'xu5p__D_c', '3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
['EX_ac_e', 'EX_acald_e', 'EX_akg_e', 'EX_co2_e', 'EX_etoh_e', 'EX_for_e', 'EX_fru_e', 'EX_fum_e', 'EX_glc__D_e', 'EX_gln__L_e', 'EX_glu__L_e', 'EX_h_e', 'EX_h2o_e', 'EX_lac__D_e', 'EX_mal__L_e', 'EX_nh4_e', 'EX_o2_e', 'EX_pi_e', 'EX_pyr_e', 'EX_succ_e']
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['b1241', 'b0351', 's0001', 'b3115', 'b1849', 'b2296', 'b1276', 'b0118', 'b0474', 'b0116', 'b0726', 'b0727', 'b2587', 'b0356', 'b1478', 'b3735', 'b3733', 'b3734', 'b3732', 'b3736', 'b3738', 'b3731', 'b3737', 'b3739', 'b0720', 'b0733', 'b0979', 'b0978', 'b0734', 'b2975', 'b3603', 'b2779', 'b1773', 'b2925', 'b2097', 'b4232', 'b3925', 'b0904', 'b2492', 'b4154', 'b4152', 'b4153', 'b4151', 'b1819', 'b1817', 'b2415', 'b1818', 'b2416', 'b1611', 'b4122', 'b1612', 'b3528', 'b1852', 'b1779', 'b1621', 'b1101', 'b2417', 'b3870', 'b1297', 'b0809', 'b0810', 'b0811', 'b1761', 'b1524', 'b1812', 'b0485', 'b3213', 'b3212', 'b4077', 'b2029', 'b0875', 'b1136', 'b4015', 'b2133', 'b1380', 'b4014', 'b2976', 'b3236', 'b1479', 'b2463', 'b2286', 'b2279', 'b2276', 'b2280', 'b2288', 'b2284', 'b2287', 'b2281', 'b2277', 'b2285', 'b2282', 'b2278', 'b2283', 'b3962', 'b1602', 'b1603', 'b0451', 'b0114', 'b0115', 'b3916', 'b1723', 'b0902', 'b0903', 'b2579', 'b3114', 'b3952', 'b3951', 'b4025', 'b2926', 'b0767', 'b0755', 'b3612', 'b4395', 'b3493', 'b2987', 'b3956', 'b3403', 'b1702', 'b2297', 'b2458', 'b1676', 'b1854', 'b4301', 'b3386', 'b2914', 'b4090', 'b0722', 'b0724', 'b0721', 'b0723', 'b0728', 'b0729', 'b2464', 'b0008', 'b2935', 'b2465', 'b3919', 'EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']
New:
Reactions:
['ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
[]
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']
Removed:
Reactions:
[]
Metabolites:
[]
Exchange:
[]
Demand:
[]
Sinks:
[]
Genes:
[]
Groups:
[]
In the next example, we use a list of database-specific reaction identifiers as pathway argument. We use the database identifier ECOLI and the compartment c (cytosol). Additionally, we define a pathway name by using the argument group. The user can also use this argument to merge pathways by using the same group names.
[19]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_biocyc.copy()
# Defining database-specific identifiers
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]
print(f'Number of reaction prior addition: {len(test_model.reactions)}')
add_pathway(
model=test_model,
pathway=sequence,
directory=dir_data,
database="ECOLI",
compartment="c",
group="curated_pathway"
)
print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Display in jupyter
test_model.groups.get_by_id("curated_pathway")
Gene-reaction rule for reaction "PEPDEPHOS-RXN" set to "OR". Please modify it if necessary.
Number of reaction prior addition: 95
Gene-reaction rule for reaction "PYRUVFORMLY-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "FHLMULTI-RXN" set to "OR". Please modify it if necessary.
Non-zero flux test for reaction 'PEPDEPHOS_RXN_c' passed.
Reaction "PEPDEPHOS_RXN_c" added to group "curated_pathway".
The model cannot turnover the following metabolites ['FORMATE_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'PYRUVFORMLY_RXN_c' passed.
Reaction "PYRUVFORMLY_RXN_c" added to group "curated_pathway".
Non-zero flux test for reaction 'FHLMULTI_RXN_c' passed.
Reaction "FHLMULTI_RXN_c" added to group "curated_pathway".
Auxiliary sink reaction for "SK_FORMATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_HYDROGEN_MOLECULE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Number of new | removed entities in
*=====================|===================*
Reactions 3 | 0
Metabolites 2 | 0
Exchange 0 | 0
Demand 0 | 0
Sinks 2 | 0
Genes 12 | 0
Groups 1 | 0
Number of reactions after addition: 100
[19]:
| Pathway identifier | curated_pathway |
| Name | |
| Memory address | 0x0135224490393088 |
| Reactions involved | PYRUVFORMLY_RXN_c, PEPDEPHOS_RXN_c, FHLMULTI_RXN_c |
| Genes involved | G7627, EG11784, EG10701, EG10804, EG10803, EG10477, EG10480, EG10475, EG10285, EG10476, EG10479, EG10478 |
| Visualization attributes |
|
NOTES
A pathway is a set of COBRApy reactions. All the notes listed for
add_metabolites()andadd_reactions()also apply to pathways, i. e., handling of duplicate elements, transport reactions and the argumentgenomefor KEGG.CobraMod saves the curation process in the directory
logs. The logs are saved by date.
1.1.6. Non-zero flux test¶
When calling the function add_pathway(), CobraMod tests each reaction of the Pathway object for its capability to carry a non-zero flux, i.e., if the involved metabolites can be turned over. Additionally, the user can test individual COBRApy reactions for their capability to cary a non-zero flux by using the function cobramod.test_non_zero_flux(). When running add_pathway() CobraMod tries to maximize (minimize if operating
backwards) the flux through the tested reaction. If the flux is below solver tolerance CobraMod adds auxiliary sink reactions to all metabolites participating in the reaction and subsequently remove them one by one if the respective metabolites can be turned over by reactions in
the model. CobraMod then raises a warning about the remaining auxiliary sink reactions.
Based on this information the user can then manually curate the model and if necessary add the missing synthesis reactions for the respective metabolites. If a reaction passes the test CobraMod prints a short message that the test was sucessfully passed.
The user can also use the argument ignore_list when adding a pathway to skip the non-zero-flux test for the specified reactions.
In the following artifical example (E. coli is a prokaryote without compartmentation), we introduce the glutathione synthase reaction (GLUTATHIONE-SYN-RXN) into a new comparment (x) and test the reaction for its capability to carry a non-zero flux. This reaction has the following equation:
ATP_x + GLY_x + L_GAMMA_GLUTAMYLCYSTEINE_x --> ADP_x + GLUTATHIONE_x + PROTON_x + Pi_x
Because we are testing a reaction in a new, empty compartment, CobraMod creates auxiliary sink reactions for the involved metabolites and prints a warning.
[20]:
from cobramod import test_non_zero_flux, add_reactions
from cobramod.test import textbook_biocyc
test_model = textbook_biocyc.copy()
# Adding the reaction into the test model
add_reactions(
model=test_model,
# The list has 4 metabolites
obj=["GLUTATHIONE-SYN-RXN, x"],
directory=dir_data,
database="ECOLI",
)
# Running test for GLUTATHIONE_SYN_RXN_x
test_non_zero_flux(
model=test_model,
reaction="GLUTATHIONE_SYN_RXN_x",
)
Gene-reaction rule for reaction "GLUTATHIONE-SYN-RXN" set to "OR". Please modify it if necessary.
Reaction "GLUTATHIONE_SYN_RXN_x" is already present in the model. Skipping additions.
Non-zero flux test for reaction 'GLUTATHIONE_SYN_RXN_x' passed.
Auxiliary sink reaction for "SK_GLUTATHIONE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ADP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_PROTON_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_L_GAMMA_GLUTAMYLCYSTEINE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_GLY_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ATP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_Pi_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Reaction GLUTATHIONE_SYN_RXN_x passed the non-zero-flux test.
1.1.7. Curation process¶
CobraMod automatically performs the following curation steps during the creation of COBRApy reaction and metabolite objects and CobraMod pathway objects:
If CobraMod encounters biochemical information (e.g. reactions or metabolites) with missing formulas or entries, a warning is printed and CobraMod informs the user about the missing data. For instance, CobraMod assigns an “X” to metabolites with missing formulas.
When adding a metabolite, reaction or pathway to the model, CobraMod tries to identify if these elements are already present in the model (potentially with a different name) and use the exiting elements instead of creating redundant ones (by checking the cross-reference database entries in the elements’s metadata against the model entries).
In the case of user-defined, curated reactions CobraMod tries to find the respective metabolites in the model. If a metabolites is not found, a warning is printed and the metabolite is created with “X” as chemical formula.
When adding metabolites, reactions or pathways, CobraMod can query MetaNetX and PubChem to find cross-references and adds the information to the corresponding Reactions and Metabolites (See Using the automatic cross-reference expansion in the documentation).
CobraMod utilizes the COBRApy method cobra.Reaction.check_mass_balance() and returns a warning if mass imbalances are found.
CobraMod uses the reaction reversibility information provided with the obtained reaction data. If the reversibility information is missing, CobraMod raises a warning.
When CobraMod adds a pathway, each pathway reaction undergos a non-zero flux test. If a reaction cannot carry a non-zero flux, CobraMod adds auxiliary sink reactions to unblock the reaction and suggests manual curation steps based on these auxiliary modifications. (See Non-zero flux test in the documentation).
CobraMod offers the function
add_crossreferenceswhich queries the genome annotation database MetaNetX to adds the missing meta-data about the cross-references.
1.1.8. Converting COBRApy groups back to CobraMod pathways¶
The COBRApy function cobra.io.write_sbml_model() writes cobra models to sbml files. If the user calls this function with a model that contains cobramod.Pathway elements these will be saved as COBRApy Group elements and the CobraMod pathway
objects are lost. To convert COBRApy groups back to CobraMod pathways (and use the associated functionalities), the user can call the function cobramod.model_convert(). In the following example, we create a Group with four reactions and add it to the model. We then use the cobramod.model_convert() function with the argument
model and can then call the newly converted respective CobraMod pathway objects.
[21]:
from cobramod import model_convert
from cobramod.test import textbook_biocyc
from cobra.core.group import Group
test_model = textbook_biocyc.copy()
# Creation of group
test_group = Group(id="curated_pathway")
for reaction in ("GLCpts", "G6PDH2r", "PGL", "GND"):
test_group.add_members([test_model.reactions.get_by_id(reaction)])
test_model.add_groups([test_group])
# Conversion to a Pathway
model_convert(model=test_model)
# Display to Jupyter
test_model.groups.get_by_id("curated_pathway")
Group-object "curated_pathway" was transformed to a Pathway-object.
[21]:
| Pathway identifier | curated_pathway |
| Name | |
| Memory address | 0x0135225782733520 |
| Reactions involved | GLCpts, G6PDH2r, PGL, GND |
| Genes involved | b1819, b1621, b1817, b1101, b2415, b2417, b1818, b2416, b1852, b0767, b2029 |
| Visualization attributes |
|