1.1. Functions¶

[1]:

from IPython.display import display
from cobramod import __version__
from cobra import __version__ as cobra_version
print(f'CobraMod version: {__version__}')
print(f'COBRApy version: {cobra_version}')
# From Escher:
# This option turns off the warning message if you leave or refresh this page
import escher
escher.rc['never_ask_before_quit'] = True

CobraMod version: 1.2.0
COBRApy version: 0.29.0

1.1.1. Retrieving metabolic pathway information¶

CobraMod can retrieve metabolic pathway information (metabolites, reactions or pathways) from various databases by using database-specific identifiers. It supports all databases from the BioCyc collection, Plant Metabolic Network (PMN), the KEGG database and the BiGG Models repository. Call cobramod.retrieval.Databases to see all supported databases.

[2]:

from cobramod.retrieval import Databases

Databases()

[2]:

CobraMod supports BioCyc, the Plant Metabolic Network (PMN), KEGG and BiGG Models repository. BioCyc includes around 18.000 sub-databases. A complete list for BioCyc can be found at: 'https://biocyc.org/biocyc-pgdb-list.shtml'.
The database-specific identifiers can be found in the URL of the respective data.CobraMod uses abbreviations to represent the databases or sub-databases:

Abbreviation	Database name
META	Biocyc, subdatabase MetaCyc
ARA	Biocyc, subdatabase AraCyc
KEGG	Kyoto encyclopedia of Genes and Genomes
BIGG	BiGG Models
PMN:META	PlantCyc, subdatabase META
PMN:ARA	PlantCyc, subdatabase ARA

This applies for all subdatabases from BioCyc and Plantcyc

The user can download the metabolic pathway information using the cobramod.get_data function. In this example we download information from the BioCyc sub-database YEAST.

NOTE: In order to retrieved data from Metacyc. An account is required. Create a file called credentials.txt and add in the first line the username and the password in the second line

my_username
secret_password

Only after setting the credentials, CobraMod is able to download data from BioCyc

[3]:

from cobramod import get_data
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
    "CPD-12575",
    "BETA-D-FRUCTOSE",
    "SUCROSE",
    "UDP",
]

for metabolite in identifiers:
    get_data(
        directory=dir_data,
        identifier=metabolite,
        database="YEAST"
    )

The first argument in cobramod.get_data() is the system path where CobraMod stores the metabolic pathway information. CobraMod uses pathlib for path representation. The second argument is the data identifier used in the respective database. In this example we retrieve data from the BioCyc sub-database YEAST. The last argument is the abbreviation of the database.

CobraMod creates a directory with the name of the database and stores the metabolic pathway information in it:

data
`-- YEAST
    |-- CPD-12575.xml
    |-- BETA-D-FRUCTOSE.xml
    |-- SUCROSE.xml
    |-- UDP.xml

1.1.2. Converting stored-data into COBRApy objects¶

CobraMod can convert metabolic pathway information (metabolites and reactions) into COBRApy objects (cobra.Reaction and cobra.Metabolite). It can thus be seamlessly integrated with a COBRApy workflow.

The function cobramod.create_object() creates COBRApy objects from metabolic pathway data retrieved by cobramod.get_data(). If no pathway information was downloaded cobramod.create_object() retrieves it automatically.

In this example, we convert the metabolite 2-Oxoglutarate with the KEGG identifier C00026 into a COBRApy object. CobraMod automatically identifies the KEGG entry as a metabolite and converts it into the corresponding COBRApy metabolite.

The first argument is the database-specific identifier (C00026) followed by the database abbreviation (KEGG). The third argument is the path representation for the directory of the metabolic pathway information. CobraMod downloads the data into this directory and utilize it instead of downloading it again. The last argument is the compartment of the reaction (c for cytosol).

[4]:

from cobramod import create_object
from pathlib import Path

# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")

new_object = create_object(
    identifier="C00026",
    database="KEGG",
    directory=dir_data,
    compartment="c"
)

print(type(new_object))
new_object

<class 'cobra.core.metabolite.Metabolite'>

[4]:

Metabolite identifier	C00026_c
Name	alpha-Ketoglutaric acid
Memory address	0x7afc8d2703e0
Formula	C5H6O5
Compartment	c
In 0 reaction(s)

In the second example, we convert the reaction RXN-11502 from the PMN sub-database CORN into a COBRApy reaction. The first argument is the database-specific identifier (RXN-11502) followed by the database identifier (pmn:CORN). CobraMod automatically takes reversibility and gene information from the database entry and adds it to the reaction object.

[5]:

new_object = create_object(
    identifier="RXN-11501",
    database="pmn:CORN",
    directory=dir_data,
    compartment="c"
)

print(type(new_object))
display(new_object)
new_object.genes

Gene-reaction rule for reaction "RXN-11501" set to "OR". Please modify it if necessary.

<class 'cobra.core.reaction.Reaction'>

Reaction identifier	RXN_11501_c
Name	alkaline α- galactosidase
Memory address	0x7afcc0510c80
Stoichiometry	CPD_170_c + WATER_c --> ALPHA_D_GALACTOSE_c + CPD_1099_c stachyose + H2O --> alpha-D-galactopyranose + raffinose
GPR	ZM00001EB033890 or ZM00001EB033880 or ZM00001EB033870
Lower bound	0
Upper bound	1000

[5]:

frozenset({<Gene ZM00001EB033870 at 0x7afc8ce13350>,
           <Gene ZM00001EB033880 at 0x7afcc0511670>,
           <Gene ZM00001EB033890 at 0x7afca7531e50>})

1.1.3. Adding metabolites¶

The function cobramod.add_metabolites() extends the COBRApy function model.add_metabolites() and can be used with a simple syntax. It can utilize a single string, a list of strings, a file path or a COBRAPy metabolite object. In the next examples we showcase these options. We use the E. coli core model from COBRApy as test model. This core model can be found under cobramod.test.textbook. If the argument obj is used as a string, it can be the database-specific identifier of the respective metabolite and its compartment or a user-defined metabolite. It uses the following syntax:

SYNTAX

To add metabolite information from a database:

database-specific_identifier, compartment

To add user-curated metabolites:

user-curated_identifier, name, compartment, chemical_formula, molecular_charge

In the first example, we add the metabolite L-methionine with the MetaCyc identifier MET to the test model. The first argument is the model name. The argument obj contains the identifier MET and the compartment c. The second argument is the database identifier (META) and the third argument is the directory where CobraMod stores the metabolite information.

[6]:

from cobramod import add_metabolites
from cobramod.test import textbook
from pathlib import Path

# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook.copy()

add_metabolites(
    model=test_model,
    obj=["MET, c"],
    database="META",
    directory=dir_data,
)
print(type(test_model.metabolites.get_by_id("MET_c")))
test_model.metabolites.get_by_id("MET_c")

Metabolite "MET_c" was added to model.

<class 'cobra.core.metabolite.Metabolite'>

[6]:

Metabolite identifier	MET_c
Name	L-methionine
Memory address	0x7afc8ca6a8a0
Formula	C5H11N1O2S1
Compartment	c
In 0 reaction(s)

In the second example, we add two metabolites (acetyl-coa and sucrose) from MetaCyc. In the argument obj we define a list with the database-specific metabolite identifiers and the respecitive compartments. The rest of the arguments remains the same as in the previous example. Before adding a metabolite, CobraMod tests if the given metabolite is already present in the model (potentially with a different identifier) by checking the cross-reference database entries in the metabolite’s metadata against the model’s metabolite entries. If CobraMod identifies a metabolite as already present in the model it skips adding the metabolite, used the already present one, and prints a warning.

[7]:

add_metabolites(
    model=test_model,
    obj=["ACETYL-COA, c", "SUCROSE, c"],
    database="META",
    directory=dir_data,
)
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("accoa_c"))
test_model.metabolites.get_by_id("SUCROSE_c")

Metabolite "ACETYL_COA_c" was added to model.
Metabolite "SUCROSE_c" was added to model.

Metabolite identifier	accoa_c
Name	Acetyl-CoA
Memory address	0x7afc8c95c5c0
Formula	C23H34N7O17P3S
Compartment	c
In 7 reaction(s)	ACALD, MALS, PDH, CS, Biomass_Ecoli_core, PFL, PTAr

[7]:

Metabolite identifier	SUCROSE_c
Name	sucrose
Memory address	0x7afc8c9a53d0
Formula	C12H22O11
Compartment	c
In 0 reaction(s)

In the third example, we use a text file to add metabolites to the test model. We use the file metabolites.txt in the current working directory with the following content:

SUCROSE, c
MET, c
MALTOSE_c, MALTOSE[c], c, C12H22O11, 1

In this example, CobraMod downloads the first two metabolites from MetaCyc, while MALTOSE_c is a user-defined metabolite. The user can specify the path to this file in the obj argument. The remaining arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the metabolites to the model.

[8]:

from cobramod.test import textbook_biocyc
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# This is our file
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()

print(f'Number of metabolites prior addition: {len(test_model.metabolites)}')
# Using CobraMod
add_metabolites(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="META",
)
print(f'Number of metabolites after addition: {len(test_model.metabolites)}')
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("MET_c"))
display(test_model.metabolites.get_by_id("SUCROSE_c"))
test_model.metabolites.get_by_id("MALTOSE_c")

Metabolite "SUCROSE_c" was added to model.
Metabolite "MET_c" was added to model.
Metabolite "MALTOSE_c" was added to model.

Number of metabolites prior addition: 72
Number of metabolites after addition: 75

Metabolite identifier	MET_c
Name	L-methionine
Memory address	0x7afc8c9a7410
Formula	C5H11N1O2S1
Compartment	c
In 0 reaction(s)

Metabolite identifier	SUCROSE_c
Name	sucrose
Memory address	0x7afc875b01a0
Formula	C12H22O11
Compartment	c
In 0 reaction(s)

[8]:

Metabolite identifier	MALTOSE_c
Name	MALTOSE[c]
Memory address	0x7afc8c7399a0
Formula	C12H22O11
Compartment	c
In 0 reaction(s)

Since cobramod.add_metabolites() is an extension of the COBRApy function model.add_metabolites() the user can also utilize COBRApy metabolites. In the following example, we use a variation of the test model (textbook_biocyc) which uses BioCyc metabolite identifiers. We copy a COBRApy metabolite from the test model and add it to the BioCyc test model.

[9]:

from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc

# Copying Metabolite from original model
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")

# Using a copy
test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj=[metabolite]
)

test_model.metabolites.get_by_id("xu5p__D_c")

Metabolite "xu5p__D_c" was added to model.

[9]:

Metabolite identifier	xu5p__D_c
Name	D-Xylulose 5-phosphate
Memory address	0x7afc8cb758e0
Formula	C5H9O8P
Compartment	c
In 3 reaction(s)	TKT2, TKT1, RPE

If CobraMod detects large molecules (e.g. enzymes) or if the metabolite information does not include a chemical formula the user receives a warning. In this example, we use the enzyme with the MetaCyc identifier Red-NADPH-Hemoprotein-Reductases and add it to the test model. CobraMod raises a warning due to the missing chemical formula.

[10]:

# Using a copy
test_model = textbook.copy()

add_metabolites(
    model=test_model,
    obj=["Red-NADPH-Hemoprotein-Reductases, c"],
    directory=dir_data,
    database="META",
)
test_model.metabolites.get_by_id("Red_NADPH_Hemoprotein_Reductases_c")

Curated metabolite "Red_NADPH_Hemoprotein_Reductases_c does not include a chemical formula
Metabolite "Red_NADPH_Hemoprotein_Reductases_c" was added to model.

[10]:

Metabolite identifier	Red_NADPH_Hemoprotein_Reductases_c
Name	Red-NADPH-Hemoprotein-Reductases
Memory address	0x7afc86120710
Formula	X
Compartment	c
In 0 reaction(s)

NOTES

CobraMod replaces hyphens (-) with underscores (_) in the identifiers when creating COBRApy metabolites.
When adding several metabolites the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.
CobraMod saves the curation process in a directory ‘logs’

1.1.4. Adding reactions¶

The function cobramod.add_reactions() extends the COBRApy function model.add_reactions() and can be used with a simple syntax. It can utilize a single string, a list of string, a file path or a COBRApy reaction object. In the next examples we showcase these options. Again, we use the E. coli core model from COBRApy as test model.

If the argument obj is used as a string, it can be the database-specific identifier of the respective reaction and its compartment or a user-defined reaction. It uses the following syntax:

SYNTAX

To add reaction information from a database:

database-specific_identifier, compartment

To add user-defined reactions (curated-reactions), the user can specify the identifier and name of the reaction following the COBRApy reaction string syntax:

user-curated_identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2

Metabolites must contain a suffix which specifies the compartment. This is given by an underscore (_) followed by the compartment-abbreviation. In this example, we create a transport reaction for oxygen between the external comparment (e) and the cytosol (c):

TRANS_H2O_ec, Oxygen Transport | 2 OXYGEN-MOLECULE_e <-> 2 OXYGEN_MOLECULE_c

In the first example, we add the BIGG reaction AKGDH to a test model. This time we use a text model which is a variaton of the original core model and uses KEGG identifiers (textbook_kegg). The first argument is the model name. The obj argument takes the identifier AKGDH and the compartment c. The next argument is the database identifier (BIGG) and the last argument is the directory where CobraMod stores and retrieves the metabolic pathway information from. The argument model_id is a BIGG-specific argument. Please read the notes below for more information.

Because the test model uses KEGG identifiers, CobraMod finds the reaction AKGDH with the KEGG identifier R08549_c.

[11]:

from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["AKGDH, c"],
    database="BIGG",
    directory=dir_data,
    model_id="e_coli_core"
)

display(test_model.reactions.get_by_id("R08549_c"))

Reaction "R08549_c" is already present in the model. Skipping additions.

Reaction identifier	R08549_c
Name	2-Oxogluterate dehydrogenase
Memory address	0x7afc86120fb0
Stoichiometry	C00003_c + C00010_c + C00026_c --> C00004_c + C00011_c + C00091_c Nicotinamide adenine dinucleotide + Coenzyme A + 2-Oxoglutarate --> Nicotinamide adenine dinucleotide - reduced + CO2 + Succinyl-CoA
GPR	b0116 and b0726 and b0727
Lower bound	0.0
Upper bound	1000.0

In the second example, we add two reactions (R00315 and R02736) from KEGG. The argument obj is a list with the database-specific identifiers and the respective compartments of the reactions. The majority of the arguments remains the same as in the previous example. The argument genome is a KEGG-specific argument. Please read the notes below for more information.

CobraMod parses the reaction information for gene identifiers and automatically adds them to the COBRApy reaction. In this example, CobraMod creates the genes 2296, b3115 and b1849, and adds it to the first reaction R00315.

[12]:

add_reactions(
    model=test_model,
    obj=["R00315, c", "R02736 ,c"],
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

display(test_model.reactions.get_by_id("R00315_c"))
print(test_model.reactions.get_by_id("R00315_c").genes)
test_model.reactions.get_by_id("R02736_c")

Gene-reaction rule for reaction "R00315" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R00315_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.

Reaction identifier	R00315_c
Name	acetate kinase
Memory address	0x7afc86121160
Stoichiometry	C00002_c + C00033_c <=> C00227_c + G11113_c ATP + Acetate <=> Acetyl phosphate + ADP
GPR	c2838
Lower bound	-1000.0
Upper bound	1000.0

frozenset({<Gene c2838 at 0x7afc7eb8c080>})

[12]:

Reaction identifier	R02736_c
Name	beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address	0x7afc8cafdc40
Stoichiometry	C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone
GPR	c2265
Lower bound	-1000
Upper bound	1000

In the following example, we use a text file to add reactions to the test model. To this end, we use the file reactions.txt in the current working directory. This file has the following content:

R04382, c
R02736, c
C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e

CobraMod downloads the first two reactions from KEGG. C06118_ce is a user-defined reaction. The user can specify the file path in the obj argument. The next arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the reaction to the model.

[13]:

from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
# This is the file with text
file = dir_data.joinpath("reactions.txt")

print(f'Number of reactions prior addition: {len(test_model.reactions)}')

add_reactions(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Show in jupyter
display(test_model.reactions.get_by_id("R04382_c"))
display(test_model.reactions.get_by_id("R02736_c"))
test_model.reactions.get_by_id("C06118_ce")

Gene-reaction rule for reaction "R04382" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}

Number of reactions prior addition: 95

Reaction "R04382_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.
Reaction "C06118_ce" was added to model.

Number of reactions after addition: 98

Reaction identifier	R04382_c
Name	4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address	0x7afc6fcd64b0
Stoichiometry	C06118_c <=> 2.0 C04053_c Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate
GPR	c0319
Lower bound	-1000
Upper bound	1000

Reaction identifier	R02736_c
Name	beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address	0x7afc6fcef290
Stoichiometry	C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone
GPR	c2265
Lower bound	-1000
Upper bound	1000

[13]:

Reaction identifier	C06118_ce
Name	digalacturonate transport
Memory address	0x7afc6fceffe0
Stoichiometry	C06118_c <=> C06118_e Unsaturated digalacturonate <=> Unsaturated digalacturonate
GPR
Lower bound	-1000
Upper bound	1000

Since cobramod.add_reactions() is an extension of the original COBRApy function model.add_reactions() the user can also utilize COBRApy reactions. In this example, we use the test model (textbook_kegg). We copy a COBRApy reaction from the test model and add it to the KEGG test model.

[14]:

from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path

# Using copy of test model
test_model = textbook_kegg.copy()
# Obtaining a reaction
reaction = textbook.reactions.get_by_id("ACALDt")

add_reactions(model=test_model, obj=[reaction], directory=dir_data)

test_model.reactions.get_by_id("ACALDt")

Reaction "ACALDt" is already present in the model. Skipping additions.

[14]:

Reaction identifier	ACALDt
Name	R acetaldehyde reversible - transport
Memory address	0x7afc6fcec560
Stoichiometry	C00084_e <=> C00084_c Acetaldehyde <=> Acetaldehyde
GPR	s0001
Lower bound	-1000.0
Upper bound	1000.0

By default, COBRApy ignores metabolites that appear on both sides of a reaction equation. CobraMod identifies such reactions and assigns one of these metabolites to the extracellular compartment and raises a warning suggesting manual curation. In the following example, we add a transport reaction for acetic acid from BioCyc sub-database YEAST to the test model.

[15]:

test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["TRANS-RXN-455, c"],
    database="YEAST",
    directory=dir_data,
)
# Show in jupyter
test_model.reactions.get_by_id("TRANS_RXN_455_c")

Gene-reaction rule for reaction "TRANS-RXN-455" set to "OR". Please modify it if necessary.
Reaction "TRANS_RXN_455_c" has the same metabolite on both sides of the equation (e.g transport reaction). COBRApy ignores these metabolites. To avoid this, by default, CobraMod will assign the reactant side to the extracellular compartment. Please curate the reaction if necessary.
Reaction "TRANS_RXN_455_c" is already present in the model. Skipping additions.

[15]:

Reaction identifier	TRANS_RXN_455_c
Name	acetic acid uptake
Memory address	0x7afc86123ec0
Stoichiometry	CPD_24335_e --> CPD_24335_c acetic+acid --> acetic+acid
GPR	G3O-32144
Lower bound	0
Upper bound	1000

NOTES

CobraMod replaces hyphens (-) to underscores (_) in the identifiers when creating COBRApy reactions.
When adding several reactions the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.
CobraMod tries to identify if a given metabolites or reactions are already present in the model and used those instead of adding redundant entries.
CobraMod saves the curation process to the file debug.log.
The argument model_id is specific for the BiGG Models repository. CobraMod can download metabolite and reaction information directly from the BIGG models. A list of all models can be found here.
The argument genome can be used with the database KEGG and specifies the genome for which gene information will be retrieved. The complete list of all available genomes can be found here. If no genome is specified, no gene information will be retrieved and a warning is printed as shown below:

[16]:

test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["R04382, c"],
    database="KEGG",
    directory=dir_data,
)
test_model.reactions.get_by_id("R04382_c")

Nothing was specified in argument "genome". Reaction "R04382" will not include genes. Please modify if necessary.
Reaction "R04382_c" is already present in the model. Skipping additions.

[16]:

Reaction identifier	R04382_c
Name	4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address	0x7afc6fcb5c40
Stoichiometry	C06118_c <=> 2.0 C04053_c Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate
GPR
Lower bound	-1000
Upper bound	1000

1.1.5. Adding pathways¶

CobraMod can add metabolic pathways to a given model. The function cobramod.add_pathway() takes as an argument either a sequence of database-specific reaction identifiers or a single pathway identifier. CobraMod download the pathway information from the databases, creates the COBRApy objects and prints a short summary about the curation process when adding the respective pathway. Additionally, to ensure reproducibility and for tracking the curation process, the user can write changes to file. CobraMod creates the object cobramod.Pathway which contains the pathway information.

When adding serveral pathways, we recommend adding pathways one-by-one to perform neccessary curation steps inbetween. In the examples below we showcase two options. Again, we use the E. coli core model from COBRApy as test model.

In the first example, we add the acetoacetate degradation pathway from the BioCyc sub-database ECOLI to the test model. This pathway contains two reactions and six metabolites.

f367eacf496b493f810fbceb66e6f301

In this example, the first argument is the model to extend. The pathway argument uses the database-specific identifier ACETOACETATE-DEG-PWY and the database identifier ECOLI. We define the compartment as c (cytosol), i.e. all metabolites and reactions will be assigned to this compartment.

The argument filename specifies a file to which a summary of the changes is written. Additionally, add_pathway() prints a summary of the changes to the model. Calling the respective cobramod.Pathway outputs a table with the main attributes of the object.

All COBRApy reactions of the pathway are tested for their capacity to cary a non-zero flux. Read more about it in the non-zero flux test section.

[17]:

from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")

# Using copy of test model
test_model = textbook.copy()

add_pathway(
    model=test_model,
    pathway="ACETOACETATE-DEG-PWY",
    database="ECOLI",
    compartment="c",
    filename="summary.txt",
    directory=dir_data,
)

# Display in jupyter
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")

Gene-reaction rule for reaction "ACETOACETYL-COA-TRANSFER-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "ACETYL-COA-ACETYLTRANSFER-RXN" set to "OR". Please modify it if necessary.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" unbalanced. Following atoms are affected. Please verify: {'charge': -4.0, 'C': 23.0, 'H': 34.0, 'N': 7.0, 'O': 17.0, 'P': 3.0, 'S': 1.0}
The model cannot turnover the following metabolites ['3_KETOBUTYRATE_c', 'accoa_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'ACETOACETYL_COA_TRANSFER_RXN_c' passed.
Reaction "ACETOACETYL_COA_TRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Non-zero flux test for reaction 'ACETYL_COA_ACETYLTRANSFER_RXN_c' passed.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Auxiliary sink reaction for "SK_accoa_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_3_KETOBUTYRATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.

Number of       new   | removed entities in
*=====================|===================*
Reactions        2    |    0
Metabolites      2    |    0
Exchange         0    |    0
Demand           0    |    0
Sinks            2    |    0
Genes            4    |    0
Groups           1    |    0

[17]:

Pathway identifier	ACETOACETATE-DEG-PWY
Name
Memory address	0x0135224490952272
Reactions involved	ACETOACETYL_COA_TRANSFER_RXN_c, ACETYL_COA_ACETYLTRANSFER_RXN_c
Genes involved	EG12432, EG11670, EG11669, EG11672
Visualization attributes	vertical = False color_negative = None color_positive = None color_quantile = False

Below is an example of the summary in form of a text file. The first part lists names of all reactions, metabolites, exchange reactions, auxiliary demand and sink reactions, genes, and groups in the model. The second part of the summary lists all elements that were added or removed by the function call add_pathway().

[18]:

%cat summary.txt

Summary:
Model identifier: e_coli_core
Model name:
None
Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'TPI', 'ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['13dpg_c', '2pg_c', '3pg_c', '6pgc_c', '6pgl_c', 'ac_c', 'ac_e', 'acald_c', 'acald_e', 'accoa_c', 'acon_C_c', 'actp_c', 'adp_c', 'akg_c', 'akg_e', 'amp_c', 'atp_c', 'cit_c', 'co2_c', 'co2_e', 'coa_c', 'dhap_c', 'e4p_c', 'etoh_c', 'etoh_e', 'f6p_c', 'fdp_c', 'for_c', 'for_e', 'fru_e', 'fum_c', 'fum_e', 'g3p_c', 'g6p_c', 'glc__D_e', 'gln__L_c', 'gln__L_e', 'glu__L_c', 'glu__L_e', 'glx_c', 'h2o_c', 'h2o_e', 'h_c', 'h_e', 'icit_c', 'lac__D_c', 'lac__D_e', 'mal__L_c', 'mal__L_e', 'nad_c', 'nadh_c', 'nadp_c', 'nadph_c', 'nh4_c', 'nh4_e', 'o2_c', 'o2_e', 'oaa_c', 'pep_c', 'pi_c', 'pi_e', 'pyr_c', 'pyr_e', 'q8_c', 'q8h2_c', 'r5p_c', 'ru5p__D_c', 's7p_c', 'succ_c', 'succ_e', 'succoa_c', 'xu5p__D_c', '3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
['EX_ac_e', 'EX_acald_e', 'EX_akg_e', 'EX_co2_e', 'EX_etoh_e', 'EX_for_e', 'EX_fru_e', 'EX_fum_e', 'EX_glc__D_e', 'EX_gln__L_e', 'EX_glu__L_e', 'EX_h_e', 'EX_h2o_e', 'EX_lac__D_e', 'EX_mal__L_e', 'EX_nh4_e', 'EX_o2_e', 'EX_pi_e', 'EX_pyr_e', 'EX_succ_e']
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['b1241', 'b0351', 's0001', 'b3115', 'b1849', 'b2296', 'b1276', 'b0118', 'b0474', 'b0116', 'b0726', 'b0727', 'b2587', 'b0356', 'b1478', 'b3735', 'b3733', 'b3734', 'b3732', 'b3736', 'b3738', 'b3731', 'b3737', 'b3739', 'b0720', 'b0733', 'b0979', 'b0978', 'b0734', 'b2975', 'b3603', 'b2779', 'b1773', 'b2925', 'b2097', 'b4232', 'b3925', 'b0904', 'b2492', 'b4154', 'b4152', 'b4153', 'b4151', 'b1819', 'b1817', 'b2415', 'b1818', 'b2416', 'b1611', 'b4122', 'b1612', 'b3528', 'b1852', 'b1779', 'b1621', 'b1101', 'b2417', 'b3870', 'b1297', 'b0809', 'b0810', 'b0811', 'b1761', 'b1524', 'b1812', 'b0485', 'b3213', 'b3212', 'b4077', 'b2029', 'b0875', 'b1136', 'b4015', 'b2133', 'b1380', 'b4014', 'b2976', 'b3236', 'b1479', 'b2463', 'b2286', 'b2279', 'b2276', 'b2280', 'b2288', 'b2284', 'b2287', 'b2281', 'b2277', 'b2285', 'b2282', 'b2278', 'b2283', 'b3962', 'b1602', 'b1603', 'b0451', 'b0114', 'b0115', 'b3916', 'b1723', 'b0902', 'b0903', 'b2579', 'b3114', 'b3952', 'b3951', 'b4025', 'b2926', 'b0767', 'b0755', 'b3612', 'b4395', 'b3493', 'b2987', 'b3956', 'b3403', 'b1702', 'b2297', 'b2458', 'b1676', 'b1854', 'b4301', 'b3386', 'b2914', 'b4090', 'b0722', 'b0724', 'b0721', 'b0723', 'b0728', 'b0729', 'b2464', 'b0008', 'b2935', 'b2465', 'b3919', 'EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']

New:
Reactions:
['ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
[]
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']

Removed:
Reactions:
[]
Metabolites:
[]
Exchange:
[]
Demand:
[]
Sinks:
[]
Genes:
[]
Groups:
[]

In the next example, we use a list of database-specific reaction identifiers as pathway argument. We use the database identifier ECOLI and the compartment c (cytosol). Additionally, we define a pathway name by using the argument group. The user can also use this argument to merge pathways by using the same group names.

[19]:

from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")

test_model = textbook_biocyc.copy()
# Defining database-specific identifiers
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]

print(f'Number of reaction prior addition: {len(test_model.reactions)}')

add_pathway(
    model=test_model,
    pathway=sequence,
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    group="curated_pathway"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Display in jupyter
test_model.groups.get_by_id("curated_pathway")

Gene-reaction rule for reaction "PEPDEPHOS-RXN" set to "OR". Please modify it if necessary.

Number of reaction prior addition: 95

Gene-reaction rule for reaction "PYRUVFORMLY-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "FHLMULTI-RXN" set to "OR". Please modify it if necessary.
Non-zero flux test for reaction 'PEPDEPHOS_RXN_c' passed.
Reaction "PEPDEPHOS_RXN_c" added to group "curated_pathway".
The model cannot turnover the following metabolites ['FORMATE_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'PYRUVFORMLY_RXN_c' passed.
Reaction "PYRUVFORMLY_RXN_c" added to group "curated_pathway".
Non-zero flux test for reaction 'FHLMULTI_RXN_c' passed.
Reaction "FHLMULTI_RXN_c" added to group "curated_pathway".
Auxiliary sink reaction for "SK_FORMATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_HYDROGEN_MOLECULE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.

Number of       new   | removed entities in
*=====================|===================*
Reactions        3    |    0
Metabolites      2    |    0
Exchange         0    |    0
Demand           0    |    0
Sinks            2    |    0
Genes           12    |    0
Groups           1    |    0

Number of reactions after addition: 100

[19]:

Pathway identifier	curated_pathway
Name
Memory address	0x0135224490393088
Reactions involved	PYRUVFORMLY_RXN_c, PEPDEPHOS_RXN_c, FHLMULTI_RXN_c
Genes involved	G7627, EG11784, EG10701, EG10804, EG10803, EG10477, EG10480, EG10475, EG10285, EG10476, EG10479, EG10478
Visualization attributes	vertical = False color_negative = None color_positive = None color_quantile = False

NOTES

A pathway is a set of COBRApy reactions. All the notes listed for add_metabolites() and add_reactions() also apply to pathways, i. e., handling of duplicate elements, transport reactions and the argument genome for KEGG.
CobraMod saves the curation process in the directory logs. The logs are saved by date.

1.1.6. Non-zero flux test¶

When calling the function add_pathway(), CobraMod tests each reaction of the Pathway object for its capability to carry a non-zero flux, i.e., if the involved metabolites can be turned over. Additionally, the user can test individual COBRApy reactions for their capability to cary a non-zero flux by using the function cobramod.test_non_zero_flux(). When running add_pathway() CobraMod tries to maximize (minimize if operating backwards) the flux through the tested reaction. If the flux is below solver tolerance CobraMod adds auxiliary sink reactions to all metabolites participating in the reaction and subsequently remove them one by one if the respective metabolites can be turned over by reactions in the model. CobraMod then raises a warning about the remaining auxiliary sink reactions.

Based on this information the user can then manually curate the model and if necessary add the missing synthesis reactions for the respective metabolites. If a reaction passes the test CobraMod prints a short message that the test was sucessfully passed.

The user can also use the argument ignore_list when adding a pathway to skip the non-zero-flux test for the specified reactions.

In the following artifical example (E. coli is a prokaryote without compartmentation), we introduce the glutathione synthase reaction (GLUTATHIONE-SYN-RXN) into a new comparment (x) and test the reaction for its capability to carry a non-zero flux. This reaction has the following equation:

ATP_x + GLY_x + L_GAMMA_GLUTAMYLCYSTEINE_x --> ADP_x + GLUTATHIONE_x + PROTON_x + Pi_x

Because we are testing a reaction in a new, empty compartment, CobraMod creates auxiliary sink reactions for the involved metabolites and prints a warning.

[20]:

from cobramod import test_non_zero_flux, add_reactions
from cobramod.test import textbook_biocyc

test_model = textbook_biocyc.copy()

# Adding the reaction into the test model
add_reactions(
    model=test_model,
    # The list has 4 metabolites
    obj=["GLUTATHIONE-SYN-RXN, x"],
    directory=dir_data,
    database="ECOLI",
)
# Running test for GLUTATHIONE_SYN_RXN_x
test_non_zero_flux(
    model=test_model,
    reaction="GLUTATHIONE_SYN_RXN_x",
)

Gene-reaction rule for reaction "GLUTATHIONE-SYN-RXN" set to "OR". Please modify it if necessary.
Reaction "GLUTATHIONE_SYN_RXN_x" is already present in the model. Skipping additions.
Non-zero flux test for reaction 'GLUTATHIONE_SYN_RXN_x' passed.
Auxiliary sink reaction for "SK_GLUTATHIONE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ADP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_PROTON_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_L_GAMMA_GLUTAMYLCYSTEINE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_GLY_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ATP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_Pi_x" created. Consider removing it and adding the synthesis reactions for the metabolite.

Reaction GLUTATHIONE_SYN_RXN_x passed the non-zero-flux test.

1.1.7. Curation process¶

CobraMod automatically performs the following curation steps during the creation of COBRApy reaction and metabolite objects and CobraMod pathway objects:

If CobraMod encounters biochemical information (e.g. reactions or metabolites) with missing formulas or entries, a warning is printed and CobraMod informs the user about the missing data. For instance, CobraMod assigns an “X” to metabolites with missing formulas.
When adding a metabolite, reaction or pathway to the model, CobraMod tries to identify if these elements are already present in the model (potentially with a different name) and use the exiting elements instead of creating redundant ones (by checking the cross-reference database entries in the elements’s metadata against the model entries).

In the case of user-defined, curated reactions CobraMod tries to find the respective metabolites in the model. If a metabolites is not found, a warning is printed and the metabolite is created with “X” as chemical formula.

When adding metabolites, reactions or pathways, CobraMod can query MetaNetX and PubChem to find cross-references and adds the information to the corresponding Reactions and Metabolites (See Using the automatic cross-reference expansion in the documentation).
CobraMod utilizes the COBRApy method cobra.Reaction.check_mass_balance() and returns a warning if mass imbalances are found.
CobraMod uses the reaction reversibility information provided with the obtained reaction data. If the reversibility information is missing, CobraMod raises a warning.
When CobraMod adds a pathway, each pathway reaction undergos a non-zero flux test. If a reaction cannot carry a non-zero flux, CobraMod adds auxiliary sink reactions to unblock the reaction and suggests manual curation steps based on these auxiliary modifications. (See Non-zero flux test in the documentation).
CobraMod offers the function add_crossreferences which queries the genome annotation database MetaNetX to adds the missing meta-data about the cross-references.

1.1.8. Converting COBRApy groups back to CobraMod pathways¶

The COBRApy function cobra.io.write_sbml_model() writes cobra models to sbml files. If the user calls this function with a model that contains cobramod.Pathway elements these will be saved as COBRApy Group elements and the CobraMod pathway objects are lost. To convert COBRApy groups back to CobraMod pathways (and use the associated functionalities), the user can call the function cobramod.model_convert(). In the following example, we create a Group with four reactions and add it to the model. We then use the cobramod.model_convert() function with the argument model and can then call the newly converted respective CobraMod pathway objects.

[21]:

from cobramod import model_convert
from cobramod.test import textbook_biocyc
from cobra.core.group import Group

test_model = textbook_biocyc.copy()
# Creation of group
test_group = Group(id="curated_pathway")
for reaction in ("GLCpts", "G6PDH2r", "PGL", "GND"):
    test_group.add_members([test_model.reactions.get_by_id(reaction)])
test_model.add_groups([test_group])

# Conversion to a Pathway
model_convert(model=test_model)
# Display to Jupyter
test_model.groups.get_by_id("curated_pathway")

Group-object "curated_pathway" was transformed to a Pathway-object.

[21]:

Pathway identifier	curated_pathway
Name
Memory address	0x0135225782733520
Reactions involved	GLCpts, G6PDH2r, PGL, GND
Genes involved	b1819, b1621, b1817, b1101, b2415, b2417, b1818, b2416, b1852, b0767, b2029
Visualization attributes	vertical = False color_negative = None color_positive = None color_quantile = False