1.1. Functions

[1]:
from IPython.display import display
from cobramod import __version__
from cobra import __version__ as cobra_version
print(f'CobraMod version: {__version__}')
print(f'COBRApy version: {cobra_version}')
# From Escher:
# This option turns off the warning message if you leave or refresh this page
import escher
escher.rc['never_ask_before_quit'] = True
CobraMod version: 1.2.0
COBRApy version: 0.29.0

1.1.1. Retrieving metabolic pathway information

CobraMod can retrieve metabolic pathway information (metabolites, reactions or pathways) from various databases by using database-specific identifiers. It supports all databases from the BioCyc collection, Plant Metabolic Network (PMN), the KEGG database and the BiGG Models repository. Call cobramod.retrieval.Databases to see all supported databases.

[2]:
from cobramod.retrieval import Databases

Databases()
[2]:

CobraMod supports BioCyc, the Plant Metabolic Network (PMN), KEGG and BiGG Models repository. BioCyc includes around 18.000 sub-databases. A complete list for BioCyc can be found at: 'https://biocyc.org/biocyc-pgdb-list.shtml'.
The database-specific identifiers can be found in the URL of the respective data.CobraMod uses abbreviations to represent the databases or sub-databases:

Abbreviation Database name
META Biocyc, subdatabase MetaCyc
ARA Biocyc, subdatabase AraCyc
KEGG Kyoto encyclopedia of Genes and Genomes
BIGG BiGG Models
PMN:META PlantCyc, subdatabase META
PMN:ARA PlantCyc, subdatabase ARA

This applies for all subdatabases from BioCyc and Plantcyc

The user can download the metabolic pathway information using the cobramod.get_data function. In this example we download information from the BioCyc sub-database YEAST.

NOTE: In order to retrieved data from Metacyc. An account is required. Create a file called credentials.txt and add in the first line the username and the password in the second line

my_username
secret_password

Only after setting the credentials, CobraMod is able to download data from BioCyc

[3]:
from cobramod import get_data
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
    "CPD-12575",
    "BETA-D-FRUCTOSE",
    "SUCROSE",
    "UDP",
]

for metabolite in identifiers:
    get_data(
        directory=dir_data,
        identifier=metabolite,
        database="YEAST"
    )

The first argument in cobramod.get_data() is the system path where CobraMod stores the metabolic pathway information. CobraMod uses pathlib for path representation. The second argument is the data identifier used in the respective database. In this example we retrieve data from the BioCyc sub-database YEAST. The last argument is the abbreviation of the database.

CobraMod creates a directory with the name of the database and stores the metabolic pathway information in it:

data
`-- YEAST
    |-- CPD-12575.xml
    |-- BETA-D-FRUCTOSE.xml
    |-- SUCROSE.xml
    |-- UDP.xml

1.1.2. Converting stored-data into COBRApy objects

CobraMod can convert metabolic pathway information (metabolites and reactions) into COBRApy objects (cobra.Reaction and cobra.Metabolite). It can thus be seamlessly integrated with a COBRApy workflow.

The function cobramod.create_object() creates COBRApy objects from metabolic pathway data retrieved by cobramod.get_data(). If no pathway information was downloaded cobramod.create_object() retrieves it automatically.

In this example, we convert the metabolite 2-Oxoglutarate with the KEGG identifier C00026 into a COBRApy object. CobraMod automatically identifies the KEGG entry as a metabolite and converts it into the corresponding COBRApy metabolite.

The first argument is the database-specific identifier (C00026) followed by the database abbreviation (KEGG). The third argument is the path representation for the directory of the metabolic pathway information. CobraMod downloads the data into this directory and utilize it instead of downloading it again. The last argument is the compartment of the reaction (c for cytosol).

[4]:
from cobramod import create_object
from pathlib import Path

# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")

new_object = create_object(
    identifier="C00026",
    database="KEGG",
    directory=dir_data,
    compartment="c"
)

print(type(new_object))
new_object
<class 'cobra.core.metabolite.Metabolite'>
[4]:
Metabolite identifierC00026_c
Namealpha-Ketoglutaric acid
Memory address 0x7afc8d2703e0
FormulaC5H6O5
Compartmentc
In 0 reaction(s)

In the second example, we convert the reaction RXN-11502 from the PMN sub-database CORN into a COBRApy reaction. The first argument is the database-specific identifier (RXN-11502) followed by the database identifier (pmn:CORN). CobraMod automatically takes reversibility and gene information from the database entry and adds it to the reaction object.

[5]:
new_object = create_object(
    identifier="RXN-11501",
    database="pmn:CORN",
    directory=dir_data,
    compartment="c"
)

print(type(new_object))
display(new_object)
new_object.genes
Gene-reaction rule for reaction "RXN-11501" set to "OR". Please modify it if necessary.
<class 'cobra.core.reaction.Reaction'>
Reaction identifierRXN_11501_c
Namealkaline α- galactosidase
Memory address 0x7afcc0510c80
Stoichiometry

CPD_170_c + WATER_c --> ALPHA_D_GALACTOSE_c + CPD_1099_c

stachyose + H2O --> alpha-D-galactopyranose + raffinose

GPRZM00001EB033890 or ZM00001EB033880 or ZM00001EB033870
Lower bound0
Upper bound1000
[5]:
frozenset({<Gene ZM00001EB033870 at 0x7afc8ce13350>,
           <Gene ZM00001EB033880 at 0x7afcc0511670>,
           <Gene ZM00001EB033890 at 0x7afca7531e50>})

1.1.3. Adding metabolites

The function cobramod.add_metabolites() extends the COBRApy function model.add_metabolites() and can be used with a simple syntax. It can utilize a single string, a list of strings, a file path or a COBRAPy metabolite object. In the next examples we showcase these options. We use the E. coli core model from COBRApy as test model. This core model can be found under cobramod.test.textbook. If the argument obj is used as a string, it can be the database-specific identifier of the respective metabolite and its compartment or a user-defined metabolite. It uses the following syntax:


SYNTAX

To add metabolite information from a database:

database-specific_identifier, compartment

To add user-curated metabolites:

user-curated_identifier, name, compartment, chemical_formula, molecular_charge

In the first example, we add the metabolite L-methionine with the MetaCyc identifier MET to the test model. The first argument is the model name. The argument obj contains the identifier MET and the compartment c. The second argument is the database identifier (META) and the third argument is the directory where CobraMod stores the metabolite information.

[6]:
from cobramod import add_metabolites
from cobramod.test import textbook
from pathlib import Path

# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook.copy()

add_metabolites(
    model=test_model,
    obj=["MET, c"],
    database="META",
    directory=dir_data,
)
print(type(test_model.metabolites.get_by_id("MET_c")))
test_model.metabolites.get_by_id("MET_c")
Metabolite "MET_c" was added to model.
<class 'cobra.core.metabolite.Metabolite'>
[6]:
Metabolite identifierMET_c
NameL-methionine
Memory address 0x7afc8ca6a8a0
FormulaC5H11N1O2S1
Compartmentc
In 0 reaction(s)

In the second example, we add two metabolites (acetyl-coa and sucrose) from MetaCyc. In the argument obj we define a list with the database-specific metabolite identifiers and the respecitive compartments. The rest of the arguments remains the same as in the previous example. Before adding a metabolite, CobraMod tests if the given metabolite is already present in the model (potentially with a different identifier) by checking the cross-reference database entries in the metabolite’s metadata against the model’s metabolite entries. If CobraMod identifies a metabolite as already present in the model it skips adding the metabolite, used the already present one, and prints a warning.

[7]:
add_metabolites(
    model=test_model,
    obj=["ACETYL-COA, c", "SUCROSE, c"],
    database="META",
    directory=dir_data,
)
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("accoa_c"))
test_model.metabolites.get_by_id("SUCROSE_c")
Metabolite "ACETYL_COA_c" was added to model.
Metabolite "SUCROSE_c" was added to model.
Metabolite identifieraccoa_c
NameAcetyl-CoA
Memory address 0x7afc8c95c5c0
FormulaC23H34N7O17P3S
Compartmentc
In 7 reaction(s) ACALD, MALS, PDH, CS, Biomass_Ecoli_core, PFL, PTAr
[7]:
Metabolite identifierSUCROSE_c
Namesucrose
Memory address 0x7afc8c9a53d0
FormulaC12H22O11
Compartmentc
In 0 reaction(s)

In the third example, we use a text file to add metabolites to the test model. We use the file metabolites.txt in the current working directory with the following content:

SUCROSE, c
MET, c
MALTOSE_c, MALTOSE[c], c, C12H22O11, 1

In this example, CobraMod downloads the first two metabolites from MetaCyc, while MALTOSE_c is a user-defined metabolite. The user can specify the path to this file in the obj argument. The remaining arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the metabolites to the model.

[8]:
from cobramod.test import textbook_biocyc
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# This is our file
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()

print(f'Number of metabolites prior addition: {len(test_model.metabolites)}')
# Using CobraMod
add_metabolites(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="META",
)
print(f'Number of metabolites after addition: {len(test_model.metabolites)}')
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("MET_c"))
display(test_model.metabolites.get_by_id("SUCROSE_c"))
test_model.metabolites.get_by_id("MALTOSE_c")
Metabolite "SUCROSE_c" was added to model.
Metabolite "MET_c" was added to model.
Metabolite "MALTOSE_c" was added to model.
Number of metabolites prior addition: 72
Number of metabolites after addition: 75
Metabolite identifierMET_c
NameL-methionine
Memory address 0x7afc8c9a7410
FormulaC5H11N1O2S1
Compartmentc
In 0 reaction(s)
Metabolite identifierSUCROSE_c
Namesucrose
Memory address 0x7afc875b01a0
FormulaC12H22O11
Compartmentc
In 0 reaction(s)
[8]:
Metabolite identifierMALTOSE_c
NameMALTOSE[c]
Memory address 0x7afc8c7399a0
FormulaC12H22O11
Compartmentc
In 0 reaction(s)

Since cobramod.add_metabolites() is an extension of the COBRApy function model.add_metabolites() the user can also utilize COBRApy metabolites. In the following example, we use a variation of the test model (textbook_biocyc) which uses BioCyc metabolite identifiers. We copy a COBRApy metabolite from the test model and add it to the BioCyc test model.

[9]:
from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc

# Copying Metabolite from original model
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")

# Using a copy
test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj=[metabolite]
)

test_model.metabolites.get_by_id("xu5p__D_c")
Metabolite "xu5p__D_c" was added to model.
[9]:
Metabolite identifierxu5p__D_c
NameD-Xylulose 5-phosphate
Memory address 0x7afc8cb758e0
FormulaC5H9O8P
Compartmentc
In 3 reaction(s) TKT2, TKT1, RPE

If CobraMod detects large molecules (e.g. enzymes) or if the metabolite information does not include a chemical formula the user receives a warning. In this example, we use the enzyme with the MetaCyc identifier Red-NADPH-Hemoprotein-Reductases and add it to the test model. CobraMod raises a warning due to the missing chemical formula.

[10]:
# Using a copy
test_model = textbook.copy()

add_metabolites(
    model=test_model,
    obj=["Red-NADPH-Hemoprotein-Reductases, c"],
    directory=dir_data,
    database="META",
)
test_model.metabolites.get_by_id("Red_NADPH_Hemoprotein_Reductases_c")
Curated metabolite "Red_NADPH_Hemoprotein_Reductases_c does not include a chemical formula
Metabolite "Red_NADPH_Hemoprotein_Reductases_c" was added to model.
[10]:
Metabolite identifierRed_NADPH_Hemoprotein_Reductases_c
NameRed-NADPH-Hemoprotein-Reductases
Memory address 0x7afc86120710
FormulaX
Compartmentc
In 0 reaction(s)

NOTES

  • CobraMod replaces hyphens (-) with underscores (_) in the identifiers when creating COBRApy metabolites.

  • When adding several metabolites the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.

  • CobraMod saves the curation process in a directory ‘logs’


1.1.4. Adding reactions

The function cobramod.add_reactions() extends the COBRApy function model.add_reactions() and can be used with a simple syntax. It can utilize a single string, a list of string, a file path or a COBRApy reaction object. In the next examples we showcase these options. Again, we use the E. coli core model from COBRApy as test model.

If the argument obj is used as a string, it can be the database-specific identifier of the respective reaction and its compartment or a user-defined reaction. It uses the following syntax:


SYNTAX

To add reaction information from a database:

database-specific_identifier, compartment

To add user-defined reactions (curated-reactions), the user can specify the identifier and name of the reaction following the COBRApy reaction string syntax:

user-curated_identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2

Metabolites must contain a suffix which specifies the compartment. This is given by an underscore (_) followed by the compartment-abbreviation. In this example, we create a transport reaction for oxygen between the external comparment (e) and the cytosol (c):

TRANS_H2O_ec, Oxygen Transport | 2 OXYGEN-MOLECULE_e <-> 2 OXYGEN_MOLECULE_c

In the first example, we add the BIGG reaction AKGDH to a test model. This time we use a text model which is a variaton of the original core model and uses KEGG identifiers (textbook_kegg). The first argument is the model name. The obj argument takes the identifier AKGDH and the compartment c. The next argument is the database identifier (BIGG) and the last argument is the directory where CobraMod stores and retrieves the metabolic pathway information from. The argument model_id is a BIGG-specific argument. Please read the notes below for more information.

Because the test model uses KEGG identifiers, CobraMod finds the reaction AKGDH with the KEGG identifier R08549_c.

[11]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["AKGDH, c"],
    database="BIGG",
    directory=dir_data,
    model_id="e_coli_core"
)

display(test_model.reactions.get_by_id("R08549_c"))
Reaction "R08549_c" is already present in the model. Skipping additions.
Reaction identifierR08549_c
Name2-Oxogluterate dehydrogenase
Memory address 0x7afc86120fb0
Stoichiometry

C00003_c + C00010_c + C00026_c --> C00004_c + C00011_c + C00091_c

Nicotinamide adenine dinucleotide + Coenzyme A + 2-Oxoglutarate --> Nicotinamide adenine dinucleotide - reduced + CO2 + Succinyl-CoA

GPRb0116 and b0726 and b0727
Lower bound0.0
Upper bound1000.0

In the second example, we add two reactions (R00315 and R02736) from KEGG. The argument obj is a list with the database-specific identifiers and the respective compartments of the reactions. The majority of the arguments remains the same as in the previous example. The argument genome is a KEGG-specific argument. Please read the notes below for more information.

CobraMod parses the reaction information for gene identifiers and automatically adds them to the COBRApy reaction. In this example, CobraMod creates the genes 2296, b3115 and b1849, and adds it to the first reaction R00315.

[12]:
add_reactions(
    model=test_model,
    obj=["R00315, c", "R02736 ,c"],
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

display(test_model.reactions.get_by_id("R00315_c"))
print(test_model.reactions.get_by_id("R00315_c").genes)
test_model.reactions.get_by_id("R02736_c")
Gene-reaction rule for reaction "R00315" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R00315_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.
Reaction identifierR00315_c
Nameacetate kinase
Memory address 0x7afc86121160
Stoichiometry

C00002_c + C00033_c <=> C00227_c + G11113_c

ATP + Acetate <=> Acetyl phosphate + ADP

GPRc2838
Lower bound-1000.0
Upper bound1000.0
frozenset({<Gene c2838 at 0x7afc7eb8c080>})
[12]:
Reaction identifierR02736_c
Namebeta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address 0x7afc8cafdc40
Stoichiometry

C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c

Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone

GPRc2265
Lower bound-1000
Upper bound1000

In the following example, we use a text file to add reactions to the test model. To this end, we use the file reactions.txt in the current working directory. This file has the following content:

R04382, c
R02736, c
C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e

CobraMod downloads the first two reactions from KEGG. C06118_ce is a user-defined reaction. The user can specify the file path in the obj argument. The next arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the reaction to the model.

[13]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
# This is the file with text
file = dir_data.joinpath("reactions.txt")

print(f'Number of reactions prior addition: {len(test_model.reactions)}')

add_reactions(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Show in jupyter
display(test_model.reactions.get_by_id("R04382_c"))
display(test_model.reactions.get_by_id("R02736_c"))
test_model.reactions.get_by_id("C06118_ce")
Gene-reaction rule for reaction "R04382" from KEGG set to "OR".
Gene-reaction rule for reaction "R02736" from KEGG set to "OR".
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Reaction "R02736_c" unbalanced. Following atoms are affected. Please verify: {'charge': -2.0, 'H': -2.0}
Number of reactions prior addition: 95
Reaction "R04382_c" is already present in the model. Skipping additions.
Reaction "R02736_c" is already present in the model. Skipping additions.
Reaction "C06118_ce" was added to model.
Number of reactions after addition: 98
Reaction identifierR04382_c
Name4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address 0x7afc6fcd64b0
Stoichiometry

C06118_c <=> 2.0 C04053_c

Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate

GPRc0319
Lower bound-1000
Upper bound1000
Reaction identifierR02736_c
Namebeta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address 0x7afc6fcef290
Stoichiometry

C00006_c + C01172_c <=> C00005_c + C00080_c + C01236_c

Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate <=> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone

GPRc2265
Lower bound-1000
Upper bound1000
[13]:
Reaction identifierC06118_ce
Namedigalacturonate transport
Memory address 0x7afc6fceffe0
Stoichiometry

C06118_c <=> C06118_e

Unsaturated digalacturonate <=> Unsaturated digalacturonate

GPR
Lower bound-1000
Upper bound1000

Since cobramod.add_reactions() is an extension of the original COBRApy function model.add_reactions() the user can also utilize COBRApy reactions. In this example, we use the test model (textbook_kegg). We copy a COBRApy reaction from the test model and add it to the KEGG test model.

[14]:
from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path

# Using copy of test model
test_model = textbook_kegg.copy()
# Obtaining a reaction
reaction = textbook.reactions.get_by_id("ACALDt")

add_reactions(model=test_model, obj=[reaction], directory=dir_data)

test_model.reactions.get_by_id("ACALDt")
Reaction "ACALDt" is already present in the model. Skipping additions.
[14]:
Reaction identifierACALDt
NameR acetaldehyde reversible - transport
Memory address 0x7afc6fcec560
Stoichiometry

C00084_e <=> C00084_c

Acetaldehyde <=> Acetaldehyde

GPRs0001
Lower bound-1000.0
Upper bound1000.0

By default, COBRApy ignores metabolites that appear on both sides of a reaction equation. CobraMod identifies such reactions and assigns one of these metabolites to the extracellular compartment and raises a warning suggesting manual curation. In the following example, we add a transport reaction for acetic acid from BioCyc sub-database YEAST to the test model.

[15]:
test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["TRANS-RXN-455, c"],
    database="YEAST",
    directory=dir_data,
)
# Show in jupyter
test_model.reactions.get_by_id("TRANS_RXN_455_c")
Gene-reaction rule for reaction "TRANS-RXN-455" set to "OR". Please modify it if necessary.
Reaction "TRANS_RXN_455_c" has the same metabolite on both sides of the equation (e.g transport reaction). COBRApy ignores these metabolites. To avoid this, by default, CobraMod will assign the reactant side to the extracellular compartment. Please curate the reaction if necessary.
Reaction "TRANS_RXN_455_c" is already present in the model. Skipping additions.
[15]:
Reaction identifierTRANS_RXN_455_c
Nameacetic acid uptake
Memory address 0x7afc86123ec0
Stoichiometry

CPD_24335_e --> CPD_24335_c

acetic+acid --> acetic+acid

GPRG3O-32144
Lower bound0
Upper bound1000

NOTES

  • CobraMod replaces hyphens (-) to underscores (_) in the identifiers when creating COBRApy reactions.

  • When adding several reactions the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively can call the function twice.

  • CobraMod tries to identify if a given metabolites or reactions are already present in the model and used those instead of adding redundant entries.

  • CobraMod saves the curation process to the file debug.log.

  • The argument model_id is specific for the BiGG Models repository. CobraMod can download metabolite and reaction information directly from the BIGG models. A list of all models can be found here.

  • The argument genome can be used with the database KEGG and specifies the genome for which gene information will be retrieved. The complete list of all available genomes can be found here. If no genome is specified, no gene information will be retrieved and a warning is printed as shown below:

[16]:
test_model = textbook_kegg.copy()

add_reactions(
    model=test_model,
    obj=["R04382, c"],
    database="KEGG",
    directory=dir_data,
)
test_model.reactions.get_by_id("R04382_c")
Nothing was specified in argument "genome". Reaction "R04382" will not include genes. Please modify if necessary.
Reaction "R04382_c" is already present in the model. Skipping additions.
[16]:
Reaction identifierR04382_c
Name4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address 0x7afc6fcb5c40
Stoichiometry

C06118_c <=> 2.0 C04053_c

Unsaturated digalacturonate <=> 2.0 (4S,5R)-4,5-Dihydroxy-2,6-dioxohexanoate

GPR
Lower bound-1000
Upper bound1000

1.1.5. Adding pathways

CobraMod can add metabolic pathways to a given model. The function cobramod.add_pathway() takes as an argument either a sequence of database-specific reaction identifiers or a single pathway identifier. CobraMod download the pathway information from the databases, creates the COBRApy objects and prints a short summary about the curation process when adding the respective pathway. Additionally, to ensure reproducibility and for tracking the curation process, the user can write changes to file. CobraMod creates the object cobramod.Pathway which contains the pathway information.

When adding serveral pathways, we recommend adding pathways one-by-one to perform neccessary curation steps inbetween. In the examples below we showcase two options. Again, we use the E. coli core model from COBRApy as test model.

In the first example, we add the acetoacetate degradation pathway from the BioCyc sub-database ECOLI to the test model. This pathway contains two reactions and six metabolites.

f367eacf496b493f810fbceb66e6f301

In this example, the first argument is the model to extend. The pathway argument uses the database-specific identifier ACETOACETATE-DEG-PWY and the database identifier ECOLI. We define the compartment as c (cytosol), i.e. all metabolites and reactions will be assigned to this compartment.

The argument filename specifies a file to which a summary of the changes is written. Additionally, add_pathway() prints a summary of the changes to the model. Calling the respective cobramod.Pathway outputs a table with the main attributes of the object.

All COBRApy reactions of the pathway are tested for their capacity to cary a non-zero flux. Read more about it in the non-zero flux test section.

[17]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")

# Using copy of test model
test_model = textbook.copy()

add_pathway(
    model=test_model,
    pathway="ACETOACETATE-DEG-PWY",
    database="ECOLI",
    compartment="c",
    filename="summary.txt",
    directory=dir_data,
)

# Display in jupyter
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")
Gene-reaction rule for reaction "ACETOACETYL-COA-TRANSFER-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "ACETYL-COA-ACETYLTRANSFER-RXN" set to "OR". Please modify it if necessary.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" unbalanced. Following atoms are affected. Please verify: {'charge': -4.0, 'C': 23.0, 'H': 34.0, 'N': 7.0, 'O': 17.0, 'P': 3.0, 'S': 1.0}
The model cannot turnover the following metabolites ['3_KETOBUTYRATE_c', 'accoa_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'ACETOACETYL_COA_TRANSFER_RXN_c' passed.
Reaction "ACETOACETYL_COA_TRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Non-zero flux test for reaction 'ACETYL_COA_ACETYLTRANSFER_RXN_c' passed.
Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" added to group "ACETOACETATE-DEG-PWY".
Auxiliary sink reaction for "SK_accoa_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_3_KETOBUTYRATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Number of       new   | removed entities in
*=====================|===================*
Reactions        2    |    0
Metabolites      2    |    0
Exchange         0    |    0
Demand           0    |    0
Sinks            2    |    0
Genes            4    |    0
Groups           1    |    0

[17]:
Pathway identifier ACETOACETATE-DEG-PWY
Name
Memory address 0x0135224490952272
Reactions involved

ACETOACETYL_COA_TRANSFER_RXN_c, ACETYL_COA_ACETYLTRANSFER_RXN_c

Genes involved

EG12432, EG11670, EG11669, EG11672

Visualization attributes
  • vertical = False
  • color_negative = None
  • color_positive = None
  • color_quantile = False

 

Below is an example of the summary in form of a text file. The first part lists names of all reactions, metabolites, exchange reactions, auxiliary demand and sink reactions, genes, and groups in the model. The second part of the summary lists all elements that were added or removed by the function call add_pathway().

[18]:
%cat summary.txt
Summary:
Model identifier: e_coli_core
Model name:
None
Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'TPI', 'ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['13dpg_c', '2pg_c', '3pg_c', '6pgc_c', '6pgl_c', 'ac_c', 'ac_e', 'acald_c', 'acald_e', 'accoa_c', 'acon_C_c', 'actp_c', 'adp_c', 'akg_c', 'akg_e', 'amp_c', 'atp_c', 'cit_c', 'co2_c', 'co2_e', 'coa_c', 'dhap_c', 'e4p_c', 'etoh_c', 'etoh_e', 'f6p_c', 'fdp_c', 'for_c', 'for_e', 'fru_e', 'fum_c', 'fum_e', 'g3p_c', 'g6p_c', 'glc__D_e', 'gln__L_c', 'gln__L_e', 'glu__L_c', 'glu__L_e', 'glx_c', 'h2o_c', 'h2o_e', 'h_c', 'h_e', 'icit_c', 'lac__D_c', 'lac__D_e', 'mal__L_c', 'mal__L_e', 'nad_c', 'nadh_c', 'nadp_c', 'nadph_c', 'nh4_c', 'nh4_e', 'o2_c', 'o2_e', 'oaa_c', 'pep_c', 'pi_c', 'pi_e', 'pyr_c', 'pyr_e', 'q8_c', 'q8h2_c', 'r5p_c', 'ru5p__D_c', 's7p_c', 'succ_c', 'succ_e', 'succoa_c', 'xu5p__D_c', '3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
['EX_ac_e', 'EX_acald_e', 'EX_akg_e', 'EX_co2_e', 'EX_etoh_e', 'EX_for_e', 'EX_fru_e', 'EX_fum_e', 'EX_glc__D_e', 'EX_gln__L_e', 'EX_glu__L_e', 'EX_h_e', 'EX_h2o_e', 'EX_lac__D_e', 'EX_mal__L_e', 'EX_nh4_e', 'EX_o2_e', 'EX_pi_e', 'EX_pyr_e', 'EX_succ_e']
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['b1241', 'b0351', 's0001', 'b3115', 'b1849', 'b2296', 'b1276', 'b0118', 'b0474', 'b0116', 'b0726', 'b0727', 'b2587', 'b0356', 'b1478', 'b3735', 'b3733', 'b3734', 'b3732', 'b3736', 'b3738', 'b3731', 'b3737', 'b3739', 'b0720', 'b0733', 'b0979', 'b0978', 'b0734', 'b2975', 'b3603', 'b2779', 'b1773', 'b2925', 'b2097', 'b4232', 'b3925', 'b0904', 'b2492', 'b4154', 'b4152', 'b4153', 'b4151', 'b1819', 'b1817', 'b2415', 'b1818', 'b2416', 'b1611', 'b4122', 'b1612', 'b3528', 'b1852', 'b1779', 'b1621', 'b1101', 'b2417', 'b3870', 'b1297', 'b0809', 'b0810', 'b0811', 'b1761', 'b1524', 'b1812', 'b0485', 'b3213', 'b3212', 'b4077', 'b2029', 'b0875', 'b1136', 'b4015', 'b2133', 'b1380', 'b4014', 'b2976', 'b3236', 'b1479', 'b2463', 'b2286', 'b2279', 'b2276', 'b2280', 'b2288', 'b2284', 'b2287', 'b2281', 'b2277', 'b2285', 'b2282', 'b2278', 'b2283', 'b3962', 'b1602', 'b1603', 'b0451', 'b0114', 'b0115', 'b3916', 'b1723', 'b0902', 'b0903', 'b2579', 'b3114', 'b3952', 'b3951', 'b4025', 'b2926', 'b0767', 'b0755', 'b3612', 'b4395', 'b3493', 'b2987', 'b3956', 'b3403', 'b1702', 'b2297', 'b2458', 'b1676', 'b1854', 'b4301', 'b3386', 'b2914', 'b4090', 'b0722', 'b0724', 'b0721', 'b0723', 'b0728', 'b0729', 'b2464', 'b0008', 'b2935', 'b2465', 'b3919', 'EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']

New:
Reactions:
['ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
[]
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c', 'SK_accoa_c']
Genes:
['EG11670', 'EG11669', 'EG12432', 'EG11672']
Groups:
['ACETOACETATE-DEG-PWY']

Removed:
Reactions:
[]
Metabolites:
[]
Exchange:
[]
Demand:
[]
Sinks:
[]
Genes:
[]
Groups:
[]

In the next example, we use a list of database-specific reaction identifiers as pathway argument. We use the database identifier ECOLI and the compartment c (cytosol). Additionally, we define a pathway name by using the argument group. The user can also use this argument to merge pathways by using the same group names.

[19]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")

test_model = textbook_biocyc.copy()
# Defining database-specific identifiers
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]

print(f'Number of reaction prior addition: {len(test_model.reactions)}')

add_pathway(
    model=test_model,
    pathway=sequence,
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    group="curated_pathway"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Display in jupyter
test_model.groups.get_by_id("curated_pathway")
Gene-reaction rule for reaction "PEPDEPHOS-RXN" set to "OR". Please modify it if necessary.
Number of reaction prior addition: 95
Gene-reaction rule for reaction "PYRUVFORMLY-RXN" set to "OR". Please modify it if necessary.
Gene-reaction rule for reaction "FHLMULTI-RXN" set to "OR". Please modify it if necessary.
Non-zero flux test for reaction 'PEPDEPHOS_RXN_c' passed.
Reaction "PEPDEPHOS_RXN_c" added to group "curated_pathway".
The model cannot turnover the following metabolites ['FORMATE_c']. To overcome this, sink reactions were created to simulate their synthesis.
Non-zero flux test for reaction 'PYRUVFORMLY_RXN_c' passed.
Reaction "PYRUVFORMLY_RXN_c" added to group "curated_pathway".
Non-zero flux test for reaction 'FHLMULTI_RXN_c' passed.
Reaction "FHLMULTI_RXN_c" added to group "curated_pathway".
Auxiliary sink reaction for "SK_FORMATE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_HYDROGEN_MOLECULE_c" created. Consider removing it and adding the synthesis reactions for the metabolite.
Number of       new   | removed entities in
*=====================|===================*
Reactions        3    |    0
Metabolites      2    |    0
Exchange         0    |    0
Demand           0    |    0
Sinks            2    |    0
Genes           12    |    0
Groups           1    |    0

Number of reactions after addition: 100
[19]:
Pathway identifier curated_pathway
Name
Memory address 0x0135224490393088
Reactions involved

PYRUVFORMLY_RXN_c, PEPDEPHOS_RXN_c, FHLMULTI_RXN_c

Genes involved

G7627, EG11784, EG10701, EG10804, EG10803, EG10477, EG10480, EG10475, EG10285, EG10476, EG10479, EG10478

Visualization attributes
  • vertical = False
  • color_negative = None
  • color_positive = None
  • color_quantile = False

 


NOTES

  • A pathway is a set of COBRApy reactions. All the notes listed for add_metabolites() and add_reactions() also apply to pathways, i. e., handling of duplicate elements, transport reactions and the argument genome for KEGG.

  • CobraMod saves the curation process in the directory logs. The logs are saved by date.


1.1.6. Non-zero flux test

When calling the function add_pathway(), CobraMod tests each reaction of the Pathway object for its capability to carry a non-zero flux, i.e., if the involved metabolites can be turned over. Additionally, the user can test individual COBRApy reactions for their capability to cary a non-zero flux by using the function cobramod.test_non_zero_flux(). When running add_pathway() CobraMod tries to maximize (minimize if operating backwards) the flux through the tested reaction. If the flux is below solver tolerance CobraMod adds auxiliary sink reactions to all metabolites participating in the reaction and subsequently remove them one by one if the respective metabolites can be turned over by reactions in the model. CobraMod then raises a warning about the remaining auxiliary sink reactions.

Based on this information the user can then manually curate the model and if necessary add the missing synthesis reactions for the respective metabolites. If a reaction passes the test CobraMod prints a short message that the test was sucessfully passed.

The user can also use the argument ignore_list when adding a pathway to skip the non-zero-flux test for the specified reactions.

In the following artifical example (E. coli is a prokaryote without compartmentation), we introduce the glutathione synthase reaction (GLUTATHIONE-SYN-RXN) into a new comparment (x) and test the reaction for its capability to carry a non-zero flux. This reaction has the following equation:

ATP_x + GLY_x + L_GAMMA_GLUTAMYLCYSTEINE_x --> ADP_x + GLUTATHIONE_x + PROTON_x + Pi_x

Because we are testing a reaction in a new, empty compartment, CobraMod creates auxiliary sink reactions for the involved metabolites and prints a warning.

[20]:
from cobramod import test_non_zero_flux, add_reactions
from cobramod.test import textbook_biocyc

test_model = textbook_biocyc.copy()

# Adding the reaction into the test model
add_reactions(
    model=test_model,
    # The list has 4 metabolites
    obj=["GLUTATHIONE-SYN-RXN, x"],
    directory=dir_data,
    database="ECOLI",
)
# Running test for GLUTATHIONE_SYN_RXN_x
test_non_zero_flux(
    model=test_model,
    reaction="GLUTATHIONE_SYN_RXN_x",
)

Gene-reaction rule for reaction "GLUTATHIONE-SYN-RXN" set to "OR". Please modify it if necessary.
Reaction "GLUTATHIONE_SYN_RXN_x" is already present in the model. Skipping additions.
Non-zero flux test for reaction 'GLUTATHIONE_SYN_RXN_x' passed.
Auxiliary sink reaction for "SK_GLUTATHIONE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ADP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_PROTON_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_L_GAMMA_GLUTAMYLCYSTEINE_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_GLY_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_ATP_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Auxiliary sink reaction for "SK_Pi_x" created. Consider removing it and adding the synthesis reactions for the metabolite.
Reaction GLUTATHIONE_SYN_RXN_x passed the non-zero-flux test.

1.1.7. Curation process

CobraMod automatically performs the following curation steps during the creation of COBRApy reaction and metabolite objects and CobraMod pathway objects:

  1. If CobraMod encounters biochemical information (e.g. reactions or metabolites) with missing formulas or entries, a warning is printed and CobraMod informs the user about the missing data. For instance, CobraMod assigns an “X” to metabolites with missing formulas.

  2. When adding a metabolite, reaction or pathway to the model, CobraMod tries to identify if these elements are already present in the model (potentially with a different name) and use the exiting elements instead of creating redundant ones (by checking the cross-reference database entries in the elements’s metadata against the model entries).

  • In the case of user-defined, curated reactions CobraMod tries to find the respective metabolites in the model. If a metabolites is not found, a warning is printed and the metabolite is created with “X” as chemical formula.

  1. When adding metabolites, reactions or pathways, CobraMod can query MetaNetX and PubChem to find cross-references and adds the information to the corresponding Reactions and Metabolites (See Using the automatic cross-reference expansion in the documentation).

  2. CobraMod utilizes the COBRApy method cobra.Reaction.check_mass_balance() and returns a warning if mass imbalances are found.

  3. CobraMod uses the reaction reversibility information provided with the obtained reaction data. If the reversibility information is missing, CobraMod raises a warning.

  4. When CobraMod adds a pathway, each pathway reaction undergos a non-zero flux test. If a reaction cannot carry a non-zero flux, CobraMod adds auxiliary sink reactions to unblock the reaction and suggests manual curation steps based on these auxiliary modifications. (See Non-zero flux test in the documentation).

  5. CobraMod offers the function add_crossreferences which queries the genome annotation database MetaNetX to adds the missing meta-data about the cross-references.

1.1.8. Converting COBRApy groups back to CobraMod pathways

The COBRApy function cobra.io.write_sbml_model() writes cobra models to sbml files. If the user calls this function with a model that contains cobramod.Pathway elements these will be saved as COBRApy Group elements and the CobraMod pathway objects are lost. To convert COBRApy groups back to CobraMod pathways (and use the associated functionalities), the user can call the function cobramod.model_convert(). In the following example, we create a Group with four reactions and add it to the model. We then use the cobramod.model_convert() function with the argument model and can then call the newly converted respective CobraMod pathway objects.

[21]:
from cobramod import model_convert
from cobramod.test import textbook_biocyc
from cobra.core.group import Group

test_model = textbook_biocyc.copy()
# Creation of group
test_group = Group(id="curated_pathway")
for reaction in ("GLCpts", "G6PDH2r", "PGL", "GND"):
    test_group.add_members([test_model.reactions.get_by_id(reaction)])
test_model.add_groups([test_group])

# Conversion to a Pathway
model_convert(model=test_model)
# Display to Jupyter
test_model.groups.get_by_id("curated_pathway")
Group-object "curated_pathway" was transformed to a Pathway-object.
[21]:
Pathway identifier curated_pathway
Name
Memory address 0x0135225782733520
Reactions involved

GLCpts, G6PDH2r, PGL, GND

Genes involved

b1819, b1621, b1817, b1101, b2415, b2417, b1818, b2416, b1852, b0767, b2029

Visualization attributes
  • vertical = False
  • color_negative = None
  • color_positive = None
  • color_quantile = False