iPython Notebook How-To: Gap-fill a model in PyFBA

Gap-fill a model in PyFBA

by Daniel Cuevas


In this notebook, we will present the steps to generate a genome-scale metabolic model from RAST annotations, gap-fill the model on rich LB type media, and save the model to hard disk.

The required files and information for this notebook:

  • List of functional roles from RAST (normally labeled ‘assigned_functions’ from the Genome Directory download).
  • Organism name
  • Organism ID
  • Media file
  • Close genomes functional roles file
  • Directory on hard disk to store model
In [1]:
import sys
import os
import PyFBA

Generate model

The first step shows how to build the model from RAST functional roles.

In [2]:
model_functions_file = "data/citrobacter.assigned_functions"
close_genomes_functions_file = "data/close_genomes_functions"
org_name = "Citrobacter sedlakii"
org_id = "Citrobacter sedlakii"
In [3]:
model = PyFBA.model.roles_to_model(model_functions_file, org_id, org_name)

The model has been generated and is now ready to use for flux-balance analysis simulations. Running flux-balance analysis will show the model does not contain all required metabolism to grow in the LB media.

Here are the LB media contents. For PyFBA media files are stored in directory indicated by environmental variable ‘PYFBA_MEDIA_DIR’. This step is only to show file contents but is not required for gap-filling.

In [4]:
lb_media_file = os.path.join(os.environ["PYFBA_MEDIA_DIR"], "ArgonneLB.txt")
with open(lb_media_file) as f:
    for l in f:
        print(l, end="")
Compound   Name    Formula Charge
cpd03424    Vitamin B12 C61H86CoN13O14PR    6
cpd00215    Pyridoxal   C8H9NO3 0
cpd00028    Heme    C34H30FeN4O4    -2
cpd10515    Fe2+    Fe  2
cpd00030    Mn2+    Mn  2
cpd00149    Co2+    Co  2
cpd00058    Cu2+    Cu  2
cpd00099    Cl- Cl  -1
cpd00007    O2  O2  0
cpd00034    Zn2+    Zn  2
cpd00156    L-Valine    C5H11NO2    0
cpd00249    Uridine C9H12N2O6   0
cpd00092    Uracil  C4H4N2O2    0
cpd00069    L-Tyrosine  C9H11NO3    0
cpd00065    L-Tryptophan    C11H12N2O2  0
cpd00184    Thymidine   C10H14N2O5  0
cpd00161    L-Threonine C4H9NO3 0
cpd00048    Sulfate O4S -2
cpd00054    L-Serine    C3H7NO3 0
cpd00220    Riboflavin  C17H20N4O6  0
cpd00129    L-Proline   C5H8NO2 -1
cpd00644    PAN C9H16NO5    -1
cpd00009    Phosphate   HO4P    -2
cpd00066    L-Phenylalanine C9H11NO2    0
cpd00218    Niacin  C6H4NO2 -1
cpd00971    Na+ Na  1
cpd00254    Mg  Mg  2
cpd00060    L-Methionine    C5H11NO2S   0
cpd00039    L-Lysine    C6H15N2O2   1
cpd00107    L-Leucine   C6H13NO2    0
cpd00205    K+  K   1
cpd00246    Inosine C10H12N4O5  0
cpd00322    L-Isoleucine    C6H13NO2    0
cpd00226    HYXN    C5H4N4O 0
cpd00119    L-Histidine C6H9N3O2    0
cpd00531    Hg2+    Hg  2
cpd00001    H2O H2O 0
cpd00067    H+  H   1
cpd00033    Glycine C2H5NO2 0
cpd00023    L-Glutamate C5H8NO4 -1
cpd00027    D-Glucose   C6H12O6 0
cpd00393    Folate  C19H17N7O6  -2
cpd10516    fe3 Fe  3
cpd00654    Deoxycytidine   C9H13N3O4   0
cpd00438    Deoxyadenosine  C10H13N5O3  0
cpd00381    L-Cystine   C6H12N2O4S2 0
cpd11595    chromate    H2O4Cr  -2
cpd01012    Cd2+    Cd  2
cpd00063    Ca2+    Ca  2
cpd00041    L-Aspartate C4H6NO4 -1
cpd01048    Arsenate    HO4As   -2
cpd00051    L-Arginine  C6H15N4O2   1
cpd00035    L-Alanine   C3H7NO2 0
cpd00182    Adenosine   C10H13N5O4  0
cpd00311    Guanosine   C10H13N5O5  0
cpd00126    GMP C10H13N5O8P -1
cpd00018    AMP C10H13N5O7P -1
cpd00091    UMP C9H12N2O9P  -1
cpd00046    CMP C9H13N3O8P  -1
cpd00793    Thiamine phosphate  C12H17N4O4PS    0
cpd00541    Lipoate C8H13O2S2   -1
cpd00239    H2S H2S 0
cpd00013    NH3 H4N 1
cpd00244    Ni2+    Ni  2
cpd11574    Molybdate   H2MoO4  0
In [5]:
# status := optimization status of FBA simplex solver
# flux_value := biomass flux value (objective function)
# growth := boolean whether the model was able to grow or not
status, flux_value, growth = model.run_fba("ArgonneLB.txt")
print("Growth:", growth)
Growth: False

Gap-fill model on LB media

Each model object in PyFBA contains a gapfill() function that requires two arguments:

  1. Media file
  2. Close genomes functional roles file

The other two arguments here, use_flux and verbose, are optional.

  • use_flux is a boolean flag that will identify which reactions that were added during the first phase of gap-filling have a non-active or zero flux. These reactions are then removed before the second phase of gap-filling occurs. This lowers the number of reactions that must be tested during second phase, thus speeding up the gap-filling process.
  • verbose is an integer flag that will output status update to stderr.
In [6]:
success = model.gapfill("ArgonneLB.txt", close_genomes_functions_file, use_flux=True, verbose=1)
if not success:
    print("Model was unable to gap-fill!")
Current model contains 1445 reactions
Finding media import reactions
Found 139 reactions
New total: 1584 reactions
Finding essential reactions
Found 109 reactions
New total: 1630 reactions
Finding close organism reactions
Found 1875 reactions
New total: 2047 reactions
Finding subsystem reactions
Found 237 reactions
New total: 2284 reactions
Finding EC reactions
Found 0 reactions
New total: 2284 reactions
Finding compound-probability reactions
Found 2686 reactions
New total: 4970 reactions
Gap-fill was successful, now trimming model
Removed 2654 reactions based on flux value
Trimming probable group of reactions
At the beginning the base list has 1817  and the optional list has 1273 reactions
Trimming ec group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming subsystems group of reactions
The set of 'base' reactions results in growth so we don't need to bisect the optional set
Trimming close genomes group of reactions
At the beginning the base list has 1520  and the optional list has 165 reactions
Trimming essential group of reactions
At the beginning the base list has 1514  and the optional list has 6 reactions
Trimming media group of reactions
At the beginning the base list has 1451  and the optional list has 54 reactions
Trimming complete.
Total gap-filled reactions: 8
The biomass reaction has a flux of 323.36325286093853

We can view the reactions that were gap-filled into the model.

In [7]:
for n, rid in enumerate(model.gf_reactions, start=1):
    print("({}) {}: {}".format(n, rid, model.reactions[rid].equation))
(1) rxn29117: (1) Pyridoxal[e] <=> (1) Pyridoxal
(2) rxn13681: (1) H+[e] + (1) Co2+ <=> (1) H+ + (1) Co2+[e]
(3) rxn01513: (1) ATP + (1) H+ + (1) dTMP <=> (1) ADP + (1) dTDP
(4) rxn05645: (1) H+[e] + (1) Riboflavin[e] <=> (1) H+ + (1) Riboflavin
(5) rxn10571: (1) H2O + (1) ATP + (1) Mg[e] <=> (1) ADP + (1) Phosphate + (1) H+ + (1) Mg
(6) rxn00119: (1) ATP + (1) H+ + (1) UMP <=> (1) ADP + (1) UDP
(7) rxn03164: (1) ATP + (1) Ala-Ala + (1) UDP-N-acetylmuramoyl-L-alanyl-D-gamma-glutamyl-meso-2-6-diaminopimelate <=> (1) ADP + (1) Phosphate + (1) H+ + (1) UDP-N-acetylmuramoyl-L-alanyl-D-glutamyl-6-carboxy-L-lysyl-D-alanyl- D-alanine
(8) rxn02285: (1) NADP + (1) UDP-MurNAc <=> (1) NADPH + (1) H+ + (1) UDP-N-acetylglucosamine enolpyruvate

Save model

The second step shows how to save the model to hard disk.

In [8]:
model_directory = "save_citrobacter_sedlakii"
PyFBA.model.save_model(model, model_directory)

Model has been stored. Here is a directory listing of the files that were created.

In [9]:
for f in os.listdir(model_directory):
    fp = os.path.join(model_directory, f)
    print(f, ": ", os.path.getsize(fp), "B", sep="")
Citrobacter sedlakii.compounds: 27883B
Citrobacter sedlakii.gfmedia: 14B
Citrobacter sedlakii.gfreactions: 72B
Citrobacter sedlakii.info: 114B
Citrobacter sedlakii.reactions: 13077B
Citrobacter sedlakii.roles: 70146B