How to create a genome scale metabolic model

If you have a genome annotated in RAST and you want to create a genome scale metabolic model, here is one way to do it using PyFBA and the SEED.

Before you start, you’ll need to install a couple of libraries:

  1. The PyFBA library has detailed installation instructions. Don’t be scared, its mostly git clone and pip install.
  2. Get the SEED Servers as you can get a lot of information from them. You can install the git python repo from github

We need to get the assigned_functions file from RAST. The easiest way to find that is in the RAST directory by choosing `Genome Directory` from the Downloads menu on the job details page.

For my example, I am going to build an Acidobacteria model. Using the genome 1121860.3 that I downloaded from RAST.

Creating the basic model

The first step is to create a basic model, which probably won’t grow, but which we can use as the basis of our models.

Converting assigned_functions to reactions

We need to create an initial list of reactions for our model. There is example code that will convert roles to reactions and make a list of reactions you can use in the model:

python example_code/ -a assigned_functions > reactions.txt

This list of reactions is essentially the basic model but is not gap filled, and so probably won’t grow. You can test that using fba from reactions.

 python example_code/ -r reactions.txt -m media/VL55.txt

Gives the output:

In parsing the bounds we found 29 media uptake and secretion reactions and 66 other u/s reactions
Length of the media: 29
Number of reactions to run: 1070
Number of compounds in SM: 1095
Number of reactions in SM: 1166
Revised number of total reactions: 34791
Number of total compounds: 45617
SMat dimensions: 1095 x 1166
Initial run has 0.0 --> Growth: False

Note that the last line says Growth: False. As expected, our model did not grow.

Gap filling the model

The next step is to gap fill the model There are many different approaches to gap filling (summarized in Cuevas and Edwards), and PyFBA has a several different methods to gap fill. One that appears to be effective is to use roles from closely related genomes to complete your model. We will generate a list of reactions that were added to the model, and a list of functional roles that complete those reactions, at the end of this example.

Getting reactions in closely related genomes

Before we can start with building models and gap-filling, we want to get reactions from closely related genomes. We use the SEED servers and the code to get a list of reactions that should also be in the genome.

[Note: If this code was not installed when you installed PyFBA. If not, you can use `git clone` to clone the repository including the example code].

python example_code/ Acidobacteria \ 
> Acidobacteria.txt

It turns out that the taxonomy of Acidobacteria is a little confused (its a new genus, according to Wikipedia … I’m not sure if that makes it better or worse that it is confused). Therefore, I’m also going to pull the list of roles associated  with the genus Acidobacterium

python example_code/ Acidobacterium \ 
> Acidobacterium.txt

Then I used a one line perl code to join the two files together:

perl -ne '@a=split /\t/; $v{$a[0]}=$a[1]; END {map {print "$_\t$v{$_}"} keys \
%v}' Acidobacteri*txt > Acido.txt

This is the file that I will use for the rest of the work.

Gap filling the basic model

python example_code/ -r reactions.txt \
-m media/VL55.txt -c Acido.txt gapfilled_reactions.txt \
2> gapfilled_reactions.err

This creates two files, the err file tells you which reactions were added and why, and the .txt file has a list of all the reactions, including the ones in the basic model and the ones that were added.