Author Archives: Katelyn

ReadMapper: a program that maps sequencing reads to a nucleotide backbone, and creates a publication quality figure.

Introduction

ReadMapper is based on R code that was written to analyze our inhouse data.  The code has been modified to make it user friendly and allow other people, who are not as versed in the R language, to be able to map sequencing reads to a nucleotide backbone, and create publication quality figures.  In our project we were mapping translatome data to the genome of the phage Lambda (this is what the included sample data is), as well as the currently annotated gene open reading frames.  Which looks just like the figure below.

Continue reading

Calculating Chi-squared with perl

There are two Perl repositories available on CPAN that deal with Chi-squared analysis(Statistics::ChiSquare and Statistics::Distributions).  However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d

Where:

variable population 1 population 2
+ a b
c d

Example:
Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased species 1 species 2
No 57 36
Yes 63 88

n = (57 + 36 + 63 + 88) = 244

χ2 = [244*(57*88 – 36*63)2] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ2 = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

D.F. 0.1 0.05 0.025 0.01 0.005
1 2.71 3.84 5.02 6.63 7.88

The χ2 value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

sub chi_squared {
     my ($a,$b,$c,$d) = @_;
     return 0 if($b+$d == 0);
     my $n= $a + $b + $c + $d;
     return (($n*($a*$d - $b*$c)**2) / (($a + $b)*($c + $d)*($a + $c)*($b + $d)));
}
print &chi_squared(57,36,63,88); 

Output:

8.81780430153469

Is there a difference between a ‘query to db’ BLAST and a ‘db to query’ BLAST

Often times the question comes up whether there is a difference between using your query file as the query in a BLAST when compared to using the query file as a database.  There is, in fact a difference between the two, and that difference is in the e-value.   Consider the following db file:

>one
acgtacgtagCtagctagctagctgactagc
>two
acgtacgtagAtagctagctagctgactagc
>three
acgtacgtagTtagctagctagctgactagc
>four
acgtacgtagGtagctagctagctgactagc

And the following query file:

>qs1
acgtacgtagCtagctagctagctgactagc

When you have a blast database, the e-value is calculated from the bit score, and takes into account the size of the database, to give an estimate of the probability of finding a certain sequence in a database. In the example above, you see the same four sequences which only differ by one base.  When you BLAST the query sequence qs1 against the database of four sequences, and since you see the sequence qs1 quite often in the database, your e-values will be higher(worse).

$ blastn -db db.fna -query query.fna -outfmt '6'
qs1 one 100.00  31  0   0   1   31  1   31  7e-15   58.4
qs1 four    96.77   31  1   0   1   31  1   31  3e-13   52.8
qs1 three   96.77   31  1   0   1   31  1   31  3e-13   52.8
qs1 two 96.77   31  1   0   1   31  1   31  3e-13   52.8

However if you reverse the database and query, because there is only a single unique sequence qs1 in your database, your e-values will be lower(better).

$ blastn -db query.fna -query db.fna -outfmt '6'
one qs1 100.00  31  0   0   1   31  1   31  2e-15   58.4
two qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8
three   qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8
four    qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8

The two red highlighted e-values should be identical, but they are not.

While the skew is only marginal here, consider if you had a database consisting of the millions of bases.

Perl Smith Waterman script

The following is a small script to perform a Smith Waterman alignment, and to calculate the percent identity between two sequences.

#!/usr/bin/perl  use strict; #use warnings;   
@Constants::nucs = split(//, "ACGT"); 
@Constants::aas  = split(//, "ARNDCQEGHILKMFPSTWYV");             
%Constants::gencode =
      (TTT => "F", TTC => "F", TTA => "L", TTG => "L",
      TCT => "S", TCC => "S", TCA => "S", TCG => "S",
      TAT => "Y", TAC => "Y", TAA => "*", TAG => "*",
      TGT => "C", TGC => "C", TGA => "*", TGG => "W",
      CTT => "L", CTC => "L", CTA => "L", CTG => "L",
      CCT => "P", CCC => "P", CCA => "P", CCG => "P",
      CAT => "H", CAC => "H", CAA => "Q", CAG => "Q",
      CGT => "R", CGC => "R", CGA => "R", CGG => "R",
      ATT => "I", ATC => "I", ATA => "I", ATG => "M",
      ACT => "T", ACC => "T", ACA => "T", ACG => "T",
      AAT => "N", AAC => "N", AAA => "K", AAG => "K",
      AGT => "S", AGC => "S", AGA => "R", AGG => "R",
      GTT => "V", GTC => "V", GTA => "V", GTG => "V",
      GCT => "A", GCC => "A", GCA => "A", GCG => "A",
      GAT => "D", GAC => "D", GAA => "E", GAG => "E",
      GGT => "G", GGC => "G", GGA => "G", GGG => "G" );
# The BLOSUM45 amino acid substitution matrix 
@Constants::blosum45 =
     #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
   ( [  5,-2,-1,-2,-1,-1,-1, 0,-2,-1,-1,-1,-1,-2,-1, 1, 0,-2,-2, 0],  # A
     [ -2, 7, 0,-1,-3, 1, 0,-2, 0,-3,-2, 3,-1,-2,-2,-1,-1,-2,-1,-2],  # R
     [ -1, 0, 6, 2,-2, 0, 0, 0, 1,-2,-3, 0,-2,-2,-2, 1, 0,-4,-2,-3],  # N 
     [ -2,-1, 2, 7,-3, 0, 2,-1, 0,-4,-3, 0,-3,-4,-1, 0,-1,-4,-2,-3],  # D
     [ -1,-3,-2,-3,12,-3,-3,-3,-3,-3,-2,-3,-2,-2,-4,-1,-1,-5,-3,-1],  # C
     [ -1, 1, 0, 0,-3, 6, 2,-2, 1,-2,-2, 1, 0,-4,-1, 0,-1,-2,-1,-3],  # Q
     [ -1, 0, 0, 2,-3, 2, 6,-2, 0,-3,-2, 1,-2,-3, 0, 0,-1,-3,-2,-3],  # E
     [  0,-2, 0,-1,-3,-2,-2, 7,-2,-4,-3,-2,-2,-3,-2, 0,-2,-2,-3,-3],  # G
     [ -2, 0, 1, 0,-3, 1, 0,-2,10,-3,-2,-1, 0,-2,-2,-1,-2,-3, 2,-3],  # H
     [ -1,-3,-2,-4,-3,-2,-3,-4,-3, 5, 2,-3, 2, 0,-2,-2,-1,-2, 0, 3],  # I
     [ -1,-2,-3,-3,-2,-2,-2,-3,-2, 2, 5,-3, 2, 1,-3,-3,-1,-2, 0, 1],  # L
     [ -1, 3, 0, 0,-3, 1, 1,-2,-1,-3,-3, 5,-1,-3,-1,-1,-1,-2,-1,-2],  # K
     [ -1,-1,-2,-3,-2, 0,-2,-2, 0, 2, 2,-1, 6, 0,-2,-2,-1,-2, 0, 1],  # M
     [ -2,-2,-2,-4,-2,-4,-3,-3,-2, 0, 1,-3, 0, 8,-3,-2,-1, 1, 3, 0],  # F
     [ -1,-2,-2,-1,-4,-1, 0,-2,-2,-2,-3,-1,-2,-3, 9,-1,-1,-3,-3,-3],  # P
     [  1,-1, 1, 0,-1, 0, 0, 0,-1,-2,-3,-1,-2,-2,-1, 4, 2,-4,-2,-1],  # S
     [  0,-1, 0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-1,-1, 2, 5,-3,-1, 0],  # T
     [ -2,-2,-4,-4,-5,-2,-3,-2,-3,-2,-2,-2,-2, 1,-3,-4,-3,15, 3,-3],  # W
     [ -2,-1,-2,-2,-3,-1,-2,-3, 2, 0, 0,-1, 0, 3,-3,-2,-1, 3, 8,-1],  # Y
     [  0,-2,-3,-3,-1,-3,-3,-3,-3, 3, 1,-2, 1, 0,-3,-1, 0,-3,-1, 5]   # V
     #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V      );
# The BLOSUM50 amino acid substitution matrix 
@Constants::blosum50 =
      #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
   ( [  5,-2,-1,-2,-1,-1,-1, 0,-2,-1,-2,-1,-1,-3,-1, 1, 0,-3,-2, 0],  # A
     [ -2, 7,-1,-2,-4, 1, 0,-3, 0,-4,-3, 3,-2,-3,-3,-1,-1,-3,-1,-3],  # R
     [ -1,-1, 7, 2,-2, 0, 0, 0, 1,-3,-4, 0,-2,-4,-2, 1, 0,-4,-2,-3],  # N
     [ -2,-2, 2, 8,-4, 0, 2,-1,-1,-4,-4,-1,-4,-5,-1, 0,-1,-5,-3,-4],  # D
     [ -1,-4,-2,-4,13,-3,-3,-3,-3,-2,-2,-3,-2,-2,-4,-1,-1,-5,-3,-1],  # C
     [ -1, 1, 0, 0,-3, 7, 2,-2, 1,-3,-2, 2, 0,-4,-1, 0,-1,-1,-1,-3],  # Q
     [ -1, 0, 0, 2,-3, 2, 6,-3, 0,-4,-3, 1,-2,-3,-1,-1,-1,-3,-2,-3],  # E
     [  0,-3, 0,-1,-3,-2,-3, 8,-2,-4,-4,-2,-3,-4,-2, 0,-2,-3,-3,-4],  # G
     [ -2, 0, 1,-1,-3, 1, 0,-2,10,-4,-3, 0,-1,-1,-2,-1,-2,-3, 2,-4],  # H 
     [ -1,-4,-3,-4,-2,-3,-4,-4,-4, 5, 2,-3, 2, 0,-3,-3,-1,-3,-1, 4],  # I
     [ -2,-3,-4,-4,-2,-2,-3,-4,-3, 2, 5,-3, 3, 1,-4,-3,-1,-2,-1, 1],  # L
     [ -1, 3, 0,-1,-3, 2, 1,-2, 0,-3,-3, 6,-2,-4,-1, 0,-1,-3,-2,-3],  # K
     [ -1,-2,-2,-4,-2, 0,-2,-3,-1, 2, 3,-2, 7, 0,-3,-2,-1,-1, 0, 1],  # M
     [ -3,-3,-4,-5,-2,-4,-3,-4,-1, 0, 1,-4, 0, 8,-4,-3,-2, 1, 4,-1],  # F
     [ -1,-3,-2,-1,-4,-1,-1,-2,-2,-3,-4,-1,-3,-4,10,-1,-1,-4,-3,-3],  # P
     [  1,-1, 1, 0,-1, 0,-1, 0,-1,-3,-3, 0,-2,-3,-1, 5, 2,-4,-2,-2],  # S
     [  0,-1, 0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-2,-1, 2, 5,-3,-2, 0],  # T
     [ -3,-3,-4,-5,-5,-1,-3,-3,-3,-3,-2,-3,-1, 1,-4,-4,-3,15, 2,-3],  # W
     [ -2,-1,-2,-3,-3,-1,-2,-3, 2,-1,-1,-2, 0, 4,-3,-2,-2, 2, 8,-1],  # Y
     [  0,-3,-3,-4,-1,-3,-3,-4,-4, 4, 1,-3, 1,-1,-3,-2, 0,-3,-1, 5]   # V
     #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    );
# The BLOSUM62 amino acid substitution matrix  
@Constants::blosum62 =
     #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
   ( [  4,-1,-2,-2, 0,-1,-1, 0,-2,-1,-1,-1,-1,-2,-1, 1, 0,-3,-2, 0],  # A
     [ -1, 5, 0,-2,-3, 1, 0,-2, 0,-3,-2, 2,-1,-3,-2,-1,-1,-3,-2,-3],  # R
     [ -2, 0, 6, 1,-3, 0, 0, 0, 1,-3,-3, 0,-2,-3,-2, 1, 0,-4,-2,-3],  # N
     [ -2,-2, 1, 6,-3, 0, 2,-1,-1,-3,-4,-1,-3,-3,-1, 0,-1,-4,-3,-3],  # D
     [  0,-3,-3,-3, 9,-3,-4,-3,-3,-1,-1,-3,-1,-2,-3,-1,-1,-2,-2,-1],  # C
     [ -1, 1, 0, 0,-3, 5, 2,-2, 0,-3,-2, 1, 0,-3,-1, 0,-1,-2,-1,-2],  # Q
     [ -1, 0, 0, 2,-4, 2, 5,-2, 0,-3,-3, 1,-2,-3,-1, 0,-1,-3,-2,-2],  # E
     [  0,-2, 0,-1,-3,-2,-2, 6,-2,-4,-4,-2,-3,-3,-2, 0,-2,-2,-3,-3],  # G
     [ -2, 0, 1,-1,-3, 0, 0,-2, 8,-3,-3,-1,-2,-1,-2,-1,-2,-2, 2,-3],  # H
     [ -1,-3,-3,-3,-1,-3,-3,-4,-3, 4, 2,-3, 1, 0,-3,-2,-1,-3,-1, 3],  # I
     [ -1,-2,-3,-4,-1,-2,-3,-4,-3, 2, 4,-2, 2, 0,-3,-2,-1,-2,-1, 1],  # L
     [ -1, 2, 0,-1,-3, 1, 1,-2,-1,-3,-2, 5,-1,-3,-1, 0,-1,-3,-2,-2],  # K
     [ -1,-1,-2,-3,-1, 0,-2,-3,-2, 1, 2,-1, 5, 0,-2,-1,-1,-1,-1, 1],  # M 
     [ -2,-3,-3,-3,-2,-3,-3,-3,-1, 0, 0,-3, 0, 6,-4,-2,-2, 1, 3,-1],  # F
     [ -1,-2,-2,-1,-3,-1,-1,-2,-2,-3,-3,-1,-2,-4, 7,-1,-1,-4,-3,-2],  # P
     [  1,-1, 1, 0,-1, 0, 0, 0,-1,-2,-2, 0,-1,-2,-1, 4, 1,-3,-2,-2],  # S
     [  0,-1, 0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-2,-1, 1, 5,-2,-2, 0],  # T
     [ -3,-3,-4,-4,-2,-2,-3,-2,-2,-3,-2,-3,-1, 1,-4,-3,-2,11, 2,-3],  # W
     [ -2,-2,-2,-3,-2,-1,-2,-3, 2,-1,-1,-2,-1, 3,-3,-2,-2, 2, 7,-1],  # Y
     [  0,-3,-3,-3,-1,-2,-2,-3,-3, 3, 1,-2, 1,-1,-2,-2, 0,-3,-1, 4]   # V
     #  A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V      );

# The amino acids, and a hashmap from amino acid to its index 0-19  
my %aaIndex;

for (my $i=0; $i<20; $i++) { $aaIndex{$Constants::aas[$i]} = $aaIndex{lc($Constants::aas[$i])} = $i; } # The score of a pair of amino acids in a given matrix sub score { my ($matrix, $aa1, $aa2) = @_; # By ref, val, val return $$matrix[$aaIndex{$aa1}][$aaIndex{$aa2}]; } # The maximum of a list of numbers sub max { my $res = -1E300; foreach (@_) { if ($_ > $res) {
      $res = $_;
    }
  }
  return $res;
} 

# Make random DNA sequence of specified length  
sub randomdna {
   my ($len) = @_;
   # length of sequence to generate
   my $dna = "";
   foreach (1..$len) {
     my $p = rand(4);
     $dna = $dna . substr("ACGT", $p, 1);
   }
   return $dna;
}   
use Data::Dumper; 
#use Constants;  
# Smith-Waterman  Algorithm  

my $seq1 = "MSEFRILSDREHVLKRAGMYIGSTTYEEHQRFLFGKWTKISYVPGLVKIIDEIIDNSVDEAIRTNFEFANVISVDIQNNIVTVTDNGRGIPQHMVTTPEGTQIPQPVAAWTRTKAGSNFDDTNRLTQGMNGVGSSLSNFFSDWFTGVTCDGETEMTVQCTNGAENITWSSKPSKLKGTNVTFCPDFDHFEGYMMDNSVLDIVHDRLQSLSVIFPKITFKFNGKRIVSKFKDYAKLYNDDPFVIEDKNFSLALVPAVEGEGFKSVSFANGLFTKNGGTHVDYVTDDLCEEIVKRIKKEYKVDVTKAAVKSGLTCILIVRELPNLRFDSQTKERLTNPTGDIKRHIDLDFKKLAKVIAKKENIIMPIIAVVLARKEAADKAAATKAAKAAKRAKVAKHVKANLIGTDAETTLFLTEGDSAINYLISVRDQDLHGGFPLRGKTLTTWEQPEAKIVKNAEIFNIMAITGLQFGVDALDVMQYKNIAIMTDADTDGIGSIKPSLISFFARWPELFEDGRIRFIKTPIIIAEPKKGDDVRWYYDLEDFENDRDNIkGYNIRYIKGLASLTESDYHRVINDPVMEIITLPENYKELFDLLYSEDADQRKIWMQS"; 
my $seq2 = "MSIRLLVHLNQIYTTYNYTNMSDLLLNVDNDYLDLAAALNIDINTITDNIDINTSKSSLNTYFTSLPKDVVVRRLNSASYQDSQEMRWPYPGQDNQKLFEQFEGVNGVIVQLQGVILQHQAQLDHSYWDEANSKYVRFCNSVGYKRTYPDGSIKLIQGLPENVCLKGVTEYGDPPNRPLPVIDKLGLVGKKGMTCSECIRAGLHSQEVEGKDRPVTCSPTGQLIFYVTGFTTRVLSNKGGKVTSTFNDYTVKELMDDTGFILIIPLKAKSTRRGIWDASTKQWTSVGYEAMVNNLIYKHSKDFANAPVGKRDTIAMKMSPYFQTIIISIVPPNPEDKNPKASLNFAVKEIPDLGAIKAARKYWQQINPAGEINVLNEDDFSNTKSVGLCAAEIVEEEPIEINKNPWAE";  
# scoring scheme 
my $GAP = -8;  
# initialization 
my @matrix; 
$matrix[0][0]{score}   = 0; 
$matrix[0][0]{pointer} = "none"; 
for(my $j = 1; $j <= length($seq1); $j++) {
     $matrix[0][$j]{score}   = 0;
     $matrix[0][$j]{pointer} = "none";
 } 
for (my $i = 1; $i <= length($seq2); $i++) {
     $matrix[$i][0]{score}   = 0;
     $matrix[$i][0]{pointer} = "none"; 
}  
# fill 
my $max_i     = 0; 
my $max_j     = 0; 
my $max_score = 0; 
my $scoring_matrix = @Constants::blosum50; 
for(my $i = 1; $i <= length($seq2); $i++) {
     for(my $j = 1; $j  $max_score) {
             $max_i     = $i;
             $max_j     = $j;
             $max_score = $matrix[$i][$j]{score};
         }
     }
}
# trace-back  
my $align1 = ""; 
my $align2 = "";  
my $j = $max_j; 
my $i = $max_i;  
while (1) {
     last if $matrix[$i][$j]{pointer} eq "none";
     if ($matrix[$i][$j]{pointer} eq "diagonal") {
         $align1 .= substr($seq1, $j-1, 1);
         $align2 .= substr($seq2, $i-1, 1);
         $i--; $j--;
     }elsif ($matrix[$i][$j]{pointer} eq "left") {
         $align1 .= substr($seq1, $j-1, 1);
         $align2 .= "-";
         $j--;
     }elsif($matrix[$i][$j]{pointer} eq "up"){
         $align1 .= "-";
         $align2 .= substr($seq2, $i-1, 1);
         $i--;
     }
}  
$align1 = reverse $align1; 
$align2 = reverse $align2; 
print "$align1n"; 
print "$align2n"; 
print &pid($align1,$align2);  

sub pid {
     my ($seq1, $seq2) = @_;
     my $matches = 0;
     my @seq1 = split(//, $seq1);
     my @seq2 = split(//, $seq2);
     for(my $i = 0; $i <= $#seq1 ; $i++) {
          if($seq1[$i] eq $seq2[$i]){
               $matches++;
          }
     }
     return ($matches/($#seq1+1));
}

PHACTS: Phage Classification Tool Set

There are two distinct phage lifestyles: lytic and lysogenic. The lysogenic lifestyle has many implications for phage therapy, genomics, and microbiology, however it is often very difficult to determine whether a newly sequenced phage isolate grows lytically or lysogenically just from the genome. Using the ~200 known phage genomes, a supervised random forest classifier was built to determine which proteins of phage are important for determining lytic and lysogenic traits. A similarity vector is created for each phage by comparing each protein from a random sampling of all known phage proteins to each phage genome. Each value in the similarity vector represents the protein with the highest similarity score for that phage genome. This vector is used to train a random forest to classify phage according to their lifestyle. To test the classifier each phage is removed from the data set one at a time and treated as a single unknown. The classifier was able to successfully group 188 of the 196 phages for whom the lifestyle is known, giving my algorithm an estimated 4% error rate. The classifier also identifies the most important genes for determining lifestyle; in addition to integrases, expected to be important, the composition of the phage (capsid and tail) also determines the lifestyle. A large number of hypothetical proteins are also involved in determining whether a phage is lytic or lysogenic.

Dealing with Variant counts

Here is a sample of the data I collated for Actinobacillus_pleuropneumoniae_L20_(416269.5). This data is for the compound C00025, which is L-Glutamate.  Only the first 6 reactions are shown; there are about 150.

C00025:R00021:
Glutamine,_Glutamate,_Aspartate_and_Asparagine_Biosynthesis     Ferredoxin-dependent glutamate synthase (EC 1.4.7.1)            1.24
cAMP_signaling_in_bacteria      Ferredoxin-dependent glutamate synthase (EC 1.4.7.1)            0
Gamma-aminobutyrate_(GABA)_shunt        Ferredoxin-dependent glutamate synthase (EC 1.4.7.1)            0
Photorespiration_(oxidative_C2_cycle)   Ferredoxin-dependent glutamate synthase (EC 1.4.7.1)            -1
Ammonia_assimilation    Ferredoxin-dependent glutamate synthase (EC 1.4.7.1)            0
C00025:R00093:

C00025:R00114:
Glutamine,_Glutamate,_Aspartate_and_Asparagine_Biosynthesis     Glutamate synthase [NADPH] small chain (EC 1.4.1.13)            1.24
Glutamine,_Glutamate,_Aspartate_and_Asparagine_Biosynthesis     Glutamate synthase [NADPH] large chain (EC 1.4.1.13)            1.24
Iron-sulfur_experimental        Glutamate synthase [NADPH] small chain (EC 1.4.1.13)            0
Iron-sulfur_experimental        Glutamate synthase [NADPH] large chain (EC 1.4.1.13)            0
Ammonium_metabolism_H._pylori   Glutamate synthase [NADPH] large chain (EC 1.4.1.13)
YgfZ    Glutamate synthase [NADPH] small chain (EC 1.4.1.13)            0
Ammonia_assimilation    Glutamate synthase [NADPH] small chain (EC 1.4.1.13)            0
Ammonia_assimilation    Glutamate synthase [NADPH] large chain (EC 1.4.1.13)            0
CBSS-155864.1.peg.3753  Glutamate synthase [NADPH] small chain (EC 1.4.1.13)
C00025:R00239:
Proline_Synthesis       Glutamate 5-kinase (EC 2.7.2.11)        fig|416269.5.peg.849    1
Iojap   Glutamate 5-kinase (EC 2.7.2.11)        fig|416269.5.peg.849    0
Proline,_4-hydroxyproline_uptake_and_utilization        Glutamate 5-kinase (EC 2.7.2.11)        fig|416269.5.peg.849    1
CBSS-316058.9.peg.23    Glutamate 5-kinase (EC 2.7.2.11)
YggW    Glutamate 5-kinase (EC 2.7.2.11)        fig|416269.5.peg.849    0
PROSC   Glutamate 5-kinase (EC 2.7.2.11)        fig|416269.5.peg.849    0
C00025:R00240:

C00025:R00241:


In the data you can see examples of positive variant counts, negative variant counts, zero variant counts, and reactions that don’t exist in the genome.

The data eventually ‘simplifies down’ into:

C00025: 1.02    0.0001  2.03    2.01    0.0001  0.0001  …

Looking at the first reaction(1.02) the total gets +1 for the first subsystem ‘Glutamine,_Glutamate..’. It gets a +0.01 for the next subsystem ‘cAMP_signaling’.  Nothing is added for the next subsystem since it comes back as experimental.  Then nothing is added for ‘Phosorespiration’ since it is a negative variant count.  And a +0.01 is added for the last ammonium assimilation. Giving a 1.02, which is eventually simplified into a 1.

 

The second reaction returns a NULL and is empty, thus it gets a +0.0001.  Which ends up being 0.0001, which ends up as a 0.