Category Archives: Uncategorized

2015 SDSU Metagenomics Workshop

The 2015 SDSU Metagenomics Workshop is designed to be a combination of lectures, discussions, and practical hands on experience to bring people up to date on data analysis for metagenomics.

The workshop is being held in Adams Humanities Room 2108 from 10 am – 6 pm every day from June 22nd – 26th, 2015.

Registration is closed.

The agenda is online here, and will be updated as we progress.

We will use a VirtualBox virtual machine during the class. More information about the image and how to download is here. (Please note, the image is still subject to change, and so don’t download it yet!)

fastq to fasta

We often have people ask us how to convert fastq files to fasta format. We have a variety of code on this website, but sometimes that is not easy enough.

Here are a couple of ways to do it on the command line: using a PERL script written by Basusing the command line, or using prinseq-lite. Here is a C++ version that you can compile (e.g. with c++ -o fastq2fasta fastq2fasta.cpp) and run on your machine.

We also have a simple form that converts fastq files to fasta files (DNA only … it does not give you the quality scores).

Calculating Chi-squared with perl

There are two Perl repositories available on CPAN that deal with Chi-squared analysis(Statistics::ChiSquare and Statistics::Distributions).  However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d


variable population 1 population 2
+ a b
c d

Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased species 1 species 2
No 57 36
Yes 63 88

n = (57 + 36 + 63 + 88) = 244

χ2 = [244*(57*88 – 36*63)2] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ2 = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

D.F. 0.1 0.05 0.025 0.01 0.005
1 2.71 3.84 5.02 6.63 7.88

The χ2 value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

sub chi_squared {
     my ($a,$b,$c,$d) = @_;
     return 0 if($b+$d == 0);
     my $n= $a + $b + $c + $d;
     return (($n*($a*$d - $b*$c)**2) / (($a + $b)*($c + $d)*($a + $c)*($b + $d)));
print &chi_squared(57,36,63,88); 




Rob Edwards’ bioinformatics lab at San Diego State University is all about decoding life’s best kept secrets. These secrets are encoded, as you must have already guessed, in genomes of bacteria, archaea, eukaryotes and the viruses that infect them.

We use all kinds of computers, from clusters to cell phones, to solve the most unsolvable computational problems that help us better understand biology.

We are funded by the National Science Foundation, Lawrence Livermore National Laboratory, and the National Institutes of Health to explore phage genomes and to explore phage metagenomes (and the unknown genes in them).

Rob has collaborations all over the world, and has taught in Europe, Asia, and Latin America. We have been funded by the Department of Education through the Fund for the Improvement of Postsecondary Education and the Brazilian Ministry of Education (FIPSE-CAPES) to develop a marine sciences course in Brazil.

Rob has published over 160 peer-reviewed papers, and given an equal number of talks. A short biography about Rob describes his background, and his CV has more information. You can contact Rob for more information.


Please read our diversity and inclusion statement and provide feedback to Rob.