Category Archives: Uncategorized

fastq to fasta

We often have people ask us how to convert fastq files to fasta format. We have a variety of code on this website, but sometimes that is not easy enough.

Here are a couple of ways to do it on the command line: using a PERL script written by Basusing the command line, or using prinseq-lite. Here is a C++ version that you can compile (e.g. with c++ -o fastq2fasta fastq2fasta.cpp) and run on your machine.

We also have a simple form that converts fastq files to fasta files (DNA only … it does not give you the quality scores).

Calculating Chi-squared with perl

There are two Perl repositories available on CPAN that deal with Chi-squared analysis(Statistics::ChiSquare and Statistics::Distributions).  However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d


variable population 1 population 2
+ a b
c d

Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased species 1 species 2
No 57 36
Yes 63 88

n = (57 + 36 + 63 + 88) = 244

χ2 = [244*(57*88 – 36*63)2] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ2 = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

D.F. 0.1 0.05 0.025 0.01 0.005
1 2.71 3.84 5.02 6.63 7.88

The χ2 value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

sub chi_squared {
     my ($a,$b,$c,$d) = @_;
     return 0 if($b+$d == 0);
     my $n= $a + $b + $c + $d;
     return (($n*($a*$d - $b*$c)**2) / (($a + $b)*($c + $d)*($a + $c)*($b + $d)));
print &chi_squared(57,36,63,88); 




Rob Edwards’ bioinformatics lab at San Diego State University is all about decoding life’s best kept secrets. These secrets are encoded, as you must have already guessed, in genomes of bacteria, archaea, eukaryotes and the viruses that infect them.

We use all kinds of computers, from clusters to cell phones, to solve the most unsolvable computational problems that help us better understand biology.

We are funded by the National Science Foundation to explore phage genomes, through our PhAnToMe project, and to explore phage metagenomes (and the unknown genes in them) through our new Viral Dark Matter Project.

Rob has collaborations all over the world, and has taught in Europe, Asia, and Latin America. We are currently funded by the Department of Education through the Fund for the Improvement of Postsecondary Education and the Brazilian Ministry of Education (FIPSE-CAPES) to develop a marine sciences course in Brazil.

Rob has published over 60 peer-reviewed papers, and given an equal number of talks. A short biography about Rob describes his background, and his CV has more information. You can contact Rob for more information.