GenomePeek allows for the quick and simple analysis of both single genome and metagenome sequencing files. GenomePeek uses a sequence assembly approach; where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be much faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

The four genes currently analyzed by GenomePeek are: 16S, recA, rpoB, and groEL. Reads from the input file(s) that hit1 to these four genes are extracted2. Each set of reads are assembled3, and the contigs (and singlets) are aligned to a representative subject sequence (red). The different contigs are color coded by taxon, which is determined by a frame-shift correcting protein alignment4, and ambigous hits refined by a nucleotide alignment5.

An abundance distribution is calculated, by normalizing the contigs by the number of reads, and then summing up for each taxa at the species level. If a contig hits unambiguosly to more than one taxa, then the plot goes up one taxonomic level, however the actual species that the contig hit to is still available in the raw data/download link. A piechart distribution graph is then created and displayed on the results page, with the taxa color coded to match the previous alignments.

The paper for Geneompeek is available at PeerJ:

1. BLAT      2. seqtk      3. CAP3      4. BLASTX      5. MEGABLAST