Research

Source Forge

FOCUS2: Ecological Implications of Metagenomics Data Analysis (under-review)

FOCUS2 first uses the FOCUS algorithm to generate a taxonomic profile of the entire metagenomic query dataset based on k-mer composition, then creates a reduced database of reference genomes using resampling, and finally aligns the individual query reads to the reduced database. We showed before that the database reduction can speed up metagenomic sequence classification. We have also added an optional output, FOCUS2(R), which individually aligns reads that fail to group into a k-mer profile to a reference database.

SUPER-FOCUS: A innovative tool for an agile functional analyze of metagenomic data (published)

SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced SEED database to report the subsystems present in metagenomic samples and profile their abundances. SUPER-FOCUS will use FOCUS (Silva et al., 2014) to predict the organisms present in the metagenomic sample and creates a reduced database containing only the subsystems present in the organisms present into the microbial community. The tool will be tested with both real and synthetic metagenomes, and the results used to test whether our approach accurately predicts the subsystems present in microbial communities.

FOCUS: An Alignment-Free Model To Identify Organisms In Metagenomes Using Non-Negative Least Squares Usage (published)

FOCUS, Find Organisms by Composition USage, an agile approach that reconstructs a taxonomic profile using an ensemble k-mer composition of the entire metagenome. We compute the optimal set of organism abundances using non-negative least squares (NNLS) to match the metagenome k-mer composition to organisms in a reference database and report the focal organisms present in metagenomic samples and profile their abundances. FOCUS was tested with simulated and over 250GB of real metagenomes, and the results show that our approach accurately predicts the organisms present in microbial communities in seconds.

Combining de novo and reference-guided assembly with scaffold_builder (published)
Software to order contigs generated by draft sequencing along a reference sequence. Gaps are filled with N’s and small overlaps are aligned with Muscle and the consensus created with IUPAC codes. Scaffold_builder can help in the assembly and annotation of genomes by revealing what is missing and allowing targeted sequencing to close those gaps.