Cross-Assembly of Metagenomes
Welcome to crAss, the web tool for comparative metagenomics using cross-assembly. Before running crAss, you must combine your metagenomic reads into a single assembly using your favorite de novo assembly tool. Then, upload the assembly ACE file and the original reads files (one per data set) below. For each data set you can specify a unique name to be displayed in the output figure.
For an explanation how to use this web server click the help logo at the top right hand corner of the page. For the 3D example, please enter Job ID: 1329506771. For the 15D example, please enter Job ID: 1329505996. Alternatively, you can download example data from: http://sf.net/projects/crass/files/example_data/.
Please use the SEQanswers forum (http://seqanswers.com/forums/showthread.php?p=65129) to share your experiences with crAss.
Please add your files, click "Start upload", specify the labels, and click "Run crAss".
Note that input files can be compressed with the GZIP or ZIP algorithm. Please ensure that the file name includes the file
type (e.g. file.ace.zip or file.fasta.gz).
If you want to reduce the size of the files to upload or want to submit the data without providing any sequences, please read
the help page.
Compressed inputs: There will be a delay for the status check due to extracting the files, which may take a few minutes. Please be patient.
Metagenomes are often characterized by high levels of unknown reads with no similarity to any sequences in Genbank. Although these are often discarded from analysis, they contain a wealth of information for comparative metagenomics.
crAss is a tool that enables fast and intuitive analysis of complete metagenomic data sets by counting the number of shared contigs between samples in a cross-assembly of all reads.
Prior to running crAss, you will need to combine all your metagenomic data sets into a single cross-assembly using your favorite de novo assembly tool. Note that the read identifiers need to be unique across all the data set files, otherwise crAss will not be able to recognize to which data set a read belongs when reading the cross-assembly ACE file.
The input to the web server includes the cross-assembly ACE file and the individual read files in FASTA or FASTQ format.
Upload the files by clicking "Add ACE / FASTA file" and selecting the appropriate files on your computer.
Note that the input files can be zipped (using GZIP or ZIP) which might be useful if you have a slow internet connection, but the original file extensions (for example .fasta or .ace) have to be included in the file name.
To start the upload, click "Start upload" or the arrow button.
The progress bar indicates that the files are uploading...
On the next screen, you have the option of changing the names of the files for each data set.
It is important that every data set has a unique name!
Note that the name of the ACE file does not need to be changed, as it is not one of the primary data sets.
Deleting a file is easy with the trash button, or by clicking the checkboxes and clicking "Delete selected" at the top of the frame.
Start the process by clicking "Run crAss" at the bottom.
You may need to wait until your zipped files have been extracted.
While crAss is running, the page will refresh automatically until the process is done.
Ususally, this will take less than a minute.
During this time, crAss extracts all contigs, calculates pair-wise distances between the metagenomes and builds an output image to visualize your data.
The first thing to notice after the run is done, is your Job ID.
With this Job ID, you can always recover your results at a later time just by entering it on the crAss home page.
Just enter it in the "Job ID" field and click "Show results".
You can download the names you gave to each of the data sets for future reference.
The first output of crAss is a file that contains a single line for each contig and shows how many reads from each data set it contains. This file also lists the unassembled reads.
The second thing crAss delivers is set of a symmetrical distance matrices. As explained in our paper, there are three different distance formulas: "SHOT", "minimum" and "Wootters". The corresponding distance matrices are visualized on the output page and can be downloaded as a .txt file.
The third output file is an image that displays the similarities between metagenomes.
If you included 4 or more data sets in the cross-assembly, the output image will be a cladogram built from the distance matrix using BioNJ (Gascuel, 1997).
You can choose to display or ignore the branch lengths in the cladogram.
Also, you can download the Phylip bracketnotation for use in other programs.
If you included 2 or 3 data sets in the cross-assembly, a cladogram is not meaningful so the output image will be a 2D or 3D graph.
Each dot represents a contig, and the X/Y/Z coordinates show how many reads from each data set were included in that contig.
If you use crAss in your research, please cite:
Bas E. Dutilh, Robert Schmieder, Jim Nulton, Ben Felts, Peter Salamon, Robert A. Edwards and John L. Mokili,
"Reference-independent comparative metagenomics using cross-assembly: crAss", Bioinformatics 2012.
Download: http://sf.net/p/crass | PubMed: 23074261 | Contact: http://www.cmbi.ru.nl/~dutilh/