Our paper on the global phylogeography of crAssphage is published in Nature Microbiology. You can read the paper at the Nature Microbiology website or on ReadCube. The paper garnered international press attention, and here we have summarized the press coverage.
Please let Rob know if you are aware of any other reports that are not included here.
Which phage gene callers are people using?
We are curious how people annotate their phage genomes, and so we looked at both the genbank records and the literature.
I was recently looking at the relative orientation of phage terminase genes along the genome. Here is a little summary.
There is lots of crAssphage in the world, and there are lots of metagenomes in the sequence read archive. Can we find those metagenomes that do, or do not, have crAssphage in them in the SRA? Lets try…
We recently released a new version of our qudaich software, designed to compare short read sequence data sets to each other. Qudaich is built around a suffix trie and provides a rapid way to compare short read data sets at the DNA or protein level. Here is how to use qudaich to compare a set of metagenomes to find out how similar they are.
Yet again, analysis of a metagenomic sample shows that crAssphage is the most abundant phage anywhere. It also shows what a dis-service NCBI did to science by deleting the crAssphage record. We used meta-spades to reconstruct the entire crAssphage genome from someone else’s data set, but in their paper, the largest contig was ~3 kb. This analysis suggests that crAssphage is present in ulcerative colitis samples but the abundance goes down after treatment!
Getting data from the NCBI Sequence Read Archive is not easy. Here we combine a few of our posts to go step by step through getting the data.
It has been a while since the original phage proteomic tree paper came out (twelve years!), and we still don’t have a web based method for doing it.
However, here are the steps and code that we use to make the phage proteomic tree.
We are interested in phages — viruses that infect bacteria. For years the Edwards’ lab has been looking at new, undiscovered phages.
Recently, we identified the crAssphage, a new type of virus that has never been seen before. By looking at the sequences in metagenomes we were able to identify a set of contigs that were common among many different metagenomes. When we assembled them, they looked like a phage. We could compare them to other known phages in our database of sequences.
Working with folks in the biology department we proved that this is a circular virus by using PCR. However, we have so far been unable to culture the virus in vivo. We’re working on it, and hopefully others are too, but until that point we don’t have an image of the virus or an idea of what it does.
Following up from the crAssphage press and comments Dan asked me the following question:
It was interesting to hear that there are 10 times as many viruses as bacteria in the body. If you have time to answer a question, I’ve always wondered about the relative biomass of bacteria compared to human cells, and now the relative biomass of viruses compared to human cells.
Inspired by XKCD’s what-if we can use some Fermi estimation to answer this. A typical virus is about 10-19 kg. (e.g. Adenovirus which is about 50kb is 2.5 x 10-19 kg ). A typical bacterium, like E. coli is about 10-15 kg, and a typical human cell is about 10-12 kg.
Scientists like to say that we have ~10x more bacteria than human cells and ~10x more viruses than bacteria. In the human body there are about 37 trillion cells  (37 x 1012, but since we are estimating we’ll round that to 1014) . Based on these estimates we have the average human weighs about 100 kg (1014 cells x 10-12 kg) in human cells, 1 kg in bacteria (1015 cells x 10-15 kg), and 0.001 kg in viruses (1016 viruses x 10-19 kg)