Making maps is hard. Even though we’ve been making maps for hundreds of years, it is still hard. Making good looking maps is really hard. We published a map that is both beautiful and tells a story, and this is the story of how we made that map.
But a figure like this does not appear immediately, it takes work to get something to look this good, and needless to say it wasn’t me that made it look so great!
Our paper on the global phylogeography of crAssphage is published in Nature Microbiology. You can read the paper at the Nature Microbiology website or on ReadCube. The paper garnered international press attention, and here we have summarized the press coverage.
Please let Rob know if you are aware of any other reports that are not included here.
There is lots of crAssphage in the world, and there are lots of metagenomes in the sequence read archive. Can we find those metagenomes that do, or do not, have crAssphage in them in the SRA? Lets try…
We recently released a new version of our qudaich software, designed to compare short read sequence data sets to each other. Qudaich is built around a suffix trie and provides a rapid way to compare short read data sets at the DNA or protein level. Here is how to use qudaich to compare a set of metagenomes to find out how similar they are.
Yet again, analysis of a metagenomic sample shows that crAssphage is the most abundant phage anywhere. It also shows what a dis-service NCBI did to science by deleting the crAssphage record. We used meta-spades to reconstruct the entire crAssphage genome from someone else’s data set, but in their paper, the largest contig was ~3 kb. This analysis suggests that crAssphage is present in ulcerative colitis samples but the abundance goes down after treatment!
Getting data from the NCBI Sequence Read Archive is not easy. Here we combine a few of our posts to go step by step through getting the data.
Following up from the crAssphage press and comments Dan asked me the following question:
It was interesting to hear that there are 10 times as many viruses as bacteria in the body. If you have time to answer a question, I’ve always wondered about the relative biomass of bacteria compared to human cells, and now the relative biomass of viruses compared to human cells.
Inspired by XKCD’s what-if we can use some Fermi estimation to answer this. A typical virus is about 10-19 kg. (e.g. Adenovirus which is about 50kb is 2.5 x 10-19 kg ). A typical bacterium, like E. coli is about 10-15 kg, and a typical human cell is about 10-12 kg.
Scientists like to say that we have ~10x more bacteria than human cells and ~10x more viruses than bacteria. In the human body there are about 37 trillion cells  (37 x 1012, but since we are estimating we’ll round that to 1014) . Based on these estimates we have the average human weighs about 100 kg (1014 cells x 10-12 kg) in human cells, 1 kg in bacteria (1015 cells x 10-15 kg), and 0.001 kg in viruses (1016 viruses x 10-19 kg)