Category Archives: Lab blog

Download a genome and remove the ribosomal RNA operon

For our search SRA engine, we want to remove the ribosomal RNA operon (not just the 16S gene, the whole opeon) before we run the search, otherwise all our hits are to the rRNA genes!

Here’s who you can use PATRIC to download a genome and remove the 16S region. For the example, we’re going to use a Faecalibacterium prausnitzii genome, because, well why not!

First, we download the genome and convert the GTO to fasta

p3-gto 657322.3
rast-export-genome -i 657322.3.gto contig_fasta > 657322.3.fna

Next, we use a couple of helper scripts from the EdwardsLab Git Repo. We start by converting the gto to a tab separated file with features and their locations

python3.7 ~/EdwardsLab/patric/parse_gto.py -f 657322.3.gto -p > 657322.3.tab

Then we can grep through that file for the ribosomal genes:

grep rna 657322.3.tab | grep Subunit

We only find two of the genes:

fig|657322.3.rna.5      Large Subunit Ribosomal RNA; lsuRNA; LSU rRNA   FP929046 586941 - 589785 (-)

fig|657322.3.rna.6      Small Subunit Ribosomal RNA; ssuRNA; SSU rRNA   FP929046 590567 - 591540 (-)

Now we can trim out the sequences and keep only the non-rRNA regions. Note that here I trim a little extra off the sequences, but you may not wish to do that

python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -e 576941 -c FP929046 > FP929046.fna
python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -b 601540 -c FP929046 >> FP929046.fna

We run this twice, which is suboptimal, but this is definitely not the most computationally challenging thing we will do with those sequences!

Connecting to an anvi’o server on tatabox

We use anvi’o for all sorts of ‘omics analysis, but it is a pain to run on your laptop as you can’t watch netflix and youtube, check facebook, and post to twitter at the same time (well, you can, but why would you?).

Instead, we have the latest version of anvi’o installed on tatabox, one of the machines in our HPC environment. After you have run all the anvi-commands, very often you want to launch anvi-interactive, but tatabox is safely behind a firewall. 

We can make a two step connection to tatabox using port tunneling. Depending on how you do this, you will need three terminals open.

First, start anvi-interactive on tatabox, and keep that window open (or use screen or tmux which are much better alternatives).

Next, open a terminal on your computer, and use this command (obviously changing USERNAME to your USERNAME).

ssh -L 5555:localhost:7008 -N -p 7010 USERNAME@edwards-data.sdsu.edu

Next, open another terminal (or if you are using screen or tmux, open a new terminal emulator), and login to edwards-data.sdsu.edu using your normal account (the USERNAME from above).

On edwards-data, run this command:

 ssh -L 7008:localhost:8080 -N USERNAME@tatabox

Finally, on your laptop, you should open a new browser window and paste this URL:

http://localhost:5555/

You should see the anvi-interactive interface appear, and you can get to work.

Press about Global Phylogeography of crAssphage

Our paper on the global phylogeography of crAssphage is published in Nature Microbiology. You can read the paper at the Nature Microbiology website or on ReadCube. The paper garnered international press attention, and here we have summarized the press coverage.

Please let Rob know if you are aware of any other reports that are not included here.

Continue reading

NSF Collaborators and Other Affiliations Information

For several years NSF ran a trial where they would ask for a conflicts form in excel-type format. Recently, that has been codified into the Collaborators and Other Affiliations Information form. You can find more information about that form at the NSF Website and the NSF GPG.

We developed a simple script to help complete this form for you. It does not do all the work, but it gets you a long way there, and you can do the rest a lot easier. After all, you have a lot of other things to worry about when you are writing that grant. Read more to see how to use it.

Continue reading

Edwards Lab @ SRS

We had a great turnout for the 2019 Student Research Symposium as usual, with everyone in the lab either presenting their work or judging the work of others. Congratulations to Dean’s award for Science winners Holly Norman and Ashelyn Lutrick for their presentation on “Analyzing the Presence, in Humans, of crAssphage: A Highly Abundant Bacteriophage Found Around the Globe”.

Here are Jillian, Melisssa, Shane, and Rob in front of Jillian and Shane’s posters.

command line deconseq

AKA: how to remove contamination from your metagenome! We use sharks genomes, but it works with humans, corals, and other things too!

A while ago we wrote deconseq to allow you to remove contamination from your sequence libraries. We used an HTS-mapper to map the reads in your sequences to your reference genome, and then filtered the sequences after mapping.

This is trivial to do with modern sequence analysis tools, and so we provide recipes here for filtering your reads based on matches to a reference genome. Read more to find out how!

Continue reading