Author Archives: Rob Edwards

snakemake tutorial

There are a lot of snakemake tutorials out there to get you started:

This tutorial is not one of those! It is quick, hands-on, and we’ll jump right in with little explanation (OK, there will be some). But I encourage you to read those tutorials, especially Titus’s tutorials as you will learn a lot from them.

Continue reading

New features in CentOS8

We have just made the transition of most of the servers from CentOS6 or CentOS7 to CentOS8. Most everything should be unified on CentOS8 (unless you know what you are doing). 

This brings several new changes (as always) and some added benefits. This is a summary and does not reflect all the changes.

To check your servers operating system version, use this command:

cat /etc/redhat-release

Software Installs

The biggest changes should allow you to install software by yourself! There are two different ways you can install easily install software if either are supported by whatever you are trying to install.

Please note, that if you do not want to do either of these, it is fine. Just let me know and I am happy to install software for you (and everyone else) to use.

Conda

A lot of bioinformatics software is now available via conda. It is installed globally, but you can not install packages globally. You can create your own environment and then use that. 

The first time you use conda, you will need to create a local environment. Start with:

source /usr/local/anaconda3/bin/activate
conda create --name <username>

But use your username instead of <username>!

After this has run, any time you need to use conda, you can use the command

conda activate <username>

And you will get into your environment. 

A simple test is to install my fastq-pair package and see if it works:

conda install -c bioconda fastq-pair

once it has installed, this command should give some output

fastq-pair

Docker

Another popular way of sharing software is by using docker. We don’t support docker, but we support a drop-in replacement called podman.

Anywhere you see docker, you can use podman instead. For example, we created a focus docker image for the cami challenge described here: https://hub.docker.com/r/linsalrob/cami-focus and you can install that with

podman pull linsalrob/cami-focus

pip

If you are trying to run some python code and don’t have the appropriate library, you should be able to use pip install as a user to add it. For example:

pip3 install --user xmlschema

this will install the appropriate libraries into your account. Of course, if you want them globally installed, just let me know.

Deprecated software and alternatives

DeprecatedAlternateUsed ForAlternative
screentmuxVirtual terminals. You should use this!tmux has similar keys to screen but uses ctrl-b instead of ctrl-a to access them. eg. create a new window: “ctrl-b n
cd-hitmmseqsClustering sequencescd-hit is still an option if you want, but mmseqs2 appears to be much better

Download a genome and remove the ribosomal RNA operon

For our search SRA engine, we want to remove the ribosomal RNA operon (not just the 16S gene, the whole opeon) before we run the search, otherwise all our hits are to the rRNA genes!

Here’s who you can use PATRIC to download a genome and remove the 16S region. For the example, we’re going to use a Faecalibacterium prausnitzii genome, because, well why not!

First, we download the genome and convert the GTO to fasta

p3-gto 657322.3
rast-export-genome -i 657322.3.gto contig_fasta > 657322.3.fna

Next, we use a couple of helper scripts from the EdwardsLab Git Repo. We start by converting the gto to a tab separated file with features and their locations

python3.7 ~/EdwardsLab/patric/parse_gto.py -f 657322.3.gto -p > 657322.3.tab

Then we can grep through that file for the ribosomal genes:

grep rna 657322.3.tab | grep Subunit

We only find two of the genes:

fig|657322.3.rna.5      Large Subunit Ribosomal RNA; lsuRNA; LSU rRNA   FP929046 586941 - 589785 (-)

fig|657322.3.rna.6      Small Subunit Ribosomal RNA; ssuRNA; SSU rRNA   FP929046 590567 - 591540 (-)

Now we can trim out the sequences and keep only the non-rRNA regions. Note that here I trim a little extra off the sequences, but you may not wish to do that

python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -e 576941 -c FP929046 > FP929046.fna
python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -b 601540 -c FP929046 >> FP929046.fna

We run this twice, which is suboptimal, but this is definitely not the most computationally challenging thing we will do with those sequences!