Author Archives: Geni

FOCUS: analyzing metagenomic (big) data in seconds using k-mers and a optimization method

Hi, I’m Genivaldo, but you can call me Geni. I’m a PhD student in the Computational Science Research Center (CSRC) in the joint program between San Diego State University (SDSU) and Claremont Graduate University (CGU), working with Dr. Rob Edwards. My research focus is developing agile methods to analyze metagenomics data. This blog post describes my experience creating models/tools to profile metagenomes, and in particular, how and why I created FOCUS (Silva et al., 2014).
Ever since I started my college studies I don’t like taking classes; mainly because they were never applied to my research or because my professors would rather give the students a test than a project. As part of the CSRC program, I have to take an extensive list of math classes, which were tough to me because I come from a computer science undergrad background. Lately I have tried to apply what I have learned at these math classes to computational biology to speed up the analyses of metagenomics data. It makes the classes more fun and lets me apply my new ideas and skills.
In the fall 2013, I was taking a mathematical modeling class where I learned a modeling method called Monte Carlo Simulations which led me to and spent part of my research time in 2013 trying to develop a model using what I learned in the class to try understanding which organisms were present in a given metagenomic sample. PHACCS (Angly et al., 2005) showed that metagenomes normally follow the power law distribution, so basically I was trying to re-sample a number of k-mer compositions from a random number of organisms, apply the power law distribution to it, and compute the distance between the random k-mer frequencies. Rob had talked a lot about this with Chris Quince, and they thought that it would be a good way to analyze metagenomes. If we repeat it thousands of time, the program predicts the organisms that are most probably in the metagenomic sample, and their abundances of those organisms. The program was tested with simulated data, and it worked. However, it was slow and did not really work for real data; it would only predict the organisms that were most abundant (which is nice for some applications, but we knew we would be criticized by the reviewers), and it ended up as a pre-print (Silva, Dutilh & Edwards, 2014). It is always interesting to create a new method, but  better is to develop a method that is fast; you know, we have enough tools to analyze metagenomes which are slow. Rob and Chris’ idea wasn’t so great.
OK! I got an A in the math modeling class, I applied it to my research, but I felt like I developed something which worked, but was not really useful in the real world. In the same semester, I was taking a class which focused on Computational Optimization methods, and I started to be interested in the topics which lead me to learn about non negative least squares (NNLS). Everybody knows about Least Squares (LS), right? A method which minimizes the function f(x) =||Ax-b||; however in the LS is an unconstrained method where the vector x may have negative values. The NNLS approach fits in metagenomic analysis because organisms only positive abundance; I haven’t heard someone saying “My sample is represented by -3% of Salmonella“. The idea is simple: for the function f(x) =||Ax-b||; A is going to be the k-mer frequencies from all known complete genomes, b is going to be the k-mer frequency from the user input data (their metagenome), and x is the optimal set of abundances for each organism in the database present in the sample constrained only to values >=0.
We named the tool by FOCUS which means Find Organisms by Composition USage, and it was my project for the optimization class which guaranteed me an A again. FOCUS is fast! It takes about 40 seconds to analyze huge metagenomics dataset, and the good news are that the results are similar to great tools such as MetaPhlAn (Segata et al., 2012). In the discussion part of the paper, we compared almost half (256 GB of data) of the Human Microbiome Project (HMP) (Consortium, 2012), and the tool took ~ 2 hours to profile all the data. FOCUS was submitted to PeerJ, and after a few rounds of revisions, it was accepted and published. You might be asking yourself “why is FOCUS so fast?” Well… calculating 7-mers (k-mer of frequency 7) can be done super fast by Jellyfish (Marçais & Kingsford, 2011), Turtle (Roy, Bhattacharya & Schliep, 2014) or Khmer (Zhang et al., 2014). Moreover, scipy has a fortan wrapper for the NNLS algorithm.
Now I am taking two other math classes at CGU (Advanced Numerical Analysis and Statistical Linear Models), and definitely gaining some new ideas and will come out from these classes and apply them in my research. Furthermore, I am working on a novel method to profile “what they are doing?” where I am using the SEED database to show the subsystems presents in a given input. The new tool is named SUPER-FOCUS, and that is all I can tell you for now, but I promise that it is going to be way faster than MG-RAST (Meyer et al., 2008), and MEGAN (Huson et al., 2007) (even when MEGAN  uses PAUDA (Huson & Xie, 2013)).

For more information about FOCUS read the paper.

References

Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P, Felts B, Nulton J, Mahaffy J, Rohwer F. 2005. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics 6:41.
Consortium THMP. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214.
Huson DH, Auch AF, Qi J, Schuster SC. 2007. MEGAN analysis of metagenomic data. Genome Research 17:377–386.
Huson DH, Xie C. 2013. A poor man’s BLASTX – high-throughput metagenomic protein database search using PAUDA. Bioinformatics:btt254.
Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–770.
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A et al. 2008. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386.
Roy RS, Bhattacharya D, Schliep A. 2014. Turtle: Identifying frequent k-mers with cache-efficient algorithms. Bioinformatics:btu132.
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. 2012. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9:811–814.
Silva GGZ, Cuevas DA, Dutilh BE, Edwards RA. 2014. FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares. PeerJ 2:e425.

Silva GGZ, Dutilh BE, Edwards RA. 2014. FORMAL: A model to identify organisms present in metagenomes using Monte Carlo Simulation. bioRxiv:010801.
Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT. 2014. These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure. PLoS ONE 9:e101271.

 

SGE Array Jobs

How to create an array job for the cluster using SGE

 

Often you have a lot of jobs that are all the same. For example, if you want to blast a series of files against the same database. Here is how to make an array job

First, you need to know about environment variables. In an array job, the environment variable $SGE_TASK_ID is set to a unique number in a range that you define, and is incremented as you define it.

To submit an array job, we use the -t flag in our qsub command:

This will submit an array job where $SGE_TASK_ID is set to every number from one to one hundred and is incremented by one: qsub -t 1-100:1    

This will submit an array job where $SGE_TASK_ID is set to every number from one to one thousand and is incremented by ten: qsub -t 1-1000:10

The range can be any set of numbers you define. There is an upper limit of 75000 jobs in a single array job, but you can submit a second array job with numbers 75001 onwards.

Now all you need is a script that processes your files and runs them. There are several ways to do this. One approach is to number all of your input files, then in your script you can replace the number with $SGE_TASK_ID:

#!/bin/bash
blastn -in $SGE_TASK_ID.fasta -db nr -o $SGE_TASK_ID.blast

You can also list all the files that you want to process and use head and tail 

#!/bin/bash
input=$(head -n $SGE_TASK_ID file_of_files | tail -n 1)
blastn -in $input -db nr -out $input.blast

Another way to do it is to have  a file with all the commands and use head and tail to get a specific command:

#!/bin/bash
cmd=$(head -n $SGE_TASK_ID file_of_commands | tail -n 1)
./$cmd

NOTE: All of these examples use bash. You should be sure to include -S /bin/bash in your qsub command to make sure that they run with the bash shell.

Atlas Scientific Raspberry PIs

We use the Atlas Scientific probes to measure all kinds of things. One of the setups is using Raspberry PI and Plotly and is based on this instructables post. After the read-more I have distilled the essential steps!

 

Boot up and log into your PI.

Start by disabling the getty on the serial line, by commenting out the T0:23 line at the end of inittab:

vi /etc/inittab
#Spawn a getty on Raspberry Pi serial line
#T0:23:respawn:/sbin/getty -L ttyAMA0 115200 vt100

Install some core modules:

sudo apt-get install python-serial git-core python-pip
sudo pip install rpi.gpio plotly

Then reboot the machine.

Grab the atlas scientific python library, (git clone https://github.com/plotly/atlas-scientific.git) and edit atlas-pi.py to include your API key, streaming token, and username from plotly.

Wire up the pi to the BNC connector

  1. Connect Ground on the Atlas stamp to Ground on the Pi Cobbler
  2. Connect VCC on the Atlas stamp to 5V on the Pi Cobbler
  3. Connect RX on the Atlas circuit to TX on the Pi Cobbler
  4. Connect TX on the Atlas circuit to RX on the Pi Cobbler

You should be good to go now.

Note that Rob’s image: RaspberryPiAtlasScientific20140531.img has all of this already done!

argparse: Python’s command line parsing module

Although GUIs and web pages are great ways for users to interact with our tools and software, the command line interface is still a prevalent medium for executing scripts in the bioinformatics field. One of the ways that we can make command line scripts more interactive with users is to include capabilities for options, flags, and arguments in our code. These allow users to change the behavior of the script, i.e., input values and input format, file output format and nomenclature, algorithm values and thresholds, status updates, and more. Before really diving into Python, C-style argument parsing was the implementation I was most familiar with, such as the getopt Python module or Getopt Perl module, but it does not follow the object-oriented style that languages like Python are most known for. I usually spent two or three dozen lines of code implementing the function and writing out a usage help message. I recently came across the argparse module and felt that this is exactly what I was looking for. It took away much of the manual programming and simplifies the process. Here I’ll explain a short tutorial with a few simple cases on how to use argparse and the benefits I found from using it.

USING REQUIRED ARGUMENTS

A simple example to show is a program that takes in a file and prints out its name.

import argparse parser = argparse.ArgumentParser()
# Initiate argument parser object
# Add input file argument with help message
parser.add_argument(‘infile’, help=‘Input file to print out’)
args = parser.parse_args() print ‘The filename is {}’.format(args.infile)

When we run the command without giving a filename, the following help message appears:

$ python sample.py
usage: sample.py [-h] infile
sample.py: error: too few arguments

We can then run the script with the -h flag to get a full help message:

$ python sample.py -h
usage: sample.py [-h] infile

positional arguments:
infile       Input file
optional arguments:
-h, –help   show this help message and exit

Here, we can see the positional (required) arguments listed along with the help message that we wrote. What is great is that we did not need to manually code the help message ourselves. The argparse object contains methods to format and print out the help message whenever there was a problem with the script during the argument parsing.

We can successfully run the code as so:

$ python sample.py test_file.fasta
The filename is test_file.fasta

 

USING OPTIONAL ARGUMENTS

Optional arguments, like flags, are also essential in many programs and the argparse module supports these.

Here, we’ll add the option to print out the number of lines in the file:

import argparse
parser = argparse.ArgumentParser()  # Initiate argument parser object

Add input file argument with help message

parser.add_argument(‘infile’, help=’Input file to print out’)

Add line count optional argument

parser.add_argument(‘—-linecount’, help=’Printout number of lines in file’,
action=’store_true’)

args = parser.parse_args()  # Call command line parser method
print ‘The filename is {}’.format(args.infile)

Check if the linecount flag was raised

if args.linecount:
with open(args.infile) as f:
numLines = len(f.readlines())
print ‘Number of lines: {}’.format(numLines)

To explain what I’ve added, we can see that the new argument includes a double hyphen ‘–‘ before the name. This will let the parser know that this is not a required or positional argument. I also added the action=‘store_true’ option to this line. This will let the parser know that it will store True for the variable args.linecount and False if the user does not include the flag. The default behavior for action is to accept an argument value after the flag.

We can run the script with the help flag to get new information:

$ python sample.py -h
usage: sample.py [-h] [–linecount] infile

positional arguments:
infile       Input file to print out

optional arguments:
-h, –help   show this help message and exit
–linecount  Printout number of lines in file
$ python sample.py test_file.fasta
The filename is test_file.fasta
$
$ python sample.py test_file.fasta –linecount
The filename is test_file.fasta
Number of lines: 4
$
$ python sample.py –linecount test_file.fasta
The filename is test_file.fasta
Number of lines: 4

We can see here that the new help message includes the –linecount flag and its help message. I then run the script without the flag and it completes successfully. Finally, I include the flag in the command, one case where I include it before the filename and one case after the filename. I did this to show that the order of the optional arguments does not matter.

We can add short arguments to the code because some users prefer them over long arguments. Changing that one line of code will give us:

import argparse
parser = argparse.ArgumentParser()  # Initiate argument parser object

Add input file argument with help message

parser.add_argument(‘infile’, help=’Input file to print out’)

Add line count optional argument

parser.add_argument(‘-c’, ‘–linecount’,
help=’Printout number of lines in file’,
action=’store_true’)

args = parser.parse_args()  # Call command line parser method
print ‘The filename is {}’.format(args.infile)

Check if the linecount flag was raised

if args.linecount:
with open(args.infile) as f:
numLines = len(f.readlines())
print ‘Number of lines: {}’.format(numLines)

The new short argument is prepended with a single hyphen. Running the script gives us the output:

$ python sample.py -h
usage: sample.py [-h] [-c] infile

positional arguments:
infile           Input file to print out

optional arguments:
-h, –help       show this help message and exit
-c, –linecount  Printout number of lines in file
$ python sample.py test_file.fasta -c
The filename is test_file.fasta
Number of lines: 4

One thing to notice, the -c is shown in the usage line at the top of the help message because we put this as the first argument in the parser.add_argument() line. If we put the -c option after –linecount then the long argument would have shown up in the usage line. The order would also have been flipped under the optional arguments section.

 

To conclude, the argparse module handles much of the work of parsing command line arguments and formatting help and usage messages. There are other functions that argparse supplies programmers with that I did not go over here, such as type checking, limited choices for arguments, and argument counting. These can be further explained in the tutorial link below. This covers all of what I presented here and more.

More in depth tutorial @ http://goo.gl/Y4CsIH

PeerJ CSL for Zotero

I have begun to use Zotero as my publication reference manager and I’ve created a new style sheet that matches what PeerJ requires for citations and references in their journal. What I did was took the APA citation style file and modified it to fit the PeerJ citation format. Please note, the style sheet only accounts for publication references and have not been modified for other types of references (webpages, books, book excerpts, etc.). See past the Read More break to find the XML code to insert into your PeerJ CSL file.


Copy the following text below into your favorite text editor and save it as peerj.csl. You can then open Zotero, go to Options and view the Citation tab. There you will see a Styles tab where you can add a new style. Here you can upload your peerj.csl file.

If you want to download the CSL file, you can find it here: http://edwards.sdsu.edu/~dcuevas/peerj.csl.txt
Other styles can be found on Zotero’s website: http://www.zotero.org/styles/

As I use the PeerJ style more often and find bugs in my modified citation format, I’ll continue to update it here. Enjoy!

<?xml version=”1.0″ encoding=”utf-8″?>
<style xmlns=”http://purl.org/net/xbiblio/csl” class=”in-text” version=”1.0″ demote-non-dropping-particle=”never”>
<info>
<title>PeerJ</title>
<id>http://www.zotero.org/styles/apa-dc_modified</id>
<author>
<name>Daniel Cuevas</name>
</author>
<category citation-format=”author-date”/>
<category field=”psychology”/>
<category field=”generic-base”/>
<updated>2013-06-11T16:45:00+00:00</updated>
<rights license=”http://creativecommons.org/licenses/by-sa/3.0/”>This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
</info>
<locale xml:lang=”en”>
<terms>
<term name=”editortranslator” form=”short”>
<single>ed. &amp; trans.</single>
<multiple>eds. &amp; trans.</multiple>
</term>
<term name=”translator” form=”short”>
<single>trans.</single>
<multiple>trans.</multiple>
</term>
</terms>
</locale>
<macro name=”container-contributors”>
<choose>
<if type=”chapter paper-conference” match=”any”>
<names variable=”editor translator” delimiter=”, ” suffix=”, “>
<name and=”symbol” initialize-with=”. ” delimiter=”, “/>
<label form=”short” prefix=” (” text-case=”title” suffix=”)”/>
</names>
</if>
</choose>
</macro>
<macro name=”secondary-contributors”>
<choose>
<if type=”chapter paper-conference” match=”none”>
<names variable=”translator editor” delimiter=”, ” prefix=” (” suffix=”)”>
<name and=”symbol” initialize-with=”. ” delimiter=”, “/>
<label form=”short” prefix=”, ” text-case=”title” suffix=””/>
</names>
</if>
</choose>
</macro>
<macro name=”author”>
<names variable=”author”>
<name name-as-sort-order=”all” and=”symbol” sort-separator=”, ” initialize-with=”. ” delimiter=”, ” delimiter-precedes-last=”always”/>
<label form=”short” prefix=” (” suffix=”)” text-case=”capitalize-first”/>
<substitute>
<names variable=”editor”/>
<names variable=”translator”/>
<choose>
<if type=”report”>
<text variable=”publisher”/>
<text macro=”title”/>
</if>
<else>
<text macro=”title”/>
</else>
</choose>
</substitute>
</names>
</macro>
<macro name=”author-short”>
<names variable=”author”>
<name form=”short” and=”symbol” delimiter=”, ” initialize-with=”. “/>
<substitute>
<names variable=”editor”/>
<names variable=”translator”/>
<choose>
<if type=”report”>
<text variable=”publisher”/>
<text variable=”title” form=”short” font-style=”italic”/>
</if>
<else-if type=”bill book graphic legal_case legislation motion_picture song” match=”any”>
<text variable=”title” form=”short” font-style=”italic”/>
</else-if>
<else>
<text variable=”title” form=”short” quotes=”true”/>
</else>
</choose>
</substitute>
</names>
</macro>
<macro name=”access”>
<choose>
<if type=”thesis”>
<choose>
<if variable=”archive” match=”any”>
<group>
<text term=”retrieved” text-case=”capitalize-first” suffix=” “/>
<text term=”from” suffix=” “/>
<text variable=”archive” suffix=”.”/>
<text variable=”archive_location” prefix=” (” suffix=”)”/>
</group>
</if>
<else>
<group>
<text term=”retrieved” text-case=”capitalize-first” suffix=” “/>
<text term=”from” suffix=” “/>
<text variable=”URL”/>
</group>
</else>
</choose>
</if>
<else>
<choose>
<if variable=”DOI”>
<text variable=”DOI” prefix=”doi:”/>
</if>
<else>
<choose>
<if type=”webpage”>
<group delimiter=” “>
<text term=”retrieved” text-case=”capitalize-first” suffix=” “/>
<group>
<date variable=”accessed” form=”text” suffix=”, “/>
</group>
<text term=”from”/>
<text variable=”URL”/>
</group>
</if>
<else>
<group>
<text term=”retrieved” text-case=”capitalize-first” suffix=” “/>
<text term=”from” suffix=” “/>
<text variable=”URL”/>
</group>
</else>
</choose>
</else>
</choose>
</else>
</choose>
</macro>
<macro name=”title”>
<choose>
<if type=”report thesis” match=”any”>
<text variable=”title” font-style=”italic”/>
<group prefix=” (” suffix=”)” delimiter=” “>
<text variable=”genre”/>
<text variable=”number” prefix=”No. “/>
</group>
</if>
<else>
<text variable=”title” font-style=”italic”/>
</else>
</choose>
</macro>
<macro name=”publisher”>
<choose>
<if type=”report” match=”any”>
<group delimiter=”: “>
<text variable=”publisher-place”/>
<text variable=”publisher”/>
</group>
</if>
<else-if type=”thesis” match=”any”>
<group delimiter=”, “>
<text variable=”publisher”/>
<text variable=”publisher-place”/>
</group>
</else-if>
<else>
<group delimiter=”, “>
<choose>
<if variable=”event” match=”none”>
<text variable=”genre”/>
</if>
</choose>
<choose>
<if type=”article-journal article-magazine” match=”none”>
<group delimiter=”: “>
<text variable=”publisher-place”/>
<text variable=”publisher”/>
</group>
</if>
</choose>
</group>
</else>
</choose>
</macro>
<macro name=”event”>
<choose>
<if variable=”event”>
<choose>
<if variable=”genre” match=”none”>
<text term=”presented at” text-case=”capitalize-first” suffix=” “/>
<text variable=”event”/>
</if>
<else>
<group delimiter=” “>
<text variable=”genre” text-case=”capitalize-first”/>
<text term=”presented at”/>
<text variable=”event”/>
</group>
</else>
</choose>
</if>
</choose>
</macro>
<macro name=”issued”>
<choose>
<if type=”bill legal_case legislation” match=”none”>
<choose>
<if variable=”issued”>
<group prefix=” (” suffix=”)”>
<date variable=”issued”>
<date-part name=”year”/>
</date>
<text variable=”year-suffix”/>
<choose>
<if type=”article-journal bill book chapter graphic legal_case legislation motion_picture paper-conference report song” match=”none”>
<date variable=”issued”>
<date-part prefix=”, ” name=”month”/>
<date-part prefix=” ” name=”day”/>
</date>
</if>
</choose>
</group>
</if>
<else>
<group prefix=” (” suffix=”)”>
<text term=”no date” form=”short”/>
<text variable=”year-suffix” prefix=”-“/>
</group>
</else>
</choose>
</if>
</choose>
</macro>
<macro name=”issued-sort”>
<choose>
<if type=”article-journal bill book chapter graphic legal_case legislation motion_picture paper-conference report song” match=”none”>
<date variable=”issued”>
<date-part name=”year”/>
<date-part name=”month”/>
<date-part name=”day”/>
</date>
</if>
<else>
<date variable=”issued”>
<date-part name=”year”/>
</date>
</else>
</choose>
</macro>
<macro name=”issued-year”>
<choose>
<if variable=”issued”>
<date variable=”issued”>
<date-part name=”year”/>
</date>
<text variable=”year-suffix”/>
</if>
<else>
<text term=”no date” form=”short”/>
<text variable=”year-suffix” prefix=”-“/>
</else>
</choose>
</macro>
<macro name=”edition”>
<choose>
<if is-numeric=”edition”>
<group delimiter=” “>
<number variable=”edition” form=”ordinal”/>
<text term=”edition” form=”short”/>
</group>
</if>
<else>
<text variable=”edition” suffix=”.”/>
</else>
</choose>
</macro>
<macro name=”locators”>
<choose>
<if type=”article-journal article-magazine” match=”any”>
<group prefix=” ” delimiter=”:”>
<group>
<text variable=”volume”/>
<text variable=”issue” prefix=”(” suffix=”)”/>
</group>
<text variable=”page”/>
</group>
</if>
<else-if type=”article-newspaper”>
<group delimiter=” ” prefix=”, “>
<label variable=”page” form=”short”/>
<text variable=”page”/>
</group>
</else-if>
<else-if type=”book graphic motion_picture report song chapter paper-conference” match=”any”>
<group prefix=” (” suffix=”)” delimiter=”, “>
<text macro=”edition”/>
<group>
<text term=”volume” form=”short” plural=”true” text-case=”capitalize-first” suffix=” “/>
<number variable=”number-of-volumes” form=”numeric” prefix=”1-“/>
</group>
<group>
<text term=”volume” form=”short” text-case=”capitalize-first” suffix=” “/>
<number variable=”volume” form=”numeric”/>
</group>
<group>
<label variable=”page” form=”short” suffix=” “/>
<text variable=”page”/>
</group>
</group>
</else-if>
<else-if type=”legal_case”>
<group prefix=” (” suffix=”)” delimiter=” “>
<text variable=”authority”/>
<date variable=”issued” form=”text”/>
</group>
</else-if>
<else-if type=”bill legislation” match=”any”>
<date variable=”issued” prefix=” (” suffix=”)”>
<date-part name=”year”/>
</date>
</else-if>
</choose>
</macro>
<macro name=”citation-locator”>
<group>
<choose>
<if locator=”chapter”>
<label variable=”locator” form=”long” text-case=”capitalize-first”/>
</if>
<else>
<label variable=”locator” form=”short”/>
</else>
</choose>
<text variable=”locator” prefix=” “/>
</group>
</macro>
<macro name=”container”>
<group>
<choose>
<if type=”chapter paper-conference entry-encyclopedia” match=”any”>
<text term=”in” text-case=”capitalize-first” suffix=” “/>
</if>
</choose>
<text macro=”container-contributors”/>
<text macro=”secondary-contributors”/>
<text macro=”container-title”/>
</group>
</macro>
<macro name=”container-title”>
<choose>
<if type=”bill legal_case legislation” match=”none”>
<text variable=”container-title”/>
</if>
<else>
<group delimiter=” ” prefix=”, “>
<choose>
<if variable=”container-title”>
<text variable=”volume”/>
<text variable=”container-title”/>
<group delimiter=” “>
<!–change to label variable=”section” as that becomes available –>
<text term=”section” form=”symbol”/>
<text variable=”section”/>
</group>
<text variable=”page”/>
</if>
<else>
<choose>
<if type=”legal_case”>
<text variable=”number” prefix=”No. “/>
</if>
<else>
<text variable=”number” prefix=”Pub. L. No. “/>
<group delimiter=” “>
<!–change to label variable=”section” as that becomes available –>
<text term=”section” form=”symbol”/>
<text variable=”section”/>
</group>
</else>
</choose>
</else>
</choose>
</group>
</else>
</choose>
</macro>
<citation et-al-min=”4″ et-al-use-first=”1″ disambiguate-add-year-suffix=”true” disambiguate-add-names=”true” disambiguate-add-givenname=”true” collapse=”year” givenname-disambiguation-rule=”primary-name”>
<sort>
<key macro=”issued-sort”/>
<key macro=”author”/>
</sort>
<layout prefix=”(” suffix=”)” delimiter=”; “>
<group delimiter=”, “>
<text macro=”author-short”/>
<text macro=”issued-year”/>
<text macro=”citation-locator”/>
</group>
</layout>
</citation>
<bibliography hanging-indent=”false” et-al-min=”11″ et-al-use-first=”10″>
<sort>
<key macro=”author”/>
<key macro=”issued-sort” sort=”ascending”/>
</sort>
<layout>
<group suffix=”.”>
<group delimiter=”. “>
<text macro=”author”/>
<text macro=”issued-year”/>
<text macro=”title” prefix=” “/>
<text macro=”container”/>
</group>
<text macro=”locators”/>
<group delimiter=”, ” prefix=”. “>
<text macro=”event”/>
<text macro=”publisher”/>
</group>
</group>
</layout>
</bibliography>
</style>

Tips For Scientific Presentations

In the world of science, the communication of knowledge and discoveries drives its forward progress. This communication is seen through literature, publications, and even moreso in lectures and presentations. From my experience, presenting research in small settings, like lab meetings, to larger environments, like conferences, is one of the most effective ways to spread ideas and gain useful perspectives for your work. Communicating that work can sometimes be difficult and it is important to master this for future academic and industry-related purposes. Here are links to the PLOS Collections: Ten Simple Rules… series that could help those looking to improve their presentation skills.

Ten Simple Rules for Making Good Oral Presentations

http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0030077

 

Ten Simple Rules for a Good Poster Presentation

http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0030102

Installing MySQLdb module for Python 2.7 for Ubuntu 12.04

After spending a good 30 minutes trying to find a solution on why I couldn’t install the MySQLdb Python module, I finally found the answer. Click the Read More to see the 4 simple steps.

Previously, I already had MySQL installed on my machine. When I tried running:

sudo pip install MySQL-python

I was getting this error message:

Traceback (most recent call last):

File "<string>", line 14, in <module>

...

File "setup_posix.py", line 25, in mysql_config

   raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

 

I found the solution on this blog post here but I’ll list the steps here anyway.

  1. Be sure you have pip installed on your machine using this command:
    • sudo easy_install pip
  2. If you already have pip installed, it’d be a good idea to upgrade it now:
    • sudo pip install pip --upgrade
  3. Build the dependencies for python-mysqldb libraries:
    • sudo apt-get build-dep python-mysqldb
  4. Install the Python MySQL libraries:
    • sudo pip install MySQL-python

To make sure it is installed, run python:

$ python
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

>> import MySQLdb
>>

A successful install does not produce any error message after the import statement

Similarities between rRNAs and coding regions

After reading “Tripp H.J., et al. Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies.”, I had some additional questions that were not addressed:

How many unique genes in protein databases are similar to known rRNA sequence on the nucleotide level?
What genes will most likely be removed from metatranscriptomes when removing rRNA-like sequences?

If you had similar questions, then this site will give you some answers: http://edwards.sdsu.edu/rrnavsprot/

The detailed steps are descripted in the readme file linked on the site above. Using the default parameters, the most common genes with similarities to rRNAs (or misannotations) in both the RefSeq and SEED database are transposase.

The comparison was done between RefSeq or SEED and the rrnadb database from riboPicker (http://ribopicker.sourceforge.net/). Alternatively, SINA (http://www.arb-silva.de/aligner/sina-download/) could have been used in a similar approach.

Programming reference sheets

I want to share a programming reference website that does a great job at listing programming constructs and commands for the most popular programming languages: http://hyperpolyglot.org/

What I really like about this site is it displays how to do something in similar languages in a side-by-side format. For example, it groups together the family of interpreted languages (Perl, Python, PHP, Ruby: http://hyperpolyglot.org/scripting). If you want to see how to perform something like a square root function, it displays the function names to use and what modules are necessary for import. If you want to see what the syntax difference in a for loop between languages, this will show that as well.

Some of the groups they have listed are:

  • Interpreted languages (Perl, PHP, Python, Ruby)
  • C++ style languages (C++, Objective C, Java, C#)
  • Relational data languages (SQL, Awk, Pig)
  • Numerical analysis software (MATLAB, R, NumPy)

They also have pages for programming tools as well:

  • Unix shells (Bash, Dash, Ksh, Tcsh, Zsh)
  • Text mode editors (Vim, Emacs, Nano)
  • Version control (Git, SVN, CVS)
  • Multiplexers (Screen, Tmux)