edwardslab

  • Increase font size
  • Default font size
  • Decrease font size
Home Lab blog
Lab blog

Parsing Newick Trees

We often need to parse "newick" format phylogentic trees to figure out some information. Writing a parser is good for the soul, because the best way to do it is through recursion.

After the readmore, I provide some perl code for parsing newick phylogenetic trees into a lightweight data structure. Each node consists of an array of three things [left child, right child, and distance]. If the node is a leaf then the node consists of ["node", the node name, and the distance]. It allows for very easy analysis of the tree, and simple ways to get data back. I also provide some example code for printing out the root-to-tip distance of every leaf in the tree.

 

Read more...
 

Server admin day a success, again!

More space, and a drive that is not warning of impending doom! Here are some tips and tricks for updating the disk capacity of a server with minimal down time and even letting users know about it!

 

Read more...
 

Reverse complement function in PERL

it includes IUPAC consensus characters:

sub reverse_complement () {
my $new = $_[0];
$new =~ tr/acgtrymkbdhvACGTRYMKBDHV/tgcayrkmvhdbTGCAYRKMVHDB/;
$new = reverse ($new);
return ($new); }

 

 

Edwards Lab On TV!

Our recent expedition to the Abrolhos Islands off the coast of Brazil was featured on Good News on RedeTV! You can watch the full video on the RedeTV! website, or below. This show also includes an Ion Torrent, if you are watching carefully. The show is in two parts because you can't get all that corally goodness in just one segment.

Here are the shows on RedeTv's website: Part 1 and Part 2, and a local version is below.

 

Read more...
 

Mapping UniRef100 to PhAnToMe

UniRef100 is another non-redundant database. In this post, I describe how to map the UniRef100 proteins to the proteins in the phantome database and get the subsystems for each.

This is similar to the description of how to map things to the SEED using the SEED servers, but this time we'll download everything and do it locally.

 

Read more...
 

Real time metagenomics

A while ago, we developed the Real Time Metagenomics web site (aka metagenomics using k-mers) and related applications to allow rapid annotation of metagenomic sequences using the SEED subsystems. In this post we discuss how this works, and how you can use real time metagenomics, either through the web site or directly on your own computer to analyze your data.

 

Read more...
 

SEED to GO Mapping using the SEED servers

The SEED contains most complete microbial and phage genomes, and includes an ontology built by annotators for annotators. The SEED systems contain the most complete microbial annotations anywhere.

The Gene Ontology project (GO) aims to unify annotations, but has long had a focus on eukaryotes and has repeatedly ignored prokaryotes. Tired of building tables mapping SEED functions to GO functions, this post will show you how to do so using the SEED servers, so that you may update the comparison any time you like.

 

Read more...
 

Splitting Paired End Sequence Reads

In genome sequencing projects one of the things we often need to do is split paired end sequence reads into the two ends. Like everything, there is the simple way, and the correct way to do this. I'm not saying which one this is.

After the break, I provide a really brief introduction to the problem, and describe a simple software solution that allows you to parse a sequence library and identify the paird ends.

 

Read more...
 

Perl Smith Waterman script

The following is a small script to perform a Smith Waterman alignment, and to calculate the percent identity between two sequences.

 

Read more...
 

How To Fix File Permissions for Joomla

Yesterday Geni came to me with an interesting problem: He needed to update user pictures, but was unable to place them inside the member_pics folder. This was because the member_pics folder was a folder that had been created by a user via ssh, not by using joomla. The permissions on the folder were 775, so a non-owner non-group user could only read and execute from the folder.

 

Read more...
 

Next-Generation Sequencing Methods: A Summary

Being a computer scientist entering the bioinformatics field, I was always interested in learning more about DNA sequencing and how biologists actually perform sequencing. I learned about a few earlier methods, like Sanger sequencing, from my molecular biology course, but these new "next-generation" methods were being thrown around. So I did a bit of research and found a couple papers that summarized the more prominent next-generation methods. If you're a person in the position I was in a while back, then maybe reading these will give you a little more insight into these new sequencing methods. Note: you might need some biology knowledge to understand the terms and things described here.

Mardis, Elaine R. "Next-Generation DNA Sequencing Methods". Annual Reviews Genomics and Human Genetics. 2008. pdf

Ansorge, Wilhelm J. "Next-generation DNA sequencing techniques". New Biotechnology. April 2009. pdf

In both of these pieces of literature, they describe a little bit of history in DNA sequencing. Then they jump into the three primary next-generation platforms:

  1. Roche/454 FLX Pyrosequencer
  2. Illumina (Solexa) Genome Analyzer
  3. Applied Biosystems SOLiD Sequencer

The Helicos HeliScope platform is mentioned and described a bit in both as well.

I feel the Mardis paper has much nicer graphics and figures which complement the description of the different platforms. However, both have really great descriptions overall. Both also describe some applications for these next-generation techniques and future applications to come.

 
  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  5 
  •  6 
  •  7 
  •  8 
  •  9 
  •  10 
  •  Next 
  •  End 
  • »


Page 1 of 17