Edwards Lab

Delivering the best in bioinformatics!

Font Size

Cpanel

Will NCBI ever update taxonomy data?

An annoying thing that keeps occurring while I'm trying to update phage metadata is that I'm heavily relying on NCBI:Taxonomy. Well, obviously I know I can't blame them since they post a not-so-funny disclaimer at the end of any record

Disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information.

However, it is really annoying and it reflects everything else in NCBI: so static, unlike everything else on the web in the past 5 years... Now to update the record of each of about 55 phages described as "unclassified" in NCBI, I have to spend anything between 10 - 100 minutes, and I may end up without getting an answer. Just take for example, Gifsy-1 and Gifsy-2, two of the most famous prophages of Salmonella. They are lambdoid phages, i.e. tailed siphoviruses. Yet, their NCBI records say: unclassified!

ICTV doesn't seem to be doing any better with individual viruses (see their latest list).

This is not why I started this post anyway, I'm trying to document the "evidence" behind my taxonomy udpates to the metadata table, because it would be too messy if I include these data in each cell of the Google Doc. However, maybe later we will come up with a 'taxonomy evidence' record as we have an annotation evidence one in the SEED database (called 'Feature evidence').

Metadata evidence:

  • Gifsy: from Salmonella: Methods and Protocols @ Google Books
  • Enterobacteria phage YYZ-2008: This one is tricky. BLASTN shows that its best matches are Enterobacteria phage 2851 (classified as Podoviridae) and Stx2-converting phage 1717 (classified as Siphoviridae)

The reference to number of phages on Earth

I have always taken (and used) for granted the 1031 number of phages in the planet. Normally, this is calculated from the estimation that there are 10 phages per prokaryotic cells, and the latter are estimated to be 1030. Usually the references to these numbers are: Jiang & Paul 1998, PMID 9687430 and Whitman 1998, PMID 9618454

Today I found what might be an older reference: Bergh et al. 1989, PMID 2755508, High abundance of viruses found in aquatic environments

Once I get access to the full-text paper ("thanks to" Nature's unwillingness to open even older articles), I can confirm the exact phage number as claimed in 1989.

If you know of a better (aka older) reference, feel free to share it.

This number (1031), by the way, can be read as: ten nonillions (by the US numbering system)

Using the SEED Servers

 

The Seed Servers are a newer way of accessing the SEED, but at the moment they are limited to the Argonne SEED. Rather than the SOAP based approach which is designed around a single call, the SEED servers are designed around sending larger chunks of data. If you are interested in using the SEED servers I have included a short demo in the read more.

Read more: Using the SEED Servers

Accessing the SDSU Seed

The SDSU Seed (aka Phantome Seed, phage seed) is a complete local seed install. I mainly update the phages on this (because it is the phage seed), but can update microbial genomes if you need. If you want a more up to date site with microbial genomes check out the SEED servers (and my separate blog post about using those).

In the read more I detail how to access the local SEED if you are interested.

 

Read more: Accessing the SDSU Seed

Annotate Metagenomes In An Instant

RTMg has been available a while, and we have done some pretty cool stuff with it [e.g. web, mobile, open social, and all publicly available metagenomes], but we need to enable others to play with it too. Now everyone can enjoy metagenome annotation in an instant. (Not the flavorless instant coffee type instant, but the rich and bountiful instant gratification type instant!). Don't believe me? Here is a video I made showing how to annotate a metagenome and create a pie chart of the data.

After the read more I'll show you how to do it too.

 

 

Read more: Annotate Metagenomes In An Instant

Dropdown Menus in Joomla

So, I did this maybe a week or two weeks ago, but it was a great victory for science and metagenomics, and I figured I'd post it here. I finally got dropdown menus working in Joomla, guys! Yayyyyyyy. If you go to phantome.org and mouse over the tools menu bar, you'll get the condensed effort of my AWESOME work. Really, all it boils down to was I needed to click a single button.

Building dropdown menus, or "submenus" in Joomla is not terribly advanced. You make an item in whatever menu you're working on, in this case the main menu. But instead of its parent being "Top" you make the parent an already existing item in the menu, in this case "Tools".

However, unless you use my SECRET TRICK, the menu will only ever appear if you're on the "Tools" page already. Which is dumb. People don't want a menu for what they're already looking at. So after a few hours of googling, downloading extensions and modules I didn't need, and then a lot of clicking, I found the answer!

Edit the module for your menu, in my specific case I go to Extensions->Module Manager-> Main Menu in the administrator backend. Then you click the simple little box labeled "Always Show Sub-menu Items", and save your changes. Voila! Success, and awesome. You now have infinite dropdown power. You can have dropdown menus, you can have dropdown menus inside those dropdown menus. Those menus can then shack up and raise little menus of their own. You can have nothing but menus, if you wanted. Infinite menus, dropping down.

Joomla is pretty cool, provided you step into a ring and box with it for like, nine hours.

man fsck this

As a few of you probably noticed, pipeline1, also known as seed.sdsu.edu, has been down at least since wednesday or thursday of last week. One of the drives was having "problems", we'll say, that were forcing pipe1 to be unable to boot. The actual boot drive wasn't the issue. The second drive was the issue. It was formatted with an lvm2 filesystem. Also known as "logical partitions". I guess it's one way to get multiple partitions onto one drive. The catch is that it makes it really difficult to fsck it if there's problems. After spending quite a while on it, here's what I've done to get it fixed.

1: Boot off of an ubuntu install disc. It gives me graphical windows, a terminal, and easy root access. Tools used: Applications->Accessories->Terminal

2: I had to install lvdisplay, part of the lvm2 package, to get a look at our logical partitions, but I'll tell you what they are so you don't have to do that. Our two logical partitions are /dev/VolGroup00/LogVol00 and /dev/VolGroup00/LogVol01.

3: Here is the SECRET TRICK. If you try to fsck either of those, it'll give you an error. First it will tell you that no such file exists, and second it will tell you that it tried anyways, and there's a bad super block. The second statement is misleading, concentrate on the first. So what I did was go to /dev/VolGroup00, and take a look at the two partitions. Both of them were symlinks! *gasp*

4: Those links pointed to the REAL partitions. /dev/mapper/VolGroup00-LogVol00 and /dev/mapper/VolGroup00-LogVol01.

5: I ran fsck on the problem volume (volume 00) and lo and behold, success! All the orphaned inodes and corrupt linked lists are finding their parents and...being trimmed? I don't know. Suffice to say fsck is currently running and hopefully pipe1 will be back up soon.

Screen in Unix/Linux

Ever since I've been using the servers more often to write and run code, I've been keeping my eyes and ears open for new, cool, and better tools to make programming and work on the command-line environment easier. Recently, the screen command has been thrown around in conversation so I thought it'd be a good idea to check it out. It turns out to be a very helpful and powerful tool when doing multiple things at once as it lets you open multiple windows or "screens" in a single session. You can dedicate each screen to a specific task, e.g. running a Java application on one screen and while editing a Perl script in another. Here's a tutorial link I found that helped me learn a little bit more about screen.

http://www.ibm.com/developerworks/aix/library/au-gnu_screen/index.html


You gotta lyse that lysin before the lysin lyses you!

Phages kill bacteria. That's their ultimate goal. Yet, they have to maintain the bacterial cell integrity until they're done with making new phage particles. So, they carefully control the bacterial genome till they replicate their DNA and package it in nascent phage particles. Once these are formed and are ready to leave, they need to leave. They engage in a highly timed and orchestrated procedure of poking holes in the bacterial membranes (using phage holins), degrading the bacterial peptidoglycan-based cell wall, then—if the bacterial host happens to be a gram-negative cell—breaking the outer membrane too!

In the event a phage decides to remain "dormant" inside a bacterium, things get a bit more complicated. A so-called "arms race" is generated. For bacteria, phages are time bombs that can be induced at any time to kill the bacteria. How would bacteria avoid this fatal vampirish ending? They have to "tolerate mutations" in the phage's most dangerous protein-encoding genes. If the gene that controls phage induction is damaged, this may salvage the bacteria. Other tempting targets are the lysis modules! If lysins or holins are disabled, the domant prophages may remain captive forever (or rather until prince "helper phage" comes and frees them from that peptidoglycan-walled prison.

So, if you're a bacterium, it's smart to disable the lysin genes, one way or another. If you're a scientist studying bacterial and phage genomes, there is no better way to find this out than using the subsystems-based SEED server. Using subsystems allows you to find out how closely related phages and prophages may have very different lysin genes. In the diagram below, a bunch of staphylococcal phage and prophage genomes are compared. You will notice immediately how some of their lysins (in Red, labeled # 1) are sometimes truncated. A truncated lysin is bad news for a phage. It means the phage is on its way to be enslaved by the bacterium for long years to come!

Truncated and intact lysins in staphylococcal phages

 

PHACTS flowchart

I have been working on getting an understandable flowchart constructed for PHACTS.  Because of PHACTS odd complexity it has been difficult to get a meaningful and understandable chart.  Here is the most current version.
You are here: Home Lab blog