We are pleased to announce the second installment of the SoCal Bioinformatics Hackathon.
From 9-11 January, 2019, the NCBI will help run a bioinformatics hackathon in Southern California hosted by the Computational Sciences Research Center at San Diego State University! We are going to put a few hundred thousand metagenomic datasets on cloud infrastructure and identify known, taxonomically definable and novel viruses! We’re specifically looking for folks who have experience in Computational Virus Hunting or Adjacent Fields! If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).
PATRIC families are somehow organized sets proteins that are related. Sometimes we want to get all the protein sequences in the family. Here’s how to do that using the PATRIC command line
If you are using utf-8 documents in Python, you may occasionally run into this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 124106: ordinal not in range(128)
The fix is trivial!
We recently released a new version of our qudaich software, designed to compare short read sequence data sets to each other. Qudaich is built around a suffix trie and provides a rapid way to compare short read data sets at the DNA or protein level. Here is how to use qudaich to compare a set of metagenomes to find out how similar they are.
How can we generate a list of all the lengths of all the proteins [in a specific group] in genbank? Its easy with ftp!
Getting data from the NCBI Sequence Read Archive is not easy. Here we combine a few of our posts to go step by step through getting the data.
CentOS is great because it is secure, but not great because it doesn’t have the latest software. Here is how to install C++11 on CentOS6 or CentOS7, and temporarily activate it in a shell. This does not change the default compiler and should cause less problems with your system (but that is not a money back guarantee … you are own your own if it does!)
When writing scientific names: italicize family, genus, species, and variety or subspecies. Begin family and genus with a capital letter. Kingdom, phylum, class, order, and suborder begin with a capital letter but are not italicized.
Here is the complete taxonomy:
If you have a genome annotated in RAST and you want to create a genome scale metabolic model, here is one way to do it using PyFBA and the SEED.
The 2015 SDSU Metagenomics Workshop is designed to be a combination of lectures, discussions, and practical hands on experience to bring people up to date on data analysis for metagenomics.
The workshop is being held in Adams Humanities Room 2108 from 10 am – 6 pm every day from June 22nd – 26th, 2015.
Registration is closed.
The agenda is online here, and will be updated as we progress.
We will use a VirtualBox virtual machine during the class. More information about the image and how to download is here. (Please note, the image is still subject to change, and so don’t download it yet!)