We are pleased to announce the second installment of the SoCal Bioinformatics Hackathon.
From 9-11 January, 2019, the NCBI will help run a bioinformatics hackathon in Southern California hosted by the Computational Sciences Research Center at San Diego State University! We are going to put a few hundred thousand metagenomic datasets on cloud infrastructure and identify known, taxonomically definable and novel viruses! We’re specifically looking for folks who have experience in Computational Virus Hunting or Adjacent Fields! If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).
PATRIC families are somehow organized sets proteins that are related. Sometimes we want to get all the protein sequences in the family. Here’s how to do that using the PATRIC command line
If you are using utf-8 documents in Python, you may occasionally run into this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 124106: ordinal not in range(128)
The fix is trivial!
How can we generate a list of all the lengths of all the proteins [in a specific group] in genbank? Its easy with ftp!
CentOS is great because it is secure, but not great because it doesn’t have the latest software. Here is how to install C++11 on CentOS6 or CentOS7, and temporarily activate it in a shell. This does not change the default compiler and should cause less problems with your system (but that is not a money back guarantee … you are own your own if it does!)
When writing scientific names: italicize family, genus, species, and variety or subspecies. Begin family and genus with a capital letter. Kingdom, phylum, class, order, and suborder begin with a capital letter but are not italicized.
Here is the complete taxonomy:
If you have a genome annotated in RAST and you want to create a genome scale metabolic model, here is one way to do it using PyFBA and the SEED.
The 2015 SDSU Metagenomics Workshop is designed to be a combination of lectures, discussions, and practical hands on experience to bring people up to date on data analysis for metagenomics.
The workshop is being held in Adams Humanities Room 2108 from 10 am – 6 pm every day from June 22nd – 26th, 2015.
Registration is closed.
The agenda is online here, and will be updated as we progress.
We will use a VirtualBox virtual machine during the class. More information about the image and how to download is here. (Please note, the image is still subject to change, and so don’t download it yet!)
We often have people ask us how to convert fastq files to fasta format. We have a variety of code on this website, but sometimes that is not easy enough.
Here are a couple of ways to do it on the command line: using a PERL script written by Bas, using the command line, or using prinseq-lite. Here is a C++ version that you can compile (e.g. with c++ -o fastq2fasta fastq2fasta.cpp) and run on your machine.
We also have a simple form that converts fastq files to fasta files (DNA only … it does not give you the quality scores).
We successfully completed a one-day training course for ~40 people on how to use anthill, and everyone is now an expert, right?
The latest version of the anthill training notes are now available at this link: AnthillTrainingNotes