Primes and Missing

From EdwardsLab

Jump to: navigation, search

Contents

About this page

This is a page that I am using to make notes on a collaboration with Dawn. This is just a bunch of random notes!

Download tarball

The Nullomers.tgz tarball has the files described below.

Data and Code

fileContents
Text files
dna_primes_len15.txtThe raw sequence downloaded from Boise State
aa_primes_len5.txtThe amino acid sequences downloaded from Boise State
dna_primes_len15.faThe same sequences, converted to fasta sequences just numbered starting at 1
dna_primes_len15.translations6-frame translations of the above in all 6 frames. The fasta identifiers are appended with the frame number.
dna_primes_len15.translations.nostopsThe 6-frame translations screened for only 5-mers and for sequences without stops
Code
translate_454_seqs.plTranslate DNA to protein (note, requires FIG installation!)
check_for_aa_primes.plCompare the 6-frame translation with the downloaded sequences
nostops.plRemove stops from the 6-frame translation, and limit the output to sequence that are 5 amino acids long

Results

FileNumber of SequencesUnique Sequences
dna_primes_len15.fa60,37060,370
dna_primes_len15.translations362,220107,796
dna_primes_len15.translations.nostops167,41457,642

So there are about 57,642 5-mer amino acids whose DNA sequence is never found. However, NONE of these are in the amino acids list from that site. I just don't grok where the differences are coming from.

Personal tools
peoples pages