Primes and Missing
From EdwardsLab
Contents |
About this page
This is a page that I am using to make notes on a collaboration with Dawn. This is just a bunch of random notes!
Download tarball
The Nullomers.tgz tarball has the files described below.
Data and Code
| file | Contents |
|---|---|
| Text files | |
| dna_primes_len15.txt | The raw sequence downloaded from Boise State |
| aa_primes_len5.txt | The amino acid sequences downloaded from Boise State |
| dna_primes_len15.fa | The same sequences, converted to fasta sequences just numbered starting at 1 |
| dna_primes_len15.translations | 6-frame translations of the above in all 6 frames. The fasta identifiers are appended with the frame number. |
| dna_primes_len15.translations.nostops | The 6-frame translations screened for only 5-mers and for sequences without stops |
| Code | |
| translate_454_seqs.pl | Translate DNA to protein (note, requires FIG installation!) |
| check_for_aa_primes.pl | Compare the 6-frame translation with the downloaded sequences |
| nostops.pl | Remove stops from the 6-frame translation, and limit the output to sequence that are 5 amino acids long |
Results
| File | Number of Sequences | Unique Sequences |
|---|---|---|
| dna_primes_len15.fa | 60,370 | 60,370 |
| dna_primes_len15.translations | 362,220 | 107,796 |
| dna_primes_len15.translations.nostops | 167,414 | 57,642 |
So there are about 57,642 5-mer amino acids whose DNA sequence is never found. However, NONE of these are in the amino acids list from that site. I just don't grok where the differences are coming from.
