Welcome to the Halophile genome site

 

The color scheme is inspired by this awesome aerial photo of the salterns outside San Jose by Jerry Ting on Flickr.

The site was built by Rob Edwards, of San Diego State University and Argonne National Laboratory, with help from many others.

Download the fasta files

You can download fasta files for each of the sequenced halophile genomes here.

Compare the halophiles to metagenomes

We have compared the halophiles to all publicly available Saltern metagenomes using BLASTN or TBLASTX

Where did this come from?

In 2008 at the ASM meeting in Boston, a couple of savvy scientists noticed that Roche diagnostics had some FLX machines sitting idly on the meeting floor. Not wanting the machines to feel lonely or left out, they proposed a challenge to Roche: Try and sequence a genome or two at an ASM meeting. Not content with that challenge alone, Roche decided to up the ante, and agree only if there would be onsite annotation, analysis, and presentations of the data. The gauntlet had been laid down, faces had been slapped, and the game was afoot.

Nothing like this had been tried before. Could a machine be run at a National Meeting? Could genomes be sequenced and analyzed so fast you would miss it if you blinked? What would we sequence if we could sequence anything quickly?

Salt Ponds (Don Edwards San Francisco Bay National Wildlife Refuge)Answering the first question last, we decided if we were going to sequence anything at all, it would be something that microbiologists would find interesting. Something that could be used for educational purposes, and something that might be more of a challenge than just sequencing E. coli again. It was decided that we would sequence some halophiles, salt-loving microbes that were grown in salterns like those shown in this photo of the solar salterns outside of San Jose (courtesy Jerry Ting, Flickr). In this photo, most of the colors come from Bacteria and Archaea that are growing in the salterns. The Archaea, in particular, are easy to grow, do not require extreme conditions, and grow in very high salt concentrations (about 2.5 M). Not much else will grow in those conditions, and so you reduce the chances of contamination.

Having chosen a bug, or eight, to sequence, we needed somewhere to get them from. Luckily for us (and everyone else), the American Type Culture Collection (ATCC) has lots of halophiles to choose from. We chose several from their catolog, grew them, extracted their DNA, and sequenced the good ones!

Information about the genomes is available in the table, and can be downloaded as tab separated text

Show/Hide the genome information table

ATCC NumberNameOther namesStrainNCBI TaxIDGenBank Genome Project IDlocus_tagRAST job #Submitted to GenBankNumber of ContigsGenome SizeIsolated from
33799Haloarcula sp 33799Haloarcula californiaeBJGN-224436339637HaCal6815No1804420514salt brine, Baja California, Mexico
33800Haloarcula sp 33800Haloarcula sinaiiensisBJSG-23574239639HaSin6816No104405164salt brine, Israel
29715Haloarcula vallismortisJ.F. 542844239641HaVal4977No883930055salt pools, Bad Water Point, Death Valley, CA
35960Haloferax denitrificansS13574539643HaDen4975No383830000Saltern, California
33500Haloferax mediterraneiR-452384139645HaMed6817No53894540salt ponds, Alicante, Spain
BAA-1512Haloferax mucosumPA1240318139647HaMuc4972No293371699Pastular mat from Hamelin pool, Shark Bay, Australia
BAA-897Haloferax sulfurifontisM625561639649HaSul4971No363813939microbial mats and mineral crusts near the sulfide and sulfur rich Zodletone spring, southwestern Oklahoma, USA
29605Haloferax volcaniiDS230980039651HaVol5025No54012900shore mud, Dead Sea

Click on any column header to sort the table

Raw Data

The raw sequence data was generated by Roche using pyrosequencing. Most of the genomes were only sequenced by shotgun sequencing, but two, Haloarcula sinaiiensis ATCC 33800 and Haloferax mediterranei ATCC 33500, a combination of shotgun sequencing and paired end sequencing was used. The shotgun sequencing data is included in with the paired-end sequencing to create a scaffold. We have provided the raw data if you would like to browse or download it!

Show/Hide the raw data

Genome Annoations

To annotate the genomes we used the Rapid Annotation Using Subsystems Technology RAST server. This is an automated annotation server from Argonne National Laboratory. To use this server, you can upload the fasta files, wait a few hours, and rtrieve a complete annoation. The server also provides the annotation in a variety of different formats which are available for download from this table.

Show/Hide the RAST Annotation table

We have the data in a variety of different formats. For most purposes you will probably either want to grab the GenBank or EMBL files. In the RAST, EC numbers are included in functional roles (aka product names or protein functions). However, some software croaks on this, and so we have also removed the EC numbers and provide those files too. GFF and GTF files are more for computer-computer transactions. The SEED organism directory is a tar-ball of the raw data that the RAST uses during the annotation.

Show/Hide more information about the SEED organism directory

The SEED organism directory contains all the files that the SEED uses in the annotation and analysis.

Per the UNIX convention, files that end ˜ are backup files of the file with the same name.

    The files in the directory are:
  • Features/: The features directory contains information about protein encoding genes and RNAs.
  • Features/peg/: The pegs directory contains information about protein encoding genes.
  • Features/peg/fasta: The fasta sequences of all the proteins.
  • Features/peg/tbl: A table of the locations of all the proteins in the genome.
  • Features/peg/tbl.recno:
  • Features/rna/: The rna directory contains information about the RNAs.
  • Features/rna/fasta: The fasta sequences of all the RNAs.
  • Features/rna/tbl: A table of the locations of all the RNAs in the genome.
  • Features/rna/tbl.recno:
  • GENETIC_CODE: The genetic code the genome uses.
  • GENOME: The genome name.
  • PROJECT: The project thaat the genome belongs to, if any.
  • Scenarios/: The Scenarios directory contains information about the automatic metabolic reconstruction of the organism.
  • Scenarios/Analysis/: The analysis directory contains information about the analysis of the metabolic reconstruction.
  • Scenarios/Analysis/inputs_to_scenarios: The inputs to the metabolic reconstruction.
  • Scenarios/Analysis/outputs_to_scenarios: The outputs from the metabolic reconstruction.
  • Scenarios/PathInfo/*: The pathways available in each of the subsystems.
  • Subsystems/: The Subsystems directory contains information about the subsystems found in the organism.
  • Subsystems/bindings: The bindings are the proteins that are in the organism and in different subsystems.
  • Subsystems/subsystems: The subsystems table is a representation of the subsystems in the organism.
  • TAXONOMY: The taxonomy of the organism.
  • annotations: The primary annotations of the organisms proteins and RNAs.
  • attr_id.btree: A btree of the attribute IDs.
  • attr_key.btree: A btree of the attribute Keys.
  • bbhs: The bidrectional best hits.
  • bbhs.index: An index of the BBHs.
  • called_by: How the ORFs were called.
  • contig_len.btree: A btree of the contig lengths.
  • contigs: The DNA sequences (contigs) in fasta format.
  • contigs.btree: A btree of the contigs.
  • evidence.codes: Evidence codes for the annotations of the proteins.
  • expanded_similarities: The similarities (BLAST hits) for the organism against the non-redundant database. The IDs have been expandedd to include all known proteins.
  • expanded_similarities.flips: The reversed similarities for the organism against the non-redundant database.
  • expanded_similarities.flips.index: An index of the reversed similarities.
  • expanded_similarities.index: An index of the similarities.
  • found:
  • log: A log of what was done during the annoation.
  • neighbors: Which are the neighboring genomes.
  • overlap.report: A report on overlapping PEGs in the genome.
  • overlap.summary: A summary of the overlapping PEGs in the genome.
  • pchs: Pairs of close homologs. Proteins that are near neighbors.
  • pchs.btree: A btree of the pairs of close homologs.
  • pchs.evidence.btree: A btree of the evidence for the pairs of close homologs.
  • pchs.raw: The raw data for the pairs of close homologs.
  • pchs.scored: The scores for the pairs of close homologs.
  • proposed_functions: The functions proposed for each PEG based on similarities to a FigFam.
  • proposed_non_ff_functions: The functions proposed for each PEG that are not similar to FigFams.
  • quality.report:
  • scaffold.map:
  • similarities: The similarities (BLAST hits) to our internal database.
  • similarities.flips: The reversed similarities to our internal database.
  • similarities.flips.index: An index of the reversed similarities to our internal database.
  • similarities.index: An index of the similarities to our internal database.
  • split_contigs:
  • unformatted_contigs: the contigs submitted to the RAST annotation system.

 

OrganismEMBLEMBL without EC numbersGenBankGenBank without EC numbersGFFGFF without EC numbersGTFGTF wwithout EC numbersSEED Organism Directory
Haloarcula californiae ATCC 33799EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloarcula sinaiiensis ATCC 33800EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloarcula vallismortis ATCC 29715EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloferax denitrificans ATCC 35960EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloferax mediteranei ATCC 33500EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloferax mucosum ATCC BAA-1512EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloferax sulfurifontis ATCC BAA-897EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory
Haloferax volcanii ATCC 29605EMBLEMBL (No ECs)GenBankGenBank (No ECs)GFFGFF (No ECs)GTFGTF (No ECs)SEED Organism Directory

Physical and Chemical Growth Data

We are collecting sources of the physical and chemical data for these strains. As we characterize them further we will update the data here.

Show/hide the physical and chemical data

 

Data from Allen MA. Et al. 2008. Haloferax elongans sp. Nov. and Haloferax mucosum sp. Nov., isolated from microbial mats from Hamelin Pool, Shark Bay, Australia. Int J Syst Evol Microbiol 58: 798-802. [Abstract|Full text]

You can download this data as a text file

 

CharacteristicPigmentationMotilityNaCl range (M)NaCl optimum (M)Minimum Mg2+Temp range (°C)Temp optimum (°C)pH rangeGeneration time (hr)Oxidase testH2S formation from thiosulfateHydrolysis of GelatinHydrolysis of CaseinHydrolysis of StarchHydrolysis of Tween 80Acid production on MannoseAcid production on GalactoseAcid production on XyloseAcid production on SucroseRifampicin ResistantBactitracin ResistantDNA G+C content (%mol)DNA-DNA reassociation with SA5DNA-DNA reassociation with PA12
SA5redrotating1.7-5.12.6-3.40.230-55537.0-9.00.53 (at 53°C)±-++++---+++61.410030
PA12pink-red-1.7-5.12.6-3.40.223-5542-536.0-100.96 (at 48°C)--++-----+++60.818100
Hfx. Mediterraneipink+1.3-4.72.90.0225-4535-37ND1.2+-+++++ND++--6020 (19)14 (15)
Hfx. volcaniired-orangerotating1.0-4.51.7-2.50.02ND45ND1.83++-----+++NDND63.429 (18)29 (17)
Hfx. Denitrificanspink+1.8-5.14.3ND10.0-40.0375.0-9.0ND++---+ND-+-NDND64.520 (27)22 (16)
Hfx. Gibbonsiiorange-red+1.5-5.22.5-4.30.225-5535-405.0-8.0ND++++-+++++--61.825 (23)30 (20)
Hfx. Sulfurifontissalmon pink-1.0-5.22.1-2.60.00118-5032-374.5-9.0ND+++--+-+++-+60.525 (33)19 (21)
Hfx. LucentensePink+1.8-5.14.3ND10.0-40.0375.0-9ND++---+ND-+-NDND64.520 (27)22 (16)
Hfx. AlexandrinusRed-1.7-5.24.30.3320-55375.5-7.5ND+++--+--+++-59.524 (19)24 (32)
Hfx. PrahovenseBeige- orange-2.5-5.23.5ND23-5138-486.0-8.5ND++--++ND----+63.720 (25)21 (15)
Hfx. LarseniiOrange-red+1.0-4.82.2-3.40.00525-5542-456.0-8.5ND+++-++---weakly +--62.222 (19)24 (22)

Click on any column header to sort the table

 

Data from: Elshahed MS. Et al. 2004. Haloferax sulfurifontis sp. Nov., a halophilic archaeon isolated froma sulfide-and sulfur-rich spring. Int J Syst Evol Microbiol 54: 2275-2279. [Abstract|Full text]

 

You can download all this data as tab-separated text

CharacteristicMotilityNaCl range (M)NaCl optimum (M)Cell stability (M NaCl)Temp range (°C)Temp optimum (°C)Optimum pHAnaerobic nitrate reductionIndole productionH2S formation from thiosulfateHydrolysis of GelatinHydrolysis of CaseinHydrolysis of StarchHydrolysis of Tween 80Rifampicin ResistantDNA G+C content (mol%)DNA-DNA hybridization values
H. mediterranei+1.3-4.72.90.520-55406.5++-++++-59.1-62.24
H. volcanii-1.0-4.52.5-4.30.5ND407-++-----63.421
H. denitrificans-1.5-4.52.0-3.01.530-55506.0-7.0+-++----64.21
H. gibbonsii-1.5-5.22.5-4.30.5-0.725-5535-406.5-7.0-++++-+-61.824

Click on any column header to sort the table

References

  1. Data from Allen MA. Et al. 2008. Haloferax elongans sp. Nov. and Haloferax mucosum sp. Nov., isolated from microbial mats from Hamelin Pool, Shark Bay, Australia. Int J Syst Evol Microbiol 58: 798-802. [Abstract|Full text]
  2. Elshahed MS. Et al. 2004. Haloferax sulfurifontis sp. Nov., a halophilic archaeon isolated froma sulfide-and sulfur-rich spring. Int J Syst Evol Microbiol 54: 2275-2279. [Abstract|Full text]
  3. F. Rodriguez-Valera, F. Ruiz-Berraquero, and A. Ramos-Cormenzana. Isolation of Extremely Halophilic Bacteria Able to Grow in Defined Inorganic Media with Single Carbon Sources. J Gen Microbiol 1980 119: 535-538 [Abstract|PDF]
  4. M. Torreblanca, F. Rodriguez-Valera, G. Juez, A. Ventosa, M. Kamekura, M. Kates. Classification of non-alkaliphilic halobacteria based on numerical taxonomy and polar lipid composition, and description of Haloarcula gen. nov. and Haloferax gen. nov. Syst. Appl. Microbiol. 1986 8:89-99. [Abstract]