List of interesting questions
From EdwardsLab
Contents |
Overview
These are things that people in the lab are thinking about, or are working on. Some of them are open for new students, while others are already in progress. This should give you an idea of things that we do in the Edwards' lab.
New
iPhone versus Android
Build an app for to display data from RTMg.web for the iPhone and android, and compare and contrast the development
Google Map App
Build an app to display RTMg.web data on google maps. Export that to android and iPhone. In particular, build a metagenomes near me app.
SEED in GWT
Implement the <a href="http://theseed.uchicago.edu/FIG/index.cgi">seed</a> in <a href="http://code.google.com/webtoolkit/">Google Web Toolkit</a> using our alpha version <a href="http://edwards.sdsu.edu/org.theseed.servers/">Java Servers</a>
BLAST as Map/Reduce
Implement the BLAST algorithm in hadoop
Old
Biorobots
This is a new project that is only open to undergrad students at the moment. If you are interested in tackling this project, please come talk to me.
More details are available on the Biobots description page.
rpm and bittorrent for the SEED
The SEED is about 250 GB of data at the moment, and we are interested in ways of exploring that data via peer-to-peer networks. We used to have a mechanism in place, but it fell by the wayside as the data got too overwhelmening. The idea is to generate an rpm of either the whole thing, or each of the individual organism directories (there are currently 2,749 organism directories, so one-by-one may be daunting), and then use bittorrent and rocks to maintain an up to date remote system.
Globus, Condor, and OGCE
Building on the work to deploy grid computing environments, we will install Globus, and the Open Grid Computing Environment (OGCE) on the local clusters. OGCE is a portlet framework that allows rapid prototyping and development of interfaces to parallel computing. The Globus/OGCE framework will be used to develop the next generation Life Sciences Gateway infrastructure, for high performance bioinformatics. This framework will be deployed across the TeraGrid.
The project will follow from the successful deployment of Globus on one or more machines, but will also lead to the deployment of web services to interact with the OGCE interfaces.
Local Grid Computing
The project is to create and maintain a mixed-use, mixed source grid computing environment for prototyping, development, and testing. The environment will contain a mixture of architectures including PowerPC, Intel-x86, Sun SPARCs and others. Each machine will run Linux (probably the latest stable version of Debian or Fedora Core), and appropriate scheduling and distributed resource management software such as SGE, Globus, or Condor; industry standards in high performance computing, each of which has its benefits and drawbacks. Initial stages of the project involve securing machines, bringing all machines up to specification, installing operating systems, and scheduling software as appropriate. Each machine will need to have appropriate bioinformatics software installed on the nodes.
As appropriate, and as the project develops, we will craft new software for job submission, control, and maintenance, such as the PERL Schedule::SGE objects. We often encounter problems with IO limitations of the machines and will have to overcome those by appropriate staging and mounting of data.
Building on this work to deploy prototype grid computing environments, we will install the Open Grid Computing Environment (OGCE) on the local clusters. OGCE is a portlet framework that allows rapid prototyping and development of interfaces to parallel computing. The Globus/OGCE framework will be used to develop the next generation Life Sciences Gateway infrastructure, for high performance bioinformatics. This framework will be deployed across the TeraGrid. For more details about this project, please read the statement of work.
This project will involve systems administration and installation, running parallel computing environments, and sharing data between resources. The project will lead to the deployment of web services to interact with the OGCE interfaces.
The HIV/HCV Project
Human immunodeficiency virus (HIV) is a leading health problem worldwide, and, of course, a major health crisis that scientists are trying to solve. Working with Roland Wolkowicz, we are using bioinformatics to aid with development of vaccines and drugs. Initially, we have a very simple hypothesis to test: are there any DNA sequences that are common between the HIVvirus genome and the human genome?
The SEED
The SEED database is a comprehensive database for bioinformatics analysis. It is supplied on several DVDs. This project is to take an unused computer and install Fedora core, the SEED software, data, and databases, and to ensure all-user access to the data. The SEED standard API provides many commonly used tools and capabilities that can be accessed programatically. This installation will be the basis of many other projects, and is central to the work in the lab.
Phage SEED
A long term project in the lab is to understand the sequences behind phages, viruses that infect bacteria. In this work, we will develop and deploy new tools into the SEED database specifically to handle phage genomes. This project will involve significant CGI programming, primarily using perl, java, and python. The aim will be to develop interactive web sites that can be used by the SEED community to add and annotate phage genomes within the complete microbial genomes.
Unified GIS, Photo data
We have data from three commonly used devices - a digital camera, a GPS receiver, and a dive computer. The project is to write an application that takes the three data types as input, and merges the data, writing the files to a suitable output format.
The output should be compatible with file sharing applications (e.g. Picassa), with simple GIS applications (e.g. Google Earth), and with more complex GIS applications like ArcGIS and Geofusion.
Image autoalignment
A collaboration with some telascience people: image autoalignment
Clustering and Metagenomics
Look through metagenomes for genes that lie next to each other.
Software Installation
Several major suites of software need installing, editing, updating, and maintaining for the lab. These are perfect one-off projects that you could do to meet people in the lab
MediaWiki upgrades, updates, maintenance, and development Versioning control and support: a subversion repository for all to use Lab web sites, email lists ...
