Author Archives: Daniel Cuevas

An Introduction

Hello Lab! My name is Daniel Cuevas and I just wanted to give myself an introduction here as I am joining the group this week. Hopefully this gives a quick insight into my background experience.
I join the lab as I begin my studies towards a Masters degree with the BMI program here at SDSU. This is not my first time working with Rob Edwards; I have two years under my belt working in Rob’s lab when I was studying for my undergraduate degree in Computer Science. My knowledge in general Biology truly started when my academic career converged here in the field of bioinformatics. My work in the lab involved much introductory work: Android cellphone software development, web application development with JavaScript, algorithm development in Java, and some exposure to bioinformatics tools, like BLAST and assembly tools.
After graduating, I began my internship at Life Technologies where I had a great deal of training doing wetlab experiments as a molecular biologist. I had a months worth of work with the SOLiD sequencing platform. This included the entire DNA library preparation protocol: library fragmentation and preparation, primer ligation, ePCR emulsions, library enrichments, and finally setting up the sequencing instruments to process my DNA samples. Afterwards, running their genome mapping software was also required to determine various analysis on the samples that were sequenced, such as coverage analysis.
In a month’s time, I was absorbed into the newly-acquired Ion Torrent company and continued my mol-bio internship there. I continued performing benchwork duties and running the PGM sequencers, but began to explore the analysis processes that were ran on the sequence data. I slowly became incorporated with the software and bioinformatics aspect of the pipeline, and worked alongside the developing group there. After my internship was completed, I took the position with the software and data analysis team where I eventually accumulated a total of 1 year with Life Tech/Ion Torrent.
Being in a rapidly moving environment, I was forced to very quickly learn new scripting languages like Perl and Python. I also made the effort to expand my web programming knowledge to develop a variety of web tools for our software and mol-bio teams. My experience in JavaScript and jQuery grew tremendously and I also learned the backend language of PHP along the way. I worked alongside mol-bio scientists on a daily basis and frequently collaborated with our multiple sites across the United States. Data analysis was the second half of my workload at Ion where we focused our efforts in performing analyses to explain how mol-bio groups can improve sequencing quality through biology and chemistry.
While working with Ion Torrent, I quickly saw areas where my bioinformatics and analytical skills fell a bit short; areas where I thought more knowledge and an educational-foundation could improve my worth as a scientist and increase my contributions to the R&D teams. This became one of many reasons why I chose to come back to academia in pursuit of higher knowledge and a higher degree in a biological/computational area. I needed re-inforcement of my analysis skills and training in how to look at data and how to develop the questions to ask that would pinpoint problems which could then move this field forward, whether it be related to sequencing improvement, algorithm optimization, cancer-detection software enhancements, or even novel methods that could change the processes set in place now. This now brings me here, on the path to higher education and increase in personal abilities. I hope to become a strong contributor to this group of upcoming scientists and to become a strong one as well.
Hello Lab! My name is Daniel Cuevas and I just wanted to give myself an introduction here as I am joining the group this week. Hopefully this gives everyone a quick insight into my background experience.

Continue reading

Next-Generation Sequencing Methods: A Summary

Being a computer scientist entering the bioinformatics field, I was always interested in learning more about DNA sequencing and how biologists actually perform sequencing. I learned about a few earlier methods, like Sanger sequencing, from my molecular biology course, but these new “next-generation” methods were being thrown around. So I did a bit of research and found a couple papers that summarized the more prominent next-generation methods. If you’re a person in the position I was in a while back, then maybe reading these will give you a little more insight into these new sequencing methods. Note: you might need some biology knowledge to understand the terms and things described here.

Mardis, Elaine R. “Next-Generation DNA Sequencing Methods”. Annual Reviews Genomics and Human Genetics. 2008. pdf

Ansorge, Wilhelm J. “Next-generation DNA sequencing techniques”. New Biotechnology. April 2009. pdf

In both of these pieces of literature, they describe a little bit of history in DNA sequencing. Then they jump into the three primary next-generation platforms:

  1. Roche/454 FLX Pyrosequencer
  2. Illumina (Solexa) Genome Analyzer
  3. Applied Biosystems SOLiD Sequencer

The Helicos HeliScope platform is mentioned and described a bit in both as well.

I feel the Mardis paper has much nicer graphics and figures which complement the description of the different platforms. However, both have really great descriptions overall. Both also describe some applications for these next-generation techniques and future applications to come.

Project Update: Multi-threading or Cluster Computing?

Recently, I’ve been faced with a problem where I feel my metagenome comparator program is running too slow. The main reason behind it is that it’s performing operations that occur multiple times in a loop. These operations involve different tasks such as: reading lines from text, creating objects, inserting those objects into a data structure, retrieving those objects from the data structure, and writing the data structures to disk (just to name a few). So it would be natural to suggest to someone in my position to parallelize it all, and that’s exactly what I want to do. However, I’ve never written any type of parallel applications, and thus, I need to do a little bit of learning and researching into parallel programming. (More of my ramblings after the Read More break)

Continue reading

Screen in Unix/Linux

Ever since I’ve been using the servers more often to write and run code, I’ve been keeping my eyes and ears open for new, cool, and better tools to make programming and work on the command-line environment easier. Recently, the screen command has been thrown around in conversation so I thought it’d be a good idea to check it out. It turns out to be a very helpful and powerful tool when doing multiple things at once as it lets you open multiple windows or “screens” in a single session. You can dedicate each screen to a specific task, e.g. running a Java application on one screen and while editing a Perl script in another. Here’s a tutorial link I found that helped me learn a little bit more about screen.


Metagenome Sequence Matcher

Metagenome analysis spans a large range of different methods and tools in the bioinformatics community. These tools provide scientists with biological information present in a sequenced environmental sample, more specifically the genetic functions encoded in the DNA of the sampled metagenome. Most often those tools have been developed to compare a specific metagenome file against databases that are filled with sequences and annotation data.

This project is directed to performing a comparative analysis between multiple metagenomic FASTA files. By importing n-length pieces of the sequences from one file into a hash table structure, comparing other metagenome sequences from other files will be done quickly and precisely. Finding similar sequences and structures between numerous metagenomes can give insight into what biological functions are shared between related and unrelated organisms.

Increasing Heap Size in Eclipse

Ran into the problem of running out of heap space when running my program on Eclipse the other day. It just so happens that the heap space that is allocated for my programs in Eclipse on my Mac at home is less than what is supplied for my work laptop, thus making my program crash at home and not on my laptop. When dealing with a huge amount of data and objects in a large hash array of trees, heap space can run out pretty quick. So, after digging around a bit on Google I found two simple solutions that I continued to run into. Click here or the Read More link for solutions…

Continue reading

New Features for OS Metagenomics

Here are some of the new features that are now included in OS Metagenomics:

Saving/Loading Your Data

This new feature is more of a developmental infrastructure change that gives the app a better use of persistent data and what we could do with it. Before, saving your data in orkut meant saving a JSON formatted object in the form of this:

{ userId :
{ "savedResults" :
[ "title 1" , "title 2", "title 3",... ]

This only allowed the application to later retrieve the titles of the data and nothing else. Later, an idea came about allowing the application to retrieve the actual data from this list of titles obtained in the Your Data section. So in order to do that I had to change the JSON formatted object to hold the phone number with its respective titles, like this:

{ userId :
{ "savedResults" :
{ phoneNumber 1 :
[ "title 1", "title 2", "title 3",..., "title 9" ],
phoneNumber 2 :
[ "title 10", "title 11",... ],
phoneNumber 3 :

What this does is allow the user to retrieve his/her data much quicker. Instead of filling in the input fields, if the user saved their data into orkut they can now view their previously saved data in the Your Data section as a drop-down list. After choosing a title to access, the app searches it’s internal application data for the “savedResults” object and parses the JSON string, searches for the chosen title and remembers the corresponding phone number associated with it. The app then sends a request to our server to retrieve the data, which is similar to “filling in” the form for you and loading the data for you.


The Friends section has been successfully added to the OS Metagenomics application. With this new section a user is now able to click the View Friends Data button to retrieve a list of drop-down lists. These drop-down lists are separated by the names of the user’s friends and contain the data the friends saved into orkut, not the data they saved onto the server. So, if someone would like their friends/colleagues to see which metagenomes they now have data for, they would just save it into orkut when viewing the data and their friends would see it in their Friends section. Also, a Request Data button is located next to the drop-down list. Clicking this would send a message to the friend asking them for the data that was chosen in the drop-down list. (orkut is actually not allowing me to send this message for some reason. Every time I try it a message at the top of the application screen shows saying “You have temporarily been disallowed from performing this action. Please try again after some time.” This may be disabled during the development phase of an application but it seems that orkut knows I’m trying to send an email which is a good thing Smile).

Saving your data on PMDS

The holidays are over and now it’s time to progress! So, with that in mind I dedicated my work yesterday to understand how to apply persistent data in OpenSocial apps. Previously, I spent some time searching how to store data into the reserved space orkut sets aside for individual apps in order to allow users to store and retrieve data. I found a few methods that I thought I should attempt to implement into my app PMDS (Personal Metagenome Data Storage). In doing so I ran into a few very confusing problems, but nonetheless I found solutions and PMDS can now save data into the app itself. With this new understanding we have opened up the possibilities of displaying, sharing, storing, and updating information and data with ease. Now the implementation isn’t exactly how I want it to be; right now it only saves the titles of the objects that are stored in our server and then displays them in a list, but what’s important is that we can do whatever we want with whatever data we want to save. If you’re interested in the JavaScript code I have for the app, click here or the Read More.


Continue reading

“Blame genetics for bad driving, study finds” article

A funny and interesting short article from CNN that I stumbled upon:

I’m just picturing what Rob has been saying about insurances denying people if they have a higher probability of developing medical problems found in their DNA sequence. Looks like it might not just be limited to medical insurance companies!

BioJava using SEED Web Services

For a few days I’ve been testing the waters of the SEED Web Services methods and trying to get a feel for the different ways they could be implemented, and at the same time was doing the same for BioJava. After aquiring enough knowledge of sequences, pegs, and other things I was finally able to get them to use each other’s methods to display and store information for anyone to use. Here is a really simple example where the Web Services obtains a genome ID, “83333.1”, and a region on the chromosome, “NC_000913_1303788_1304792”, and then returns a sequence. Then, BioJava uses its createDNA() method to convert the sequence into an object it can manipulate. The code then prints out different properties held by the sequence. Click the Read More for the code.



Continue reading