A Second Step: What to do with the Oncoming Data Onslaught?

by Bradley Miller on August 31, 2009

In my previous post, I talked about the oncoming DNA sequencing movement.  With rapid, cost-effective DNA sequencing beginning to come of age, there will be an additional “problem” to deal with – the completely massive amount of data that will be generated from sequencing and how to store it, manipulate it, analyze it, compare it and understand it.  Most importantly, how these solutions will enable research and help create biomedical breakthroughs.

As technologies like Pacific Biosciences and Complete Genomics come on the scene, they will product prodigious amounts of data.  And while Complete Genomics has a business model where they act as a service and compile the genome sequence data for you, once you have that sequence, what do you do with it?

Today there are tools like BLAST and other open source solutions that researchers can use – BLAST is limited in its capabilities and other tools require extreme CS collaboration and costly servers and set-up.  At anywhere from 250GB-4TB, complete genome sequences will be massive data files – not just to store, but to compare and contrast as well.  The upcoming computational needs will be massive – to perform complete genome analysis and comparison, a lab would need a large computer array and in depth algorithmic knowledge.  What good will it be to create such a mass of data, without adequate computational and informatics resources?

There are a couple solutions to this problem, one of which is on-demand/cloud computing solutions for this type of computation.  One company, DNAnexus, has developed such a solution and there is a group in Seattle named Sage Bionetworks that appears aimed to create complex and chaotic models of human disease – including genetic information.  These efforts and more will be essential as DNA sequencing becomes cheaper and cheaper.  These companies and technologies will provide an economical platform for deep and insightful bioinformatic research.  This continuation of market segmentation (between sequencing companies and the informatics ‘back end’) will be essential as costs are driven lower, but expertise and needs increase.   Not only that, but these companies will enable more and more scientists to include deep genome analysis in to their research.  For example, my sister-in-law will be pursuing a PhD in evolutionary biology – most likely comparing genetic variation in regional habitats.  The cost effectiveness of this technology will enable graduate researchers like her to do quality, indepth research and truly add to her field.  These platforms will enable research and breakthroughs that will push tomorrow’s breakthroughs.

Leave a Comment

{ 1 trackback }

Previous post:

Next post: