Feature: In Search of the $1000 Genome

MONDAY, 3 MAY 2010

In 2001, before the first human genome sequence was published, hundreds of scientists were already meeting to discuss the future of genomic research. Included in their vision was the dream of sequencing a human genome for $1,000 or less. This idea, which seemed almost fictional at the time, has in less than a decade become a question of when, not if, we can achieve it.

Nature asked dozens of leading scientists in 2007 what they would do if it became possible to sequence a genome for $1,000. Their answers reveal the diversity of research that will be revolutionised by inexpensive genome sequencing. From tracing human origins and understanding dog behaviour, to generating better strains of rice, the benefits are wide-ranging. As well as the implications for research, a low-cost genome sequence brings personal genome sequencing into the price range of medical testing. This would lead to personalised medicine, where cancer treatments could be targeted to specific gene changes, and drug prescriptions based on an individual’s genotype. As Francis Collins, a leader of the original Human Genome Sequencing project puts it: “The real question is, what wouldn’t we do?”

The human genome contains all the hereditary information for an individual and is stored on 23 pairs of chromosomes. Sequencing the genome refers to the process of determining the exact order of the three billion chemical building blocks (called bases) that make up the DNA that carries this information. An entire human genome cannot be sequenced at once, as only a short stretch can be read at a time, so the genome must first be split into fragments which are sequenced and then finally reassembled.

One of the earliest methods of DNA sequencing was the chain termination sequencing method, developed by Fred Sanger and colleagues in 1977. This method was costly, at around $10 per base pair in 1985, but the development of automated sequencing systems and advancements in technology reduced the price to $1 per base by 1995, and allowed sequencing of up to 100,000 bases per day. The cost dropped further to $0.10 per base in 1998 with the development of the ABI Prism sequencer, which made it possible to undertake larger-scale sequencing projects. This automation of the Sanger sequencing technique was used to map the first human genome sequences. While these technologies were further refined, sequencing a whole genome required millions of dollars and months to complete and was still based on techniques developed in the seventies. It was clear that new technologies would be needed if the $1,000 genome was to be achieved.

The current generation of sequencing technologies (known as next-generation or second-generation sequencers) achieve their improvements in speed and cost by performing many sequencing reactions in parallel. The method used is known as sequencing-by-synthesis, where the DNA sequence is read off as the molecule is created. These systems must successfully combine complicated biochemistry with high-resolution imaging and require sufficient computing capacity to cope with the huge quantities of data produced. The cost and speed of sequence generation must be balanced with the accuracy and length of the sequences achieved. A variety of approaches have been developed to meet these technical challenges.

The first next-generation sequencing technology to be commercially available was the Roche/454 FLX sequencer, which relies on a form of sequencing-by-synthesis known as pyrosequencing. In this system, the sample of DNA is fragmented and stuck to tiny beads that are placed within water droplets in an oil and water emulsion. Each individual drop acts as a minute, self-contained reactor, and the single fragment of DNA is copied up to a million times on the surface of the beads. Hundreds of thousands of beads are placed in picolitre-scale wells on a glass slide (one picolitre is one trillionth of a litre), allowing the addition of each base to the DNA to be monitored by the emission of detectable light. The whole genome sequence of James Watson was produced in two months using an FLX sequencer, at a cost of less than a $1million. In contrast, the whole genome sequence of J. Craig Venter, completed only a few months previously using traditional Sanger sequencing, was estimated to cost $100 million. The FLX platform has also been used in the sequencing of ancient DNA, including the extinct woolly mammoth and Neanderthal genomes.

Sequencing technology first developed here in Cambridge, at the Department of Chemistry, forms the basis of the Illumina Genome Analyzer, a leading competitor of the Roche/454 sequencer. Instead of sticking the DNA to beads, the Illumina technology sticks millions of DNA fragments to the surface of a glass slide. Fluorescently-labelled bases are added to the DNA fragments, and a CCD (charge-coupled device) camera takes an image of the whole slide, allowing millions of sequences to be read off at once. The Illumina technology was used in the recent sequencing of the giant panda genome and the first cancer genome sequence. Illumina have recently launched a service offers personal genome sequencing to consumers at a cost of $48,000 each – an improvement over the $1 million sequence of the Watson genome, but still a long way from the $1,000 target.

It is the upcoming third-generation sequencing technologies that seem most likely to achieve the $1,000 genome target. The genome sequencing company Complete Genomics published three human genome sequences in early 2010 for less than $4,400 per individual, but there is a cost in accuracy. The technologies have an error rate of one wrong base in every 100,000 bases, which adds up to nearly 60,000 errors in the complete genome. This makes it potentially difficult to separate any important disease-causing mutations from the sequencing errors.

Another player in the third-generation sequencing market is Pacific Biosciences, whose SMRT (Single Molecule Real Time) technology can monitor DNA synthesis as it happens. This monitoring gives an increase in speed of up to four orders of magnitude compared to second-generation technologies. It also allows for much longer fragments of DNA to be sequenced than the second-generation systems, which are limited by the need to stop the reaction each time the sequence is read.

The most promising of the third-generation technologies is nanopore sequencing. Here the sequence is read as each DNA base passes through a protein pore by measuring the change in ionic current caused by the different bases partially blocking the pore. This is a potentially revolutionary technology, as it requires none of the expensive enzymes and reagents or the precision cameras needed for sequencing-by-synthesis. While the potential improvements in cost and speed are huge, a nanopore sequencer has yet to be brought to market.

Whoever wins the race to the $1,000 genome, the rewards are great. The Archon X-PRIZE for genomics is offering $10 million to the first team to sequence 100 human genomes in under ten days for under $10,000 each, and eight teams have already signed up. Beyond the financial rewards, whoever produces the $1,000 genome will have achieved a goal that seemed like science fiction only a few years ago, and which could bring about profound changes in all areas of the biological sciences, affecting the lives of people around the world.

Elizabeth Batty is a PhD student in the Department of Pathology