/ 30 June 2000

The long and winding code

It would take decades and cost billions to map the human genome, they said. Tim Radford tells how it was done

Scientists have just finished deciphering the genome – the book of human life. Their DNAblueprint will change everything. The once undreamed-of knowledge has already begun to alter agriculture, forensic science, archaeology, biology and medicine. In the next decades it will fuel new multi- billion pound industries based on the software of life. It will begin to improve human health and prolong human life. It could even begin to dictate the future, and shape the lives of the as-yet-unborn. It will begin to alter the way humans think about themselves, and all life on the planet.

By being able to “read” the software of inheritance, the argument goes, doctors will for the first time be able to treat the disease rather than the patient. They would know exactly what biochemistry allowed the disorder to happen: they would be able to devise exactly the treatment to correct it, without harmful side effects.

But the completion of the first draft of the book of humanity – the scientists will now go over and over the script for years, filling in missing fragments and correcting misreadings – is just a beginning, not an end. Scientists have before them a book of letters, but they have to learn to read it: they have to begin to identify and decode the secrets of the genes, locked in that enigmatic code of four letters, strung over 23 chromosomes.

The work will take decades. Thousands of genes are already known, but tens of thousands still have to be deciphered. In the course of numbering all the genes that affect obesity, or longevity, or the way limbs form, or the way the brain develops, scientists will change expectations about human life in profound ways, and they will raise huge questions as well as huge hopes.

The knowledge will also power a new industrial revolution, as biotech companies spring up to exploit this once-unimaginable knowledge. Britain has been a major player in the revolution, but when the project began more than a decade ago, the United Kingdom government would not support it. Britain’s contribution was underwritten by Wellcome Trust, one of the world’s largest medical charities.

When scientists proposed the idea in the 1980s, it took hours of painstaking effort using exquisitely tricky chemistry to “read” even a few thousand bits of DNA. The cost was put at $5 to $10 for each one of the four “letters” in the three billion- letter human alphabet. When, in the teeth of grumbling from fellow biologists, foot- dragging by politicians in Britain and America and explicit public concern in Germany, the project gathered momentum, the international cost was set at $3-billion in total – $1 for each base pair. Right now the cost has fallen to 10 or 20c.

The first “rough draft” is complete, thanks to astonishing advances in robotics and computing and subtle tricks with electrochemistry.

The challenge was to decipher the entire DNA of one representative human, selected from a group of anonymous donors. But the DNA of one human is an invisible double- stranded molecule arbitrarily chopped up by nature into 46 chromosomes – 23 inherited from the mother, 23 from the father – and folded into impenetrable tangles. It was and still is impossible to read in that form: impossible even to see, let alone manipulate.

So researchers took lengths of human DNA and inserted them into bacteria, and then grew the bacteria, much in the way dairymen “grow” yoghurt, allowing each bacterial specimen to “clone” the same bit of DNA indefinitely, so that it could be preserved in dishes in a human DNA “library”.

But that was the equivalent of simply tearing random paragraphs out of a book, hoping to read them later.

So the first step was to try to make a “map” of the human genome: to establish recognisable genes, or telltale lengths of DNA, at points along all the chromosomes so that those given the challenge of “reading” the sequence would start with a rough idea of where, in a particular chapter, their paragraph might fall. That effort continued throughout the 1990s, as individual groups in the worldwide community of biologists began to identify then locate genes associated with diseases such as muscular dystrophy and cystic fibrosis.

By this time, too, teams within the Human Genome Project, in industry, in the medical charities and in government research laboratories were getting experience with the genetic codes of complex creatures such as yeast, a little member of the mustard family called Arabidopsis, the nematode worm or the fruit fly.

Since DNA is the machinery of evolution, the genes of all creatures would show similarities and make recognition easier.

All the researchers used a technique developed by Frederick Sanger, the Cambridge scientist who deciphered the protein structure of insulin, won a Nobel Prize for it, and then went back to his laboratory to decipher the code of DNA. A feat which earned him another Nobel Prize.

The Sanger method, then done by hand (but now by specially commissioned robots) was simple. You shredded the DNA sample into a large number of lengths, ending up with very short bits, medium lengths and long bits. If you did it often enough, you would end up with bits of every possible length. Then you would tag the end of each with a fluorescent dye, dribble them into a tiny, jelly-filled capillary tube the thickness of a human hair, using electricity to help them settle. The shortest would settle first, then the next shortest and so on until the fragments were in order.

When you had enough fragments, all overlapping each other, you could get a computer to match them, rather like reassembling the Bible from fragments that said “Saul” or “Psalm” or “Solomon”.

But the process also has to be fast. To read a base-pair sequence of a human being out loud – at, say, five letters a second – it would take 20 sleepless years. The original Sanger method could deliver thousands of letters a day, but the project demanded millions.

So what began as a meticulous matter of test tubes and syringes and white-coated laboratory staff became rooms full of robots: St Louis, Missouri; Cambridge, England; Cambridge, Massachusetts or Rockville, Maryland. Biology, invented by gentlemen-amateurs became big science, demanding engineering skills and computing power far greater than needed for the Apollo programme. Although the researchers have blown a whistle, posted their latest results and declared a milestone passed, it isn’t complete.

“There is roughly 10%we haven’t sequenced,” says Don Powell, a molecular biologist at the Sanger Centre in Cambridge. “The attitude in the Human Genome Project is that we are putting this data out, it is only going to be done once and it absolutely has to be done properly. Everybody is agreed you need to do everything tenfold in order to get yourself down to 99,99% accuracy. We will achieve better than that. Because the genome is only going to be sequenced once to this level of completion – no other organisation is going to do all this finishing and all the work to plug all the holes we can – it has to be done properly.”