Besides potentially revealing the underlying structure of the universe — including, perhaps, the elusive Higgs boson, the nature of dark energy and whether superstrings exist — the Large Hadron Collider (LHC) project will show us how to manipulate vast amounts of data in precisely the way that we will have to in the coming decades as we try to do more complex, distributed computing.
After all, lots of people know that the world wide web came out of Cern, which is home once again to a particle collider project. But not many people know why the web was invented there. It was, lest we forget, because in 1989 Tim Berners-Lee, while working there, needed a manageable way to support the documentation required for Cern operations.
Big servers with long directories didn’t do the job, nor did Gopher or Archie, early tools that did internet searching long before Google was a glint in anyone’s eye. So Berners-Lee implemented a hypertext system that would let people dig as deep as they needed to in the web of documents for the collaborative project.
And today we have a world wide web of documents and pages; arguably, every penny invested by our governments in Cern has paid off — just not in particle physics. Instead it’s been repaid through the utility of the web and the productivity enhancements and economic growth it has enabled.
Now the LHC experiments are taking the next step. They will generate — once it’s tuned and working — two gigabytes of data every 10 seconds and will need to store 2,2 petabytes (a million gigabytes) of fresh data every year. Google processes 20 petabytes of data a day, mostly through its map applications, according to the High Scalability blog (bit.ly/lhc01). But that’s slightly different from adding such vast amounts, as Cern is doing.
What’s more important is that, just as with the web, Cern is making its methods and programs available for free under the open-source LGPL licence, as well as its computing model — enabling anyone who wants to build the next Google to do so with ease.
But although that’s unlikely, what the LHC will probably bequeath to organisations that want to learn from it is an object lesson in safely storing and then processing huge amounts of data spread all over the world.
If you want to find out more about what’s going on at the LHC project — in terms of output, rather than the gee-whizz of simply getting the particle beams to collide — then have a look at the Cern LHC WebHome page (bit.ly/lhc02) and even browse the live webcams dotted around the system.
All was quiet at the Alice detector — bit.ly/lhc03 — when we looked on Tuesday and, true to its open-source roots, the LHC project will be one where you’ll be able to see the results evolve on the wiki pages. Open-source science, indeed. —