/ 6 September 2013

SKA takes the long view on big data

The SKA will collect more data in one week than humankind has developed in its entire history.
The SKA will collect more data in one week than humankind has developed in its entire history. (Gallo)

As governments worry about how they will legislate the access to and analysis of "big data", scientists are still trying to figure out how they're going to cope with the deluge of information.

"Big data is a very abused term that is bandied about a lot," says Simon Ratcliffe, technical lead for scientific computing at Square Kilometre Array (SKA) South Africa.

"Essentially, it's saying that you're in an environment where you have so much data that if you take a traditional approach to processing that data, you'll be overwhelmed."

Traditionally, "you would write the data to file, put it in a spreadsheet, analyse it in non-real time. You just can't do that with big data. You have to look at it quickly in real time."

This is why the SKA project is important. The SKA – a collection of thousands of antennas collecting information about the relatively weak signals coming from space – will collect more data in one week than humankind has developed in its entire history.

The giant radio telescope will be hosted in Australia, South Africa and eight other African countries.

"The SKA will drive infrastructure, broadband capacity and high-powered computer systems," Ratcliffe says, adding that there is an analogy with road and sewerage infrastructure to facilitate economic development in an area.

"Big data infrastructure will be a legacy of the SKA … a springboard for entrepreneurial activity and we will have populated the market with good skills."

SKA South Africa's director, Bernie Fanaroff, has repeatedly said that an important offshoot of South Africa hosting part of the SKA will be that it will position the country as a hub for big data analytics.

The international Information Systems Audit Control Association's white paper on privacy and big data highlights a big data skills shortage because there are very few people globally who have the skills to participate in big-data number crunching.

IBM is one of the SKA's industry partners, and the company's university relations manager for South Africa, Sean McLean, says: "There are major gaps in tertiary institutions' ability to meet current and future IT skills in South Africa, Africa and across the world."

He says his role is to "find out how we can assist tertiary institutions … generate the skills necessary for big data and the SKA".

SKA has previously partnered with IBM and Dutch radio astronomy institution Astron on the Dome project, which will create an IT road map for the SKA. The €34-million project would "research extremely fast, but low exascale computer systems aimed at developing advanced technologies for handling massive amounts of data", Ton Engbersen, the Dome project leader at IBM Research, said at the time.

He said that the full SKA – which is expected to be completed in about 2024 – would generate about 14 exabytes of data a day (about 14-billion one-gigabyte iPods). "We need to focus on building an efficient computer system behind this so we can deal with this amount of data, without using power that no one can afford."

South Africa's contribution to the Dome project would involve data visualisation, software analytics and desert-proof technologies.