/ 16 March 2018

Harvesting the benefits of data

The DST is working hard to ensure that South Africa has the human capacity and cyberinfrastructure for extracting knowledge in data assets
The DST is working hard to ensure that South Africa has the human capacity and cyberinfrastructure for extracting knowledge in data assets

Deadly hurricanes accompanied by devastating floods have wreaked havoc across the world, leaving in their wake a trail of destruction, forcing residents to pick up the pieces and rebuild their lives.

In recent times, we’ve seen how meteorological data has provided crucial information, providing people with more time to prepare for and mitigate the effects of hurricanes and floods, probably saving hundreds, if not thousands, of lives.

Data has become key to research. In the past few hundred years, research has involved theory, the collection of information, experimentation and computation, but these days it has become intensively data-based — even in the human and social sciences. With the unprecedented proliferation of data, and the so-called big data phenomenon, data is increasingly valuable.

Business and industry have rushed to use it for financial and competitive benefit, while governments and nongovernmental organisations are using it in domains such as urban planning, environmental management, agriculture, transport and health.

Obviously, data on its own is not enough, but its uses are legion, and multiple stakeholders are mining it for research towards socioeconomic benefit.

Government, through the department of science and technology (DST) and entities such as the Council for Scientific and Industrial Research (CSIR), is working to ensure that South Africa has the human capacity and cyberinfrastructure it needs for the extraction of knowledge (sometimes buried) in data assets.

The country’s National Integrated Cyberinfrastructure System (Nicis) is managed by the CSIR on behalf of the DST. The system has four components: a human capacity development component, the South African National Research Network (a high-speed network dedicated to science, research, education and innovation traffic), the Centre for High Performance Computing (which enables cutting-edge research with a high impact on the economy), and the Data Intensive Research Initiative of South Africa (Dirisa).

Dirisa provides data storage and management services that enable its users to upload, discover and reuse data sets. It is currently developing the national (Tier 1) data node and co-ordinating the establishment of regional (Tier 2) data nodes. The regional data node being established in the Western Cape has an astronomy and bioinformatics research focus and is shared among all the higher education institutions in that region.

Nicis has a focus on building the expertise needed to extract value from data. It is co-ordinating the implementation of an e-science master’s programme by a consortium of higher education and research institutions, of which at least three will offer an MSc by 2018.

Another human capital initiative, the Data Science for Impact and Decision Enhancement programme — a multidisciplinary, project-based vacation programme for students — usefully complements the MSc programme. Participants, guided by mentors, learn to understand and use data sets for real needs in domains such as education, transportation, logistics, energy and smart urban development.

Data should be well maintained. A significant amount of taxpayers’ money has been invested in acquiring and generating data. Given South Africa’s lead in international research projects such as the Square Kilometre Array radio telescope, these investments are set to increase substantially. Much work still needs to be done and the adoption of proper research data management practices will be an essential enabler of open data and open science, and allow South Africa to reap maximal benefits from its investment in data.

Dr Anwar Vahed is the principal data scientist at the CSIR and he is the caretaker manager of the Data Intensive Research Initiative for South Africa. His interests are high performance computing infrastructure, big data and data preservation