/ 16 March 2018

Building a new generation of local data scientists

The CSIR’s Data Science for Impact and Decision Enhancement programme places emphasis upon problem solving and creativity
The CSIR’s Data Science for Impact and Decision Enhancement programme places emphasis upon problem solving and creativity

South Africa’s data science capacity continues to expand, with more students pursuing this discipline and using data science to help solve the country’s problems.

The department of science and technology (DST) has made huge investments in the Data Science for Impact and Decision Enhancement (DSIDE) programme, which has trained 141 candidates since its inception in 2014.

The programme is hosted at the Centre for Scientific and Industrial Research (CSIR), an entity of the department. The aim of the programme is to support capacity building in the ever-growing field of data science, by scheduling recruits to participate in mentor-guided and learn-by-doing problem solving of real-world needs as presented by different stakeholders.

The projects have a common theme that adapts a visual analytics framework, with goals that include understanding the dataset through interactive visual exploration and model development. Extracted insights are intended to trigger actions towards better decision-making for various users.

The CSIR DSIDE programme puts emphasis on problem solving and creativity, and encourages students to be curious. Experienced mentors from the CSIR data science community will introduce machine learning topics, tools and theories, and guide students in this project-driven environment. Given that this is a learn-by-doing initiative, stakeholders do not expect the delivery of market-ready output by the end of the programme.

The programme is held over 12 weeks, four weeks in the June/July university vacation, and eight weeks in December and January.

Current recruits include students from third year to PhD level, in various fields related to data science, including engineering, applied mathematics and business informatics.


The Coastal News Watch project and Project CoastCam utilise students to monitor the effects of global warming upon the South African coastline

Coastal News Watch

Some of the data-science solutions that have been developed have been implemented by government departments and municipalities. The DSIDE Coastal News Watch project, led by Bolelang Sibolla and a team that includes Mpheng Magome and Retief Lubbe from Unisa and Promise Msomi from the University of Pretoria, developed a project to protect our coastal areas.

Coastal News Watch is a project tasked to develop oceans and coastal information system management, where researchers and managers have access to details of the events happening in certain areas of interest, the location of these events and the causes thereof. The Project Coastal News Watch team developed a dashboard for visualising geospatial events on South African coasts; visualising topical media-based data about South Africa’s coastline and Exclusive Economic Zones; and applied exploratory Geospatial Visual Analytics to harvest information by their topic to aid in delivering rapid information.

The team also focused on developing a core engine for classifying news articles about coastal events, and by the end of the DSIDE programme the engine was running.

The students felt that the DSIDE vacation work program has been incredibly educational and inspirational, since many of them were new to software development machine learning, while others have gained more experience with their Python and JavaScript programming skills.

CoastCam

Another project focused on the coast was Project CoastCam, which was led by Dr Michael Burke, with team members Thembelani Bheza (Wits), Mokuwe Windy (Sefako Makgatho Health Sciences University) and Henneth Malatji (University of Limpopo).

Project CoastCam focuses on investigating the impacts of climate change on the coast. These include rising sea levels and flooding that affects coastal activities, cause delays at ports, damage coastal infrastructure and impact on the ecosystem. These effects can be worsened by sand erosion, so it is important to monitor sand movement over time. CoastCam’s team is designing a classification tool that will be used to label coastal image areas as either dry sand, wet sand or water.

The tool will also allow researchers to label a small subset of image areas appropriately and then use a classifier that can label previously unseen images. The CoastCam’s dataset contains 25 453 images of Fish Hoek’s shoreline in Cape Town, captured from September 2014 to September 2015. During the first phase, CoastCam investigated classification algorithms and machine learning approaches to deal with images, prototyping a supervised classification system. In the second phase of the project, these algorithms have been deployed to a dashboard that allows image labelling, trains a classifier, and returns a segmented image. Work on processing these segmented images to produce long-term measurements of sand volume changes over time is ongoing.

For this team, learning new computer languages such as Python, JavaScript and Django to develop a web app was very important, as was the teamwork and engaging with machine learning concepts including decision trees, support vector machines, naive bayesian classifiers and supervised and unsupervised learning.