Students from universities across the country participate in the annual DSIDE programme aimed at growing local data science capacity
The Northern Cape (NC) is the largest province in South Africa, but has the lowest population. It has a colourful history and a wide variety of cultural tourist attractions; when diamonds were discovered in Kimberley unprecedented growth took place in the province.
It is also home to the Karoo region, a desert-like area that has become a famous tourist attraction, particularly for the Namaqualand flower season. Tourism in the NC has become a key feature of its economy. Conserving and growing tourism is crucial, as mining in the region is no longer a key economic driver.
Weather patterns have a big impact on tourism. The weather affects the frequency of tourism, the selection of the destination, tourist activities and their vacation satisfaction. Some tourist sites are weather dependent, e.g. Namaqualand, which is filled with seasonal flowers. So weather can be considered a determinant of the success of tourism in a location, controlling the tourist flow.
For this reason, it has become important to monitor the weather patterns in the region as they influence the numbers of visiting tourists. A data-science project for the department of economic development and tourism (Dedat) was developed to improve the tourism flow in the NC.
Led by Professor Sonali Das from the Centre for Scientific and Industrial Research (CSIR), the Data Science for Impact and Decision Enhancement (DSIDE) programme, a team comprising Kopano Motlapele (Sol Plaatjie University), Lebohang Molapo (Sefako Makgatho University), Boitumelo Matlapeng (University of Cape Town) is working hard to develop an instrument that will assist the province.
A case study was conducted to understand the flow of tourists in NC in comparison to the weather. Two datasets with different frequencies were used, namely tourism in NC and the weather data.
In the tourism data, there were two categories, namely foreign and domestic, ranging from 2013-2016 and 2009-2016, with common variables: the “average length of stay”, “visits”, “bed nights”, “purpose of visits” and “cities”. The weather data ranges from 2012-2016 and temperature and humidity were the only considered variables. The aim was to build a regression model to predict and advise the Dedat in NC on how the weather affects tourists and how to mitigate this effect, but due to the variation of the frequency (tourism is observed annually and weather hourly) in the dataset, it was difficult to come up with a decisive tool.
The DSIDE programme aims to support capacity building in the ever-growing field of data science by scheduling recruits to participate in mentor-guided and learn-by-doing problem-solving of real-world needs as presented by different stakeholders.
The projects under DSIDE have a common theme that adapts a visual analytics framework with goals that include understanding the dataset through interactive visual exploration and model development. Extracted insights are intended to trigger actions towards better decision-making for various users.
The CSIR DSIDE programme places emphasis on problem-solving and creativity and encourages students to be curious. Experienced mentors from the CSIR data science community will introduce machine learning topics, tools and theories, and guide students in this project-driven environment. Given that this is a learn-by-doing initiative, our stakeholders do not expect the delivery of market-ready output at the end of the programme.
The programme is hosted at the CSIR and funded by the department of science and technology (DST).