/ 26 July 2022

How can governments become data-driven organisations?

Becoming a “data-driven” organisation has been creeping into the governance lexicon recently. On 21 April, Randall Williams, the mayor of Tshwane, announced “the city would move with the times and embrace the full opportunities provided by new technology and data solutions to build a data-driven service delivery”.

But what does it mean to be a data-driven organisation?

In the governance sphere, data is literally everywhere. Information is collected and collated every moment of the day in every government entity by thousands of officials. Outside of governance, millions of smart devices are gathering data about everything you can think of. In fact, data is so abundant, so ubiquitous, that it could take years for a statistician to make sense of all of it, let alone begin to pick up patterns.

Does this mean the governance space is data-driven? Unfortunately, not. Having lots of information and reporting about it to portfolio committees, audit committees, the auditor general’s auditors, fiscal commissions etc constitutes only the starting steps to becoming data-driven in the emerging world of data science.

Continuing with the City of Tshwane as our example, our first problem is the data itself. It’s dirty and noisy. Data scientists spend about 80% of their time cleaning and massaging the data available to prepare it for machine learning. Only the remaining 20% is spent developing models to train on the data. 

For an entity like the City of Tshwane to embrace machine learning, all their data, from all the various departments, in all the various formats would have to collected, collated, scaled and all the null values removed, to name but a few of the procedures required –  there are many more. The reason all these steps are necessary is to be able to reasonably guarantee that the data is reliable and trustworthy. This is important because you need public trust and transparency to run a stable government.

Our second problem is also rooted in the data but requires the domain knowledge of the city official. We need to determine if we are asking the right questions, collecting the right data, and collecting it on time. 

One of my biggest concerns about the sector oversight model is that it is backward looking. Let us say that we have determined the best variables to measure irregular expenditure and all the officials are trained to collect the data in the correct format. By the time this data reaches a decision-maker, such as a city council, it is months old and the money is long gone.

Which brings us to the most pressing problem, which will prove to be the starting place for mayor Williams. Within the underbelly of the officialdom of the City of Tshwane, we will have to inculcate and foster a culture of being data-driven. This means we go beyond asking, “What happened?” 

For a government to harness the power of machine learning we need to ask forward-looking questions. Why did the irregular expenditure occur, what was the cause of the failure, what catalytic mechanisms can prevent the failure and what do we do on Monday morning to permanently disrupt the patterns we see in the data? Machine learning will show us the patterns but human decision-makers will have to change their behaviour and their work culture to effect changes in the audit outcomes and, ultimately, the tangible impact of their behaviour in communities.

This workflow, from collecting the data to measuring the impact on the community, is known as the analytics value chain.

Williams and many others in governance roles who also recognise the need to adopt a data-driven approach need to be commended and assisted. Done correctly, such an exercise is not only worth it but probably indispensable. Here are some of the positive impacts from pivoting towards data-driven service delivery models:

The algorithmic gathering of vast amounts of data is already a ubiquitous feature of smart urbanism. City officials could, for instance, use the data being collected from the smart devices carried by cyclists to help them design city spaces that are optimised for the use of cycling for transport. This is especially relevant now that fuel prices have reached record highs and people are evaluating the future of car ownership.

Many standard tasks in governance are boring or repetitive. Think of licensing processes; permit applications and evaluation; monitoring of public spaces and many more. It might be more cost effective and kind to use machine learning to move these processes online. The research is already replete with examples and case studies of smart-city initiatives that accomplish automation but much more can be done to base daily decisions on data visualisation from machine learning algorithms that are deployed on real data.

A third promise of being data-driven is that it could enhance public participation by bringing “the governed closer to the governors”. Studies have also echoed the huge potential that exists when governments use compelling and interesting digital hooks to gather information from citizens about their needs. Relevant free resources can be made available to residents, leading them to complete surveys and questionnaires and even play games that generate data for city planners.

But, as always when innovations seem too good to be true, caution is indicated. I am reminded of the Harry Potter books. In particular, the part where the British prime minister asks Rufus Scrimgeour (the Minister for Magic) why his ministry does not simply use magic to sort out all the societal ills of the day. He replies: “The trouble is, the other side can do magic too.”

It is important to be aware that “dataism” can have a dark side as well. If a government so intends, it can deploy data science to spy on citizens, take away their privacy and hold them to ransom.

I conclude with a key take away, for me at least – being a data-driven organisation means taking policy and operational decisions based on big data as opposed to intuition or ideology. To advance our future in South Africa and future-proof our country we, as law-makers and governments alike, must always seek to use data science to advance maximum personal freedom and maximum economic freedom. Measuring our impact against this core value should be the gold standard.

The views expressed are those of the author and do not necessarily reflect the official policy or position of the Mail & Guardian.