/ 13 June 2023

The time is ripe for a large-scale South African genome project

Uk Well Placed In Global Race For Investment, Tech Minister Chloe Smith Says
Despite the proliferation of genomics research worldwide, less than 3% of participants in these studies are of African descent. (Jose Sarmento Matos/Bloomberg via Getty Images)

Despite the proliferation of genomics research worldwide, less than 3% of participants in these studies are of African descent. This means that the genomic data of Africans is greatly underrepresented in international genomics research projects. This can have dire consequences in terms of improving population health and achieving precision medicine. 

There has, however, been a positive development in this regard. The department of science and innovation (DSI) is considering launching a “110 000 genomes project”. As presently conceived, this will involve the random selection of 100 000 participants from the South African population and an additional 10 000 participants with rare diseases. 

This is not the first intended large-scale genomics project in South Africa. In 2011, the Southern African Human Genome Programme was launched, which aimed to understand the impact of genetic variation on the health of the population. But it managed to sequence only 24 genomes and did not include health data. 

The 110 000 genomes project can make waves in the genomics community and be at the forefront of genomics research across Africa. Its success could improve healthcare by increasing our understanding of the genomes of South Africans.

Although 110 000 participants might sound like a lot, it should be compared to other large-scale population-level genomics databases, such as UK Biobank (500 000 individuals), BioBank Japan (260 000 individuals), the China Kadoorie Biobank (512 000 individuals) and the Estonian Biobank (200 000 individuals). 

Initiating the 110 000 genomes project and providing a database of the genomic data of South Africans will be a great triumph. But how should such a project be structured? And how accessible should the data be? Furthermore, should the DSI decide to proceed with the 110 000 genomes project, we suggest that two considerations are crucial.

First is the data to be collected. To be effective, the database of the 110 000 genomes project should consist of both whole genome sequences and phenotypic data (demographic and clinical data). While whole genome sequences will be obtained by sequencing the genomes of participants, phenotypic data will be collected through questionnaires and medical records. 

Because clinical data can change, it is imperative that it be continually updated to ensure that the research conducted using this data is as accurate as possible. To achieve this, the database should receive regular data from the department of health’s electronic medical record system (to cover participants who use public healthcare) and private medical insurance schemes’ electronic medical records (to cover participants who use private healthcare). Critically, the clinical data that complements the genomic data must be kept up to date. 

Second, the data must be as accessible as possible. If we want to increase the presence of African genomes in global genomics research projects, databases should be more open. The 110 000 genomes project could promote unrestricted and free access to, and use of, genomic data — aligned with the DSI’s commitment to open science. 

Open science increases efficiency, transparency and collaboration among researchers, leading to more discoveries that benefit society. The Human Genome Project, which succeeded — through international and cross-disciplinary collaboration — in sequencing the human genome a generation ago shows how beneficial an open science approach can be. 

Importantly, individual participants should be free to exercise their autonomy in choosing whether to make their genomic data and health records public. However, given the privacy risks involved, how should this be managed? By ensuring that participants truly understand, and consent to, the privacy risks involved in the project. 

In this regard, the open consent model developed and used by Harvard University’s Personal Genome Project holds the most promise. To ensure that prospective participants understand the risks, and that their consent is truly informed, they are provided with an information booklet beforehand and must take (and pass with full marks) a test based on this information booklet. In this way, it is certain that participants will make an informed choice when they agree to make public their genomic data and health records.

If the DSI’s 110 000 genomes project is successful in recruiting on a large scale, it could greatly benefit the health of the South African population. Making the database open access would increase the accessibility of (South) African genomic data used in global genomics research projects. In other words, it will optimise the impact of the 110 000 genomes project. 

There could be privacy risks involved but, if done correctly, we believe openness and privacy can be reconciled. After all, people have a right to privacy, which they can choose to exercise or not, in particular circumstances. What is important, is that this choice must be properly informed. 

Amy Gooden is a doctoral research fellow at the University of KwaZulu-Natal.

Donrich Thaldar is a professor of law at the University of KwaZulu-Natal.