Boom time for bioinformatics
In 2000 the human genome was first sequenced – and since then the cost of sequencing has dropped over a millionfold. Couple that with the continuing downward spiral of the cost of computing, and the area is not only growing fast but also becoming increasingly affordable in developing countries. Part of this affordability arises from a large number of freely available data sets, making it possible to do research without having to start from the wet lab.
Why is bioinformatics important in Africa? Historically, diseases of the poor have not attracted the same level of research as diseases of the rich. In Africa, we are especially interested in the prevention and cure of tuberculosis, malaria and HIV. Cost-effective methods for understanding the mechanisms of these diseases aid in prevention and in identifying new drugs.
I work mainly in animal genomes, but there is also significant work around the country in botany.
Traditional wet-lab biology requires delicate equipment and expensive chemicals. Bioinformatics does not replace biochemistry and other traditional lab techniques but complements them and reduces cost.
In one study in which I played a small role, my work allowed biologists to cut the amount of lab work significantly. Though they still had to go to the wet lab for final results, they saved on both lab time and costs, because I was able to show that a good fraction of their potential experiments was pointless.
There are limited options in South Africa for studying bioinformatics at undergraduate level. The universities of Cape Town, the Witwatersrand, Rhodes and Pretoria, for example, have options but, like most multidisciplinary subjects, bioinformatics is best tackled once one has a strong background in at least one underlying discipline. Although bioinformatics is the application of computer science to biology, many problems have a mathematical or statistical aspect to their solution.
Because few students studying bioinformatics have an undergraduate background in the area, higher degrees are generally open to students with at least one related discipline.
The common path to bioinformatics is through a master’s, and a high percentage of graduates go on to do a PhD. As this is still a new area, industrial applications are limited, and most job opportunities are in academia. Because costs drop further, that will change. Once sequencing a genome reaches a level that is affordable in a routine lab test, there will be rapidly growing demand for software specialists who can transform research software into tools that can be used in a commercial lab.
At Rhodes, we have a coursework MSc programme in bioinformatics, and the limited number of places is oversubscribed by up to a factor of 10. Our students include some from other African countries such as Zimbabwe, Nigeria and Kenya.
The coursework MSc starts from catch-up courses to bring students from disparate backgrounds to the same level, continues with more advanced courses and ends with a thesis.
Other universities have taken different paths to bioinformatics qualifications: some require that students do extra courses for background before entry into a bioinformatics programme, and others offer courses as a supplement to a master’s by thesis.
How can one make a good choice about where to study? Each university with bioinformatics offerings has different strengths and weaknesses, and I encourage you to check each university that may interest you for research you can relate to. The particular strength of the Rhodes offering is a one-year MSc that includes courses and research.
Other universities have other strengths, such as integration with a medical school, or a stronger focus on plant biology. You may also want to choose between small-town life and a big city.
Research questions in bioinformatics
Here is a brief taste of some of the research questions in bioinformatics.
Protein structure is a big area of research. A protein molecule folds up into a complex shape. Depending on the shape, different kinds of reactions with other proteins or DNA are possible. Determining the shape and identifying how differences in protein molecules change function is an example of how effective computational techniques can save a lot of lab time. With malaria, for instance, knowing protein structure is important both for understanding the disease progression and finding cures.
Another area of research is transcriptional regulation. Transcription refers to the steps by which a sequence of DNA is read off to create an RNA (ribonucleic acid) molecule, which is a template for creating a protein molecule. Special proteins called transcription factors that bind to DNA regulate transcription.
Some transcription factors are directly involved with initiating transcription, whereas others have less direct effects such as making transcription more or less likely, or making it happen faster or slower. Transcription varies between tissue types, developmental stage and healthy or diseased tissue. Understanding transcription is at the core of understanding any disease with a genetic component, including cancer.
What are the big challenges? Although protein and DNA are molecules that superficially look very simple to process with computer software, things quickly become very complex when one digs deeper. Protein is a long string of amino acids, which one can think of as letters from a 20-letter alphabet. DNA is a long string of bases, which one can think of as letters from a four-letter alphabet. The complication arises from the way these molecules form structures.
A protein generally folds only one way, but DNA is wrapped into a complex called chromatin, which changes in different situations, so the information content of a strand of DNA changes as different parts of it become exposed and available to interact.
What are the big issues and controversies in the field? One of the largest is the extent to which DNA has function. The central dogma of biology is that DNA provides a template for producing RNA, and that RNA provides a template for a protein molecule. A very tiny fraction of the human genome, less than 1%, encodes proteins.
What does the rest do? Some argue that most of it is “junk” – spare material available as building blocks for evolution. Others argue that 80% or more of the genome has a purpose. What makes the issue all the more perplexing is that simpler organisms sometimes have more protein encoding DNA than more complex organisms. One explanation for this apparent paradox is that DNA in complex organisms encodes small RNAs that are used for various messaging and regulating purposes.
Genetically modified organisms
Another widely debated controversy is genetically modified organisms (GMOs). Much GMO research relates to plants. I am a sceptic of the benefits for Africa because GMO crops are a high-tech solution, and the risks are still unclear. Given that we do not know what a large fraction of the genome does in higher organisms, blasting in a chunk of DNA and expecting a very specific outcome seems dodgy to me.
Is bioinformatics for you? As with any cross-discipline field, it is not for everyone. Some biologists cannot relate to computation, and some computer scientists cannot relate to biology.
I entered the field after working as a computer science academic for about 30 years, without even high school biology. It took a while to develop confidence in talking to biologists, even though I was working with a very experienced researcher.
A good start is to talk to academics and students at a university with a bioinformatics offering. If you lack background in a core discipline, talk to someone who does have that background. See whether you can relate to the problems in that area.
As with any other cross-discipline field, there is a spectrum of abilities and job descriptions. A biologist with a working knowledge of computers can do a lot with existing tools – though it helps to know how to talk to a computer scientist if you need something new.
At the opposite end of the scale, a computer scientist with a little biology will find many grateful biologists who are very happy to accept help when they get stuck.
In the middle of the spectrum is a small group, made up of those who are really good at identifying and solving biology problems and are also good at computer science – and ideally the applicable mathematical and statistical methods.
Dr Philip Machanick is an associate professor in the department of computer science at Rhodes University