Comment and Analysis

Assessment results don't make sense

Nic Spaull

Annual literacy and numeracy tests for grades one, six and nine aren't comparable over time or across grades, so talk of improvement is misleading.

Annual National Assessment results not comparable over time or across grades makes all talk about improvement or trends misleading. (Madelene Cronje, M&G)

I have a love-hate relationship with the yearly publication of the national assessment results.

On the one hand I am very proud of the annual national assessments and glad that we have them. Testing primary school children using standardised assessments is imperative to target support where it is needed and also to hold the basic education department and schools accountable. We definitely shouldn't scrap them, since without them we would be stabbing in the dark.

On the other hand, I get depressed when the results are released because, given the way they are currently implemented, we actually are stabbing in the dark. For the national assessments to fulfil the function for which they were created, the results need to be comparable across grades, over time and between geographical locations. Unfortunately, given the sorry state of affairs that is the 2013 national assessment, none of these criteria are currently met.

The highlights version of last week's release goes something like this: "Performance in grades one to three is adequate. Results for most grades show a steady increase. Grade nine performance is an unmitigated disaster."

The unfortunate part is that the only statement I actually believe is the last one.

The reason is that Basic Education Minister Angie Motshekga, presumably shocked to the core by the grade nine mathematics average of 14%, appointed a ministerial task team to assess whether the 2013 grade nine test was fair, valid and reliable. They concluded it was.

This result is in stark contrast with the much higher mathematics averages in grades one (60%), two (59%) and three (53%). If the minister had asked the task team to look into those tests as well, my suspicion is that it would have reached the opposite conclusion.

The reason I think this is that all the existing research in South Africa points to the fact that children are not acquiring the foundational numeracy and literacy skills in primary school and that this is the cause of underperformance in higher grades. The 2011 pre-Progress in International Reading Literacy Study, for example, found that 29% of South African grade four students could not "locate and retrieve an explicitly stated detail" — that is, they were completely illiterate.

Yet in the assessment results, the home language average was 49% and the first additional language was 39% — higher than one would expect given previous grade-appropriate tests.

Likewise, the average score on the grade three national assessment numeracy test was 53%, up from 41% in 2012. Putting aside for a moment the fact that such gargantuan improvements have never been seen around the world — in any country, ever — the level is also way out.

The National School Effectiveness Study, published in 2011, found that grade three students scored an average of 29% on a grade three-level test. If South Africa improved at the fastest rate ever seen globally (which is 0.08 standard deviations a year), the score in 2013 would be about 38% in grade three — not the 53% Motshekga reported. These results don't make sense.

Another peculiarity is the huge drop in mathematics performance between the relatively high grade six average of 39% and the depressingly low grade nine average of 14% only three grades later. From an educational assessment perspective, this is surely because the grade six test was much easier than the grade nine one, which, the task team established, was set at the appropriate level.

There are other possible reasons why the results in grades one, two and three are so high. One is that for grades one and two teachers were allowed to invigilate their own students during the test — which could obviously be problematic.

Another is that the grade three marks in unmonitored schools were significantly higher than those in the verification sample. In order to assess the fidelity of the administration, scoring and collating processes of the annual national assessments, 2 164 of the 24 355 schools were monitored by an external body. The exams from these schools were then marked and captured by an independent body.

The national unverified grade three literacy average was 51%, which was considerably higher than that found by the verification body, namely 42%.

But the report completely ignores these differences and rather focuses on the higher unverified scores, claiming that the verified scores are "not significantly different from the mean scores of pupils from the whole population" — something that is clearly untrue, as is evident from the report itself.

In some instances, the discrepancies are considerable. In the grade three literacy test, for example, the unverified provincial average for the Eastern Cape was 47% and for the Western Cape it was 50%.

This is in stark contrast with the true average found in the verification sample, which was 35% in the Eastern Cape and 49% in the Western Cape. And yet for some bizarre reason, the department decided to stick with the unverified marks for all grades and both subjects. There were also considerable discrepancies in the grade six and grade nine home language results.

Lastly, one simply cannot compare national assessment results over time. To do this, the tests would have to be calibrated and linked using psychometric analysis — something the department did not do. Last week's national assessment report is somewhat bipolar on this point.

The report cautions: "The comparability of the tests from one year to the other cannot be guaranteed, which implies that comparability of the results from one year to the other may not be accurate."

But these cautions did not seem to deter the minister, who said when releasing the results: "I am confident that performance in the education system is on an upward trend and all our interventions and programmes are beginning to produce the desired outcomes" — a confidence I, unfortunately, do not share.

Elsewhere the report says: "There is currently a strong emphasis on ensuring that tests from different years are comparable to each other, so that trends over years can be reliably monitored. In this regard a process is already under way."

Is this meant to be reassuring? The fact that the process of ensuring psychometric comparability over time is "under way" is a half-baked admission that it was not ready or used for 2013, making any annual comparisons impossible.

Unfortunately, there is no technical report to confirm or deny this. Even without a technical report, the erratic and colossal changes occurring year on year are simply impossible.

In 2012, only 24% of grade six students had acceptable achievement (more than 50%) in the first additional language test — but this shot up to 41% in 2013 (a 71% increase) only one year later. By contrast, for grade nine students — only three grades later — the proportion achieving at an acceptable level came down from 21% in 2012 to 17% in 2013. Anyone familiar with educational assessments would balk at such large and inconsistent changes.

Many questions still remain unanswered. Who was the technical committee that advised the department on the national assessments of 2013? Why were their names not included in the report? What "process" of psychometric comparability is "under way" and where is the department in that process? Who was the "service provider" that verified the assessment results and why was it not listed, as the Human Sciences Research Council was in 2011?

One could also speak about the dangers of giving erroneous feedback to teachers or allocating resources based on faulty data — both of which are the spectres we will have to live with in the future. In testing seven million children the department has bitten off more than it can chew and, in the process, undermined its own technical credibility.

If we could trade ambition for competence, we may have a test that was actually telling us something clear instead of the muddled mess that is the national assessment for 2013. This testing must and should go on, but for heaven's sake do it properly.

Nic Spaull is a researcher in the economics department at Stellenbosch University. His education-focused research can be found at nicspaull.com/research. Follow him on Twitter @NicSpaull

Topics In This Section

Comments

blog comments powered by Disqus