Education: Are International Tests Worth Anything?

Diane Ravitch is not a fan of international tests that compare the performance of students from different countries.  She believes that the observation that US students, on average, perform around the middle of the pack has led to the conclusion that this is a national tragedy requiring strong corrective measures in our schools.  Ravitch identifies the problem as being not with our school systems, but with our history of multigenerational poverty, and racial and ethnic discrimination.  She expresses her views in her book Reign of Error: The Hoax of the Privatization Movement and the Danger to America's Public Schools.

Ravitch provides an interesting perspective on the issue of performance testing.  She wishes us to conclude that striving to be at the top of the testing ladder is not a healthy strategy for a nation, and, in fact, is counterproductive.  She introduces us to a study performed by Keith Baker who was a long-time analyst in the Department of Education.

“He [Baker] reviewed the evidence and concluded that for the United States and about a dozen of the world’s most advanced nations ‘standings in the league tables of international tests are worthless.  There is no association between test scores and national success, and, contrary to one of the major beliefs driving U.S. education policy for nearly half a century, international test scores are nothing to be concerned about.  America’s schools are doing just fine on the world scene.”

Baker looked at the results of an early international student comparison performed in 1964.

“Baker looked at the per capita gross domestic product of the nations whose students competed in 1964.  He found that ‘the higher a nation’s test score 40 years ago, the worse its economic performance on this measure of national health—the opposite of what the Chicken Littles raising the alarm over the poor test scores of U.S. children claimed would happen.’  The rate of economic growth improved, he held, as test scores dropped.  There was no relation between a nation’s productivity and its test scores.”

How might this make sense?  The goal of education is not just to provide a student with knowledge, it is to teach the student how to acquire knowledge on his/her own and to help them learn how to use knowledge effectively.  Neither of these two things shows up on tests. 

“A certain level of educational achievement may be considered ‘a platform for launching national success, but once that platform is reached, it may be bad policy to pursue further gains in test scores because focusing on the scores diverts attention, effort, and resources away from other factors that are more important determinants of national success’.”

“The United States has been a successful nation, Baker argues, because its schools cultivate a certain ‘spirit’ which he defines as ‘ambition, inquisitiveness, independence, and perhaps most important, the absence of a fixation on testing and test scores’.”

Such a conclusion would certainly be remarkable.  Let us look closer now at what Baker actually provided in his study Are International Tests Worth Anything?

Baker’s paper was published in 2007.  The early study he referred to was the First International Mathematics Study (FIMS).

“FIMS was administered in 1964 to samples of 12-year-olds in 11 nations. Today’s world is largely a world created and operated by the now 55-year-old FIMS generation. If there is a connection between high test scores and national success, it will show up in looking at how well the 1964 FIMS scores predicted where nations are today. Among the 11 FIMS nations, the U.S. finished second to last (ahead of Sweden).”

The nations participating in this study were Australia, Belgium, England, Finland, France, Germany (FRG), Israel, Japan, Netherlands, Scotland, Sweden, and the United States.  England and Scotland are combined in order for Baker to make his point.  He wishes to evaluate how these nations have evolved between 1964 and 2002 in order to determine any correlation between test scores and national performance.  He evaluates the quantities wealth, rate of growth, productivity, quality of life, democracy, and creativity.  This is his conclusion with respect to wealth.

“First, and perhaps most important to a nation, is the creation of wealth. The best measure of generating wealth is per-capita GDP adjusted for cost of living differences, or purchasing power parity (PPPGDP). The wealth of nations scoring higher than the U.S. on FIMS averaged 73% of the per-capita income in the U.S. in 2002.   FIMS scores in 1964 correlate at r = -0.48 with 2002 PPP-GDP. In short, the higher a nation’s test score 40 years ago, the worse its economic performance on this measure of national wealth….”

What Baker seems to be saying is that since the US was wealthier than the countries whose students knew more about math than the US in 1964 and the US is still wealthier, then the poor test performance did not matter.  But wouldn’t the growth in wealth over the 1964-2002 interval be a more relevant comparison?  Many of the countries in the study were still in a rebuilding mode trying to recover from the effect of World War II in 1964.  Their wealth had been depleted, but their economic growth would have been strong.

It should be noted that GDP is more closely aligned with income than with wealth.  Wealth and its growth will depend on tax and saving rates and could vary dramatically from country to country for reasons that have nothing to do with education or economic health.  The accumulation of wealth in a nation might not even be considered a good thing, let alone be targeted as a measure of economic prowess.  Consider this chart provided by Thomas Piketty in his book Capital in the Twenty-First Century.

Using the measure of private capital (wealth) divided by national income (essentially GDP) Italy would have to be considered the healthiest economy today.  In any event, the results can change dramatically over time and the US is far from the dominant nation.  Perhaps per capita GDP growth over time is better indication economic efficiency.

Baker chooses to address GDP growth, but he limits it to the decade before 2002.  He apparently wishes to look at a time when the children of 1964 would be of an age where they might be expected to control their nations.  That implies that the children of 1964 were somehow unique and different from those who came before or after—an unlikely assumption

“One can argue that since the U.S. had a big post-WW II economic lead over the rest of the world, the rate of economic growth is at least as important as GDP as an indicator of national achievement.  The nations that scored better than the U.S. in 1964 had an average economic growth rate for the decade 1992-2002 of 2.5%; the growth rate for the U.S. during that decade was 3.3%. The average economic growth rate for the decade 1992-2002 correlates with FIMS at r = -0.24. Like the generation of wealth, the rate of economic growth for nations improved as test scores dropped.”

One hopes that Baker used per capita GDP growth because most European countries, along with Japan, have experienced stagnant or decreasing populations, a factor that would decrease their GDP figures relative to that of the relatively fast-growing US population.  Baker does not designate which data he used.  Let us then turn to Piketty and his data again.  He provides per capita GDP growth rates for North America and Western Europe that span the period of interest.  The numbers for North America would be dominated by US values because of its large population.

Using per capita GDP growth as a metric for the efficacy of a given school system would seem to indicate that the higher scoring European nations of 1964 had better scores and better economies than the US at the time.  Eventually, everyone appears to be headed for some common level of excellence.  Trying to use economic factors to determine the strength of a given approach to learning is a highly uncertain process.

Baker wishes to make the case that the US has been better at fostering creativity because it has produced the most patents per capita compared to other countries.

“A good school system should foster creativity.  The number of patents issued in 2004 is one indicator of how creative the generation of students tested in 1964 turned out to be. The average number of patents per million people for the nations with FIMS scores higher than the U.S. is 127. America clobbered the world on creativity, with 326 patents per million people.”

Unfortunately, interpreting patent numbers also requires a number of qualifications.  The race to produce patents can be more an indication of a nation’s business composition and business practices than a direct indicator of creativity.  In addition, most patents arise in technical fields where advanced degrees are required to attain competence.  University technical departments in the US are typically heavily endowed with students from other countries.  Many of the patents that Baker is so proud of are actually being produced by students educated by school systems that he would claim are inferior to ours because they perform well on international tests.

The gold standard in international testing is currently PISA (Program for International Student Assessment).  It is an OECD project that has invited many non-OECD countries to participate.  It tests 15-year-olds in math, reading, and science competency, and tries to deduce from the results which factors are effective in educating students.  The PISA people also conduct surveys to deduce non-educational characteristics of those participating so that factors like income level can be assessed in comparing results between students of the varying countries.  PISA also produces country assessments which explain what they believe to be relative lessons learned from the testing.  The latest test was performed in 2012 and the results were released in 2014.  The country rankings and the assessment of the US students can be found here.

The first PISA test was in 2000.  It has been given every three years since.  Baker had available early results with which to compare with his FIMS data.  He drew these conclusions:

“On these indicators of success, the nations that scored at the PISA average generally outperformed those scoring either above or below average. For example, percapita GDP was $22,495 for the 11 nations scoring above average, $34,414 for the five average nations, and $16,375 for the 11 below-average nations. The same pattern holds for quality of life, democracy, and creativity as measured by patents.” 

“International comparisons on many factors show that Norway is the best place in the world to live, and, like the U.S., Norway scored right at the PISA average. Mediocre test scores correlate with better, more successful countries than do top scores (or lower scores). Mediocrity in test scores is, for nations, a good thing! This finding is highly counterintuitive. Why should it be

Baker provides interesting and compelling reasons why average test performance by economically developed countries might a good thing.  There is more to life than studying for a given test.  Even the Asian countries that do so well on PISA would agree that having children spend all day year after year preparing for a national test that will determine their future is an unhealthy environment, even if it makes them proficient in PISA.

Baker’s explanation is presented again here.

“A certain level of educational achievement may be considered a platform for launching national success, but once that platform is reached, it may be bad policy to pursue further gains in test scores because focusing on the scores diverts attention, effort, and resources away from other factors that are more important determinants of national success.”

This is a wonderful hypothesis, but like so many other explanations for academic performance it is just a hypothesis.  His paper does not provide confirmation.

Since we are in the mode of evaluating hypotheses, here is another one for consideration.

It is not difficult to see how a country with a poor school system might still succeed economically.  Such a country will produce a large number of intelligent, well-educated, and creative people in spite of general academic conditions.  The important factor is providing sufficient numbers with the opportunity to use their skills in a productive manner.  Knowledge, creativity and opportunity must come together.  Countries that are efficient at providing opportunities to excel can prosper even if a large fraction of the population is poorly educated.

That is yet another way to view the US.

