Visible Learning 1 – Cohen’s d statistic gives strange results because it’s not used in the way he intended

I am a Mathematics teacher who did a degree in Mathematics at the University of Sheffield, half of which was in Statistics. By a rough estimate I’ve sat through a thousand hours of lectures on Statistics and done thousands of hours by myself. I’ve also taught A level Statistics for sixteen years.

Everything I say is easily checked, you only need to read Chapter 2 of Visible Learning and really you could get away with reading just pages 8 and 9. Cohen’s book is available to read for free on Google Books.

————————————————————————————————-

In his book Visible Learning, John Hattie is using Cohen’s d statistic (or some slight variation) to calculate his “Effect size”. Hattie cites from Cohen (1988) a lot on pages 8 and 9 of Visible Learning when he introduces the “Effect size” measure he uses throughout the book.

Jacob Cohen was a Psychologist who wrote about Statistics for the Social Sciences. Cohen introduced the d statistic in the 1988 book Statistical Power Analysis for the Behavioural Sciences.

Cohen’s d statistic is Effect size = change in means divided by standard deviation

Whenever I use the phrase “Effect size” I am using it very specifically to mean Cohen’s d statistic NOT the size of the effect as they are very different things as we shall see.

————————————————————————————————-

To find the mean of a group of numbers we add up all the numbers and divide by how many numbers there are. It’s an average of the numbers.

The standard deviation measures how spread out a group of numbers are.

If we take two groups of five people and give them a test.

The first group gets

10%   30%   50%   70%   90%

The second group gets

40%   45%   50%   55%   60%

Now both groups have got the same mean (50%) but the first group is more spread out than the second group so it has a higher standard deviation.

————————————————————————————————–

Lets have a look at calculating the “Effect size” on a single class.

They take a test. You teach them a topic and then they take another test.

You find that the mean test score before you teach them is 50% and the mean test score after you’ve taught them is 55%. So the thing we’re interested in, how much have the mean test scores increased because of my teaching = 5%. This is the size of the effect (teaching them).

Now we go to calculate the “Effect size” and we find that their test scores are really close together and they have a standard deviation of 0.5%.

Effect size = change in mean scores / standard deviation = 5 / 0.5 = 10 a massive “Effect size”.

So we have a class for which teaching them has only led to a 5% increase in their mean test scores but has led to an “Effect size” of 10, when an “Effect size” of 0.8 is meant to be large according to Hattie on page 9 of Visual Learning and originally Cohen (1988).

————————————————————————————————–

Now let’s use the d statistic to compare two classes. They all take a test. Then you teach them a topic and then they take another test.

You find that the mean test score before you teach them is 50%. The mean test score after you teach them is 60%.

If we work out the mean score after teaching take away the mean score before teaching = 60% – 50% = 10% and this is what most of us would be interested in, by how much have the test scores increased because of my teaching. The size of the effect.

Now lets take a look at each of the two classes “Effect size”.

The test scores of the first class are really spread out and they have a standard deviation of 20% so its “Effect size” is change in mean divided by standard deviation = 10 / 20 = 0.5

The test scores of the second class are quite close together and they have a standard deviation of 5% so its “Effect size” is 10 / 5 = 2

So both classes had a size of effect of 10% but one class has an “Effect size” of 0.5 and another of 2 purely because of the difference in standard deviations. One class has a standard deviation four times bigger than the other so it’s “Effect size” is four times smaller than the other. In fact we shall see that if the number of children in each class is different then we can’t compare them using the “Effect size” either.

————————————————————————————————-

So we have a statistic invented in 1988 by a Psychology Professor that gives different results if there are different standard deviations. The reason the d statistic is giving strange results is because Cohen intended it to be used in a very different way as we shall see in Post 2.