Banning Statistical Significance testing in Education

Stephen Gorard is a Professor of Education at Durham University who evaluates studies for the EEF. He wants to ban significance testing in Education Research.

Significance testing was invented in the 1930s by the Mathematician R.A.Fisher, “a genius who almost single-handedly laid down the framework for modern-day Statistics”. Significance testing has been used for decades by Mathematicians and Scientists as an absolutely fundamental part of Statistical Analysis, most recently by the people who found the Higgs-Boson particle. Stephen Gorard has a degree in Psychology.

He spells out the reasons he thinks we should ban significance testing in five articles here. 

Reason 1 – It’s too hard

He calls it “needlessly complex” and “confusing” (page 2) for Social Scientists and wants to replace it with something easier. In Maths we’re not concerned if something is hard, we’re only concerned if it’s right.

Reason 2 – People misinterpret it

When he says people misinterpret it he quotes Falk and Greenbaum 1995. So when he says people misinterpret it, he means, 53 Psychology students at the Hebrew University of Jerusalem misinterpreted it in 1995.

If Social Scientists do misinterpret it, I suggest you teach them better in the Statistics bit of their course, rather than ban the bit they misinterpret.

Reason 3 – The logic behind significance testing is flawed

A significance test is an indirect proof, it’s an extension of the idea of proof by contradiction. You assume something (the null hypothesis) follow your thoughts logically and you end up with something improbable so you discount your initial assumption. But, he doesn’t understand the logic therefore there is no logic.

He has asked me on Twitter to explain how to calculate the probability of the null Hypothesis given the Data from an experiment. This is something that is impossible to do and doesn’t even make any sense to ask. You can calculate the probability of the Data given the null Hypothesis but you can’t calculate the probability of the null Hypothesis given the Data. Asking someone to do so is like asking them to ‘measure how tall is the colour blue?’. It’s possible he may have read some articles on Bayesian Inference which would talk about finding the probability of the *parameter* given the data and got confused, but, this wouldn’t be applicable in this case anyway.

He also shows that he doesn’t understand the reason Mathematicians do significance tests.

“Significance tests are really only another way of presenting the scale of a piece of research, saying little or nothing about about the magnitude or importance of the finding” (page 2)

No, no, they’re not Stephen. As with all Effect Sizers, he clearly does not understand why we do a significance test (it is to check if the data could actually just be explained by randomness). He then criticizes significance tests for not doing something they weren’t designed to do. Mathematicians *do* have tools to estimate the size of an effect, they’re called Maximum Likelihood Estimators and we use them all the time in conjunction with significance tests. Stephen Gorard is quite happy to throw significance testing away despite not understanding the vital role it plays. He doesn’t understand it therefore it has no use.

Well you might say, what does it matter if some old Professor of Education has these crank ideas?

Well, here he is telling young people just entering the field of education research that significance tests, confidence intervals, p values and chi-squared tests don’t work. Not that *he* thinks they don’t work but no Mathematician agrees with him, just that they don’t work. These young people will often be coming from subjects that don’t have a great deal of statistical content, and will not be able to distinguish what this apparently eminent Professor is telling them is false. Immeasurable damage to the field of education research for decades to come.

These are not even original ideas, this dismissal of Null Hypothesis Significance Testing by Psychologists has occurred several times over the years, Carver 1978, and most influentially Cohen 1994 to name a couple. The last was rebutted at the time by Richard Hagan. They have never managed to persuade any Mathematicians of their point of view.