How could all those people be wrong?

“How could thousands of Psychologists and Educationalists all make the same mistake? Entire fields doing incorrect Statistics. It’s simply not plausible.”

On Thursday night I read a piece called ‘The Art of being Right’ by Arthur Schopenhauer. Underneath I reproduce a few paragraphs from a section entitled ‘Appeal to Authority rather than Reason’.

“When we come to look into the matter, so-called universal opinion is the opinion of two or three people; and we should be persuaded of this if we could see the way in which it really arises.

We should find that it is two or three persons who, in the first instance, accepted it, or advanced it and maintained it; and of whom people were so good as to believe they had thoroughly tested it. Then a few other persons, persuaded beforehand that the first were men of the requisite capacity, also accepted the opinion. These, again, were trusted by many others, whose laziness suggested to them that it was better to believe at once, than to go through the troublesome task of testing the matter for themselves. Thus the number of these lazy and credulous adherents grew from day to day; for the opinion had no sooner obtained a fair measure of support than its further supporters attributed this to the fact that the opinion could only have obtained it by the cogency of its arguments. The remainder were then compelled to grant what was universally granted, so as not to pass for unruly persons who resisted opinions which everyone accepted. 

Since this is what happens, where is the value of the opinion even of a hundred millions? It is no more established than a historical fact reported by a hundred chroniclers who can be proved to have plagiarised it from one another; the opinion in the end being traceable to a single individual.”

Gene Glass should be the most famous man in Education. He is the person who changed the way the ‘Effect Size’ is used and spread its new use throughout Education. He became an Educational Psychologist in 1964. In the early Seventies he was receiving Psychotherapy and decided it had helped him so much that he wanted to prove to everyone that Psychotherapy worked. He’d learned about the ‘Effect Size’ from Jacob Cohen’s book ‘Statistical Power Analysis for the Behavioral Sciences’. (Jacob Cohen originally invented the ‘Effect Size’ and wrote a 500 page book explaining how to correctly use it to find the number of people you needed for your experiment.) Glass decided to completely change the way Jacob Cohen used the ‘Effect Size’, throw away the carefully constructed statistical look-up tables and use it for a completely different reason, sticking results together. While he was doing this, Glass was also elected as the President of the American Educational Research Association. He used his Presidential address to 1,500 educational researchers to announce his new method of putting results together using the new way of using the ‘Effect Size’. How many of those researchers would have thought that there was any element of doubt in what this eminent man was telling them at this prestigious occasion? How many of them would have had the necessary expertise to tell if it was correct or not? Glass wrote a 2 page pamphlet justifying his new way (this has a few sketches on it as proof) and published an article with his wife, Mary Lee Smith, in ‘American Psychologist’. Psychologists and Educationalists all started to copy him and the new method spread throughout Psychology and Education.

So, imagine all the children of the world, underneath them, supporting them are the teachers from all the different countries, underneath them is the whole of education research and all of this, resting on his shoulders, is just one man, Gene Glass. Given that Mathematicians have never taken the remotest bit of interest in the ‘Effect Size’, are we absolutely sure he’s correct?


Now the writer of the EEF report on Philosophy says that the way Mathematicians and Scientists do Statistics is all wrong

The writer of the EEF report on Philosophy, Professor Stephen Gorad, has now openly admitted that he thinks that the way that Mathematicians and Scientists do Statistics is wrong and should be banned.

The significance test is how Mathematicians and Scientists do Statistics.

Psychologists invented the Effect Size as the “New Statistics” to replace it. It is unknown to Mathematicians.


etwhCapturegwgw eg2g 242h2 2g2g2 g24g2

When the Physicists at the Large Hadron Collider were looking for the Higgs Boson particle, to be sure they had really found it they used a significance test, called the five sigma test.


So, on one side of the argument we have the people who found the Higgs Boson, the other, Stephen Gorard. The decision is yours.

The Chinese Way

The Government announced today that they are spending £11 million to have 32 hub schools, bringing Chinese teachers to the UK and sending our teachers to China so we can learn how they do so well compared to us in Mathematics.

Yay, there’s a magic bullet that means the children can achieve without hard work or behaving themselves? I know we’ve been fooled many, many times before but this time it’s really true.

But wait, what’s this?  If we look at the report released by Parliament – ‘Underachievement of White Working Class children



We see that the poorest Chinese children beat the richest non-Chinese children in this country.

It’s almost as if there is something else going on and the Chinese don’t have a magic way of teaching Maths at all.

The answer is of course that the Chinese have a culture of hard work and respect for education as typified in the ‘Battle hymn of the Tiger Mother‘ book by Amy Chua. They bring this with them when they move countries enabling them to come top in our country as well.

Maybe we need to be worrying more about copying the attitudes and culture of the Chinese parents and children towards Education and a little less trying to copy a mythical, magic, Chinese way of teaching.

Significance testing

When I was investigating the ‘Effect Size’ I found lots of criticism of significance testing on Social Science websites. Remember, this is once again, Social Scientists, often but not always Psychologists, criticising the way Mathematicians and Scientists do Statistics.

This is actually a fundamental part of the ‘Effect Size’ story as their failure to understand the significance testing procedure has led directly to the ‘Effect Size’ as they try to solve a ‘problem’ that isn’t really a problem, only a misunderstanding on their part.

It is also vital to recognise that the ‘Effect Size’ isn’t just another statistical method to choose from amongst many, it is the tip of the ice-berg of a completely different ethos. The people who advocate using the ‘Effect Size’ think that the whole way Mathematicians and Scientists do Statistics is wrong so they’ve decided to  invent their own version. This has been mistakenly copied by people in Education like John Hattie.

In my next post I’ll be looking at the Maths of significance testing, but, what if you don’t know anything about Alpha levels or Type 1 and 2 errors, how could you judge? Well, a good place to start would be the mathematical credentials of the people making the criticism. So let’s have a look at the people who are criticising significance testing.

If we type in ‘Criticism of Significance testing’ into Google, the first ten results are – – Number one on the list, our old friend Robert Coe, Professor of Education at Durham University – A general article on Statistics by Wikipedia – CEM, Professor Coe’s organisation publishing an article by Ronald P.Carver, Professor of Education and Psychology at the University of Missouri – Deborah Mayo, Professor of Philosophy at the University of Pennsylvania – John D Cook, Consultant in Applied Mathematics and Computing – R.Chris Fraley, Professor of Psychology at the University of Chicago – John Myles White, PhD student in Psychology – Andrews University Education department. Authors, Jeffery Gliner, retired Professor of Psychology, Associate Professor Nancy Leech, PhD in Philosophy and MA in Counselling, George Morgan, retired Professor of Education – No information – Authors – Dr Fiona Fidler, Environmental Science, background in Psychology and Philosophy, Mark Burgman, Environmental Science, background Zoology, Geoff Cummnigs, retired professor of Psychology, Robert Buttrose, background in Philosophy, Neil Thomason, historical and philisophical studies

And so it goes on, page after page of Psychologists, Philosophers and Education Professors critisicing the way Mathematicians and Scientists do Statistics.

So, you can judge for yourself the quality of the people criticising the way Mathematicans do Maths. Though this time we do seem to have a lot of Philosophers as well as Psychologists.

Now, this is important because, their mistakes in significance testing have led to the ‘Effect Size’, which has led to Education research being done incorrectly, which has an impact on real children in real classrooms.

In my next post, I will deal with the more Mathsy side of things. I will show that their criticisms of significance testing are baseless and just show their poor understanding of Statistics.

Can we stop yet?

prof coe

If you were not a Mathematician you might think that all Mathematicians are pretty much the same, however, there are three main strands to the Maths that gets taught at University. Pure, Mechanics and Statistics. A bit like Science splits into Biology, Chemistry and Physics. Pure is Algebra, proofs, very abstract things like that, whereas, Statistics is all about analysing data from the real world. Someone who was very accomplished at Pure Maths would, nevertheless, be a total beginner at Statistics as the skills and knowledge aren’t really transferable.

What we have here is a classic case of someone who is an expert in their own field, switching to a different field, forgetting they are no longer an expert, yet, still being supremely confident in their own judgement and opinion. A good analogy would be someone who does a Physics degree up to Quantum Mechanics level who then moves over to Biology. They need to go right back to the beginning and start to quietly learn the different parts of a cell. Imagine if they started to loudly disagree with accepted Biological opinion after a week of lessons. Yet, Professor Coe does disagree with accepted Statistical opinion. No Mathematician uses the Effect Size.

Professor Coe did a Pure Maths degree which had no statistics in it.

John Hattie did an Arts degree which had no statistics in it.

There’s a very simple reason they advocate the use of statistics you won’t find in any Maths textbook, their degrees contained no statistics.

Nobody in Maths uses the Effect Size.

Can we stop yet?

Who works at the Education Endowment Foundation?

The Education Endowment Foundation was set up in 2011, with a £125 million grant from the Department of Education. So, who are the Executive Team spending this money?

Chief Executive

Dr Kevan Collins, Chief Executive, PhD in Literacy Development


Eleanor Stringer, Grants Manager, BA Philosophy, Politics and Economics

Emily Yeomans, Grants Manager, BSc Biology, PGCE Science

Matthew van Poortvilet, Grants Manager, no information


Camilla Nevill, Evaluations Manager, BA Experimental Psychology

Elena Rosa Brown, Evaluations Officer, MA Psychology and Education

Sarah Tang, Evaluations Officer, MSc Economics of Education

Dissemination and Impact

Robbie Coleman, Research and Communications Manager, no information

James Richardson, Senior Analyst, BA Politics, PGCE Geography, MA Education, Culture and Society

Dr Jonathan Sharples, Senior Researcher, M Biochem, Biochemistry, PhD Biochemistry

Peter Henderson, Research Officer : BA History, MSc Public Policy

Sharmini Selvarajah : Deputy Head of News, BA Social and Political Science, MA Public Policy

Development and Communications

Stephen Tall, BA Modern History, MA Modern History

Rebecca Clegg. BSc Marketing

So, my question is this. If none of the people who run the Education Endowment Foundation have any qualifications in Mathematics or Statistics, exactly whose expertise are they following? Would they know if they’d made an error? You might say, well they’ve obviously asked the experts, but the experts are Mathematicians and they haven’t asked them, because, no Mathematician uses the ‘Effect Size’, so whose advice are they following? The learning of millions of children may depend on it.

Who are the authors of the EEF toolkit?

The Sutton Trust-EEF Teaching and Learning Toolkit is a summary of all the research that the EEF has done so far. So, who are the authors of this Toolkit that advises the 500,00 teachers in this country the most effective way to teach their pupils?


Professor Steve Higgins, Professor of Education at Durham University, BA Literae Humaniores (Classics – Ancient Rome, Ancient Greece, Latin, Ancient Greek and Philosophy)

Dr Maria Katsipatki, Research Associate at Durham University School of Education, BA Psychology

Dr Dimitra Kokotsaki, Lecturer at Durham University School of Education, no information about her degree but she lectures in Music education

Professor Rob Coe, Professor of Education at Durham University, BSc Mathematics, PhD Education

Dr Lee Elliott Major, Director Sutton Trust, BSc Physics, PhD Physics

Robbie Coleman, Research and Communications Manager EEF, MSc Comparative Social Policy

When I originally envisioned the EEF, I imagined dozens of Statisticians, bent over spreadsheets, all furiously discussing their statistical methodology. Yet, it appears than in the whole of the EEF only one person has any Mathematical training at all. And, he’s using Statistics that no Mathematician has ever heard of. Are we willing to bet the lives and aspirations of millions of children on one man’s say so? I’m not sure we should.

Banning Statistical Significance testing in Education

Stephen Gorard is a Professor of Education at Durham University who evaluates studies for the EEF. He wants to ban significance testing in Education Research.

Significance testing was invented in the 1930s by the Mathematician R.A.Fisher, “a genius who almost single-handedly laid down the framework for modern-day Statistics”. Significance testing has been used for decades by Mathematicians and Scientists as an absolutely fundamental part of Statistical Analysis, most recently by the people who found the Higgs-Boson particle. Stephen Gorard has a degree in Psychology.

He spells out the reasons he thinks we should ban significance testing in five articles here. 

Reason 1 – It’s too hard

He calls it “needlessly complex” and “confusing” (page 2) for Social Scientists and wants to replace it with something easier. In Maths we’re not concerned if something is hard, we’re only concerned if it’s right.

Reason 2 – People misinterpret it

When he says people misinterpret it he quotes Falk and Greenbaum 1995. So when he says people misinterpret it, he means, 53 Psychology students at the Hebrew University of Jerusalem misinterpreted it in 1995.

If Social Scientists do misinterpret it, I suggest you teach them better in the Statistics bit of their course, rather than ban the bit they misinterpret.

Reason 3 – The logic behind significance testing is flawed

A significance test is an indirect proof, it’s an extension of the idea of proof by contradiction. You assume something (the null hypothesis) follow your thoughts logically and you end up with something improbable so you discount your initial assumption. But, he doesn’t understand the logic therefore there is no logic.

He has asked me on Twitter to explain how to calculate the probability of the null Hypothesis given the Data from an experiment. This is something that is impossible to do and doesn’t even make any sense to ask. You can calculate the probability of the Data given the null Hypothesis but you can’t calculate the probability of the null Hypothesis given the Data. Asking someone to do so is like asking them to ‘measure how tall is the colour blue?’. It’s possible he may have read some articles on Bayesian Inference which would talk about finding the probability of the *parameter* given the data and got confused, but, this wouldn’t be applicable in this case anyway.

He also shows that he doesn’t understand the reason Mathematicians do significance tests.

“Significance tests are really only another way of presenting the scale of a piece of research, saying little or nothing about about the magnitude or importance of the finding” (page 2)

No, no, they’re not Stephen. As with all Effect Sizers, he clearly does not understand why we do a significance test (it is to check if the data could actually just be explained by randomness). He then criticizes significance tests for not doing something they weren’t designed to do. Mathematicians *do* have tools to estimate the size of an effect, they’re called Maximum Likelihood Estimators and we use them all the time in conjunction with significance tests. Stephen Gorard is quite happy to throw significance testing away despite not understanding the vital role it plays. He doesn’t understand it therefore it has no use.

Well you might say, what does it matter if some old Professor of Education has these crank ideas?

Well, here he is telling young people just entering the field of education research that significance tests, confidence intervals, p values and chi-squared tests don’t work. Not that *he* thinks they don’t work but no Mathematician agrees with him, just that they don’t work. These young people will often be coming from subjects that don’t have a great deal of statistical content, and will not be able to distinguish what this apparently eminent Professor is telling them is false. Immeasurable damage to the field of education research for decades to come.

These are not even original ideas, this dismissal of Null Hypothesis Significance Testing by Psychologists has occurred several times over the years, Carver 1978, and most influentially Cohen 1994 to name a couple. The last was rebutted at the time by Richard Hagan. They have never managed to persuade any Mathematicians of their point of view.

North London middle class call for more youth unemployment

That’s not how it was phrased of course, they were praising Lidl for introducing the living wage and that isn’t their intention, but, as night turns to day, that will be the consequence.

There is a discussion about whether or not a minimum wage causes unemployment, and the answer is it depends at the level at which the minimum wage is set.

Employers pay people depending on the amount of profit they can make from them. Anyone worth less to their employer than the minimum wage won’t be employed at all.

Ann employs Billy in her supermarket. Billy is young and inexperienced so only makes £10 of profit per hour for the shop, so that’s what Ann pays him. If she tried to pay him £9 an hour then Chris who owns the supermarket next door would offer him more money to come and work for her, so, his wages will always increase to £10 per hour.

If the minimum wage is set at £8 per hour, i.e below the wage he was getting anyway, it doesn’t affect anyone. Billy doesn’t get paid any more but he doesn’t lose any hours.

Now, if the minimum wage is set at £11 per hour. The middle class, North Londoners are all slapping themselves on the back at that week’s dinner party. They intend for poor people to be paid more. But what actually happens? Suddenly, Ann is losing £1 an hour for every hour she employs Billy. Billy finds that his hours start getting cut and Ann starts thinking about getting a self-serve machine in instead and getting rid of Billy. Worse, she was thinking about taking on more people. Dave would have been taken on at £10 an hour but will now remain unemployed. And he doesn’t even know he lost out.

The winners of minimum wage increases are visible, concentrated and vocal. The losers are invisible, spread out and may not even know how they were cheated.

The biggest losers are the young and inexperienced. To them, getting experience which will get them higher wages in the future is even more valuable than actual money. If they never get on the lowest rung, then how can they ever climb the ladder?

Nobody likes seeing people working hard for low wages, but, the alternative is not higher wages, it’s no wages. The main benefit to these policies seems to be so that some people can have a warm feeling in their bellies and damn the consequences.