Dodiscimus has pointed out that he has also had a closer look at some of the studies behind Hattie’s Effect Sizes.
Shortly after I started this blog, after I’d done a few posts, I wrote to John Hattie at the University of Melbourne pointing out some of my concerns. One of the things I pointed out was that he claimed the ‘Effect Size’ had units of standard deviation when it can be shown mathematically that it actually has no units (and it’s fine for it to have no units as long as you realise that).
In fairness to him, he wrote back quite a long letter taking each of my points in turn. When it came to my ‘the Effect size has no units’ point he said –
“It is not correct to claim that the Effect Size has no units, it does, from -infinity to +infinity but more normally between -3 and 3”
Now, up to this point, I couldn’t quite believe all that I’d found out about the Effect Size. I would say to myself ‘the Effect Size is wrong and you’re the only one who noticed. Yeah, right!’ I was constantly searching my mind thinking ‘You’ve missed something, what have you missed?’
When I read this statement from him my mouth just dropped open.
Not only does John Hattie not know what units the ‘Effect Size’ is measured in, he doesn’t even understand what units are. What he’s quoted are not the units but the typical magnitude of the ‘Effect Size’ as found in Education research. This is an error which throws doubt on John Hattie’s basic mathematical competence.
To give you an example of how big a gaffe this is, imagine you asked a Physics Teacher what the units of speed for a car are. ‘The units of speed of a car are between 0 and 70’ they answer. No, the units of speed are miles per hour (or kilometres per hour or metres per second). That is a significant mistake and you wouldn’t have a great deal of faith in the ability of the person who said it afterwards.
Is his letter, John Hattie also admonished me, saying that he had read my blog and felt I made too many remarks about him for him to leave a comment. He said –
“. . . in Academia the criticism is of ideas not people”
Which is fine except most people have no way to gauge whether or not a Mathematical argument is correct or not so they might need to rely on other questions to guide them, questions like –
– Do other people in relevant fields use this?
– What is the competence of the person using this?
Now these questions won’t give us the definite answer to the use of the ‘Effect Size’ but what they may do is indicate an area of concern that may be worthy of further investigation.
The answer to the first question is, Mathematicians and Scientists have never heard of the Effect Size, in fact only Psychologists and Education Researchers use it.
If you’re going to use Maths that Mathematicians don’t you’re either a genius, or you don’t know what you’re doing.
John Hattie is an Arts Graduate, who doesn’t understand what units are, nor the importance of getting them correct. I’ll leave you to ponder for yourself which he is.
I noticed a few days ago that people were expressing surprise on Twitter that the EEF report on Philosophy for children had no tests for statistical significance.
OK. Maybe some people haven’t read my previous blogs or believed what I’ve said before (and it is quite shocking) so I will briefly explain again.
When Mathematicians invented modern-day Statistics in the 1930s, they needed a way to see if results from an experiment were a real effect or just randomness. (For example, I throw a coin 10 times and it comes up Heads 7 times, it’s probably just randomness. I throw a coin 100 times and it comes up Heads 70 times, it’s probably biased.) So, Mathematicians invented statistical significance and p values to separate randomness and real effects.
Now, along come some Psychologists. They said “Mathematicians are a bunch of idiots and they’re doing this all wrong, let’s invent our own way of doing things’. So they invented the Effect Size. Mathematicians and Scientists have continued using statistical significance and Psychologists and Educationalists have continued using the Effect Size. They have said repeatedly that Null Hypothesis Significance Testing (i.e. the way Mathematicians and Scientists do things) is wrong.
This kind of thing is repeated on numerous Social Science websites.
So, you’ve really got to understand this, it’s not a case of them choosing one technique over another.
The people who use the Effect Size think that statistical significance testing, i.e. the way Mathematicians and Scientists do things is wrong and they have invented their own way of doing Statistics.
You’ve really got to grasp that to understand what I’ve been saying in my blogs.
“How could thousands of Psychologists and Educationalists all make the same mistake? Entire fields doing incorrect Statistics. It’s simply not plausible.”
On Thursday night I read a piece called ‘The Art of being Right’ by Arthur Schopenhauer. Underneath I reproduce a few paragraphs from a section entitled ‘Appeal to Authority rather than Reason’.
“When we come to look into the matter, so-called universal opinion is the opinion of two or three people; and we should be persuaded of this if we could see the way in which it really arises.
We should find that it is two or three persons who, in the first instance, accepted it, or advanced it and maintained it; and of whom people were so good as to believe they had thoroughly tested it. Then a few other persons, persuaded beforehand that the first were men of the requisite capacity, also accepted the opinion. These, again, were trusted by many others, whose laziness suggested to them that it was better to believe at once, than to go through the troublesome task of testing the matter for themselves. Thus the number of these lazy and credulous adherents grew from day to day; for the opinion had no sooner obtained a fair measure of support than its further supporters attributed this to the fact that the opinion could only have obtained it by the cogency of its arguments. The remainder were then compelled to grant what was universally granted, so as not to pass for unruly persons who resisted opinions which everyone accepted.
Since this is what happens, where is the value of the opinion even of a hundred millions? It is no more established than a historical fact reported by a hundred chroniclers who can be proved to have plagiarised it from one another; the opinion in the end being traceable to a single individual.”
Gene Glass should be the most famous man in Education. He is the person who changed the way the ‘Effect Size’ is used and spread its new use throughout Education. He became an Educational Psychologist in 1964. In the early Seventies he was receiving Psychotherapy and decided it had helped him so much that he wanted to prove to everyone that Psychotherapy worked. He’d learned about the ‘Effect Size’ from Jacob Cohen’s book ‘Statistical Power Analysis for the Behavioral Sciences’. (Jacob Cohen originally invented the ‘Effect Size’ and wrote a 500 page book explaining how to correctly use it to find the number of people you needed for your experiment.) Glass decided to completely change the way Jacob Cohen used the ‘Effect Size’, throw away the carefully constructed statistical look-up tables and use it for a completely different reason, sticking results together. While he was doing this, Glass was also elected as the President of the American Educational Research Association. He used his Presidential address to 1,500 educational researchers to announce his new method of putting results together using the new way of using the ‘Effect Size’. How many of those researchers would have thought that there was any element of doubt in what this eminent man was telling them at this prestigious occasion? How many of them would have had the necessary expertise to tell if it was correct or not? Glass wrote a 2 page pamphlet justifying his new way (this has a few sketches on it as proof) and published an article with his wife, Mary Lee Smith, in ‘American Psychologist’. Psychologists and Educationalists all started to copy him and the new method spread throughout Psychology and Education.
So, imagine all the children of the world, underneath them, supporting them are the teachers from all the different countries, underneath them is the whole of education research and all of this, resting on his shoulders, is just one man, Gene Glass. Given that Mathematicians have never taken the remotest bit of interest in the ‘Effect Size’, are we absolutely sure he’s correct?
The writer of the EEF report on Philosophy, Professor Stephen Gorad, has now openly admitted that he thinks that the way that Mathematicians and Scientists do Statistics is wrong and should be banned.
The significance test is how Mathematicians and Scientists do Statistics.
Psychologists invented the Effect Size as the “New Statistics” to replace it. It is unknown to Mathematicians.
When the Physicists at the Large Hadron Collider were looking for the Higgs Boson particle, to be sure they had really found it they used a significance test, called the five sigma test.
So, on one side of the argument we have the people who found the Higgs Boson, the other, Stephen Gorard. The decision is yours.
The Government announced today that they are spending £11 million to have 32 hub schools, bringing Chinese teachers to the UK and sending our teachers to China so we can learn how they do so well compared to us in Mathematics.
Yay, there’s a magic bullet that means the children can achieve without hard work or behaving themselves? I know we’ve been fooled many, many times before but this time it’s really true.
But wait, what’s this? If we look at the report released by Parliament – ‘Underachievement of White Working Class children‘
We see that the poorest Chinese children beat the richest non-Chinese children in this country.
It’s almost as if there is something else going on and the Chinese don’t have a magic way of teaching Maths at all.
The answer is of course that the Chinese have a culture of hard work and respect for education as typified in the ‘Battle hymn of the Tiger Mother‘ book by Amy Chua. They bring this with them when they move countries enabling them to come top in our country as well.
Maybe we need to be worrying more about copying the attitudes and culture of the Chinese parents and children towards Education and a little less trying to copy a mythical, magic, Chinese way of teaching.
When I was investigating the ‘Effect Size’ I found lots of criticism of significance testing on Social Science websites. Remember, this is once again, Social Scientists, often but not always Psychologists, criticising the way Mathematicians and Scientists do Statistics.
This is actually a fundamental part of the ‘Effect Size’ story as their failure to understand the significance testing procedure has led directly to the ‘Effect Size’ as they try to solve a ‘problem’ that isn’t really a problem, only a misunderstanding on their part.
It is also vital to recognise that the ‘Effect Size’ isn’t just another statistical method to choose from amongst many, it is the tip of the ice-berg of a completely different ethos. The people who advocate using the ‘Effect Size’ think that the whole way Mathematicians and Scientists do Statistics is wrong so they’ve decided to invent their own version. This has been mistakenly copied by people in Education like John Hattie.
In my next post I’ll be looking at the Maths of significance testing, but, what if you don’t know anything about Alpha levels or Type 1 and 2 errors, how could you judge? Well, a good place to start would be the mathematical credentials of the people making the criticism. So let’s have a look at the people who are criticising significance testing.
If we type in ‘Criticism of Significance testing’ into Google, the first ten results are –
http://community.dur.ac.uk/r.j.coe/teaching/critsig.htm – Number one on the list, our old friend Robert Coe, Professor of Education at Durham University
http://en.wikipedia.org/wiki/Statistics – A general article on Statistics by Wikipedia
http://www.cem.org/attachments/publications/CEMWeb037%20The%20Case%20Against%20Statistical%20Significance%20Testing.pdf – CEM, Professor Coe’s organisation publishing an article by Ronald P.Carver, Professor of Education and Psychology at the University of Missouri
http://errorstatistics.com/2012/12/24/13-well-worn-criticisms-of-significance-tests-and-how-to-avoid-them/ – Deborah Mayo, Professor of Philosophy at the University of Pennsylvania
http://www.johndcook.com/blog/2008/11/18/five-criticisms-of-significance-testing/ – John D Cook, Consultant in Applied Mathematics and Computing
http://www.uic.edu/classes/psych/psych548/fraley/ – R.Chris Fraley, Professor of Psychology at the University of Chicago
http://www.johnmyleswhite.com/notebook/2012/05/10/criticism-1-of-nhst-good-tools-for-individual-researchers-are-not-good-tools-for-research-communities/ – John Myles White, PhD student in Psychology
http://www.andrews.edu/~rbailey/Chapter%20two/7217331.pdf – Andrews University Education department. Authors, Jeffery Gliner, retired Professor of Psychology, Associate Professor Nancy Leech, PhD in Philosophy and MA in Counselling, George Morgan, retired Professor of Education
http://lesswrong.com/lw/g13/against_nhst/ – No information
http://www.ncbi.nlm.nih.gov/pubmed/17002771 – Authors – Dr Fiona Fidler, Environmental Science, background in Psychology and Philosophy, Mark Burgman, Environmental Science, background Zoology, Geoff Cummnigs, retired professor of Psychology, Robert Buttrose, background in Philosophy, Neil Thomason, historical and philisophical studies
And so it goes on, page after page of Psychologists, Philosophers and Education Professors critisicing the way Mathematicians and Scientists do Statistics.
So, you can judge for yourself the quality of the people criticising the way Mathematicans do Maths. Though this time we do seem to have a lot of Philosophers as well as Psychologists.
Now, this is important because, their mistakes in significance testing have led to the ‘Effect Size’, which has led to Education research being done incorrectly, which has an impact on real children in real classrooms.
In my next post, I will deal with the more Mathsy side of things. I will show that their criticisms of significance testing are baseless and just show their poor understanding of Statistics.