Psychology journal bans significance testing

The journal “Basic and Applied Social Psychology” has banned significance testing, confidence intervals and p values.


These are all mainstream Statistics tools with decades of use by Mathematicians.

Stephen Gorard who evaluates reports for the EEF has also recently called for the banning of these tools in Education.

Massive hat-tip to @cbokhove.


2 thoughts on “Psychology journal bans significance testing

  1. It seems extreme to be banning inferential statistics but there is a deep problem in social science research (as demonstrated by the outcome of the recently published psychology reproducibility project The blame for this cannot all be laid at the door of the p value, but it’s proving an intractable problem and the bottom line is that, for whatever reasons, researchers (and, until very recently, publishers) are tending to focus on meeting the p<.05 criterion instead of looking long and hard at the data and asking what it says (as further evidence, when large meta-analyses are split up to look for factors affecting the ES – sorry to swear – the only consistent finding is that published studies show stronger effects than unpublished ones). I know you have compared a lot with the use of inferential statisitics in the physical sciences but, to take your Higgs boson example, the fields are different. With the Higgs data, essentially the question was whether or not the data showed a spike (well, a subtle bump, really) at an energy that corresponded to that expected. Because all kinds of weird shit happens in TeV collisions the question was about whether the spike seen was evidence for the Higgs or whether it was just noise. This is an excellent problem to apply inferential statisitics to because there is a straightforward null hypothesis (no Higgs), and asking what the chances were of the spike occuring if the null hypothesis were true is exactly what the physicists wanted to know. Not wanting to be caught out announcing a false positive, they ran until they had p<3×10-7. If the Higgs does not exist then they've been damn unlucky. I assume the data could be treated as a genuine random sample from the population of all possible collisions but who knows what quantum weirdness may have been going on – safe to say they probably had some 'real' mathematicians on the case so will have adjusted if necessary. Education and other social science research just isn't like this. The first thing is that almost everything affects almost everything! Teachers aren't (or shouldn't be) interested in knowing that x factor correlates with y outcome; the answer is very likely to be yes under some circumstances, and no (or yes but negatively) under others. What we actually want to know is whether or not a change in x correlates with a BIG change in y, and whether that is a causal link, reproducible across all settings, or, if not, which ones. A study hitting p<.05 doesn't help answer these questions. Second, basic inferential statistics require random sampling; I know there are ways to compensate for some non-random situations but most social science samples aren't random (I think Gorard's argument is strong on this point) and there is a lot of published research that just ignores this problem (that may be good support for your argument that social scientists don't understand statistics but it doesn't help put the house in order, whereas moving away from NHSTP might). Third – sorry, going to swear again – but thinking at the level of individual studies in social science is not helpful. The studies are too small, and each one will have individual flaws. We need to amalgamate data from many studies with different designs, methodologies, settings, and research teams. This can be done by meta-analysis starting from p values etc. and generating ES but there is a good argument in favour of looking at the actual data sets, and including qualitative as well as quantitative work; the focus on p values has tended to make this more difficult. Gorard takes a possibly unique position in questioning the fundamental validity of inferential statistics but there is a real problem with their use in social science that doesn't affect mathematicians in the same way.
    Best wishes

    • Re point 2: as Glass’ ‘effect size’ has shown one arbitrary criterion will be replaced with a new one. The ‘solution’ namely simple stats plus ‘judgement’ which Gorard seems to suggest, will just make this worse. It’s one thing to rightly point out the flaws in NHSTP but if the ‘solution’ is no ‘solution’ then it might be better to better ‘educate’.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s