There is much to agree with in Yale University clinical neurologist Steven Novella’s recent article on the p-value. The latest news regarding the beleaguered statistical parameter used in hypothesis testing is the call to reduce its associated threshold for statistical significance by an order of magnitude, from its venerable value of 0.05 to 0.005.
Over the years Major League Baseball has tweaked the dimensions of the field, specifically the distance and height of the pitcher’s mound and the area of the strike zone. They did this in order to adjust the balance between pitchers and hitters, mostly to shift the balance toward hitters to make games more exciting for the fans.
Scientists are debating similar tweaks to statistical significance, to adjust the balance between false positives and false negatives. As with pitchers and batters, some changes are a zero-sum game – if you lower false positives, you increase false negatives, and vice versa. Where the perfect balance lies is a complicated question and the increasing subject of debate.
A recent paper (available in preprint) by a long list of authors, including some heavy hitters like John P.A. Ioannidis, suggests that the p-value that is typically used for the threshold of statistical significance, be changes in the psychology and biomedical fields from 0.05 to 0.005.
This is a modest proposal compared to outright banning the use of p-values.
But in any case, while a move to 0.005 would likely help to reduce problems, what is more desperately needed is the underlying training and peer review to ensure proper statistical testing, period, regardless of the selected threshold for the p-value. This is because the issues discussed by Novella are dwarfed by hypothesis testing fallacies, such as false dichotomies, that routinely appear in the literature. Those problems, unfortunately, are routinely ignored.
Image credit: Charles Dixon [Public domain], via Wikimedia Commons.
Cross-posted at Darwin’s God.