Back in December I noted a study wherein two thirds of a representative sample of published psychological studies could not be replicated. NewScientist (16 April 2016, paywall) has more dirt in “The Unscientific Method” (Sonia van Gilder Cooke):
Science is often thought of as a dispassionate search for the truth. But, of course, we are all only human. And most people want to climb the professional ladder. The main way to do that if you’re a scientist is to get grants and publish lots of papers. The problem is that journals have a clear preference for research showing strong, positive relationships – between a particular medical treatment and improved health, for example. This means researchers often try to find those sorts of results. A few go as far as making things up. But a huge number tinker with their research in ways they think are harmless, but which can bias the outcome.
Science defenders often note how it’s self-correcting, which makes it better than “competing” ideologies. So this is rather dismaying:
Traditionally, once results are published they tend to go unchecked. “The current system does not reward replication – it often even penalizes people who want to rigorously replicate previous work,” wrote statistician John Ioannidis of Stanford University in California in a recent paper entitled “How to make more published research true”. Proponents of a new discipline called metascience (the science of science) aim to change that, and Ioannidis is in the vanguard.
Part of the problem is simply we’re trying to do difficult things:
Some fields of research are less susceptible than others, though. In astronomy, chemistry and physics, for instance, “people have a very strong tradition of sharing data, and of using common databases like big telescopes or high energy physical experiments”, Ioannidis says. “They are very cautious about making claims that eventually will be refuted.” But in fields where such checks and balances are absent, irreproducible results are rife.
Take the case of cancer researcher Anil Potti when he was at Duke University in Durham, North Carolina. In 2006, staff at the MD Anderson Cancer Center in Houston, Texas, wanted to investigate treatments based on Potti’s published work on gene expression. Before pressing ahead, they asked their colleagues, biostatisticians Keith Baggerly and Kevin Coombes, to look over the findings. Their efforts illustrate how hard it can be for peer reviewers to pick up on mistakes. It took them almost 2000 hours to disentangle the data and reveal a catalogue of errors. It later transpired that Potti had falsified data, but in the meantime, three clinical trials had been started on the basis of his research.
Bold mine. 2000 hours is 50 man-weeks of 40 hour weeks. I get the impression that the breadth of the field and a relative dearth of competent personnel makes it difficult for self-correction to occur, when the fields themselves are difficult – and personnel may be more interested in the creative, exploratory part of science, and not the replicative side of things. Those researchers who want to build on novel results, as above, are those most likely interested in replication – but that’s not their commercial goal, it’s simply part of the process responsible researchers should follow, and not all researchers will be as responsible as this group, as evidenced by the last clause noting that three clinical trials were underway. I always wonder how scientists like Dr. Potti (is it possible to retract a PhD?) feel when they realize their false research is used for building treatments.
Gilder Cooke goes on to note responses from the various subfields of science, including registration of studies, including the analysis methods to be used once the data is collected, on an online platform by the Center for Open Science. In a sidebar, she notes this:
Blindfolding – Deciding on a data analysis method before the data are collected.
This serves to safeguard researchers from cherry-picking statistical methods for the one which gives the most significant results. By selecting the analysis method before hand, hopefully the researcher will justify why it’s the appropriate method.
Which reminds me of the struggle in software engineering between sitting down and hacking out a solution, and putting together a formal design that analyzes the requirements, anticipated data quantities, etc. I usually fall somewhere in the middle – the document exists, but only in my mind.