Science is a perfect ideal surrounded by imperfect servants, by whom I mean the scientists: poor sensory organs, poor tools, and the occasional dishonest creature makes up the corp. The results are studies with results defying known science, anomalous results, and the occasional out and out fraud. These problems are not particularly easy to manage, due to various factors, which I take to include both poor logistical tools as well as egotism and narcissism, and, in the commercial realm, a desire for corporate secrecy.
The advent of computers and then the Internet, however, present an opportunity to emplace procedures ameliorating the imperfections of the servants of science. But first, an example of the problem, courtesy an anonymous psychologist and The Atlantic:
In the last few years, psychologists have become increasingly aware of, and unsettled by, these problems. Some have created an informal movement to draw attention to the “reproducibility crisis” that threatens the credibility of their field. Others have argued that no such crisis exists, and accused critics of being second-stringers and bullies, and of favoring joyless grousing over important science. In the midst of this often acrimonious debate, [Brian] Nosek has always been a level-headed figure, who gained the respect of both sides. As such, the results of the Reproducibility Project, published today in Science, have been hotly anticipated.
They make for grim reading. Although 97 percent of the 100 studies originally reported statistically significant results, just 36 percent of the replications did.
The original publication in Science is here. Lisa Feldman Barrett, in the New York Times, disagrees with the perception that there may be a crisis occurring:
But the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works.
Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate.
Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist’s job now is to figure out what those conditions are, in order to form new and better hypotheses to test. …
Psychologists are usually well attuned to the importance of context. In our experiments, we take great pains to avoid any irregularities or distractions that might affect the results. But when it comes to replication, psychologists and their critics often seem to forget the powerful and subtle effects of context. They ask simply, “Did the experiment work or not?” rather than considering a failure to replicate as a valuable scientific clue.
(h/t Richard Soulen) And, of course, this can be absolutely true for some fields, and false in others. However, the point is to detect those studies with flaws in design and execution, and to detect them early. To this goal, Dr. Nosek is now running the Reproducibility Project. From the Atlantic article:
… would be the first big systematic attempt to answer questions that have been vexing psychologists for years, if not decades. What proportion of results in their field are reliable?
As Professor Barrett points out, this could be an opportunity to discover new information and follow leads to new discoveries. But replication of results remains an important facet of science, so this project is important in its own right, not only for psychology, but science in general. And replication can be used as a criticism, as in this article from NewScientist (22 August 2015, paywall), “Suicidal behaviour predicted by blood test showing gene changes,” which claims tentative evidence that suicide can be predicted from a blood test. Here’s the criticism:
While mortality linked to physiological conditions like cardiac disease has fallen, suicide rates are at an all-time high across all age groups in the US. In the UK, rates have been rising steadily since 2007, and similar trends are seen in other countries. The desire to have psychiatry benefit from biological advances in the same way as the rest of medicine is why the NIMH has changed its approach. But the shift towards detecting biomarkers by neuroimaging or monitoring gene expression has drawn criticism.
“The NIMH is funding biomarker porn,” says James Coyne of University Medical Center in Groningen, the Netherlands. “It’s airbrushed, heavily edited, and you can’t replicate it at home.”
Coyne’s view is shaped by the small sample sizes used in early mental health biomarker research – something that can be problematic for rare conditions.
An allied problem is publication bias, wherein journals preferentially publish papers which fail to falsify their hypothesis, which means we lose access to knowledge about falsified hypotheses. This has particularly been a problem in the area of drug development, wherein we have not kept good track of what drugs are not applicable to what conditions – arguably more sheer data than what drug does have an effect on what condition. Towards resolving this is ClinicalTrials.gov, which enables the registration of studies as well as their results.
Then we can talk about fraud. Scientific American has a review of a book, On Fact and Fraud: Cautionary Tales from the Front Lines of Science (Princeton University Press, 2010), by David Goodstein, containing this fascinating observation:
Knowing that scientists are highly motivated by status and rewards, that they are no more objective than professionals in other fields, that they can dogmatically defend an idea no less vehemently than ideologues and that they can fall sway to the pull of authority allows us to understand that, in Goodstein’s assessment, “injecting falsehoods into the body of science is rarely, if ever, the purpose of those who perpetrate fraud. They almost always believe that they are injecting a truth into the scientific record.” Goodstein should know because his job as the vice provost of Caltech was to investigate allegations of scientific misconduct. From his investigations Goodstein found three risk factors present in nearly all cases of scientific fraud. The perpetrators, he writes, “1. Were under career pressure; 2. Knew, or thought they knew, what the answer to the problem they were considering would turn out to be if they went to all the trouble of doing the work properly; and 3. Were working in a field where individual experiments are not expected to be precisely reproducible.”
And while scientific fraud is a serious matter, I must cite one that makes me laugh, the Columbia University ‘Miracle’ Study of 2004. From, appropriately enough, Skeptical Inquirer:
On October 2, 2001, the New York Times reported that researchers at prestigious Columbia University Medical Center in New York had discovered something quite extraordinary (1). Using virtually foolproof scientific methods the researchers had demonstrated that infertile women who were prayed for by Christian prayer groups became pregnant twice as often as those who did not have people praying for them. The study was published in the Journal of Reproductive Medicine (2). Even the researchers were shocked. The study’s results could only be described as miraculous. This was welcome and wonderful news for a shaken nation.
The upshot? Bad procedures. Two authors who backed away and refused to answer questions. A third who ended up in jail. Again.
So what? Don’t we already have “peer-reviewed” journals? Certainly, but they can only do so much. From Charles Seife at the L. A. Times:
[Science Magazine] Editor in Chief Marcia McNutt said the magazine was essentially helpless against the depredations of a clever hoaxer: “No peer review process is perfect, and in fact it is very difficult for peer reviewers to detect artful fraud.”
This is, unfortunately, accurate. In a scientific collaboration, a smart grad student can pull the wool over his advisor’s eyes — or vice versa. And if close collaborators aren’t going to catch the problem, it’s no surprise that outside reviewers dragooned into critiquing the research for a journal won’t catch it either. A modern science article rests on a foundation of trust.
So some scientists have feet of clay. Projects such as the Replication Project function as a way to splash more light onto studies, and whether it’s disinterested scientists performing their duties, or scientists with vendettas, when the subject is out in the light for viewing, it’s just like the encryption community – everything improves. Using technology to gather up studies in all their details, from data to collection methods to analysis methods to results will certainly help to improve the quality of the studies, IF other scientists are willing to take that information seriously and use it for their own replication attempts.
After all, scientists don’t reinvent the wheel; they stand on the shoulders of those who already did.