Philosophers' Playground: Science and Authority: Vioxx, The New England Journal of Medicine, and the Corporate Funding of Research

In the volume released yesterday, The New England Journal of Medicine corrected a report it published in March 2005 concerning a study about the safety of the drug Vioxx, which pharmaceutical manufacturer Merck pulled off the shelves in September 2004 after similar studies linked it to an increased risk of heart attack and stroke. The correction serves to undermine Merck's defense in many of the lawsuits brought against it because the study appeared to show that the risk of heart disease or stroke was the same in the Vioxx and control groups for the first 18 months of treatment. Because of the correction of a statistical error, the corrected study no longer supports this claim which was used to refute the cases of those who suffered heart ailments after Vioxx use for less than a year and a half.

There are all sorts of interesting questions around this correction and Merck's rejection of it which begins with the horribly ironic sentences,

At Merck, we are committed to rigorous scientific research conducted under high standards of ethical behavior. This is at the heart of who we are and how we do business.

Yes, these allegations attack the heart of Merck. What a stroke of bad luck. Could they not have found worse verbiage to subject the families of victims to?

Merck contends, contrary to the editors of The New England Journal of Medicine, that the correction does not alter the findings. The company argues that the description of a statistical method was, in fact, in error, but that when "a battery of statistical methods" is employed, the original finding still stands.

In the manuscript submitted to NEJM, the methods section referred to the use of the logarithm of time. This description of the method used for the report of the p-value for the test of proportionality of hazards was in error. The reported result (p-value = 0.01) came from a method using linear time, not logarithm of time. Results of diagnostic analyses indicate that a model using linear time is more representative of the data than one using logarithm of time. Thus, the linear time analysis is an appropriate method to assess the changes in relative risk over time. Recent tests show that the result using logarithm of time has a p-value = 0.07. Even this borderline significant result justifies concern regarding changes in relative risk over time. The battery of statistical assessments together indicate that the relative risk was not constant over time.

They didn't do what they said they would do in terms of statistical analysis, and when they did do what they said they'd do, the results changed. But if you selected a battery of other statistical methods, then the results could be maintained. These other methods are perfectly legit. So, should they stay or should they go? This indecision's bugging me.

Over at The Intersection yesterday, Chris Mooney (he of the Republican War on Science) asked an interesting question about a quotation from Ernest Rutherford who weighed in on the old battle between the hypothetico-deductive and inductivist models of scientific reasoning. The question concerns the source of purported laws of nature. The H-D folks contended that you could pull hypotheses out of the air as long as you rigorously test them, whereas the inductivist camp contended that because an infinite number of hypotheses will be consistent with any finite set of data, you need to let the data talk for itself, that is, there is a logical mechanism that will extract the correct rule from the data. Newton, the most famous adherent of the inductivist group, put it this way in his master work, the Principia,

In experimental philosophy we are to look upon propositions inferred by general induction from phenomena as accurately or very nearly true, notwithstanding any contrary hypotheses that may be imagined, till such time as other phenomena occur, by which they are made more accurate, or liable to exception.

In other words, the data itself will suggest the rule which holds until more data overturns it.

But this view, when we look at the Vioxx issue turns out to be quaintly naive. Newton's "general induction from the phenomena" is not quite as simple as he thinks. Finding the sort of regularities needed requires using sophisticated statistical tools, but there are always several tools to choose from. That choice, as we see, can make a difference. What to do?

One thing we do is not to choose the tool after the fact. Just as we need a double blind methodology in administering the drugs in order to make sure we don't slant the findings, so too we need to stick to our tools when doing the analysis. This illicit shift, a game of statistical three-card monty, is what Merck is trying to pull.

But couldn't it be right? If they had chosen that battery of tests in the first place -- which would have been a perfectly acceptable choice -- wouldn't the newly corrected study be wrong? Well, wrong is too strong, but the data would be taken to support the contrary hypothesis and thus their central claim in court would have been bolstered instead of undermined. How can science be so squishy? Doesn't this make science an art and not...well, a science? No. The fact that you can find tools to statistically massage your data does not mean that there is no fact of the matter, simply that the data that you have is not sufficient for conclusive evidence.

There are two senses to the word "evidence," there is evidence-for or supporting evidence and evidence-that or conclusive evidence. We are dealing here with evidence-for the claim that Vioxx was safe for use up to 18 months. Where Merck thought they had evidence-for that claim, they don't. Might it be true? Yes, it might; but there is no evidence, no good reason for belief coming from this study. Additional studies or meta-studies that pull in and reanalyze the data from many studies could advance the hypothesis or undermine it. But in this case, because the pre-selected means of data analysis gave evidence for earlier correlation between the drug and the adverse effects, Merck cannot lean on this study as evidence against those who bring suit.

This raises an interesting question concerning the nature of science and the fallacy of questionable authority. On NPR this morning, a scientist commented on another Merck funded study that was found to be shady after being published in The New England Journal of Medicine and said, "Burn me once, shame on you; burn me twice, shame on me." When is there good reason to be suspicious of scientific findings from research sponsored by an advocate?

Arguments from authority are fine arguments as long as you have a legitimate authority. A legitimate authority needs three things: (1) to exist -- "I read somewhere that" does not constitute an acceptable authority, (2) to be an expert in the field -- when you are sick, listen to your doctor, not your Uncle Murray the dry cleaner, and (3) to be independent -- the expert can't have a stake in getting you to believe one way or the other.

It is (3) that is interesting here. Now, if you are a climate scientist and you see good reason to suspect that global warming is happening and can be prevented, you damn well better advocate for that change (see yesterday's discussion on science and ethics). But does that advocacy immediately make your work suspect? Scientific work may be funded by a corporation with a financial interest, but does that interest affect how we ought to view the results once published?

This is, of course, two different questions: one for the peer reviewers and one for those reading the peer-reviewed article. But they aren't that different since there is only so much that a peer-reviewer can check. They can't oversee the lab work. They can't see if they can duplicate the data. They aren't going to redo the entire analysis. If a study is conducted by a respected researcher from Dartmouth, does the fact that his lab is corporately funded give us reason to be suspicious of his findings? that's a hard question. This case seems to say that the data itself may not be in doubt, but the corporate interpretation of the data is.

UPDATE: Mark Chu-Carroll at "Good Math/Bad Math" (doing a wonderful job filling an important niche) discusses this as well.