Suppose you have just been diagnosed with a rare illness. You go to your doctor and they put you through a series of tests. In the end, they recommend that you take a new drug — wonderzene — that has recently been approved by the FDA following several successful trials. How confident should you be that this drug will improve your condition?
You might think that this question cannot be answered in the abstract. It has to be assessed on a case by case basis. What is the survival rate for your particular illness? What is its underlying pathophysiology? What does the drug do? How successful were these trials? And in many ways you would be right. Your confidence in the success of the treatment does depend on the empirical facts. But that’s not all it depends on. It also depends on assumptions that medical scientists make about the nature of your illness and on the institutional framework in which the scientific evidence concerning the illness and its treatment is produced, interpreted and communicated to patients like you. When you think about these other aspects of the medical scientific process, it might be the case that you should very sceptical about the prospects of your treatment being a success. This could be true irrespective of the exact nature of the drug in question and the evidence concerning its effectiveness.
That is the gist of the argument put forward by Jacob Stegenga in his provocative book Medical Nihilism. The book argues for an extreme form of scepticism about the effectiveness of medical interventions, specifically pharmaceutical interventions (although Stegenga intends his thesis to have broader significance). The book is a real tour-de-force in applied philosophy, examining in detail the methods and practices of modern medical science and highlighting their many flaws. It is eye-opening and disheartening, though not particularly surprising to anyone who has been paying attention to the major scandals in scientific research for the past 20 years.
I highly recommend reading the book itself. In this post I want to try to provide a condensed summary of its main argument. I do so partly to help myself understand the argument, and partly to provide a useful primer to the book for those who have not read it. I hope that reading it stimulates further interest in the topic.
1. The Master Argument for Medical Nihilism
Let’s start by clarifying the central thesis. What exactly is medical nihilism? As Stegenga notes in his introductory chapter, “nihilism” is usually associated with the view that “some particular kind of value, abstract good, or form of meaning” does not exist (Stegenga 2018, 6). Nihilism comes in both metaphysical and epistemological flavours. In other words, it can be understood as the claim that some kind of value genuinely does not exist (the metaphysical thesis) or that it is impossible to know/justify one’s belief in its existence (the epistemological thesis).
In the medical context, nihilism can be understood relative to the overarching goals of medicine. These goals are to eliminate both the symptoms of disease and, hopefully, the underlying causes of disease. Medical nihilism is then the view that this is (very often) not possible and that it is very difficult to justify our confidence in the effectiveness of medical interventions with respect to those goals. For what it’s worth, I think that the term ‘nihilism’ oversells the argument that Stegenga offers. I don’t think he quite justifies total nihilism with respect to medical interventions; though he does justify strong scepticism. That said, Stegenga uses the term nihilism to align himself with 19th century medical sceptics who adopted a view known as ‘therapeutic nihilism’ which is somewhat similar to the view Stegenga defends.
Stegenga couches the argument for medical nihilism in Bayesian terms. If that’s something that is unfamiliar to you, then I recommend reading one of the many excellent online tutorials on Bayes’ Theorem. Very roughly, Bayes’ Theorem is a mathematical formula for calculating the posterior probability of a hypothesis or theory (H) given some evidence (E). Or, to put it another way, it is a formula for calculating how confident you should be in a hypothesis given that you have received some evidence that appears to speak in its favour (or not, as the case may be). This probability can be written as P(H|E) — which reads in English as “the probability of H given E”. There is a formal derivation of Bayes’ Theorem that I will not go through. For present purposes, it suffices to know that the P(H|E) depends on three other probabilities: (i) the prior probability of the hypothesis being true, irrespective of the evidence (i.e P(H)); (ii) the probability (aka the “likelihood”) of the evidence given the hypothesis (i.e. P(E|H); and (iii) the prior probability of the evidence, irrespective of the hypothesis (i.e. P(E)). This can be written out as an equation, as follows:
P(H|E) = P(H) x P(E|H) / P(E)*
In English, this equation states that the probability of the hypothesis given the evidence is equal to the prior probability of the hypothesis, multiplied by the probability of the evidence given the hypothesis, divided by the prior probability of the evidence.
This equation is critical to understanding Stegenga’s argument because, without knowing any actual figures for the relevant probabilities, you know from the equation itself that the P(H|E) must be low if the following three conditions are met: (i) the P(H) is low (i.e. if it is very unlikely, irrespective of the evidence, that the hypothesis is true); (ii) the P(E|H) is low (i.e. the evidence observed is not very probable given the hypothesis); and (iii) the P(E) is high (i.e. it is very likely that you would observe the evidence irrespective of whether the hypothesis was true or not). To confirm this, just plug figures into the equation and see for yourself.
That’s all the background on Bayes’ theorem that you need to understand Stegenga’s case for medical nihilism. In Stegenga’s case, the hypothesis (H) in which we are interested is the claim that any particular medical intervention is effective, and the evidence (E) in which we are interested is anything that speaks in favour of that hypothesis. So, in other words, we are trying to figure out how confident we should be about the claim that the intervention is effective given that we have been presented with evidence that appears to support its effectiveness. We calculate that using Bayes’ theorem and we know from the preceding discussion that our confidence should be very low if the three conditions outlined above are met. These three conditions thus form the premises of the following formal argument in favour of medical nihilism.
- (1) P(H) is low (i.e. the prior probability of any particular medical intervention being effective is low)
- (2) P(E|H) is low (i.e. the evidence observed is unlikely given the hypothesis that the medical intervention is effective)
- (3) P(E) is high (i.e. the prior probability of observing evidence that favours the treatment, irrespective of whether the treatment is actually effective, is high)
- (4) Therefore (by Bayes’ theorem) the P(H|E) must be low (i.e. the posterior probability of the medical intervention being successful, given evidence that appears to favour it, is low)
The bulk of Stegenga’s book is dedicated to defending the three premises of this argument. He dedicates most attention to defending premise (3), but the others are not neglected. Let’s go through each of them now in more detail. Doing so should help to eliminate lingering confusions you might have about this abstract presentation of the argument.
2. Defending the First Premise: The P(H) is Low
Stegenga offers two arguments in support of the claim that medical interventions have a low prior probability of success. The first argument is relatively straightforward. We can call it the argument from historical failure. This argument is an inductive inference from the fact that most historical medical interventions are unsuccessful. Stegenga gives many examples. Classic ones would include the use of bloodletting and mercury to cure many illnesses, “hydropathy, tartar emetic, strychnine, opium, jalap, Daffy’s elixir, Turlington’s Balsam of life” and many more treatments that were once in vogue but have now been abandoned (Stegenga 2018, 169).
Of course, the problem with focusing on historical examples of this sort is that they are often dismissed by proponents of the “standard narrative of medical science”. This narrative runs like this “once upon a time, it is true, that most medical interventions were worse than useless, but then, sometime in the 1800s, we discovered scientific methods and things started to improve”. This is taken to mean that you can’t use these historical examples to question the prior probability of modern medical treatments.
Fortunately, you don’t need to. Even in the modern era most putative medical treatments are failures. Drug companies try out many more treatments than ever come to market, and among those that do come to market, a large number end up being withdrawn or restricted due to their relative uselessness or, in some famous cases, outright dangerousness. Stegenga gives dozens of examples on pages 170-171 of his book. I won’t list them all here but I will give a quick flavour of them (if you click on the links, you can learn more about the individual cases). The examples of withdrawn or restricted drugs include: isotretinoin, rosiglitazone, valdecoxib, fenfluramine, sibutramine, rofecoxib, cerivastatin, and nefazodone. The example of rofecoxib (marketed as Vioxx) is particularly interesting. It is a pain relief drug, usually prescribed for arthritis, that was approved in 1999 but then withdrawn due to associations with increased risk of heart attack and stroke. It was prescribed to more than 80 million people when it was on the market (there is some attempt to return it to market now). And, again, that it just one example among many. Other prominent medical failures include monoamine oxidase inhibitors, which were widely prescribed for depression in the mid-20th century, only later to be abandoned due to ineffectiveness, and hormone replacement therapy (HRT) for menopausal women.
These many examples of past medical failure, even in the modern era, suggest that it would be wise to assign a low prior probability to the success of any new treatment. That said, Stegenga admits that this is a suggestive argument only since it is very difficult to give an accurate statement of the ratio of effective to ineffective treatments from this data (one reason for this is that it is difficult to get a complete dataset and the dataset that we do have is subject to flux, i.e. there are several treatments that are still on the market that may soon be withdrawn due to ineffectiveness or harmfulness).
Stegenga’s second argument for assigning a low prior probability to H is more conceptual and theoretical in a nature. It is the argument from the paucity of magic bullets. Stegenga’s book isn’t entirely pessimistic. He readily concedes that some medical treatments have been spectacular successes. These include the use of antibiotics and vaccines for the treatment of infectious diseases and the use of insulin for diabetic treatment. One property shared by these successful treatments is that they tend to be ‘magic bullets’ (the term comes from the chemist Paul Ehrlich). What this means is that they target a very specific cause of disease (e.g. virus or bacteria) in an effective way (i.e. they can eliminate/destroy the specific cause of disease without many side effects).
Magic bullets are great, if we can find them. The problem is that most medical interventions are not magic bullets. There are three reasons for this. First, magic bullets are the “low-hanging fruit” of medical science: we have probably discovered most of them by now and so we are unlikely to find new ones. Second, many of the illnesses that we want to treat have complex, and poorly understood, underlying causal mechanisms. Psychiatric illnesses are a classic example. Psychiatric illnesses are really just clusters of symptoms. There is very little agreement on their underlying causal mechanisms (though there are lots of theories). It is consequently difficult to create a medical intervention that specifically and effectively targets a psychiatric disease. This is equally true for other cases where the underlying mechanism is complex or unclear. Third, even if the disease were relatively simple in nature, human physiology is not, and the tools that we have at our disposal for intervening into human physiology are often crude and non-specific. As a result, any putative intervention might mess up the delicate chemical balancing act inside the body, with deleterious side effects. Chemotherapy is a clear example. It helps to kill cancerous cells but in the process it also kills healthy cells. This often results in very poor health outcomes for patients.
Stegenga dedicates an entire chapter of his book to this argument (chapter 4) and gives some detailed illustrations of the kinds of interventions that are at our disposal and how non-specific they often are. Hopefully, my summary suffices for getting the gist of the argument. The idea is that we should assign a low prior probability to the success of any particular treatment because it is very unlikely that the treatment is a magic bullet.
3. Defending the Second Premise: The P(E|H) is Low
The second premise claims that the evidence we tend to observe concerning medical interventions is not very likely given the hypothesis that they are successful. For me, this might be the weakest link in the argument. That may be because I have trouble understanding exactly what Stegenga is getting at, but I’ll try to explain how I think about it and you can judge for yourself whether it undermines the argument.
My big issue is that this premise, more so than the other premises, seems like one that can really only be determined on a case-by-case basis. Whether a given bit of evidence is likely given a certain hypothesis depends on what the evidence is (and what the hypothesis is). Consider the following three facts: the fact that you are wet when you come inside the house: the fact that you were carrying an umbrella with you when you did; and the fact that you complained about the rain when you spoke to me. These three facts are all pretty likely given the hypothesis that it is raining outside (i.e. the P(E|H) is high). The facts are, of course, consistent with other hypotheses (e.g. that you are a liar/prankster and that you dumped a bucket of water over your head before you came in the door) but that possibility, in and of itself, doesn’t mean the likelihood of observing the evidence that was observed, given the hypothesis that it is raining outside, is low. It seems like the magnitude of the likelihood depends specifically on the evidence observed and how consistent it is with the hypothesis. In our case, we are assuming that the hypothesis is the generic statement that the medical intervention is effective, so before we can say anything about the P(E|H) we would really need to know what the evidence in question is. In other words, it seems to me like we would have to “wait and see” what the evidence is before concluding that the likelihood is low. Otherwise we might be conflating the prior probability of an effective treatment (which I agree is low) with the likelihood.
Stegenga’s argument seems to be that we can say something generic about the likelihood given what we know about the evidential basis for existing interventions. He makes two arguments in particular about this. First, he argues that in many cases the best available medical evidence suggests that many interventions are little better than placebo when it comes to ameliorating disease. In other words, patients who take an intervention usually do little better than those who take a placebo. This is an acknowledged problem in medicine, sometimes referred to as medicine’s “darkest secret”. He gives detailed examples of this on pages 171 to 175 of the book. For instance, the best available evidence concerning the effectiveness of anti-depressants and cholesterol-lowering drugs (statins) suggests they have minimal positive effects. That is not the kind of evidence we would expect to see on the hypothesis that the treatments are effective.
The second argument he makes is about discordant evidence. He points out that in many cases the evidence for the effectiveness of existing treatments is a mixed bag: some high quality studies suggest positive (if minimal) effects; others suggest there is no effect; and others suggest that there is a negative effect. Again, this is not the kind of evidence we would expect to see if the intervention is effective. If the intervention were truly effective, surely there would be a pronounced positive bias in the total set of evidence? Stegenga goes into some of the technical reasons why this argument from discordant evidence is correct, but we don’t need to do that here. This description of the problem should suffice.
I agree with both of Stegenga’s arguments, but I still have qualms about his general claim that the P(E|H) for any particular medical intervention is low. Why is this? Let’s see if I can set it out more clearly. I believe that Stegenga succeeds in showing that the evidence we do observe concerning specific existing treatments is not particularly likely given the hypothesis that those treatments are effective. That’s pretty irrefutable given the examples discussed in his book. But as I understand it, the argument for medical nihilism is a general one that is supposed to apply to any random or novel medical treatment, not a specific one concerning particular medical treatments. Consequently, I don’t see why the fact that the evidence we observe concerning specific treatments is unlikely generalises to an equivalent assumption about any random or novel treatment.
That said, my grasp of probability theory leaves a lot to be desired so I may have this completely wrong. Furthermore, even if I am right, I don’t think it undermines the argument for medical nihilism all that much. The claims that Stegenga defends about the evidential basis of existing treatments can be folded into how we calculate the prior probability of any random or novel medical treatment being successful. And it would certainly lower that prior probability.
4. Defending the Third Premise: The P(E) is High
This is undoubtedly the most interesting premise of Stegenga’s argument and the one he dedicates the most attention to in his book (essentially all of chapters 5-10). I’m not going to be able to do justice to his defence of it here. All I can provide is a very brief overview. Still, I will try my best to capture the logic of the argument he makes.
To start, it helps if we clarify what this premise is stating. It is stating that we should expect to see evidence suggesting that an intervention is effective even if the intervention is not effective. In other words, it is stating that the institutional framework through which medical evidence is produced and communicated is such that there is a significant bias in favour of positive evidence, irrespective of the actual effectiveness of a treatment. To defend this claim Stegenga needs to show that there is something rotten at the heart of medical research.
The plausibility of that claim will be obvious to anyone who has been following the debates about the reproducibility crisis in medical science in the past decade, and to anyone who has been researching the many reports of fraud and bias in medical research. Still, it is worth setting out the methodological problems in general terms, and Stegenga’s presentation of them is one of the better ones.
Stegenga makes two points. The first is that the methods of medical science are highly malleable; the second is that the incentive structure of medical science is such that people are inclined to take advantage of this malleability in a way that produces evidence of positive treatment effects. These two points combine into an argument in favour of premise (3).
Let’s consider the first of these points in more detail. You might think that the methods of medical science are objective and scientific. Maybe you have read something about evidence based medicine. If so, you might well ask: Haven’t medical scientists established clear protocols for conducting medical trials? And haven’t they agreed upon a hierarchy of evidence when it comes to confirming whether a treatment is effective or not? Yes, they have. There is widespread agreement that randomised control trials are the gold standard for testing the effectiveness of a treatment, and there are detailed protocols in place for conducting those trials. Similarly, there is widespread agreement that you should not over-rely on one trial or study when making the case for a treatment. After all, one trial could be an anomaly or statistical outlier. Meta-analyses and systematic reviews are desirable because they aggregate together many different trials and see what the general trends in evidence are.
But Stegenga argues that this widespread agreement about evidential standards masks considerable problems with malleability. For example, when researchers conduct a meta-analysis, they have to make a number of subjective judgments about which studies to include, what weighting to give to them and how to interpret and aggregate their results. This means that different groups of researchers, conducting meta-analyses of the exact same body of evidence, can reach different conclusions about the effectiveness of a treatment. Stegenga gives examples of this in chapter 6 of the book. The same is true when it comes to conducting randomised control trials (chapter 7) and measuring the effectiveness of those trials (chapter 8). There are sophisticated tools for assessing the quality of evidence and the measures of effectiveness, but they are still prone to subjective judgment and assessment, and different researchers can apply them in different ways (more technically, Stegenga argues that the tools have poor ‘inter-rater reliability’ and poor ‘inter-tool reliability’). Again, he gives several examples of how these problems manifest in the book.
The malleability of the evidential tools might not be such a problem is everybody used those tools in good faith. This is where Stegenga’s second claim — about the problem of incentives — rears its ugly head. The incentives in medical science are such that not everyone is inclined to use the tools in good faith. Pharmaceutical companies need treatments to be effective if they are to survive and make profits. Scientists also depend on finding positive effects to secure career success (even if they are not being paid by pharmaceutical companies). This doesn’t mean that people are always explicitly engaging in fraud (though some definitely are) it just means that everyone operating within the institutions of medical research has a significant interest in finding and reporting positive effects. If a study doesn’t find a positive effect, it tends to go unreported. Similarly, and because of the same incentive structures, there is a significant bias against finding and reporting on the harmful effects of interventions.
Stegenga gives detailed examples of these incentive problems in the book. Some people might push back against his argument by pointing out that the problems to which he appeals are well-documented (particularly since the reproducibility crisis became common knowledge in the past decade or so) and steps have been taken to improve the institutional structure through which medical evidence is produced. So, for example, there is a common call now for trials to be pre-registered with regulators and there is greater incentive to try to replicate findings and report on negative results. But Stegenga argues that these solutions are still problematic. For example, the registration of trial and trial data, by itself, doesn’t seem to stop the over-reporting of positive results nor the approval of drugs with negative side effects. One illustration of this is the drug rosiglitazone, which is a drug for type-2 diabetes (Stegenga 2018, p 148). Due to a lawsuit, the drug manufacturer (GlaxoSmithKline) was required to register all data collected from forty-two trials of the drug. Only seven trials were published, which unsurprisingly suggested that the drug had positive effects. The drug was approved by the FDA in 1999. Later, in 2007, a researcher called Steven Nissen accessed the data from all 42 trials, conducted a meta-analysis, and discovered that the drug increased the risk of heart attack by 43%. In more concrete terms, this meant that the drug was estimated to have caused somewhere in the region of 83,000 heart attacks since coming on the market. All of this information was available to both the drug manufacturer and, crucially, the regulator (the FDA) before Nissen conducted his study. Indeed, internal memos from the company suggested that they were aware of the heart attack risk years before. But yet they had no incentive to report it and the FDA, either through incompetence or lack of resources, had no incentive to check up on them. That’s just one case. In other cases, the problem goes even deeper than this, and Stegenga gives some examples of how regulators are often complicit in maintaining the secrecy of trial data.
To reiterate, this doesn’t do justice to the nuance and detail that Stegenga provides in the book, but it does, I think, hint that there is a strong argument to be made in favour of premise (3).
5. Criticisms and Replies
What about objections to the argument? Stegenga looks at six in chapter 11 of the book (these are in addition to specific criticisms of the individual premises). I’ll review them quickly here.
The first objection is that there is no way to make a general philosophical case for medical nihilism. Whether any given medical treatment is effective depends on the empirical facts. You have to go out and test the intervention before you can reach any definitive conclusions.
Stegenga’s response to this is that he doesn’t deny the importance of the empirical facts, but he argues, as noted in the introduction to this article, that the hypothesis that any given medical intervention is effective is not purely empirical. It depends on metaphysical assumptions about the nature of disease and treatment, as well as epistemological/methodological assumptions about the nature of medical evidence. All of these have been critiqued as part of the argument for medical nihilism.
The second objection is that modern “medicine is awesome” and the case for medical nihilism argument doesn’t properly acknowledge its awesomeness. The basis for this objection presumably lies in the fact that some treatments appear to be very effective and that health outcomes, for the majority of people, have improved over the past couple of centuries, during which period we have seen the rise of scientific medicine.
Stegenga’s response is that he doesn’t deny that some medical interventions are awesome. Some are, after all, magic bullets. Still, there are three problems with this “medicine is awesome” objection. First, while some interventions are awesome, they are few and far between. For any randomly chosen or novel intervention the odds are that it is not awesome. Second, Stegenga argues that people underestimate the role of non-medical interventions in improving general health and well-being. In particular, he suggests (citing some studies in support of this) that changes in hygiene and nutrition have played a big role in improved health and well-being. Finally, Stegenga argues that people underestimate the role that medicine plays in negative health outcomes. For example, according to one widely-cited estimate, there are over 400,000 preventable hospital-induced deaths in the US alone every year. This is not “awesome”.
The third objection is that regulators help to guarantee the effectiveness of treatments. They are gatekeepers that prevent harmful drugs from getting to the market. The put in place elaborate testing phases that drugs have to pass through before they are approved.
This objection holds little weight in light of the preceding discussion. There is ample evidence to suggest that regulatory approval does not guarantee the effectiveness of an intervention. Many drugs are withdrawn years after approval when evidence of harmfulness is uncovered. Many approved drugs aren’t particularly effective. Furthermore, regulators can be incompetent, under-resourced and occasionally complicit in hiding the truth about medical interventions.
The fourth objection is that peer review helps to guarantee the quality of medical evidence. This objection is, of course, laughable to anyone familiar with the system of peer review. There are many well-intentioned researchers peer-reviewing one another’s work, but they are all flawed human beings, subject to a number of biases and incompetencies. There is ample evidence to suggest that bad or poor quality evidence gets through the peer review process. Furthermore, even if they were perfect, peer reviewers can only judge the quality of the studies that are put before them. If those studies are a biased sample of the total evidence, peer reviewers cannot prevent a skewed picture of reality from emerging.
The fifth objection is that the case for medical nihilism is “anti-science”. That’s a bad thing because there is lots of anti-science activism in the medical sphere. Quacks and pressure groups push for complementary therapies and argue (often with great success) against effective mainstream interventions (like vaccines). You don’t want to give these groups fodder for their anti-science activism, but that’s exactly what the case for medical nihilism does.
But the case for medical nihilism is definitely not anti-science. It is about promoting good science over bad science. This is something that Stegenga repeatedly emphasises in the book. He looks at the best quality scientific evidence to make his case for the ineffectiveness of interventions. He doesn’t reject or deny the scientific method. He just argues that the best protocols are not always followed, that they are not perfect, and that when they are followed the resulting evidence does not make a strong case for effectiveness. In many ways, the book could be read as a plea for a more scientific form of medical research, not a less scientific form. Furthermore, unlike the purveyors of anti-science, Stegenga is not advocating some anti-science alternative to medical science — though he does suggest we should be less interventionist in our approach to illness, given the fact that many interventions are ineffective.
The sixth and final objection is that there are, and will be soon, some “game-changing” medical breakthroughs (e.g. stem cell treatment or genetic engineering). These breakthroughs will enable numerous, highly effective interventions. The medical nihilist argument doesn’t seem to acknowledge either the reality or possibility of such game-changers.
The response to this is simple. Sure, there could be some game-changers, but we should be sceptical about any claim to the effect that a particular treatment is a game-changer. There are significant incentives at play that encourage people overhype new discoveries. Few of the alleged breakthroughs in the past couple of decades have been game-changers. We also know that most new interventions fail or have small effect sizes when scrutinised in depth. Consequently, a priori scepticism is warranted.
That brings us to the end of the argument. To briefly summarise, medical nihilism is the view that we should be sceptical about the effectiveness of medical interventions. There are three reasons for this, each corresponding to one of the key probabilities in Bayes’ Theorem. The first reason is that the prior probability of a treatment being effective is low. This is something we can infer from the long history of failed medical interventions, and the fact that there are relatively few medical magic bullets. The second reason is that the probability of the evidence for effectiveness, given the hypothesis that an intervention is effective, is low. We know this because the best available evidence concerning medical interventions suggest they have very small effect sizes, and there is often a lot of discordant evidence. Finally, the third reason is that the prior probability of observing evidence suggesting that a treatment is effective, irrespective of its actual effectiveness is high. This is because medical evidence is highly malleable, and there are strong incentives at play that encourage people to present positive evidence and hide/ignore negative evidence.
* For Bayes afficionados: yes I know that this is the short form of the equation and I know I have reversed the order of two terms in the equation from the standard presentation.