Just over three years ago, when few had heard of him, Dr No wrote a post called The Collapse of the Probability Function. At its heart lies the troublesome paradox that, while we might know how a group of patients might fare, we have no way of knowing how individual patients will fare. We might know that of a hundred patients, five will die in the next ten years from a heart attack. What we don’t know is who of the hundred will be the five; and the flip side of that is, when as doctors we choose to intervene, as increasingly we do, there are ninety five souls now tangled in our medical web, with all that that entails, be it tests, treatments and general apprehension, who were never going to have a heart attack anyway, let alone die from one in the next ten years. That’s a whole lot of medical intervention without any benefit whatsoever – but what the heck – overall, we might save a handful of lives – or so the hopeful reasoning goes.
This problem – and major problem it is – of not knowing who will benefit, and for that matter who will be harmed also lies at the heart of the screening debate, which has once again been re-ignited by a ‘new’ report on the benefits and harms of breast cancer screening, ‘new’ being qualified because, though the report is new, the data it is based on is old. The arguments for and against screening symptom free women of a certain age for breast cancer have gone up and down like a tired see-saw for decades. Screening evangelists, cancer charities and cancer specialists, will insist that not to be screened is to play Russian roulette, just as the sceptics will warn that to be screened enters you in an alternative game of Russian roulette, with its risk of over-diagnosis and unnecessary, and potentially harmful of itself, treatment. But what neither side tells you is that, for the vast majority of those screened, screening will make not one jot of difference.
Exactly the same dilemma applies to the vast majority of patients who take medication: for most patients, the drugs don’t work. To many, this might seem a counter-intuitive, not to mention heretical, statement, but it is true. To understand why it is true, we need to consider the science – evidence – that we are supposed, in this enlightened age of evidence based medicine, to use in deciding whether or not to prescribe.
The evidence comes from clinical trials. The best of these are, to coin a bit of a mouthful, randomised double blind placebo controlled trials. What the mouthful is about is doing our best to remove the influence of bias (systemic influences that twist) on our results. Patients are randomly allocated to receive the active drug or a dummy pill, and in each case neither the patients nor their doctors know who gets the drug, and who the dummy. Patients are followed for a time, and the number of events – say heart attacks – counted for each group. The results can be expressed in various ways – a relative risk reduction (ten percent fewer deaths in patients receiving the active drug), an absolute number (sixteen fewer deaths) or, increasingly, as the NNT, the number needed to treat (we need to treat forty patient to prevent one death). The figures are mathematically related, but of the three, the NNT is, in information terms, the richest, but not always the most popular. Fifty percent fewer deaths, or sixteen hundred lives saved, sounds racier than the NNT is forty.
Now, let us consider what an NNT actually tells us. Real world NNTs vary tremendously. Those in single figures are considered excellent. Many common preventative drugs have NNTs in tens or even hundreds of patients: the NNT for a statin to prevent a heart attack over five years in people without a history of heart disease is, for example, around (estimates, unsurprisingly, vary) sixty. Around sixty patients need to take a statin for one to benefit; and that means – Dr No hopes by now you are getting the point – in fifty nine there was no benefit: the drug didn’t work as intended. In fact, some patients are harmed, but that is another story for another day, for today’s post is about whether drugs work, not whether they do harm.
The perfect NNT is one: you only need treat one patient to have one patient benefit. Such a drug, we can confidently say, does work. But the moment the NNT starts to rise above one, we find increasingly that for most patients, the drug doesn’t work. Even when the NNT is excellent, in single figures, for most patients the drug doesn’t work.
Let us consider – as an example – the NNT for aspirin for period pain, which is around 9 (never mind ibuprofen has a better NNT, at around 3 – the numbers are easier to appreciate with a slightly bigger NNT). Trials, of course, are ideally done on large numbers of patients, but what happens if we boil down the figures to the minimum needed to generate an NNT? [Empirical note: those who want to play around with the numbers can find an online RRR/NNT calculator here: select ‘Randomized Controlled Trial’ in the drop down box, and note that with small numbers of patients you will get comedy p/CI values.]
The trial, recall, has two groups: those who receive the active drug, and those who don’t. In our hypothetical boiled down example, we might have nine patients who receive aspirin, and nine who do not. Of those who do not receive aspirin, four are pain free at two hours, and five are not; of those who did, five are pain free, and four are not. Taking aspirin has converted one patient from still in pain to pain free.
But what of the other patients? From the control (no aspirin) group we know that four would be pain free anyway, even without aspirin; and from the treated group we know that four will still be in pain, despite taking aspirin. For eight of the nine patients in the active drug group, taking aspirin made no difference: they were going to be in pain or pain free, whether they took aspirin or not. The aspirin made no difference, which is why Dr No can confidently say, by a generous – some might say wild, even reckless – extrapolation, for most patients, most drugs don’t work.
This, then, is the paradox at the heart of modern evidence based population focused medicine. Whether we are looking at screening or treatment, we have to accept that, for most of those we screen or treat, our intervention makes not one jot of difference. They were going to get better or die, whatever we did.
That is not to say we should never screen or treat. Of course not. But it does mean perhaps we should cut the paternalistic crap, and fess up that, most of the time, for most patients, medical intervention makes no difference. Somewhat paradoxically, the more we understand what evidence based medicine is really telling us, the more we should dump the bluff, provide the facts, and let the patient decide.
Anon – the ones who get better do so for one (or more – they are not necessarily mutually exclusive) of three reasons: the natural history of their condition (‘self-limiting’ in the jargon), a placebo effect or the effect of an active drug. The randomised double blind placebo controlled trial is supposed to create a setup in which any difference in outcome (at whatever statistical significance ie p value it has) can only really be attributed to the active drug.
The randomisation is meant (among other things) to ensure each group has the same number who are going to get better anyway; the double blinding (neither patient not doctor knows who is getting the active drug) is supposed to ensure any placebo effect applies equally to drug and placebo and the placebo is there to, ahem, create a placebo effect in those who don’t get the active drug. The p value gives an estimate of how (un)likely the result was if in fact there is no difference in effect (the ‘null hypothesis’), and so gives us an idea of the influence of chance (low p value = unlikely to happen by chance = difference likely to be due to drug).
That’s the idea, but there may still be other things going on. There was a rather interesting short report in the Lancet years ago (tiresomely behind a paywall) that suggests that a clinician’s knowledge of possible treatments can influence the placebo effect. Even in double blind trials, efficacy appears to be influenced by the clinician’s (but not patient’s) expectations of possible treatment.
BTB – clinical trials measure efficacy – how effective drugs are in ‘lab conditions’. Effectiveness is to do with how drugs work in the real world (where people forget to take drugs etc) and, strictly speaking, efficiency is about economic efficiency, as in how much benefit for a given unit cost.
The formal drug approval processes (and NICE) are supposed to take all these e-words into account, and perhaps mostly they do, but there are still big problems, not least of which is publication bias, otherwise known as burying bad news (as in only favourable trials are published). Ben Goldacre has recently highlighted this long-standing problem (Dr No remembers discussing it in the eighties): the real scandal is, given it has recognised as a problem for so long, why is it still going on?
A classic example of the importance of publication bias is perhaps antidepressants. Most GPs and shrinks probably think they work (otherwise, why prescribe them in such huge numbers?) but a couple of studies published in 2008 (1, 2) which went out of their way to find and include all trials including negative ones showed that previously published data seriously over-egged the apparent efficacy of antidepressants. Indeed, the Hull study found “virtually no difference [between drug and placebo] at [mild to] moderate levels of initial depression to a relatively small difference for patients with very severe depression, reaching conventional criteria for clinical significance only for patients at the upper end of the very severely depressed category” – hardly a ringing endorsement for prescribing ADs on an industrial scale to all and sundry.
Whatever happened to the days when your doctor could prescribe you something mysterious with a fancy Latin name on the bottle that was just sugar pills? An elderly relative of mine was a pharmacist and remembers dispensing them many, many years ago. I wonder how the unwanted side-effects compare? (After taking into account that people can attribute just about any symptom to drugs they happen to be taking.)
Of course, these days the sugar pills would have to be in bubble strips in a packet with a brand name and lots of pretty colours and long words together with a small-print leaflet telling you everything and nothing. But the active ingredient could be disguised by tweaking the name a bit.
A new business proposition for you, Dr No?
The ethics of prescribing a placebo are interesting.
In practice when wanting to prescribe a placebo, I usually do it with Lifestyle advice. For example I recommend the consumption of plenty of green vegetables and to cut out saturated fat. Patients are happier if they feel that they have some control over their condition so it gives them a way of believing that they can influence events. It has the advantage of being suitable advice for almost all of us for all times! I sometimes find more specific advice for more obsessional patients.
Thank you for this post. Very informative.
For those who are interested – Dr No has noticed that the NNT calculator linked to (and yes, the link was missing for a while) in the post above (which is based at the prestigious Canadian Centre for Evidence-Based Medicine) appears to miscalculate NNTs (although it does get the ARR right). For example, if you put in
50 50 (50/100 got better in the treated group)
40 60 (40/100 got better in the placebo (control) group)
the ARR is 0.1 (10%) and so the NNT should be 10 (1/0.1) but in fact it shows up as 11…with similar errors for other numbers. OK, the error isn’t huge, but…
Just goes to show, perhaps, even the most prestigious EBM can’t always be relied on – or more generally, even simple black boxes (web based computer calculators) need rigorous checking. When it comes to the comedy computer models so beloved of economists, one can only wonder at the opportunities for error.
Dr No has notified the Centre of the apparent error and awaits its reply.
The Centre for EBM didn’t reply, but in the meantime a private correspondence Dr No has been engaged in has highlighted the fact that while Dr No’s assertion – most of the time, most drugs don’t work – is true, it isn’t necessarily the whole truth.
In the post, Dr No said: “But what of the other patients? From the control (no aspirin) group we know that four would be pain free anyway, even without aspirin; and from the treated group we know that four will still be in pain, despite taking aspirin. For eight of the nine patients in the active drug group, taking aspirin made no difference: they were going to be in pain or pain free, whether they took aspirin or not. The aspirin made no difference, which is why Dr No can confidently say, by a generous – some might say wild, even reckless – extrapolation, for most patients, most drugs don’t work.”
Now, we are so used to thinking in terms of placebo controlled RCTs, with their active treatment/placebo groups, we do perhaps forget a third (hypothetical, invisible) group: those who took no treatment at all. Let us now imagine that group incorporated (and the NNT 10 to make the sums easy; calculated NNTs below rounded to nearest whole number):
No treatment at all: 2/10 get better
Placebo: 4/10 get better
Active drug: 5/10 get better
The placebo-active drug ARR is 0.1, NNT = 10 (so far, so good…)
The no treatment-placebo ARR is 0.2, NNT = 5 (!)
The no treatment-active drug ARR is 0.3, NNT = 3(!!!)
So when Dr No said “For eight of the nine patients in the active drug group, taking aspirin made no difference: they were going to be in pain or pain free, whether they took aspirin or not”, it doesn’t tell the whole story. It is true to say that aspirin itself benefited only one of the patients, but taking any pill (be it aspirin or placebo) benefited three patients, one of whom benefited from aspirin’s pharmacology, and two of whom benefited from the placebo effect.
But that still leaves six (of the original nine/hypothetical ten) unaffected – a majority – so Dr No thinks he can still get away with saying that, for most patients, most drugs don’t work.
yeah right. they don’t work much if you are taking it wrongly. so better careful when taking your meds or drugs.
Really great blog – although I do not know English well is a subject interests
me. In Poland we do not have such a well-developed blogs – Sorry for the mistakes, but I have a
bad English
It is a fact that most of the drug does not work in their screening test perhaps. This problem of not knowing who will benefit, and for that matter that will be harmed lies at the heart of the screening debate
http://clickintobusiness.com
Read this to know more of this wonderful article.
google.com