You are hereBlogs / dr-no's blog / The Cart and the Horse

# The Cart and the Horse

Sir Liar â€˜Tombstoneâ€™ Swansong, ex-CMO-elect, has let it be known that he intends to use his retirement to persuade government to impose a binding minimum price for alcohol, in the hope of curbing alcohol related harm. A figure of 50p per unit sold has been suggested â€“ which would raise the minimum price for a bottle of 12% ABV wine to Â£4.50, up some 50% on todayâ€™s minimum prices.

The government â€“ fearful of loosing votes â€“ has announced that it has no intention of penalising Middle England in the hope of cutting heavy drinking and alcohol related harm. But â€“ ironically â€“ that is what Tombstone and the health tyrants say must happen, if we are to stem the rising tide of alcohol that they say threatens to drown our nation. And Tombstone has a knack of getting his way.

Now, tyrants are not usually first in the queue when it comes to honest use of science, so let us look a little at what they have to say, and see how the science backs up their claim.

The seminal work on the relationship between average values in a population and their relationship to extreme values was done by an epidemiologist called Geoffrey Rose. In a charmingly titled 1985 paper, Sick Individuals and Sick Populations, Rose separated out what he called â€˜high riskâ€™ (double-entendre no doubt intended) prevention strategies â€“ strategies that target high risk individuals â€“ from population strategies, where the aim is to shift risk factors in a favourable direction at a population level â€“ and thereby, one hopes, reduce the incidence of disease. Population approaches, opined Rose, had powerful advantages: they were â€˜radicalâ€™, had â€˜large potentialâ€™, and were â€˜behaviourally appropriateâ€™; and had none of the disadvantages of â€˜high-riskâ€™ strategies.

Roseâ€™s 1985 paper was more by way of an introduction to his thinking than a research paper. Five years later he co-authored an even more charmingly titled paper: The population mean predicts the number of deviant individuals â€“ note that sick individuals are now *deviant* individuals â€“ in which he showed, unsurprisingly, that populations with high mean values of â€˜xâ€™ had more individuals with very high values of â€˜xâ€™. Populations, for example, with a high average body mass index (BMI) tend to have more obese individuals in them than populations with lower average BMIs.

This is about as earth-shattering as observing that populations with high average height have more very tall individuals in them. It is what we would expect, and indeed the figures â€“ for blood pressure, BMI, alcohol consumption and sodium intake, the variables included in Roseâ€™s 1990 study - bear this out. In a strict, mathematical sense, it is true to say that the population mean predicts the number of deviant individuals. For alcohol, the sums are simple: the percentage of heavy drinkers in a population can indeed be predicted by dividing the mean alcohol consumption (in ml/wk) by ten. A country with a mean weekly alcohol consumption of 150ml will indeed have some 15% of the population classified as heavy drinkers.

Rose then reasoned that, because the population mean predicts the number of deviant individuals, any intervention that reduced the population mean would cause a reduction in the number of deviants: thus, a ten percent reduction in average alcohol consumption would naturally lead to a ten percent reduction in the number of heavy drinkers. And so, in the fullness of time, we have Tombstone and his ilk urging government to hike the price of alcohol, on the grounds that higher priced alcohol means lower average consumption â€“ and so less â€˜deviantâ€™ individuals.

But Rose â€“ and all those who have since jumped on the Rose band-wagon â€“ have made a fundamental epidemiological error. Certainly, within the limits of his 1990 study (it is not without other methodological concerns), Rose did show an association between mean population levels and the number of deviants. But â€“ and Rose and Tombstone should know better than to ignore this fundamental epidemiological doctrine: *association is not the same as causation*.

The data Rose used in his 1990 study was cross-sectional data â€“ that is, data that represents a snapshot in time. Such data can, and did, show, an association, but it can tell us nothing about *why* the association exists. It could be that a determined cadre of â€˜deviantâ€™ individuals â€“ say a subgroup with an unusually marked and invariable tendency to great height â€“ pulls, by a simple mathematical effect, the mean height upwards. Or it may be that higher average height does indeed lead, on average, to more people of exceptional height. Or it may be something altogether different â€“ say that better nutrition in childhood leads to better growth for all â€“ and so both higher average height, and more exceptionally tall people.

Because we cannot know what is *causing* the association, we cannot make any predictions about how varying one variable â€“ say average height â€“ will affect the other variable â€“ numbers of very tall people. And, by the same token, we cannot know that reducing average alcohol consumption â€“ by a 50% hike in alcohol price for all drinkers - will curb the numbers of heavy drinkers. It is, instead, mere speculation that it might do so. Or it might not. We just donâ€™t know.

Dr No has no doubt that alcohol can be profoundly harmful. But basing wide-sweeping public health policy - policy that will encroach on three quarters of the entire population, the majority of whom are not harmed by alcohol - on mere speculation is bad policy, not to mention **bad medicine**. It damages professional credibility. Tombstone should lighten up, enjoy a tipple, and do what most people do when they retire â€“ retire.

As you say Rose's research seems to say what is blindingly self evident about predicting numbers of outliers (or "deviants") based on population mean values. However this must to some extent assume that the distribution is normally distributed around the mean and not skewed.

I guess his research might have been sufficiently powered to "prove" that the measures he selected were in fact normally distributed in this way. However as I think you suggest it is likely that alcoholic consumption is not normally distributed and will include a large number of die hard heavy drinkers (self harmers?) at one tail.

PoH - you have spotted some key issues about this data set and its analysis. Dr No has no wish to turn this into a full blown stats toot, but he is so regularly flabbergasted but the statistical ignorance amongst his colleagues that he is going to make a few brief points.

The first and most important one, made in the post itself, which should not be forgotten, is that

correlation tells us nothing about causation. Even when A and B - lets make A weekly tobacco consumption and B weekly alcohol consumption - are highly correlated, we have no way of knowing (let us make C some index of social deprivation) whether:A causes (influences) B

B causes (influences) A

A and B are caused (influenced) by C, but not by each other

â€“ or whether the result is simply down to chance.

A classic example of over-interpreting correlation is the correlation between the price of petrol and the divorce rate. The sums are correct â€“ divorce rate does correlate with petrol price â€“ but the interpretation â€“ that high petrol prices cause high divorce rate â€“ is plain wrong. To infer a causal link is simply not justified.

This is the real point of this post: however good the correlation, however valid the sums, the fact is you cannot infer causation from association. You cannot infer that lowering mean consumption will lower the number of heavy drinkers.

Having said that, you are right to raise the question of distributions.

Noneof the four plots in the paper suggest Normal distributions (systolic blood pressure gets closest, but is probably skewed) and so strictly speaking, Rose shouldnâ€™t be using parametric methods (Pearsonâ€™s correlation - let alone linear regression â€“ at least three out of the four plots arenon-linear!) although he does mention (and then dump) Spearmanâ€™s correlation coefficient (which is a rank-based non-parametric coefficient) and non-linear models.Had they been skewed Normal distributions (which the systolic blood pressure might be), then he could have tried applying a so-called transformation â€“ for example plotting the log values â€“ but he does not do this.

â€˜Powerâ€™ doesnâ€™t really come into a study like this (in the way that it does in the â€˜powerâ€™ of a clinical trial). The related values are the P values and/or confidence intervals (which are related to the number of individuals sampled), which are conspicuous by their absence (apart from in Table II). We simply donâ€™t know, for example, the sample sizes behind the individual data points. Maybe the alcohol outlier (Mexico) had more to do with too much tequila (and perhaps seeing double) than actual numbers sampledâ€¦

And so onâ€¦the study has more holes in it that a Swiss Cheese. But that is not the key point â€“ the key point is

donâ€™t infer causation from association!