31 May 2011

Unhealthful News 151 - Logic and proportion have fallen sloppy dead

I figure I need to do a tobacco example since this is World No Tobacco Day, brought to you by the UN's champions of UN (that is, United Nations' and Unhealthful News, respectively), the World Health Organization.  Over at the tobacco harm reduction blog, Paul Bergen wrote a great translation of this year's WHO statement about WNTD (one might call it a parody, but that would understate how fundamentally accurate it is).  We started TobaccoHarmReduction.org five years ago today, in response to WHO's anti-harm-reduction WNTD that year, and if anything they have only become more unhealthful since then.  Almost as funny was WHO doling out WNTD awards to its own peeps, funny because I saw it reported on twitter with the hashtag #mutualmasturbation.

Today, in keeping with the anti-tobacco extremist mission – attack all forms of nicotine use except smoking, because the popularity of smoking helps support their abstinence-only agenda – their pet reporters seem to have focused on low-risk smokeless alternatives and hookah smoking.  I am writing about the New York Times's "contribution" to this, mostly because it is a great example of the WHO myths, as well as the funniest.

The humor commences with the human interest hook in the lead, about a physics major here at University of Pennsylvania who has become such a fan of hookah smoking that he bought one for his fraternity.  He was quoted:
[He] believes that hookah smoke is less dangerous than cigarette smoke because it “is filtered through water, so you get fewer solid particles.”
Let's think about that.  "Filtered" by moving fairly large bubbles through a couple of inches of water.  Yes, some of the particles will touch the surface of the water and stick or dissolve, but most of them do not.  It is more efficient than trying to clear the smoke in your kitchen from a cooking mishap by running water in the sink, but the same basic idea.  Perhaps physics is not the strong field at our local supposed-part of the Ivy League.  I guess we can hope he specializes in quantum theory rather than something that would educate about everyday physical chemistry.  Or maybe the bit about the fraternity explains the problem.

Anyway, his conclusion that hookah smoking is less harmful than cigarettes is a reasonable guess, though we do not know enough about the risks to say for sure.  But if so, the main reason would be that it is mostly a heat-not-burn system that products fewer combustion products.

After that hook, reporter Douglas Quenqua takes over the misleading claims himself.  He starts with the WHO propaganda (pdf) that an hour-long session of hookah smoking is like smoking 100 cigarettes.  Really?  It is bad enough that NYT (and most other) health reporters do not fact-check, but it is pretty sad when they do not think.  What the WHO report is claiming is not about health effects, because no such information exists, though that is what they are trying to imply.  It is really just about total amount inhaled.  But think about what it would take to inhale the equivalent of the smoke from 100 cigarettes.  About ten breaths a minute times sixty minutes is 600 breaths.  It is hard to smoke an entire cigarette in only six puffs, let alone 100 in a row, so it would be almost impossible to inhale that much in an hour.  And that does not even take into consideration that hookah smoke is usually far less concentrated.

The WHO numbers seem to be based on the total amount of inhaling that someone does in an hour, as if the hookah were being used as a scuba hose.  It might theoretically be possible to smoke a hookah as intensely as the WHO claims is typical, but it would be very difficult, and you would have to have a Nascar-style pit crew to quickly change the tobacco and charcoal when they ran out, several times.  This certainly is in no way similar to actual hookah smoking, by well over an order of magnitude, and probably more like two, as would be obvious to anyone who had bothered to witness it before writing a news story about it.  If he had gone as far as to try it, he would have discovered that each hit from a hookah is far less intense than one from a cigarette.  Doing all of your inhaling for an hour by drawing on a hookah would be unrealistically extreme, but doing it by drawing on cigarettes is completely absurd.

The report goes on:
That study also found that the water in hookahs filters out less than 5 percent of the nicotine.
Well, that is good news, since nicotine is the good part of smoking, not the harmful part.  Though apparently the New York Times is not aware of that, since this is presumably intended to suggest that a bigger number would be better.
Moreover, hookah smoke contains tar, heavy metals and other cancer-causing chemicals.
It might have been useful for the reporter to look up "tar" before writing that conjunction, so that he would know that it refers to the particulate phase of the smoke/vapor – i.e., the bit with the heavy metals and other cancer-causing chemicals.
An additional hazard: the tobacco in hookahs is heated with charcoal, leading to dangerously high levels of carbon monoxide, even for people who spend time in hookah bars without actually smoking, according to a recent University of Florida study.
It is plausible that there is lots of CO produced, but "dangerously high levels" tends to imply acute poisoning, which readers presumably understand, having heard about it occurring in household accidents.  But this scare tactic ignores the lack of any reports, to my knowledge, about adverse acute effects (not that CO is good for your heart and the rest of your body in the medium term).  It would be nice if someone offered some non-propaganda analysis of this, but the press is certainly not going to help with that.
And because hookahs are meant to be smoked communally — hoses attached to the pipe are passed from one smoker to the next — they have been linked with the spread of tuberculosis, herpes and other infections.
Practices vary certainly, but I have only seen situations where everyone has their own mouthpiece they insert for their turn.  In any case, tuberculosis?  If that is floating around the Penn student body (or the other American youth that were the focus of the story), transmission by hookah is not exactly the major concern.
Paul G. Billings, a vice president of the American Lung Association ... calls the emerging anti-hookah legislation a “top priority” for the lung association.
You would think they would be a bit more worried about the cigarettes that certainly cause (as opposed to the weak evidence about hookahs) lung cancer and COPD, and that remain about a thousand times more popular.  But since the ALA is opposed to the use of smoke-free THR products to reduce lung diseases (and other diseases) it is clear that they are not really about lungs, or public health.  They are just pursuing a political agenda that their donors are being tricked into supporting.  Funny that reporters do not know how to probe, as I learned in middle school journalism class, perhaps asking the lung guy how much evidence there is that casual hookah use causes serious risk of lung disease, or maybe why it is the top priority.

The balance of the article goes on to recount how various localities are pursuing various forms of hookah bans, particularly focused on hookah bars, to eliminate what is slowly becoming a more popular pub-like center of social gathering for some young American adults.  Banning is, of course, the preferred and most effective solution when young people are doing other than what they are told.  If the trends and the bans are both popular enough, maybe it will lead to a generation that has a healthy distrust of the honesty and motives of those in power.  Maybe a few of them will become reporters.

(Oh, and in case that title was too obscure, it is a lyric from the Jefferson Airplane's Go Ask Alice, the song with a hookah smoking caterpillar.  The current discourse on tobacco has far too much in common with Alice's adventures.)

30 May 2011

Unhealthful News 150 - Understanding (some of) the ethics of trials and stopping rules, part 3

A few days ago I made a comment about clinical trial stopping rules being based on dubious ethics, as practiced.  In part 1 and part 2, I made the following points: clinical trials almost always assign some people to an inferior treatment, an action which serves the greater good by giving better information for many future decisions; hurting people for the greater good is not necessarily unethical, but pretending you are not doing it seems indefensible; people who do clinical trials and act as "medical ethicists" for them usually deny that they are hurting some subjects, though this is clearly false; clinical trials are an example of what are known as "bandit problems" (a reference to playing slot machines), characterized by a tradeoff between gathering more information and making the best use of the information you have; there is a well-worked mathematics for optimizing that tradeoff.

Today I will conclude the analysis, staring by expanding on that last point.  The optimization depends on every variable in sight, notably including how many more times you are going to "play" (i.e., make the decision in question), as well as more subtle points like your prior beliefs and whether you only care about average payoff or might care about other details of distribution of payoffs (e.g., you might want to take an option whose outcomes vary less, even though it is somewhat inferior on average).  Obviously the decision predominantly depends on how much evidence you have to support the claim that one option is better than the other, on average, and how different the apparent payoffs are, but the key is that that is not all it depends on.

I hinted at some of this point yesterday, pointing out that you would obviously choose to stop focusing on gathering information, switching over to making the apparently best choice all the time, at different times when you were expecting to play five or a thousand times.  Obviously the value of information varies greatly, with the the value of being more certain of the best choice increasing with the number of future plays.  On the more subtle points, if you are pretty sure at the start that option X is better, but the data you collect is favoring option Y a bit, you would want to gather a bit more data before abandoning your old belief, as compared to demanding a bit less if the data was tending to support X after all.  And if the payoffs are complicated, rather than being simply "win P% of the time, lose (100-P)% of the time", with varying outcomes, maybe even some good and some bad, then more data gathering will be optimal.  This is the case even if you are just concerned with the average payoff, but even more so if people might have varying preferences about those outcomes, such as worrying more about one disease than another (I have to leave the slot machine metaphor to make that point).

So, stopping rules make sense and can be optimized mathematically.  That optimization is based on a lot of information, but thanks to years of existing research it can be done with a lot less cost than, say, performing medical tests on a few study subjects.  So there is no excuse for not doing it right.

So what actually happens when these stopping rules that are demanded by "ethics" committees are designed in practice?  Nothing very similar to what I just described.  Typically the rule is some close variation on "stop if, when you check on the data gathered so far, and one of the regimens is statistically significantly better than the other(s)".  Why this rule, which ignores all of the factors that go into the bandit optimization other than how sure you are about which regimen is best, based only on the study data, ignoring all other sources of information? 

It goes back to the first point I made in this exploration, the fiction that clinical trials do not involve giving some people a believed-inferior regimen.  As I noted, as soon as you make one false declaration, others tend to follow from it.  One resulting falsehood that is necessary to maintain the fiction is that in any trial, we actually know and believe absolutely nothing until the data from this study is available, so we are not giving someone a believed-inferior treatment.  A second resulting falsehood is that we must stop the study as soon as we believe that one treatment is inferior.  An additional falsehood that is needed to make the second one function is that we know nothing until we reach some threshold ("significance"), otherwise we would quit once the first round of data was gathered (at which time we would clearly know something). 

The first and last of these are obviously wrong, as I illustrated by pointing out that an expert faced with choosing one of the regimens for himself or a relative before the study was done would never flip a coin, as he would pretty much always have an opinion about which was better.  But they do follow from that "equipoise" assumption, the assumption that until the trial gives us an answer, we know nothing.  That assumption was, recall, what was required to maintain the fiction that no group in the trial was being assigned an inferior treatment.

As for stopping when one regimen is shown to be inferior, based on statistical significance, I believe this is the most compelling point of the whole story:  Based on the equipoise assumption, the statistical significance standard basically says stop when we believe there is a 97.5% chance that one choice is better.  (It is not quite that simple, since statistical significance cannot be directly interpreted in terms of phrases like "there is an X% chance", but since we are pretending we have no outside knowledge, it is pretty close for simple cases.)  So what is wrong with being this sure?  Because it is pretty much never (I have never heard of an exception) chosen based on how many more times we are going to play – that is, how many total people are going to follow the advice generated by the study.  If there are tens of millions people who might follow the advice (as is the case with many preventive medicines or bits of nutritional advice), then that 2.5% chance of being wrong seems pretty large, especially if all you need to do is keep a thousand people on the believed-inferior regimen for just a few more years. 

But *sputter* *gasp* we can't do that!  We cannot intentionally keep people on an inferior regimen!

Now we have come full circle.  Yes we can do that, and indeed always do that every time we start a trial.  That is, we have someone on a believed-inferior regimen because we never prove the answer to an empirical question.  There is nothing magical about statistical significance (or any variation thereof) – it is just an arbitrary threshold with no ethical significance (indeed, it also lacks any real significance statistically, despite the name).  It usually means that we have a stronger believe about what is inferior when we have statistically significant data as compared to when we do not, but there is no bright line between ignorance and believing, let alone between believing and "knowing". 

So, if we are asking people to make a sacrifice of accepting assignment to the believed-inferior option in the first place, we must be willing to allow them to keep making the sacrifice after we become more sure of that belief, up to a point.  But since there is clearly no bright line, that point should start to consider some bandit optimization, like how many plays are yet to happen.

This is not to say that we should just use the standard bandit problem optimizations from operations research, which typically assume we are equally worried about losses during the data gathering phase as during the information exploiting phase.  It is perfectly reasonable that we are more concerned with the people in the trial, perhaps because we are more concerned with assigning someone to do something as compared to merely not advising people correctly.  We would probably not except nine excess deaths in the study population (in expected value terms) to prevent ten expected excess deaths among those taking the advice.  We might even put the tradeoff at 1-to-1000, which might justify the above case, making 97.5% sure the right point to quit even though millions of people's actions was at stake.  But whatever that tradeoff, it should be reasonably consistent.  Thus, for other cases where the number of people who might heed the advice is only thousands, or a hundred million, the stopping rule should be pegged at a different point.

So there is the critical problem.  Whatever you think about the right tradeoff, or how much to consider outside information, or other details, there is a tradeoff and there is an inconsistency.  Either we are asking people to make unreasonable levels of sacrifice when there is less at stake (fewer future "plays") or we are not calling for enough sacrifice when there is more at stake.  There is a lot of room for criticism on many other points that I have alluded to, and I would argue that almost all stopping rules kick in too soon and that most trials that accumulate data slowly should also be planned for a longer run (i.e., they should not yet stop at the scheduled ending time), though some trials should not be done at all because the existing evidence is already sufficient.  But those are debatable points and I can see the other side of them, while the failure to consider how many more plays seems inescapable.  The current practice can only be justified based on the underlying patent fiction.

When the niacin study that prompted this analysis was stopped, it was apparently because an unexpected side effect, stroke, had reached the level of statistical significance but perhaps also because there was no apparent benefit.  This one kind of feels like it was in the right ballpark in terms of when to stop – they were seeing no benefit, after all, and there was prior knowledge that make it plausible that there was indeed none.  But imagine a variation where the initial belief was complete confidence in the preventive regimen, and there was some apparent heart attack benefit in the study data, but the extra strokes (which were completely unexpected and thus more likely to have been a statistical fluke) outweighed the benefit by an amount that achieved statistical significance.  Would we really want to give up so quickly on something that we had thought would be beneficial to tens of millions of people?

The situation becomes even more complicated when there are multiple outcomes.  An example is the Women's Health Initiative, the trial that resulted in post-menopausal estrogen regimens being declared to be unhealthy rather than healthy.  It was stopped because the excess breast cancer cases in the treatment group hit the magic threshold.  But there were offsetting benefits in terms of hip fracture and other diseases, so the bottom line was really unclear.  Someone with particularly low risk of breast cancer and high risk of fracture might have still wanted to go with the therapy, but we cannot tease out enough detail because the trial ended too soon.  Whatever we might have learned from continuing longer could have helped millions and really would not have hurt subjects much on net, but now we will never know.  (Arguably the trial had become such a train wreck by the time it ended, with a huge portion of each trial arm being "noncompliant" – i.e., those assigned to hormones having stopped taking them and those assigned to placebo having sought out hormone treatment – and many being lost to follow up.  Still those were not the reasons the study was stopped, and everyone mostly just pretended they had not happened.)

Bottom line:  Pretending that trials do not hurt (in expected value terms) some subjects is unethical.  Engineering their design in a way that provides suboptimal information in order to maintain that fiction is even worse.

29 May 2011

Unhealthful News 149 - Understanding (some of) the ethics of trials and stopping rules, part 2

Yesterday I explained why clinical trials (aka randomized clinical trials, RCTs, medical experiments on people) almost always inflict harm on some of their subjects, as assessed based on current knowledge (which is, of course, the only way we can measure anything).  To clarify, this means that one group or another in the trial experiences harm in expected value terms.  "Expected value" means it is true for the average person, though some individuals might benefit while others suffer loss, and averaging across hypothetical repetitions of the world, because sometimes the luck of the draw causes an overall result that is very different from what would occur on average.

The critical ethical observation about this is that causing this harm is ok.  Some people have to suffer some loss – in this case by volunteering to be a study subject and getting assigned to the believed-inferior regimen – for the greater good.  In this case, the greater good is the knowledge that lets us choose/recommend a regimen for everyone in the future based on the additional knowledge we gained from the study.  There is nothing inherently unethical with causing some people harm for a greater good.  Sometimes that is unethical, certainly, but not always.  If we tried to impose an ethical rule that said we could never make some people worse off for the greater good (or even the much narrower variation, "...some identified people…", a large fraction of human activity would grind to a halt.  Now it does turn out that it is always possible, when an action has a net gain for society, to compensate those who are being hurt so that everyone comes out ahead in expected value terms (for those with some economics, I am referring to potential Pareto improvement).  But it turns out that for most clinical trials, no such compensation is offered and, bizarrely, it is often considered "unethical" to provide it (another pseudo-ethical rule that some "health ethicists" subscribe to, and another story for another day:  they claim that it would be too coercive to offer someone decent compensation to be in a trial, which ...um... explains why it is considered unethical coercion to pay people to work their jobs?)

However, though it is not necessarily unethical to take actions that hurt people, there is a good case to be made that it is per se unethical to hurt people but claim to not be doing so.  Thus, an argument could be made that invading Iraq was ethical even though it was devastating for the Iraqi people (I am not saying I believe that, I am just saying there is room to argue).  But when US government apologists claim that the invasion was ethical because it made the average Iraqi better off, they are conceding the situation is unethical:  Not only are they lying, but they are implicitly admitting that the invasion was unethical because their defense of it requires making a false claim.  Similarly, banning smoking in bars/pubs is the subject of legitimate ethical debate even though it clearly hurts smokers and the pub business.  But when supporters of the ban pretend that pubs have not suffered they are being unethical and are implying that they think the truth ("the bans are costly for the pubs in most places, but we feel the benefits are worth the cost") would not be considered ethical or convincing.

So, it seems that those doing and justifying clinical trials are on rather shaky ethical ground based on their rhetoric alone, because they pretend that no one is being hurt.  This is simply false.  Their claim is that if we are doing the trial then we must not know which of the regimens being compared is better, so no one is being assigned to an inferior choice.  But as I explained yesterday, this is simply false in almost all cases – they are misrepresenting the inevitable uncertainty as being complete ignorance.  But it gets worse, because as is usually the case that once you take one nonsensical step, others follow from it (which you can interpret as either "one false assumption leads to bad conclusions via logical reasoning" or "trying to defend the indefensible usually requires more indefensible steps to patch over the mess you have made").  The stopping rules, as they now exist, are one of those bad steps that follow.

But it occurs to me that I need to explain one more epistemic principle before making the final point, so I will do that today and add a "part 3" to the plan here (you need to read part 1 to know what I am talking about here, btw).  I hope that anyone who likes reading what I write will find this worthwhile.

Clinical trials are an example of the tradeoff between gathering more information about which choice is better and exploiting the information you have to make the apparent best choice.  Yesterday I pointed out that if an expert is making a decision about a health regimen (e.g., a treatment option) for himself or a close relative right now, he almost certainly would have a first choice.  This is a case of just exploiting current knowledge because there is no time to learn more, so the choice is whichever seems to be better right now, even if it only seems a little better and is quite uncertain.  But if we are worried not just about the next member of the target population, but the next thousand or million who could benefit from a treatment or health-improving action, it would be worth resolving the uncertainty some.  The best way to do that is to mix up what we are doing a bit.  That is, instead of just going with the apparently better regimen (which would provide some information – it would help narrow down exactly what the expected outcomes are for that regimen) we seek the additional information of clarifying the effects of the other regimen.

Aside – yes, sorry; it is hard for me to present complicated topics that have subtle subpoints without getting all David Foster Wallace-esque – I already use his sentence structure, after all.  For a lot of trials, one of the regimens represents the current common practice, being used for comparison to the new drug/intervention/whatever of interest.  This is a regimen that we actually already have a lot of data about, and for which more usually continues to accumulated.  Thus, you might say, we can just assign everyone to the other regimen, if it is believed to be better, and use the data about the old standard from other sources.  This is true, and it is yet another epistemic disgrace that we do not make better use of that information in evaluating the new regimen.  But there are big advantages to having the data come from the same study that examined the new regimen.  This is often attributed to the value of randomization and blinding, but the main benefits have to do with people in studies being enough different from average that it is tricky to compare them to the population average.  People in studies experience placebo effects and Hawthorne effects (effects of merely being studied, apart from receiving any intervention, which are often confused with placebo effects – ironically including in the study that generated the name "Hawthorne effect"), and are just plain different.  Thus, though we should make better use of data from outside the study, there is still great value in assigning some people to each of the regimens that is being studied.

The tradeoff between exploiting best-available information and paying the price to improve our information is called a "two-armed bandit problem" (or more generally, just a "bandit problem"), a metaphor based on the slot machine, which used to be a mechanical device with an arm that you pulled to spin real mechanical dials, thus earning the epithet, "one-armed bandit" (this was back before it became all digital and able to take your money as fast as you could push a button).  Imagine a slot machine with a choice of two arms you can pull, which almost certainly have different expected payoffs.  If you are only going to play once, you should obviously act on whatever information you have.  If you are going to play a handful of times, and you have good information about which pays off better you should probably just stick with that one.  If you have no good information you could try something like alternating until one of them paid off, and then sticking with that one for the rest of your plays.  This strategy might well have you playing the poorer choice – winning is random, so the first win can easily come from the one that wins less – but you do not have much chance to learn any better. 

But imagine you planned to play a thousand times.  In that case, you would want to plan to play each of them some number of times to get a comparison.  If there is an apparent clear advantage for one of the choices, play it for the remainder of your plays (actually, if it starts to look like the test phase was a fluke because you are not winning as much in the later plays, you might reopen your inquiry – think of this as post-marketing surveillance).  On the other hand, if it still seems close, keep playing both of them some to improve your information.  The value of potential future information is that it might change your mind about which of the options is better (further information that confirms what you already believe has less practical value because it does not change your choice, though it does create a warm fuzzy feeling).  Now imagine an even more extreme case, where you can keep betting pennies for as long as you want, but eventually you have to bet the rest of your life's savings on one spin.  In that case you would want to play many times – we are talking perhaps tens of thousands of times (let's assume that the effort of playing does not matter) – to be extremely sure about which offers the better payoff.

There actually is an exact mathematics to this, with a large literature and some well-worked problems.  It is the type of problem that a particular kind of math geek really likes to work out (guess who?).  The calculations hinge on your prior beliefs about probability distributions and Bayesian updating, two things that are well understood by many people, but not by those who design the rules for most (not all) clinical trials.

Clinical trials are a bandit problem.  Each person in the study is a pull of the arm, just like everyone that comes after during the "exploit the information from the study to always play the best choice from now on" phase.  Many types of research are not like this because the study does not involve taking exactly the action that you want to eventually optimize, but clinical trials have this characteristic.

You may have seen emerging hints of the stopping rule.  The period of gathering more information in the bandit problem is, of course, the clinical trial period, while the exploitation of that knowledge is everyone else who is or will be part of the target population, now and into the future until some new development renders the regimen obsolete or reopens the question.  The stopping rule, then, is the point when we calculate that going further with the research phase has more costs (assigning some people to the inferior treatment) than benefits (the possibility of updating our understanding in a way that changes our mind about what is the better regimen).  It should also already be clear that the stopping rule should vary based on several different piece of information.  Therein lies part (not all) of the ethical problem with existing stopping rules

I hope to pull these threads together in part 3 (either tomorrow, or later in the week if a news story occurs that I do not want to pass up).

28 May 2011

Unhealthful News 148 - Understanding the ethics of trials and stopping rules, part 1, with an aside about alcohol and the NHS

A couple of people asked me about an allusion I made to clinical trial stopping rules yesterday – rules which are based on a very weak understanding of statistics and epistemology, and thus, arguably, weak ethics – which I said was a story for another day.  But since there is nothing I particularly want to cover in today's health news, I will let today be that day I start the explanation.  (For those looking for a most standard Unhealthful News style analysis, you can find it in the second part of this post where I link to a couple of other bloggers who did that for recent UK statistics about alcohol and hospital admissions.)  Besides, whenever I move something to the category "to do later" it joins such a long list that it is often lost – note that this observation should serve as a hint to those of you who have asked me to analyze something and I said I would get to it: please ask again if you are still interested!  (If you do not want to post a comment, my gmail I use for work is cvphilo.)

Clinical trials (a prettied-up name for medical or health experiments conducted on people) which follow the study subjects for a long period of time (e.g., they give one group a drug they hope will prevent heart attacks and the other group a placebo, and then watch them for years to count heart attacks) often have a stopping rule.  Such rules basically say that someone will look at the accumulated data periodically, rather than waiting until the planned end, to make sure that it does not already clearly show one group is doing better (in terms of the main outcome of the study and major side effects).  If the data support the claim that one group is apparently suffering inferior health outcomes because of their treatment, the argument goes, then it would be unethical to continue the trial and thus continue to assign them to the inferior regimen.  Oh, except those who recite the textbook justification for the stopping rules would probably phrase that as something like "if one treatment is clearly inferior" rather than the much longer version of the conditional I wrote; therein lies much of the problem.

Backing up a couple of steps, to understand the problem it is useful to realize that most trials involve assigning some people to a treatment that is believed to be inferior.  Realizing this is not necessary for figuring out a statistically optimal stopping rule, but it does immediately get rid of a persistent ethical fantasy that interferes with good analysis.  A typical trial involves comparing some new treatment, preventative medicine, or public health intervention to whatever is currently being done.  Almost always this is because those who initiated, funded, and approved of the research believe that the new regimen will produce better outcomes than the old one.  There are other scenarios too, of course, such as comparing two existing competing regimens, but the point is that those with the greatest expertise almost always have a belief about which is better.  If they had to decide, right now, which would be used for the next few decades, ignoring all future information from the trial or any other source, they would be able to make a decision.  More realistically, if they had to decide which regimen to follow/use for themselves, or their parent or child, right now (because what we might learn over the next ten years cannot aid in today's decision), they would be able to make a decision.  Just because we are not sure which regimen is better (or how much better), and thus want to do research to become more sure, does not mean that there is not a prevailing expert opinion.

Many people who fancy themselves ethicists (and many more who just want to do trials without feeling guilty about it) take refuge in a fantasy concept called "equipoise".  The term (which is actually a rather odd jargonistic adoption of that word – not that it is used in conversation anyway) is used to claim that when we do a trial, we are exactly balanced in our beliefs about which regimen produces better outcomes.  Obviously this might be true on rare occasions (though incredibly rare – we are talking about perfect balance here).  But most of the time the user of the word is confusing uncertainty with complete ignorance.  That is, someone obviously feels inadequately certain about which regimen is better, but this is not the same as having no belief at all.  Keep in mind that we are talking about the experts here, not random people or policy makers.  They know what the existing evidence shows and, if forced to make a decision right now about which regimen to assign to a close relative who is in the target population, it would be an incredibly rare case where they were happy to flip a coin. 

Every now and then, there is a case of such incredible ignorance that no one has any guess as to whether a treatment will help or hurt (e.g., this condition is always fatal in a few weeks, so let's just start doing whatever we can think of – the results will be pretty random, but we have nothing to lose), and occasionally a situation is so complex that it starts bordering on chaos theory (e.g., what will the new cigarette plain packaging rule in Australia do? nothing? discourage smoking? expand the black market? provoke a political backlash? reinstate smoking's role as a source of rebellion?).  But such examples are extremely rare.

It is also sometimes the case that no one is being assigned to an option inferior to their best available option had they not been in the trial.  For example, offering a promising new drug – or even just condoms and education – for people at high risk of HIV in Africa, comparing them to a group that does not get the intervention, may hurt no one.  If the researchers only had enough budget to give the treatment to a limited group of people, that group is helped (according to our prior belief) while the other group is right where they otherwise would have been.  Their lack of access to the better regimen is due to their inability to afford a better lot in life, who while they are not helped, they are in no way hindered by being in the control arm of the trial.  (Bizarrely, it is situations like these that often provoke greater ethical objections than cases where people are assigned to a believed-inferior regiment when they could afford to buy either regimen for themselves, but that is yet another story of the confused world of "health ethics".)  Another example is the study I wrote about recently in which some smokers are given snus while others are not; setting aside all that is wrongheaded about the approach of this study, it does have the advantage that one group benefits (at least they get some free product they can resell) and the other is exactly where they would have been had there been no study.  There is a similar circumstance in which the trial only assigns people to the believed-better treatment, with the plan of comparing them to the known outcomes for people not getting that treatment.  This is similar to having a control group that just gets the standard treatment, though people who do trials do not like this approach because the data is harder to analyze (they have to consider the same challenges that exist for observational data).  But all of these cases, while certainly not rare, are unusual.

I will reiterate one point here, in case it is not yet clear (one of the challenges in turning seminar material into written essays is I get no realtime feedback, so I cannot be sure if I have not made something clear):  We are never sure about which of the regimens is better, so we might be wrong.  Handing out the condoms might actually increase HIV transmission; we are pretty sure that is not the case, but it is possible we are wrong.  Or niacin might not actually prevent any heart attacks, even though it seems like it should.  But there is still a belief about what is better when we start.

The bottom line, then, is that most trials involve assigning some people to do something that is believed to produce inferior health outcomes.  Why is this ok?  It is because it is for the greater good.  We want to be more sure about what is the better regimen so we can give better treatment/advice to thousands or millions of people, and so judge that it is ethical to let a few hundred informed volunteers follow the believed-inferior option to do so.  Also we usually want to measure how much better the better regimen is, perhaps because it costs more and we want to decide if it is worth the cost, because we want to be able to compare it to new competing regimens that might emerge in the future, or perhaps just out of curiosity. 

Asking people to suffer for what is declared to be the greater good is, of course, not an unusual act.  Every time someone rights a check to a humanitarian charity, they are doing this, and governments force such choices (taxation, zoning and eminent domain, conscription).  But the people who chatter about medical ethics, and make the rules about trials, like to pretend that they are not doing that.  From that pretense comes the stopping rules, which I realize I have not mentioned yet.  But this is a complex subject and requires some foundations.  I will end that for today and continue tomorrow.

––––––

On a completely unrelated note, for those of you who want some regular Unhealthful News and do not read Chris Snowdon (I know a lot of you do), check out what he wrote, based on what <Nigel Hawkes wrote about a recent UK report that hospital admissions due to alcohol consumption have skyrocketed.  I will not repeat their analysis and do not have much to add to it.  The simple summary is (a) the claim makes no sense because dangerous drinking has decreased a lot, as has alcohol-caused mortality, and (b) it is obvious that the apparent increase was due to a change in the way the counting was done. 

It is pretty clear that the government knew they were making a misleading claim when they released this information.  Their own reports recognized the true observations, but their press release about their new estimate did not.  The National Health Service is on a crusade to vilify health-affecting behaviors they do not approve of.  But governments lie – we know that.  While the commonality of that does not make it any less disgraceful, the greater disgrace belongs to the press that is supposed to resist government lies, not transcribe them.  But, as Hawkes and Snowdon predicted (they wrote their posts right after the report came out, before the news cycle), the press picked up, with hundreds of articles that report the misleading claims and seem to completely lack skepticism (examples here here here here).  This is not a difficult error to catch, either by running a Google search for the blogs that had already debunked the claim before the stories ran, or simply by asking "hey, we know that heavy drinking is way down, so how can this possibly be true?" 

I suppose it is not too surprising that the average reader has no idea what stopping rules do when they read one was employed, let alone what is wrong with them, when the health reporters cannot even do simply arithmetic or fact checking.

27 May 2011

Unhealthful News 147 - Bad news about pharmaceutical niacin, pretty good news about health science

Since I recently wrote about statins, I thought I would follow up with today's story about cholesterol drugs.  It is actually a story of most everything proceeding in a way that makes perfect sense, though it seems to have created a lot of consternation.

Having lower "bad cholesterol" reduces cardiovascular disease (CVD) risk; statins lower bad cholesterol; and trials have shown that taking statins provides the health benefits.  Also, statins do not cost much (apart from the pharma industry profits that can be made via patents) and do not seem to have much downside.  All in all, a straightforward story of preventive medicine.  The story I wrote about was that they seemed not to be doing so well in practice in Sweden, but that was a "hmm, we should try to explain that" moment, not a case of "whoa, it looks like we were wrong."

In a story that turns out to be dissimilar, having higher "good cholesterol" and lower triglycerides reduces CVD risk;vniacin, a B vitamin, raises good cholesterol and lowers triglycerides; but the studies do not seem to show that taking niacin improves CVD risk.  This is disappointing, since niacin is also cheap, though for some people it causes an annoying skin flush and sometimes other superficial side effects.  It is fairly odd to find a case where having a particular physiologic status is good for you, but causing that status is not good for you.  It becomes more likely, though, when the method of causing it departs substantially from what causes it in nature, as it were.  However, this was not the first time a drug to raise good cholesterol failed to have the expected health effect, so it was not totally shocking.

The way this transpired should not be seen as troubling, however, despite the way some new reports have portrayed it.  Consider the sequence of events:  Observational research supported the conclusion that the cholesterol levels in question (when not drug-induced) result in lower CVD risk.  Simple short-term studies supported the conclusion that niacin causes those levels.  Niacin is cheap and low-risk (and those who hate the side effects can rationally choose to not take it).  Therefore it was the obvious rational choice for people to have been taking niacin while awaiting further information.  Further research that connected up the whole proposed causal pathway (niacin causes reduced CVD risk) rather than breaking it into pieces, however, finally suggested there was no benefit.  Oh, well.  Note much harm done, and that is why we do this research: to find out if what we believed before seems to be wrong.

So, what is disturbing about this story?  One issue is that the the study, and apparently the popular medical regimen, consisted of taking Abbott's drug, Niaspan, which is basically niacin with a slow release.  Presumably someone somewhere concocted a reason why this drug should be used rather than cheap generic niacin, but it certainly was not because it was shown to be more effective (obviously: we only just learned how effective it is, which is to say, not at all).  I guess the only good news in this turning of a common nutrient into private profits was that, sadly, if people had just been taking niacin from competitive market sources, the study might have not been done.

Also mildly disturbing was the cessation of the study early because the group taking niacin had a somewhat higher rate of stroke (and no reduction in heart attacks, as was hoped).  In this particular case, the good effects were not happening, so quitting the study was a good idea.  But the rules for stopping studies early because they have become "unethical" are quite misguided – but that is a story for another day.

But since there was no apparent benefit, rather than a complicated uncertain tradeoff between costs and benefits, stopping in this case seemed entirely sensible.  Not so sensible is:
Wells Fargo Securities analyst Larry Biegelsen said the surprise findings could cut Niaspan sales by 20 to 30 percent.
So the message will be "this does not seem to work, so we advise only three-quarters of you to keep buying and taking it"?  You really have to love our medical industry.  Notice also that it is the Wall Street guys who are assessing the effects of this.  I did not notice any broad comments about how this should affect behavior from the medical or public health people.  Health research, in the mind of those in the halls of power, is not primarily a story about health.

Of course, it is possible that some consumers will get a benefit, people different from those who were studied (who had a history of heart disease, and like most trial subjects, were rather different from most of the target population).  I think this is what the study leader was trying to say when interviewed, though it came out as unintentional comedy:
But it's not clear if niacin would have any effect on people at higher risk or those who don't have a diagnosis of heart disease yet but take niacin as a preventive, said study co-leader Dr. William Boden of the University at Buffalo.
 "We can't generalize these findings …to patients that we didn't study," he said.
I would have to say that any study whose results cannot be generalized beyond the few thousand people in the study is really not worth doing.  

Yes, it is always possible that some unstudied types of people will benefit, but there are three strikes here:  No studies show this drug helps, one study shows this drug does no good, and other good-cholesterol-raising drugs have not shown health benefits either.  I think this falls into the category of "stop recommending this unless some new evidence emerges to change our minds."

So if you piece together all the claims, we have a study that showed there is no benefit from causing what is known to be a beneficial difference under other circumstances, which focused on one patented version of common nutrient, which was stopped for no good reason, but could not be generalized beyond a few thousand people, though it is relevant to maybe a million people taking the nutrient for the non-existent benefit, the implications of which are being studied by finance guys rather than health policy makers, and that will end some but not all use of the apparently useless treatment.  And yet, all in all, compared to much of what we see, this story arc is a case of health science working mostly like it should.

26 May 2011

Unhealthful News 146 - Tobacco harm reduction study is apparently designed to fail; it was only a matter of time

In yesterday's post I suggested that a study of statins that failed to find what we would expect to find, based on a lot of prior knowledge, might not have been looking at the data the right way.  (I also preemptively condemned what I am 99% sure will be the reaction of the medical establishment to the study, which is to ignore it without even trying to explain the result because they are sure it is wrong and do not understand the value of the study design.)  Also yesterday I wrote in our weekly readings in tobacco harm reductio at the THR blog about a new study that seemed to be designed to fail.  It is a similar theme, that it is very easy to do a study that purports to look for a phenomenon, but really does not do so.  I think that point would benefit from a bit more Unhealthful News detail.

The key point to keep in mind is that almost all public health science studies (along with most psychology studies, and some other fields) produce results that are much narrower and less interesting than the authors and others interpret them.  A lot of the confusion about this probably stems from medical research (which we read about in the newspaper and which most public health researchers were trained in, rather than in public health research) and physics (which is actually a very unusual science, but it misleadingly taught in school as if it were the canonical science that establishes the methods used in other sciences).  Those who are vaguely familiar with clinical research or physics will think of experiments as being designed to optimally answer a narrow well-defined question, like which of two surgical procedures produces fewer complications on average, or what happens when you swing a pendulum in a gravity field. 

But when we want to know something in most social science realms, we often cannot do the experiment we want.  Want to know what will happen with health outcomes if you lower the salt content of lots of foods?  About the best experiment you can do is to pay a bunch of people to eat as assigned for a few years and see if the ones eating less salt have better health outcomes.  The problem is that the results of that study will be interpreted not as "forcing cooperative people to follow a monitored low-salt diet has health benefits", but as "lowing the salt content of foods will improve health outcomes."  Similarly, "give people condoms and actively bug them to use them, so that it becomes a matter of personal obligation and identity, and HIV transmission goes way down" will be interpreted as "condom distribution and education dramatically lowers HIV transmission".  This is why public health, economics, and other social sciences rely primarily on observational studies, which measure what we are really interested in.  It is not that experiments ("clinical trials") would be too expensive or unethical, as is often claimed; rather, doing the right experiments would be more or less impossible.

The new study was described in the news (unfortunately, I do not know any other good source of information about it, so I have to go with what was reported) as answering,
Can a smokeless product, in this instance Camel Snus, contribute to a smoker quitting cigarettes…? 
That is a pretty broad question isn't it?  (For those readers who may not know, substituting smokeless tobacco for smoking provide nearly all of the health benefits of quitting entire without depriving the user of nicotine.)  The study, by Matthew Carpenter at University of South Carolina obviously will not answer the broad question.  What could answer a question as broad as "can it contribute?"  Putting Camel Snus on the market and seeing what happens – done and underway.

Presumably we can narrow down what the study is really examining, right?  Well, it does not appear that the author is capable of doing so.  A few paragraphs later,
Carpenter's research team wants to learn whether Snus [sic - should not be capitalized except as part of a brand name] leads to quit attempts, smoking reduction and cessation
Again, the product is being given remarkable credit for independent action and perhaps even volition.  Is the product acting, or are we talking about the mere existence of the product?  Obviously whatever they are really doing is much more specific.  As best as I can figure out, they will be gathering a large collection of smokers who are not trying to quit and will give half of them a supply of Camel Snus.  What is wrong with that?  Nothing, so long as the results are interpreted as "what happens over a fairly short period of time if you give random smokers one particular variety of smokeless tobacco, without educating them about why they should want to switch, but kind of implying (to an unknown degree) that they ought to by virtue of the fact that you are giving it to them?"  Of course, that is not how it will be interpreted.  In theory the researchers might be honest about the extremely narrow implications of their research, answering a very limited and artificial question.  Not likely, though:
"The study will provide strong, clear and objective evidence to guide clinical and regulatory decision-making for this controversial area of tobacco control," Carpenter said.
If I did not already know what passes for policy research in tobacco and health, I would assume this was a joke.  Setting aside the fact that there is no such thing as objective evidence (this reflects generally bad science education that cuts across fields), there are still numerous problems with this claim.  The study will provide "strong" evidence?  Really?  A single highly-artificial intervention, with a single moderate-sized population, will tell us what we need to know? 

And even if it could, why are they assuming their evidence will be strong and clear?  Are they already writing their conclusions before they start (not unheard of in tobacco research, but trickier for a study like this than it was for, say, DiFranza's Joe Camel studies)?  Even if you believed that this study could ever give clear policy-relevant evidence for some outcome, Carpenter's assertion depends on him already deciding what the results will be.  Presumably if no smokers switched they would consider this strong evidence of something, and we would probably all agree it was strong evidence if 200 switched.  Somewhere in between is a crossover point (which will vary by observer) where someone would say "hmm, I am not quite sure what to say abou this".  But apparently the researchers plan to keep that from happening, to make sure they have "strong, clear" evidence of something.  I suppose this is fairly realistic, since they do seem to be designing a study that avoids encouraging smokers to switch.

As for creating a "guide" for policy, perhaps you could conclude that the study result will help inform decision making.  Anything can help inform.  But no specific scientific result can guide policy decisions,  and certainly not one as obliquely informative as this one will be.

The study design might not be quite as bad as I am guessing.  There was an allusion to "or another smokeless product".  If they actually give someone a decent sample of many varieties of commercially available smokeless products, and then give them more of whichever they ask for, this at least solves the problem of conflating "not spontaneously attracted to switching" with "does not like the taste of Camel Snus" or "Camel Snus does not deliver enough nicotine fast enough to appeal to most smokers, even though other smokeless tobacco products do".  Studies that force people to choose from only one or just a few varieties of a highly variable consumer product are completely inappropriate for assessing overall preferences for the category. 
"We're just trying to mimic the real-world scenario of a smoker being exposed to these products in their own environment, such as a grocery store," Carpenter said.
It is indeed true that most American smokers (who have been sufficiently lied to about the risks that they would not consider switching to, say, Skoal products, which offer the same reduction in health risk compared to smoking) will see the beckoning of only the new Camel Snus and Marlboro Snus, and not the wide variety of other products that exist.  But why, exactly, would anyone care about the results of an experiment that mimics that situation?  Far worse, though, is the apparent failure to educate the smokers as to why they might want to switch.  Actually, this is a bit ambiguous and maybe they plan to provide some information, or maybe they will see that it is the right thing to do before they start and change their plan.  But between the previous quote and the following one, it does not appear so:
Carpenter said researchers are not trying to encourage the use of smokeless tobacco with the study.
What could possibly possess someone do a study of whether people will adopt a more healthy behavior and not try to encourage them to do it?  Here are some condoms; do whatever you want with them.  Here is a course of statins, but we are not going to tell you how much to take or why you should want to.  Here is your healthy food; you might expect that I am going to suggest you not pour salt and butter on it, but do whatever you want.

But, as Carpenter alluded, the fix is probably in, and the plan by the US government (who is funding this) is to interpret the almost certainly unimpressive rate of switching as evidence that THR does not work, full stop, no caveats.  Frankly it is surprising it has taken this long.  I remember Brad Rodu and I, the first time we ever met, eight or nine years ago, wondering why the anti-THR activists (the anti-tobacco extremist faction) were not cranking out studies that were designed to "show" that smokers who were not interested in becoming abstinent would also not switch to low-risk nicotine products.  These would undermine the efforts of those of us who genuinely cared about public health to educate people about THR   It was clearly easy to design studies that would show no interesting in switching (e.g., by not suggesting there was any reason to switch, not helping people switch, etc.).  I think the failure to pursue this tactic was because the extremists were convinced that they could undermine THR by simply lying to the public and convincing them that there were not low-risk alternatives.  They were largely successful with that for a decade, and no doubt killed many smokers who would have switched if they had not been misled. 

But in the information age, claims that are so obviously false can only stand up so long.  While most people still believe the lies, enough have learned the truth that this tactic is failing and a collection of other tactics has been adopted.  And it turns out that misrepresenting the scientific evidence to claim that low-risk products are not low-risk is rather more difficult than just paying to create "scientific" "evidence" that low-risk products should be banned or discouraged because most smokers do not want to switch to them.  Yes, I know, the logic of the previous sentence is even faultier than the science could be, but that is just how they roll.

25 May 2011

Unhealthful News 145 - Statins prevent heart attacks, except maybe in real life in Sweden?

There is a joke about economists that upon observing something working in practice they immediately set out to try to figure out if it works in theory.  No one ever seems to make a joke of it (perhaps because it is less funny), but a similar observation applies to health researchers.  In their case, they observe that something works in the real world and they wonder if it works in the highly artificial confines of a randomized trial.  What they far too seldom seem to wonder is if the opposite is true, if the semi-theoretical result that is based on trials really works out.

A new study (which does not seem to have made the news, which is probably just as well) in the Journal of Negative Results in BioMedicine looked at statin use and the rate of AMI (acute myocardial infarction – i.e., heart attack).  The study was more in the economic style than epidemiology (which is to say that the authors explained their methods and used a purpose-built comparison rather than just forcing everything into a logistic regression and not explaining what they did).  To summarize the basic result, they did not find a correlation between rate of statin use (across geography, time, and age range) and AMI rates.

This is rather troubling since one of the Accepted Truths of preventive medicine right now is that statins provide substantial benefit at very little cost.  But this information should not be dismissed because randomized trials got a different result.  Randomized trials do not represent the real-world circumstances in which people act.  For economic exposures (i.e., consumer choice or pure behavior – e.g., smoking cessation), trials are often almost useless.  For purely biological exposures (say, something in the water) or attempts to evaluate existing behaviors (such as the effects of an exposure that some people just happen to have, studied by forcing others to be exposed in a trial), this is not such a problem.  Most medical exposures fall somewhere in between – statins have a biological effect, but actually using them as directed is economic (a consumer behavior).

There are some obvious possible stories that make the new result misleading and the trial results exactly right after all.  If statins are used more by subpopulations that need them more (i.e., have more people at higher risk of disease) then there will be a simple confounding problem (called "confounding by indication") wherein high risk causes the exposure, so people with the exposure do worse than average even if the exposure is beneficial.  For a population where most everyone at high and moderate risk are consistently using statins, this confounding would largely disappear.

Another possible explanation is that they did not look at the data correctly.  What they did sounds reasonable, but it is impossible to know that for sure.  For one thing, the rate of fatal AMI seemed to do the "right" thing even though non-fatal AMI seemed to go a little bit the wrong way.  You will recall that I often question whether authors who found a positive result hunted around for a statistical model that generated their preferred outcome.  It should be realized that using the wrong statistical model and getting a misleading negative result is a much simpler exercise.  It is very easy to fail to find something that really exists by analyzing the data wrong.  It is not clear if they authors hunted around a bit to see if maybe their negative result was not so robust if they changed their analysis (that is, if it might be that they just missed it by looking at the data one particular way).

And I think there is some reason to worry.  The authors demonstrate some holes in their knowledge of scientific epistemology.  They wrote:
Results from an ecological study are best not being interpreted at the individual level, thus avoiding the ecological fallacy. However, the results can be used as a basis for discussion and for the generation of new alternative hypotheses.
A disturbing number of people seem to think there is something called the "ecological fallacy" that implies that you cannot draw conclusions about the effect of an exposure on people based on ecological data.  That is simply wrong.  There is one odd way in which ecological data can steer you to an incorrect causal conclusion that is not present for other types of studies, which is that it is possible that having a higher rate of exposure in a population causes a higher rate of the outcome in the population, but for an individual this is not true.  An example is that having more guns in a population causes people to be more likely to be shot by stranger.  However, having a gun yourself does not make you more likely to be shot by a stranger (I am setting aside the fact that it makes you enormously more likely to be shot by a family member or by yourself). 

But oddities like this are rare and usually fairly predictable.  Beyond that, the challenges with ecological data are just the same as with any other study design: measurement error, confounding, etc. There is no fallacy, and usually there is no reason to think there is a "ecological paradox" like with the guns (it is not really a paradox either, but that term is a lot closer to correct than "fallacy").  Indeed, population-wide ecological data has some advantages over other data, creating a tradeoff rather than a clear advantage.  There is no more an ecological fallacy that makes it necessarily worse than there is a "sampling fallacy" that makes other study designs necessarily worse.

As for generating new alternative hypotheses, allow me:  Hypothesis 1 = statins do not work so well when used by regular people in real life as compared to the artificial situation in trials.  Hypotheses 2A (B,C…) = statins do not work as well in subpopulation A (or B or C…) as they do in trial populations.  Hypothesis Variant 3 = this is true in Sweden but not elsewhere.  Hypothesis Variant 4 = the observed lack of correlation will change when there is greater use of statins.  There, done.  I generated the hypotheses.  Shall we get on with figuring out what is really true?

Probably not.  The randomized trials have spoken, and any contrary evidence will be dismissed by those who do not understand it (which includes most of the people who make health policy).  There is probably no harm done in ignoring the other evidence in this case, because even if statins are a bit less impressive than currently thought, they still should be used a lot more.  Still, it is not so reassuring that the reaction to this from those who tell millions of people how to live healthier will likely be to ignore it because it must be wrong, rather than to act like scientists and make the effort to assure themselves by figuring out why this result occurred.

24 May 2011

Unhealthful News 144 - New Karolinska anti-snus study, part 2

I decided to follow up on yesterday's comments about the new study that used sneaky statistics to try to blame the weight-gain effects of men's typical lifecycle on snus after I talked to Brad Rodu about some other interesting aspects of the study.  We agreed that as Karolinska anti-snus broadsides go, it is not very important, but there are a few more interesting lessons about epidemiology and epidemiology-based propaganda to be found in it.  (Notes:  I am not going to repeat the background that I covered yesterday, so this post realy requires reading that one.  A couple of these points are Brad's but I am not going to specifically attribute out of concern for accidentally misrepresenting, but he is invited to claim credit or clarify in the comments if so desired.)

Yesterday I commented about how the results will likely be misused should the anti-snus propagandists decide to adopt it, that they will likely be interpreted as meaning that snus will cause about 1/3 of those who use it to gain weight.  This is how the game will likely play out:  The study estimated an odds ratio for snus users in the range of 1.3 for the the probability of gaining 5% of body weight during the study period.  This OR is typically sloppily converted to prose as "snus users have a 30% greater chance".  But that phrase in quotes, because it is sloppy, is they often interpreted as meaning 30% of the population who would not have gained weight will do so (which is properly described as a 30 percentage point increase rather than a 30% increase, but almost no one gets that right).  And presto-chango thanks to the power of using sloppy language in a scientific context (like "addiction"), nonsense is created.

Just how wrong would that claim be?  Well setting aside the errors in the study and just pretending that the results are correct, of the nonusers, 790 of 3877 had the 5% weight gain, or about 20%.  So if that is increased by about 1/3, we are talking about 6 or 7% of the population that gains weight because of snus.  And, again, that is based on pretending that the analysis was correct (and once it is clear how small that number it, it also becomes clear how easy it would be for it all to be the result of study errors, intentional or unintentional).

An interesting question about epidemiology in general is why they do not just report the 7% or equivalent numbers.  Why did they report this obscure statistics called "odds ratio" that facilitates sloppy prose and misinterpretations?  You might be tempted to conclude that it is to make small numbers seem big, and there are no doubt those who are happy about that – an OR of 3 sounds a lot more impressive than "your lifetime chance of getting this disease increases from 1-in-1000 to 3-in-1000".  But the answer is actually both less political and more pathetic:  The OR is the easiest number to get out of the statistical calculation for a logistic regression, which you may not know is the never-thought-through assumed relationship between exposure and disease in almost every epidemiology study you ever see.  Or, put another way, it is the number that comes out of the software used by most people doing epidemiology, and that software is a black box to them such that they just copy down the results.  People in other sciences are taught the math behind the calculations they do, such that they could write the software rather than just start blindly using it, but that is rarely the case in epidemiology (particularly including psych and clinical research). 

Not that most epidemiology researchers could not figure out the risk difference that I reported, of course.  That was easy.  (The risk for nonusers was 20%, the risk for snus users was about 27%, and so the risk difference was 7%.)  They might need some help to calculate the confidence interval, but at least they could report a meaningful point estimate.  No one cares about the odds ratio.  One of the first lessons in my courses when I was teaching how to make decisions based on epidemiologic evidence was how to take the useless statistics that are typically reported and convert them into something people actually care about.  I have made this point before in Unhealthful News, but it is worth repeating.  No one cares about ratios, and most people do not really even understand what they mean.  If you read a lot of health news, you may think you know what it means when you hear there was an odds ratio (sometimes called by the generic term "relative risk") of 2.5, but do you really?

As an aside, here is a good example on that point for those who care about snus and THR:  Anti-THR activists like to try to claim that other sources of tobacco are lower risk than snus.  There is no evidence to support that claim, but imagine it was true and some other smokeless products were only half as risky – say 0.5% as risky as smoking versus 1% as risky.  These same activists like to insist that there is no difference in risk among varieties of cigarettes, which is obviously not true.  It is a safe bet that cigarette risk varies by 5%.  In ratio terms, the hypothetical factor of two reduction for smokeless seems a lot bigger than the mere reduction to .95 of what it would have been.  But in absolute terms – what really matters in terms of whether someone gets a disease – this difference among cigarette varieties is ten times as great as any hypothetical difference among smokeless products.  While the exact numbers are unknown, the magnitudes are probably about right, which means that there is no doubt that encouraging a smoker to switch to the lower risk cigarettes (whichever ones they might be) makes a lot more difference that encouraging a smokeless user to switch to the lower risk smokeless products (whichever ones they might be -- again we do not actually know).

One last point on ORs:  Most people intuitively interpret this as a risk ratio, the ratio of the risk for one group (the probably that the event occurred) divided by the other, rather than of the ratio of the odds for one group (the probability the event occurred divided by the probability it did not occur for one group) divided by the other.  The odds defies intuition for anyone who does not gamble (gambles are typically described in terms of odds rather than probability/risk), though there are some technical reasons why it is often a legitimately better measure.  Fortunately it does not matter much because they are about the same if the event in question is fairly rare.  I mixed them together in my back-of-the-envelope calculations above.

Getting back to the study, there are a few other oddities about it.  For one thing, there are a lot more people in the "Other" category then there were in any of the use categories that were cleanly defined and analyzed.  This category includes anyone who had any history of using both snus and cigarettes.  Presumably it also includes anyone who ever used tobacco in any other form.  While there is no evidence that simply throwing out this group really matters, it might.  Because it is so much larger than the exclusive snus groups, the authors should have told us what happens if people in borderline groups are included, like those who formerly smoked a bit but are now exclusive snus users being included with snus users.  Chances are that this would not change much, but it would be interesting to know, as a reality check.

Definitions of exposure categories are far more arbitrary than the abstracts and press releases that most people read understand.  For example, the methods report, "Cessation less than six months prior to baseline was regarded as current use."  Again, probably not a big deal, but, um, why?  Could it be that doing this made the results come out better?  Rather more interesting is why 5% weight gain is the measure of interest?  Why not a different percentage, or an absolute change?  Better still, why did they not compare the weight gain of the study subjects to the known average weight gain of those in the given age range (see yesterday's post)?  There is no explanation for why this arbitrary measure was chosen among the infinite possible choices, which should always be a cause for suspicion.  It would have been trivial for them to report how the results varied if we go with 4% or 8%, but they did not tell us.  Again, there might be no major differences, but it might be that most any other measure would have produced results they liked less – it would not be the first time this was ever done in the anti-smokeless-tobacco literature.

Finally, and perhaps most important, what is with the list of covariates they included?  Alcohol use, fruit and berry consumption, and how often someone eats breakfast, but no other dietary measures – huh?  As usual, there is no explanation for this rather odd choice.  Also, why were these and the amount of physical exercise measure only included for the baseline measure and not the change at follow-up?  They looked at tobacco use at both baseline and follow-up, so why the arbitrary choice to exclude it for exercise?  Perhaps that explained the differences in the groups, but we will never know.  It also could be that if there really is an effect of snus it is because one of these is an intermediate step in the causal pathway, and should not have been controlled for, but I suspect that is way over the heads of the authors.  Also, it is worth noting that alcohol use was much higher among snus users with "heavy" use exceeding 40% of the population compared to less than 20% for non-tobacco-users.  That is probably just another effect of the age difference I talked about yesterday, but if it really does matter (and it might) then they did not effectively control for it by simply dividing people into "heavy" and "moderate" categories (it turns out there are no light drinkers among Swedish men).  This is another example of when "we controlled for that" is misleading, since there is no way these two categories can capture all of the effects.

The bottom line for a lot of this is that the reader always must put a huge amount of trust in the honesty and skill of epidemiologic researchers.  They inevitably make arbitrary choices which can affect their results.  And chances are that no one will ever even check their math, let alone their data quality (you can be certain that the "peer reviewers" never did, despite the widespread misperception that peer review actually vouches for the accuracy of analyses).  So when you are dealing with research groups that have already demonstrated their dishonesty, the sensible strategy is to assume that the results are wrong. 

Of course, we can still always learn something.  In this case, if the best they could cook up was ORs of 1.3 or so, we can parse the results of the study and our existing knowledge and conclude that this shows that snus use apparently does not cause weight gain, which is just as we would have predicted.

23 May 2011

Unhealthful News 143 - Men gain weight as they leave their youth; Karolinska blames it on snus

The anti-snus activist shop at Karolinska Institute just published (press release; article - open access) a report that claims that using snus causes weight gain.  Since KI snus researchers are notorious for changing their statistical analysis to get the anti-snus results they want (my colleagues and I, and Brad Rodu have written a lot about this), the smart money says that their data do not really support their conclusion.  I believe these Karolinska researchers account for the majority of studies from the last half decade that claimed to find an association between snus (the Swedish word for moist snuff) and disease.  They got these results by changing their statistical methods in order to create such an association – or so we suspect, since they continue to defy a court order to release their data, which would provide proof of their unethical behavior.  So, it is always an interesting little exercise to figure out how they cooked up their latest "result".

In this case, the most obvious candidate is the effect of age.  The authors claim that snus use increased the chance a man gained 5% of his body weight between 2002 and 2007 by about 30 or 40% compared to a nonuser of tobacco.  Presumably the authors are motivated by the fact that few people will worry about a trivial risk of cancer as much as they will a 30% increase in the chance of gaining weight (which someone will probably misconstrue as a 30% chance when they put it in the propaganda).  But what they do not tell us in the abstract or the press release is that the nonusers had an average age of 46 at baseline while the snus users had an average age that was ten years younger.  Unless Swedish men are radically different from those in the Western countries I am more familiar with, an unfortunate fact of life is that many of us gain well over 5% during some period between late-20s and 40, when family and too much work replace sports and such.  Fortunately most of us stabilize again after that.

So we would expect a group of men with an average age in the mid-30s to include many in this prime fattening range, while a group with average age in the mid-40s would have fewer.  Oh, but wait, they said they "controlled for" age.  Doesn't this solve the problem?

No!

Therein lies a great example of one of the biggest lies of epidemiology.  "Controlling for" is never perfect, but in many cases it is obviously so very far from perfect that it is dishonest to claim to be controlling for the confounder.  There are many different ways this can hapen.  The reason in this case is that the effect of age is not linear, but they assumed it was.  At least I assume they assumed that, since they do not actually report what they did (failure to report one's methods is typical for bad epidemiology).  They wrote "Age and baseline weight were included in the analyses as continuous variables".  Apparently they are unaware that continuous variables can take all sorts of different shapes, and it did not occur to them to specify that they were assuming a simply linear trend.  I am quite confident they would have said so if using an appropriately complicated function.

(Aside:  They published this in an online-only journal that does not have page limits.  Thus, their failure to publish their methods, or alternative analyses, or sensitivity analyses, or the effect estimates for their covariates, etc., was entirely their choice.  It is bad enough that health science "publishing" in dead tree journals forbids actually including enough information.  It is worse, in cases like this, when the authors choose to not to report useful information.)

From the few words they wrote, I can only conclude that they controlled only for the linear trend across the entire age range, from 18 to 84 (i.e., basically the entire adult male population).  That is, they assumed that the effect of being 19 rather than 18 is the same as the effect of being 25 rather than 24, and 41 rather than 40, and so on.  There is presumably some minor linear trend across all ages, but this trend does not capture the fact that (I am just roughing this out without looking it up, but you can see the idea) a five year period starting at age 18 is likely to see major weight gain, starting at 23 less so, with the risk at 30 being higher again but 40 being lower.

If that is not clear, consider another example.  Imagine the claim that being unmarried increases your risk for auto accident.  It is probably true to some extent, but it is even more true that young drivers (mostly unmarried) and old drivers (more likely than average to be widowed) are much riskier than those in between.  Now imagine that we "control for" age, but use statistics that assume that the effect of age is a linear trend.  That line is going to be fit to a curve that looks like a bowl (high on each extreme, low in the middle), and so it going to be fairly flat and not look at all like the real age effect.  Thus, having "controlled for" age with that variable, we still have almost all of the actual confounding effect of age, and so the statistics blame lack of marriage for the bad driving of inexperienced and rather less responsible youth and dottering and even less responsible old people.

There are useful ways to control for the effect of age on a variable.  Assuming a continuous linear trend is usually not one of them, and clearly not in the case of weight gain.  It seems quite plausible that the snus users included more very young adults who were not up to their full adult weight yet and others at greatest risk of late-youth weight gain, and this explains the entire "effect", and moreover the effect of age is only slightly affected by controlling for the lifetime linear trend.  One reason I suspect this explanation is that they report three different statistical models, each controlling for more variables than the last, but the first still controls for age.  I would really like to see how much difference "controlling for age" makes compared to not doing so.  Since age is hugely predictive of someone's likelihood of weight gain in the next few years, it should matter a lot if correctly controlled for, but I am betting that putting in the covariate they used had very little effect.  It is very suspicious that they did not report the unadjusted association; it might have clearly demonstrated that they were not really controlling for the effect of age.

There are other possible explanations for the results.  This is particularly true for a second outcome they looked at, people becoming obese.  That outcome is likely affected by numerous confounding variables that the authors did not include at all.  My hypothesis about age being the explanation is also not so well supported for the result for former snus users (already quit at baseline and remaining abstinent at followup), who are only slightly younger than the never-users but were also more likely to gain weight.  This suggests there is something different about people who are inclined to use snus, other than age, that the study did not control for at all, since there is no plausible reason why having used snus in the past would cause weight gain.  There is definitely more to critique about this study.  Still, I am still pretty confident about this age effect mattering, and makes for a nice Unhealthful News lesson about potential confounding and how vacuous the statement "we controlled for…" can be.

No doubt that if confronted with this observation the first thing the authors would do would be to retreat into the usual weasel words and say they were just looking to see if there was an association, not making a causal claim.  This is clearly disingenuous because, though they always use the word "association" when reporting their results, the prose surrounding the results makes quite clear that they are pursuing a causal hypothesis.  They certainly never say "but there is no reason to assume it is causal".  More important, if I am right about the effect of age, or about them manipulating their statistical methods in general, that weaseling is not even accurate (though I suspect that few epidemiologists would even understand why).  If they analyzed the statistics in a misleading way, then there is not really even an association in the data, when considered properly.  If they were just looking at the unadjusted comparison of the two variables then they could claim this, but they claim to be controlling for age.  If proper controlling for age would make much of the result go away, as I suspect, then not only is there no evidence of causation, but there is not even an association after adjusting for age.

Of course, it is possible that when confronted with this challenge they will release their data, or even just enough additional analyses and information beyond the almost-useless reporting they provided, to show that my hypothesis is wrong.  They might even report "sorry we were so cryptic about the functional form; it was continuous, but we realized that linear was the wrong functional form so we used a fourth degree polynomial [or three splines] to capture the nonlinearity that you describe; thank you for giving us a chance to clarify that and prove you wrong; nyah nyah!".  Most of you reading this will have no idea what the middle bit of that last sentence means, but there is no reason to worry about that, because I can assure you it is not true.  Such an analysis is way over their heads.  As for them having something they could release that would allow that "nyah nyah" at me, I am really not to worried.

22 May 2011

Unhealthful News 142 - Interpreting health science evidence, the case of wind turbines

Trying to draw scientific conclusions requires reviewing all of the evidence, whatever form it might take.  This is true of health science, though you might never know it if you just observed the way many ostensible scientists behave in that area.  There are activists and paid hacks who pretend to be doing science, but are just looking for sciency-sounding claims to support their goals.  But even apart from that, the majority of those writing in the field are basically lab technicians, not scientists:  They know how to carry out some specific tasks and interpret the results, but really have no idea what constitutes good scientific inference.

The big picture is that there is remarkably little supply of or demand for (among those who control the market) good health science.  One specific implication is the problem for those trying to communicate the health effects of wind turbines on nearby residents and have that considered in policy making.  There is quite a remarkable collection of information, but most of those commenting on it simply do not understand it (to say nothing of those who are paid to intentionally denigrate the evidence).  Quite a good story by Don Butler on the topic ran in the Ottawa Citizen yesterday – probably the best I have seen on the topic.  It covered points that are usually not talked about in a useful and intelligent manner.  Still, it had the obligatory statement,
...the health impact of turbines has yet to be conclusively demonstrated. In a May 2010 report, Ontario's chief medical officer of health, Dr. Arlene King, found that scientific evidence to date "does not demonstrate a direct causal link between wind turbine noise and adverse health effects."
That King report was pretty much a joke, ignoring most of the evidence.  It provides a great example of how medics are typically not very good a evaluating scientific evidence ("medical officers of health" are an odd Canadian institution that puts physicians rather than public health science experts in charge of the science side public health policy – not much different from what happens to public health policy making elsewhere, frankly, but completely institutionalized).  Of course, being a physician does not prevent someone from understanding health science, it just does not promise it.  Butler's article followed immediately with another MOH (my only fault with Butler was not finding some scientists to quote rather than just government medics, but at least he found one who got the right answer):
But Dr. Hazel Lynn, medical officer of health for the Grey Bruce Health Unit, reached a different conclusion in a report in January. It's clear, she found, that many people have been "dramatically impacted by the noise and proximity of wind farms. To dismiss all these people as eccentric, unusual or hyper-sensitive social outliers does a disservice to constructive public discourse."
She is quite right.  It also does a disservice to science.  I will take this opportunity to post my paper, Properly Interpreting the Epidemiologic Evidence About the Health Effects of Industrial Wind Turbines on Nearby Residents (PDF).  Anyone interested in the topic or sufficiently interested in my analysis of health science might find the whole thing interesting.  (Note:  It is not as long as it looks from the page count.  There is a long appendix.)  I mentioned a few days ago, when I criticized one lame dismissal of the wind turbine evidence, that I would write more on this topic.  This paper offers some observations that are generalizable to interpreting health science that I will draw out on near-future slow health news days.  If you want a shorter read, I posted the abstract and final paragraph of an earlier version (which is almost the same) a few months ago.

21 May 2011

Unhealthful News 141 - Follow-up: Addiction & the more you read, the funnier Simon Chapman gets

Having written rather involved essays on addiction and the poor scholarly skills and negative public health contributions of Simon Chapman lately, I thought I would make it shorter and simple today by doing a quick follow-up on these.

Yesterday I noted how almost all discussions of addiction to cigarettes lack any definition of their key term,  and so it just becomes a legal/political game.  Today I did a search of news items from the last day that mention addiction, and found the following:
Booming Addiction: Baby Boomers Using Drugs In Record Numbers (CBS News)
What did addiction mean in this article?  Basically in meant "use" (which was also synonymous with "abuse").  There was one story of someone who was "hooked" on prescription painkillers, which is a bit better than just equating addiction with "uses something the author does not approve of", but begs the question of what "hooked" means.  This is the typical usage of the word, basically as background noise, offering no distinction between use that might not be abuse and abuse that might not represent addiction (or vice versa).  So why even have a word?  More interesting was:
'Hypersexual disorder' might make DSM-5 (Los Angeles Times)
This article was actually broader than the headline and photo of Tiger Woods implied.  It also noted that in addition to sex,
Compulsive gambling will likely be grouped with substance-use disorders in the new DSM, the first so-called behavioral addiction to be added to a category that traditionally has been reserved for drug and alcohol problems. Other "behavioral addictions" — including Internet addiction, shopping addiction and exercise addiction — also have more in common with drug abuse than other types of mental disorders, experts said.
This is likely to produce an outcry that they want to call everything an addiction in the DSM (the American Psychiatric Association manual which is, unfortunately, consider to define what constitutes mental illness as I wrote about before).  But it is not actually unreasonable.  We can start with the notion that whatever addiction is, it must be characterized by extreme economic choices (i.e., consumption behavior patterns) similar to those we see for highly destructive use of drugs that are the canonical examples of addiction, like opiods or amphetamines.  If that is the concept, then it is certainly true that some people engage in consumption of gambling, or sex or shopping, in a way that seems quite similar.  These behaviors seem much more similar to "drug addiction" than does almost all consumption of nicotine (or any specific delivery device thereof).  Recall from my previous post about the DSM how ridiculous the proposed new entry for Tobacco Use Disorder was, with references to loss of control and other acutely destructive behavior patterns that describe the canonical "addictive drugs" but that clearly do not describe tobacco use.

If all of the behaviors listed in the above quote become recognized as sometimes being a lot like hard drug use, I wonder if it will create some pressure to better define "addiction" (even though the DSM does not actually use that word).  If universal behaviors like sex and internet use can be addictive, then addiction cannot just synonymous with "ongoing use", as it is typically used.  Similarly, if behaviors like these, especially exercise, can be addictive, then it will have to be recognized that addiction must have some specific meaning that discriminates from sensible and health versions of a behavior, even if they are intensive.

Perhaps this means that people will learn to distinguish sensible nicotine use from other variations on the behavior.

Probably not in Australia, though, as long as they have Simon Chapman.  His latest brilliant suggestion is that a total ban on smoking is but a few years away, and to pave the way for the ban, the government should start by issuing licenses to smoke:
"They would get a swipe card with their photo on it and - just like the pre-commitment gambling card - they could say how much they wanted to smoke a day. If it was 10 cigarettes a day you'd get a category one licence, 20 cigarettes would be a category two and there would be a higher cost to the card if you wanted to smoke more. The most anyone could buy would be 60 a day."
Do you remember when you were a child of about ten, making up games, fantasy worlds, or social policies with your friends, with elaborate rules and fancy devices.  My male readers probably do, anyway.  Doesn't this sound like one of those games?  (...and then we will make a blue card if you are only allowed to smoke on weekends, and you can buy one extra pack but then you get a penalty and you suffer minus 1 to all die roles until you buy some Nicorette, unless you play the "dingos at my last pack" card...)

Apart from the silliness, what's wrong with this picture?  First, creeping prohibition has been a remarkably effective strategy, because people have not resisted each little step, figuring that step alone was not so unreasonable and still allowed some freedom (it is that slowly boiling a frog metaphor).  But when you actually declare that this is the penultimate step on the way to prohibition, that complacency may not continue to work so well.  Remember, this smoking ban would take place in a country where the appealing low-risk alternatives (both smokeless tobacco and e-cigarettes) are banned.  Second, this demeaning and controlling action very different from even punitive sales taxes that lots of people will pay until they become so high that the financial incentives to evade them are just too strong.  Forcing people to buy a license at any price will produce an instant backlash, with people who would not normally do so immediately turning to the black market.  I suppose maybe I am missing something and Australians are more complacent about such things than most people I know, but I am guessing that Chapman is more clueless about people than most people I know.

For those who choose to partially comply, the easiest black market would be for people to get licenses to buy cigarettes for resale.  Obviously, "The most anyone could buy" would be limited only by how many people they know (though 60 is well more than enough to saturate your nicotine receptors, so you might as well be smoking sham cigarettes for most of those anyway).  Presumably the per-cigarette license fee will increase with quantity, so there will be a whole new interest in "becoming a smoker" to get the low-priced low-quantity license and make a bit of money the resale to those avoiding the upper-bracket fee or just having a supply to help out people who want to go over their limit this week.  Assuming, of course, that everyone does not just say f*** this, and switch to the black market.

Of course, Chapman is handicapped in his analysis not just by his unfamiliarity with human nature, but by not understanding that black markets exist, and that they sometimes get ugly.  Recall from my previous post that he seems to think that a tobacco black market does not exist in Australia, because such things only happen in backward corrupt countries, like Canada, and those with open borders (I am not sure what he meant by the latter, but he might want to look at a map – the borders of his country are pretty difficult to build a fence across).  He further argued that black markets are easy to shut down because if buyers know where to buy then cops can know where to make arrests. The mind boggles as to why he never explained this great insight to those fighting the War on Drugs; he could have cleared that whole problem right up. 

[Update:  Chris Snowdon also found this amusing and suggested that in future columns Chapman might use his great insight to come up with solutions that no one else ever thought of to solve the Palestinian problem or cure the common cold.]

(Aside: It may not be obvious, but this is another great example of why someone should have a modicum of understanding about economics before they start spouting off about public policy.  Economics introduces the assumption that people are doing the best they can rather that being utter idiots who are missing obvious easy opportunities.  It is not always right, but it is a much better starting point than the opposite.)

Finally, Chapman says that getting a license "would involve them passing a test, not dissimilar to a driving test".  I am not sure I can think of a final punchline that is funnier than the concept itself.  But I am wondering (a) if they will offer 'Smokers Ed' class in high school to prep for the test, (b) if parents will look forward to kids turing 18 so they can finally get their license and help out with the family cigarette purchases, (c) if the hardest part of the test will be smoking while backing up into a tight space between some orange cones, (d) if you need a chauffeur's license to offer someone else a light, (e) will the rule of thumb "always mark the the answer that makes the most dire claim about smoking" get you through the written test (not so funny – that will undoubtedly be the case), (f) who will monitor the practical test portion of the exam, since no one can be exposed to smoke in a work setting, (g) will beginning smokers get a learner's permit that allows them to smoke only when they are with an experienced smoker?

This could get addictive.  I had better quit and post this thing.