28 February 2011

Unhealthful News 59 - Many things trigger heart attacks, but does anything trigger "wait, what?"

A recent paper published in The Lancet (it was apparently too complicated for Pediatrics) claims to have calculated the percentage of heart attacks that are "triggered" by a random collection of exposures and compared the numbers.  The authors looked at air pollution, a few drugs, and physical exertion, and much to the delight of those who want to reduce air pollution and the consternation of those who think the world's horribly polluted cities are just fine, they put air pollution at the top of the list.  As you might guess, this resulted in a few headlines.  But no one reporting it (nor even those criticizing it) seemed to think about the critical bits of hubris, or even the glaring omissions.

The "triggered" concept is a vaguely-defined notion that refers to an immediate exposure that makes the heart attack occur on a particular day, as opposed to the long-term exposures or conditions that make someone have a heart attack they would not have otherwise had, or have it a decade earlier than they otherwise would have.  As for what exactly a trigger means, it could mean having the impending heart attack a day sooner than it would have otherwise have happened, or perhaps a week, or maybe even a few hours or a month.  There is little reason to believe that the authors had a precise (i.e., scientifically useful) definition in mind, and there is no doubt that the hodgepodge of sources they used to create their estimate all used different definitions, most of which were probably not well defined either.  Obviously there is a rather important difference between use of cocaine or another powerful stimulant that causes a heart attack in a generally healthy forty-five-year-old and air pollution that causes ailing ninety-year-olds to die a week earlier than they would have.

These observations should cause you to ask something that none of the reporters or other commentators I read thought to ask:  What does this trigger concept really mean?  If they are talking about events or deaths that occur a few days earlier than they otherwise would, it is bad but the impact is limited.  If they are talking about events that would not have otherwise occurred, or would have occurred decades later then we are back in the realm of what we usually mean by causes of a disease.  But if it was what I guess, some random mix of these, then how do the results make any sense at all?

I know that the literature on air pollution and fatal disease events is confused and controversial.  I doubt anyone would seriously claim we know the effects within a factor of two.  Yet the authors of the new paper estimated the effects to two significant figures – i.e., numbers like exposure to traffic triggering 7.4% of heart attacks.  So they are claiming to not merely know it is around 7% – already way more precise than is possible to estimate – but that it is more than 7.3% but not as high as 7.5%.  And yet no one asked, how could you possibly know that?  Once again, the faith that non-experts seem to have in the abilities of epidemiology would be charming if it were not for the credulity.

Oh, and you will be excused for not noticing this – unlike the reporters and commentators who also did not notice it – because I was vague in my list of factors they studied:  The study conspicuously left out cigarette smoking, even though it would probably top the list if it were actually possible to do this study legitimately.  I am guessing that the reason is that no one presumes to be able to estimate the "triggering" effect of smoking apart from the serious long-term damage that causes heart attacks that would not have otherwise happened.  But, then again, air pollution involves long-term exposure for most who are exposed.  In theory it is possible to sort out trigger effects from long-term exposure effects of anything, but perhaps someone should have asked "how can you do that, and how good are the estimates really?"

There are some more technical criticisms that can be made, such as the exposures not really being comparable to each other.  Also, for various reasons it is folly to try to divvy up disease cases to their individual causes for a disease with a long list of interacting causes.  But this level of understanding is not necessary for anyone who just thinks to ask "wait a minute; how could they possibly estimate that?"

Finally anyone who actually reads the abstract could discover that the authors separated "sexual activity" into its own category even though there was an aggregate "physical exertion" category.  (Maybe they just don't do it right.)  If they had included sexual activity in it, the physical exertion category would have been at the top of the list they made (recall UN35 about how you can always play this game).  But then the headlines would have had to read that physical exertion is an unrecognized public health hazard that we need to do something about.  If the authors had actually admitted to this conclusion, though, they probably would not have generated headlines or gotten published in The Lancet.  I am sure they thought about it for minutes before deciding to go for the headlines rather than taking on tough and complicated questions about exposures that have both costs and benefits.

Of course, air pollution is caused by activities that have benefits too, but why worry about that.  It was undoubtedly more fun to engage in an ad hoc arithmetic exercise, which the authors either did not understand was or did not care was nonsense, and claim to have shown something of "considerable public health relevance" without addressing the larger worldly context.  A running-joke observation made at the academic epidemiology meetings notes that your relative risk for a heart attack in the hour after having sex is about 50 compared to the average hour of your life, but most people who know this realize that a 50-fold increase is really not of such considerable importance in that context.

27 February 2011

Unhealthful News 58 - Hierarchy of study types, "Hill criteria", and other anti-shibboleths

Over the last few months I have realized that a strategy I employ for identifying good science is to look for common oversimplifications that no real expert would present (at least without noting that they are simplifications).  These represent bits of advice that are better than believing the opposite and better than complete ignorance about what to believe, but I just realized what they are akin to.  They are analogous to the advice we give children who are learning to cross the street or to drive – something that is a good start on what to think about in those situations, but that would make us rather uncomfortable if repeated by an experienced adult as if it were still how she thinks about the challenge.

I am thinking about such things as the advice to look left, then right, then left again (or is it the other way around?) before crossing the street.  That is really good advice for a child who is just learning to cross the street and might only look in one direction if not advised otherwise.  But it is not necessary or useful to think in such mechanistic terms once you are skilled at recognizing traffic patterns as you approach the street, taking in what you need to know subconsciously.  But adults also know that it is not sufficient in some cases, like in London or someplace like Bangalore, where I use the strategy of furiously looking at every bit of pavement that could fit a car, just to make sure one is not attacking from that direction.  A similar bit of advice is "slow down when it is snowing", good advice to a new driver, and it remains true for an experienced driver.  But it would be a mistake for someone to interpret that as "all you have to know about driving in the snow is to slow down".

Encountering a writer or researcher who believes that a randomized trial is always more informative than other sources of information (I have written a lot about this error in the UN series, which I will not repeat now) is like walking down the street with a forty-year-old who stops you at the corner and talks himself through "look left, then right…."  Yes, it is better than him walking straight into traffic, just as fixating on trial results is better than physicians saying "based on my professional experience, the answer is…."  The latter is the common non-scientific nonsense that forced medical educators to hammer the value of randomized trials into the heads of physicians, so they did not get hit by cars.  Or something like that – I am getting a bit lost in my metaphors.  Anyway, the point is that you would conclude that perhaps your forty-year-old companion was perhaps not up to the forty-year-old level of sophistication in his dealings with the world.

Other such errors peg the author at a different point in the spectrum of understanding.  Yesterday I pointed out that anyone who writes "we applied the Bradford Hill criteria", or some equivalent, is sending the message that they do not really understand how to think scientifically and assess whether an observed association is causal.  They seem to recognize that it is necessary to think about how to interpret their results, but they just do not know how to do it.  They certainly should think about much of what was on Hill's list, but if they think it can be used as a checklist, their understanding seems to be at the level of "all you need to know is to drive slower".  That puts them a bit ahead of residents of the American sunbelt who do not seem to understand the "slower" bit, and have thousands of crashes when they get two centimeters of snow.  You have to start somewhere. 

Perhaps if it were forty five years ago, when Hill wrote his list, their approach would be a bit more defensible.  As I wrote about Hill's list of considerations about causation in one of the papers I wrote about his ideas,
Hill's list seems to have been a useful contribution to a young science that surely needed systematic thinking, but it long since should have been relegated to part of the historical foundation, as an early rough cut.
I would like to be able to say that those who make this mistake are solidly a step above those who think that there is some rigid hierarchy of study types, with experiments at the top.  However, the authors who wrote the paper that appealed to Hill's "criteria" that I discussed yesterday also wrote, "Clearly, observational studies cannot establish causation."  As I have previously explained, no study can prove causation, but any useful study contributes to establishing (or denying) it to some degree.  The glaringly obvious response is that observational studies of smoking and disease – those that were on everyone's mind when Hill and some of his contemporaries wrote lists of considerations – clearly established causation.  (I love that "Clearly" they started the sentence with because I know I am clearly guilty of overusing words like that.  But I certainly would like to think, of course, that I obviously only use them when making an indubitably true statement.)

More generally, it is always an error to claim that there is some rigid hierarchy of information, like the claims that a meta-analysis is more informative than its component parts.  As I wrote yesterday, not only are synthetic meta-analyses rather sketchy at best, but this particular one included a rather dubious narrowing of which results were considered.  The best study type to carry out to answer a question depends on what you want to know.  And assessing which already-existing source of information is most informative is more complicated still, since optimality of the study design has to be balanced against how close it comes to the question of interest and the quality of the study apart from its design.

When authors make an oversimplification that is akin to advice we give children, is a good clue that they do not know they are in over their head.  That is, I suspect that most people who repeat one of these errors not only do not know it is an error (obviously), but were not even of the mindset, as we all are at some point, of saying "uh oh, I have to say something about this, but it is really beyond my expertise, so I had better look up the right equation/background/whatever and try to be careful not to claim more than I can really learn by looking it up."  Rather, I suspect they thought they really understood how to engage in scientific inference at a deep level, but they are actually so far from understanding that they do not even know they do not understand.  It is kind of like, "what do you mean complicated? everyone knows how a car works; you just turn this key and it goes."

These errors are a good clue that the authors thought they understood the rest of their analysis, but might have been just as over their heads there too.  I may not be able to recognize where else they were wrong or naive, either because I am not an expert on the subject matter or simply because they did not explain how they did the analysis, as is usually the case.  But the generic sign that they know only enough to be dangerous is there.  This is why I am engaging in anger management self-therapy about these errors, telling myself "when I read things like that, I should not feel like my head is exploding with frustration yet again; rather, I should thank the authors for generously letting me know that I should not take anything they say too seriously."

If someone writes about a hierarchy of study designs or Bradford Hill criteria, it probably means they are following a recipe from a bad introductory epidemiology textbook or teacher, perhaps the only one they ever had.  This probably also means that the rest of their methods are following a simplistic recipe also.  That certainly does not mean that they did a bad study; the recipes exist because they are passable ways to do simple studies of simple topics, after all.  But if they are trying to do something more complicated than crank out a field study, like do an analytic literature review or sort through a scientific controversy, the recipe-followers are definitely in over their heads.

These errors serve as a shibboleth, or more precisely a shibboleth failure.  Anyone who makes one of those statements is volunteering that he cannot pronounce the tricky test word correctly (i.e., is not really expert in the language of scientific analysis and is just trying to fake it).  We cannot count on everyone to volunteer this signal, of course, and we cannot stop them at the river and quiz them.  This approach is not useful for typical health news reporting, when a reporter basically just transcribes a press release about a research study, because they do not even attempt to make such analyses and so cannot make the error.  But researchers and news-analysis authors (and people giving "expert witness" testimony in legal matters) volunteer information about their limited understanding often enough that we can make use of it.  What is more, though a shibboleth is normally thought of as a way to recognize whether someone is "one of us", it can be used just as effectively to recognize when someone is pretending to have expertise even if you yourself do not have that expertise.  You can train your ear to recognize a few correct pronunciations even if you cannot lose your own accent.

26 February 2011

Unhealthful News 57 - Alcohol consumption is good for heart attack but meta-analyses and "causal criteria" are bad for the health news

Most synthetic meta-analyses are parlor tricks, set-pieces that produce a flashy result but signify nothing.  For those not familiar, a synthetic meta-analysis (which is almost always just called "a meta-analysis" though this is misleading because there are more useful types of meta-analysis) combines the results of studies on a topic based on the fiction that they were results from a single big study and reports the results of this fiction.  Occasionally this is useful and appropriate, but usually it is a misleading exercise.  But like any good parlor trick (or grade school level science museum demonstration show), synthetic meta-analyses tend to impress people who do not understand them and, much worse, give the illusion of education when actually they may do more harm than good to real understanding.

A new analysis, published as two journal articles (the first is the one relevant to this post) and widely over-hyped in the news (example), that looked at alcohol consumption and cardiovascular disease is a perfect example.  First, it should be noted that the authors, a group at the University of Calgary, came to the conclusion that we already knew beyond any serious doubt:  Moderate alcohol consumption is protective against heart disease to an impressive degree, and possible protective against stroke to a minor degree.  As I noted in UN29 this has been well known for two decades.  So averaging together all of the many studies that tend to support the conclusion (and the very few that do not) tells us little we did not know already.

In theory it might help us quantify the effect.  That is, we might have studies that showed a variety of results – reducing risk by 10%, 20%, or 50% – so we were already sure there was a reduction, but the synthesis might let us consolidate them into a single estimate.  But this brings up one of the fatal flaws in the synthesis, the "separated at birth" assumption.  Basically an analysis of this type effectively pretends that the studies being summarized were a single large study and the resulting dataset was chopped up into pieces and analyzed by different research groups, and so the synthetic analysis is putting it all back together.  Obviously this is fiction, since each study was done on a different population, there were a variety of different definitions of exposure, outcome, follow-up time, and the analyses were done differently.  There are a few statistical tricks to deal with some of the differences but these mostly replace one of the many clearly false assumptions with another likely false assumption.

Thus, a study like the new one effectively says "let's average together this result of a five year study of American men who consumed an average of 1.9 drinks per day with this other twenty year study of French men and women who consumed an average of 1.2 drinks per day with…."  I suspect if they explained it in those terms rather than burying the essence of the methodology in statistical mumbo jumbo most readers would laugh at the concept.  A useful type of meta-analysis, sometimes called a "comparative meta-analysis" or the less descriptive "analytic meta-analysis", is one that attempts to learn from the difference between studies, rather than pretending there are none.  E.g., there is a great deal of uncertainty about exactly how much drink produces the optimal benefit, so a useful analysis would be to see if the existing studies, taken as a whole, could better inform that.  Another useful comparison of results across studies that has been done is the one that put to rest the myth that this is about red wine rather than alcohol itself; in that case the comparison was to see if there was any substantial difference between the protective effects of different sources of alcohol.

A synthetic analysis inevitably hides all manner of problems, both with the individual studies and with the synthesis itself.  In this case, in order to make the combined studies somewhat more uniform (that is, to make the "separated at birth" assumption a little bit closer to true) the authors restricted the analysis to study results where the comparison group was lifelong non-drinkers.  But one of the persistent criticisms of the claim that alcohol is protective is that lifetime teetotalers in a free-drinking society are an unusual group that might just have extra disease risk due to being anti-social or following minority religious practices, or might refrain from drinking due to some known health problem.  The hypothesis, then, is that it is not so much that moderate drinking helps, but that being a non-drinker means something is wrong with you.  Most of the time this claim is put forth by nanny-state type activists who just do not want to admit the truth about alcohol (usually they omit the phrase "being a non-drinker means something is wrong with you"), but the hypothesis still cannot be dismissed out of hand.  This is why many researchers pursuing this topic have compared the moderate regular drinkers to those more-likely-normal folk who drink occasionally but not never.

Thus, the entire new study was done in a way that told us what we already knew and avoided even looking at the most important reason why this might be wrong.  And it avoided trying to answer any of the unanswered comparative questions.  It did provide an update to previous systematic reviews of the topic, a minor contribution, but did nothing that created any fundamentally new knowledge.  You would never know any of that from reading the news, of course.  In this particular case there is nothing wrong with the effect of the one day of hype that resulted from pretending we had learned something new, which was to get out a public health message that has long been known to the experts but is not widely understood in the population.  However, praising bad practice because it had a good outcome in a particular case is a very dangerous game.

What in my mind is the biggest problem with the new research reports, however, is the bit that starts:
    we can now examine the argument for causation based on Hill’s criteria
Those familiar with what I (and others expert on the subject) have written about the useful but universally mis-interpreted contributions of Austin Bradford Hill will immediately know why I respond to that with:  Really? 

Such a statement indicates a fundamental failure to understand how to think scientifically, a problem rather common in health science.  It is not an error that is likely to be quoted in a newspaper, since it is too arcane for that medium, but some pundits did fall for it.  I will write more about this tomorrow, and later in the series, but to start the point:

Over 45 years ago, Hill gave one of the greatest talks in the history of epidemiology with many brilliant lessons.  It also included a list of points (he called them "considerations"; many authors – like the recent ones – incorrectly refer to them as "criteria") that are worth considering when trying to decide if an observed association between an exposure and disease is causal rather than confounding.  These are also often referred to as "the Bradford Hill criteria", which represents confusion about his surname.  Such a common sense list of considerations was a valuable contribution to anyone who had no clue about how to do a scientific analysis to assess whether an association was causal.  It is also equally worth considering a separate point which he did not address in any depth, whether the association in the data might not represent a real association in the world that is either causal or confounding.  I.e., it might have been caused by biases in the data gathering or analysis process.  (If this seems a bit too technical to make any sense to you, do not worry because (a) I will address it again as the series continues (and had an example of some of it here) and (b) it presumably also makes no sense to anyone who would talk about applying "Hill's criteria".)

Hill's list was one of many that have been proposed.  It included some obvious valid points (make sure the ostensible cause occurred before the ostensible effect) and some slightly less obvious ones (consider whether there is a plausible pathway from the cause to effect; observe whether the greater the dose of exposure the more likely the outcome; observe similar results in different populations).  It also contains some points that are as likely wrong as right (make sure the effect is specific, which does not work for the many exposures that cause a constellation of different diseases).  It only obliquely touches on what is probably the best test for causation when it is in question, think like a scientist:  Figure out what the proposed confounding would be; figure out what observational or experimental data would be different if the relationship were causal rather than the confounding; gather and check the necessary data. 

But most important, Hill's or others' lists of considerations are not a checklist and cannot be used that way.  The considerations are not definitive: it is easy to list examples where the "criteria are met" but we know the association is confounding and not causal.  Moreover, there are no rules for determining whether one of the considerations "has been met".  Indeed, in almost every case it is possible to make an observation that could be interpreted as being in line with the consideration (e.g., the Calgary authors argued "the protective association of alcohol has been consistently observed in diverse patient populations and in both women and men") and also an observation that could be interpreted toward the opposite (e.g., a few paragraphs after the above declaration, the authors wrote, "we observed significant heterogeneity across studies"). 

Another example from the Calgary paper is the declaration that the consideration of specificity was met because there was no protective effect against cancer, only against cardiovascular disease; they could have just as easily have said it was not met because the results were observed for both coronary artery disease and stroke, or because moderate alcohol consumption also protects against gallstones.  Obviously the authors just chose a way to look at specificity that supported the conclusions they wanted to draw.  Finally, there is no method for converting one's list of observation ("the cause definitely preceded the effect; there is good biological plausibility; there is moderate agreement across studies; etc.) into the conclusion "and that is enough to conclude causation".  In other words, scientific conclusions require thinking, not recipes.

It has been shown that in general when someone tries to "apply" causal "criteria" they choose from the list of the considerations, pick the ones that fit their goal (either claiming or denying that an association is causal) and apply them in idiosyncratic ways.  This is certainly what was done in the present case.  Indeed, it is the only thing that can be done, since there is neither a definitive list nor a systematic way to apply the entries on it.  In other words, "we applied Hill's criteria to confirm this was causal" is, at best, similar to saying "we looked at this inkblot and confirmed it is a picture of a butterfly".  At worst, it is simply blatant rhetoric disguised as science.  Indeed, the last few times I had seen someone actually make an assertion about what "Hill's criteria" show, it was in consulting reports which were written as the most overt kind of advocacy for a particular conclusion about causation.

Of course, the Calgary authors seem to have come to the right conclusion.  There is little doubt that the relationship is causal based on all we know, though there remains the one source of doubt that teetotaling is caused by or has common cause with poor health.  Funny how they did not mention that possibility.

News reporters, not surprisingly, naively reported the quantitative conclusions from the meta-analysis as if the "separated at birth" assumption were correct and the comparison to teetotalers were reasonable.  More realistically, they had absolutely no idea that either of those was an issue they needed to worry about (and the authors apparently either shared this ignorance or were hiding their knowledge) and did not bother to ask anyone who might know better.  Since most readers will only remember the qualitative conclusion, though, the resulting message was basically correct.  Fortunately this was a case where the right answer was hard to miss.

25 February 2011

Unhealthful News 56 - Slumping toward feudalism and other economic observations

Today I noticed in the news that here in Pennsylvania, a state above the average in providing social services, a program to provide health care for the poorest adults in the state has run out of money.  Meanwhile, half of Americans do not realize that the new Congress has not actually repealed the "Obamacare" law, a lame excuse for a health financing system but currently the only hope of a first step in the right direction (i.e., the direction of healthcare not being a luxury for the rich and/or bankrupting the country).  And, "As Mental Health Cuts Mount, Psychiatric Cases Fill Jails".

Yes, I know that at the outset of Unhealthful News I said I would not report much on financing, and I do have three good epidemiology posts in mind but unfinished.  Actually, I suppose I am not really seriously posting about financing, since all I am doing is listing depressing news about it with no further analysis. 

It has proven to be a very difficult day to focus on the type of health news I analyze when looking at the news today.  I am not just talking about Libya, which is horrifying but by way of being the usual story of people trying to extract themselves from feudalism.  There is also the American people letting themselves be drawn back toward feudalism.  It is a feudalism that has a much higher level of health and material well-being than the visions of castles and peasantry that the word usually evokes, obviously, but feudalism just the same.  For those who do not follow U.S. politics, there is currently a showdown about whether we continue to have functional government employee unions, one of the few remaining bulwarks against growing oligarchy (since the U.S., unlike most Western countries, lacks a strong welfare state, labor unions play a particularly critical role in offering a backstop against wage serfdom).

The tragedy is not even so much that the oligarchs are threatening what is left of the middle class, but that what is left of the middle class is helping them do it.  In a story yesterday about my hometown, a professor from my old haunts put it quite depressingly:
Richard Freeman, an economist at Harvard, said he saw the hostility toward unions as a sign of decay in society. Some working-class people see so few possibilities for their lives that it is eroding the aspirational nature that has long been typical of Americans. 
“It shows a hopelessness,” he said. “It used to be, ‘You have something I don’t have; I’ll go to my employer to get it, too. Now I don’t see any chance of getting it. I don’t want to be the lowest one on the totem pole, so I don’t want you to have it either.’ ”
Of course, I know that no one comes to this blog to read yet another bit of random punditry about the economic situation.  So, I will move on to a brief observation about the economics of health-affecting behaviors:

Yesterday, Chris Snowdon, with the contributions of some of his readers in the comments, offered the insight that price hikes, in the form of taxes on cigarettes, are not entirely unlike prohibition in their effects, especially for the poor.  In particular, they increase the demand for smuggling lower-priced (tax evading) alternatives.  To take that one step further, economists generally think of prohibition as simply a large price hike, and you will be able to much better understand prohibitions if you think of them that way rather that the way they are typically reported, as if they are some qualitative change that suspends the laws of supply and demand.  Imposing a serious risk of legal penalties for owning or selling a good makes it very expensive, but this is no different from something just being so rare or hard to make that it is naturally expensive.  If demand is great enough then the price will be paid.  Of course, fewer people will buy it at the increased price, but quite often some still will do so. 

Indeed, the equivalence of prohibition and price takes other forms too.  The effectiveness of a prohibition regime is typically measured by the street price of the good.  I.e., effectively enforced prohibition quite directly and literally means a higher purchase price, nothing more or less, and ineffective prohibition enforcement is evident in dropping prices, as has been the case with most street drugs over the last decade.  The fact that those who smuggle black market or divert gray market drugs are in jeopardy of arrest or violence means they charge more for their efforts than they would if the market were legal.  If their risks are low, the premium is less unless they can get monopoly pricing by restricting supply, creating a cartel through violence of their own.  If this sounds remarkably like the economics of any other good, that should not be surprising.

So why are we not always at the mercy of monopolies?  The simple economics tells us that in most cases if someone is making monopoly rents, then competitors will be attracted to the market.  One way to keep this from happening is the threat of violence from the government or organized crime, which incidentally are more similar than most people realize.  No, that is not a joke or some kind of ultra-libertarian slogan.  Governments effectively evolved from organized crime, which is little different from feudalism; it offers huge advantages for those in power, who take most of society's surplus wealth, but offers just enough advantage for the rest of the population (compared to being at the mercy of invaders, non-organized crime, or even greater exploitation) that they do not rise up against it.

Anyway, the funny thing about cigarettes is while large companies can make them very efficiently, if the price is raised high enough, then small operations become competitive and face little risk of legal or illegal violence.  This is evidenced by the story of do-it-yourself growing of tobacco and cigarette making right in New York City.  It is a remarkably inefficient use of labor (as is smuggling).  But if changes to the remarkably efficient American economy that created the middle class in the 20th century force people into such inefficiency, growing crops in one's back yard becomes the best option for many people.

Feudalism indeed.

24 February 2011

Unhealthful News 55 - Cribs are not a safe alternative

Last weeks news included a report that revealed, based on a review of hospital records, that each year in the U.S., 10,000 young children end up in the emergency room because of crib and playpen related injuries.  Cribs in the U.S. are governed by a remarkable number of regulations for such a simple object – not as many as there are for cars, but far more than for the more complicated and dangerous device, bicycles.

There is some good reason for this.  Regulations like maximum distance between the bars on the side, preventing infants from getting their head jammed between them or pushed clear through which can have fatal consequences, make perfect sense.  There is no reason anyone would want a crib that created that risk, but it is not reasonable to just demand parents all figure this out themselves.  I know that some of my readers are adamantly opposed to most regulations that are intended to protect people from their own decisions, but I suspect you have to agree that this one is a good idea.

On the other hand, some of the regulations are of the "deciding for someone how they want to trade off risk versus other calculations" variety, tending toward nanny state behavior.  For example, "drop-side" cribs, those with a side that can slide down to allow easier access to the bed and baby, are now banned because if the side is left lowered babies and toddlers are at greater risk of falling out and if mis-assembled the moving parts created a risk of catching or pinching the baby.  This new regulation was mentioned in most of the recent stories and cited as a reason to expect that the injury rate would go down.  Not mentioned was the burden this placed on a 5 foot tall mom, who cannot reach into a crib if the side does not drop, and how she might not be able to get her baby out of the crib until he learned how to crawl to the near side so she could reach him.  (Our society is a hostile place for short women to have kids without a man around.)  Regulations like the drop-side ban and associated recalls of products are so onerous that regular furniture stores seem to have gotten out of the business of selling cribs at all, leaving them entirely to baby product specialty stores that are used to dealing with such hassles.

As for the alarming injured baby statistic, it was noted that 1- and 2-year-olds account for most of the injuries, which is about 10 million at risk in the study population (that is a conservative estimate since older and younger kids were also at risk).  So we have less than 1/1000th of the at-risk population experiencing an event each year.  This would be a disturbing number if many of the events were highly serious, but there is no indication of this.  Deaths were in the order of less than 1/100,000 per person-year.  To put that in perspective, the risk to the kid from car travel is easily ten times that great.

This does not mean that the problems should be ignored.  But it was noted that most of the injuries, especially the most serious ones, consisted of toddlers pulling themselves up out of the crib and then falling to the ground.  This is another case of operator error rather than bad tech, since the mattress can and should be lowered, effectively raising the walls of the cage, as the kid gets bigger.  And, of course, there comes a combination of height and arm strength when the pen walls create rather than reduce falling risk.  In other words, a lot of the injuries were to newly motile kids whose parents had not figured out that they needed to climb-proof their space, so those kids faced some risk wherever they were left alone.  Strange how none of the articles I saw led off with or even clearly noted the message "parents can eliminate almost all of the small but nonzero risk from cribs by making sure the kid cannot climb out of them."

Many of the news articles about the topic mentioned that, despite the hazards, putting the kid to sleep in a crib is safer than any other option.  I find it rather difficult to understand how this can be asserted, given that it is probably quite difficult to get good statistics on how often kids are put to sleep in socially frowned-upon places like, say, their parents' bed (common in most of the world, but widely condemned in the U.S.).  I would guess that there are no good statistics on alternative sleeping arrangements like the parents' bed, dresser drawers, or mattresses on the floor (hmm, that one seems safer).  This is one of those claims that should cause a reporter to say "how do you know that"; they seldom do.

All that got me thinking.  We have a fairly low-risk activity and the risk is being further lowered by changing technology.  It is an alternative to a popular but officially socially-condemned activity.  That sounded really familiar.  So I had to wonder, why the "health promotion" types are not attacking cribs, like they do other harm reduction practices, screaming that cribs are not a safe alternative to other sleeping arrangements.  After all, the study was published in Pediatrics.

How can we accept actions that merely lower the risk from cribs when there are government sanctioned proven methods of quitting...er...sleeping.  As shown here, there are tested and proven safety gear approved by the U.S. National Highway Traffic Safety Administration, International Mountaineering and Climbing Federation, and U.S. Consumer Product Safety Commission.  Photographer's Note:  The latter is not shown in use (it is sitting in the back) since my model started crying every time I put the bike helmet on him, and for some reason his mom then decided the shoot was over.  Models can be such divas.  But, if we have learned anything from anti-tobacco extremists and their ilk, it is that a little needless emotional distress and intense discomfort is a small price to pay to eliminate every last trace of risk.

With that in mind, why is there no demand to do away with these jungle-animal-decorated death traps?  My personal theory is that all the health activists are secretly in the pocket of Big Crib.  This explains why they condemn the most popular alternative in the world (and that mattress on the floor idea) in favor of a slavish devotion to cribs, even as they desperately try to eliminate all crib features that could lead to faulty assembly or other operator error.  Coming soon will be cribs with a lid on top like a hamster cage,  which will be a bit dehumanizing, but will be good for getting kids ready for their role in our increasingly feudal society.  Actually I think it is more likely that the requirement will be cribs
where the mattress cannot be raised above the lowest level, to make it impossible for anyone to not properly lower it when the kid grows.  Yes it will cause all manner of orthopedic problems due to back bending lifting by a large portion of mothers as well as shorter fathers, but how dare you worry about that?  Think of the children!

Where does it end?  I will bet that more babies are injured when being carried than when they are in their cribs.  Shouldn't we do something about this needless risk, perhaps creating some kind of approved device that eliminates the danger of unassisted carrying.  Naturally U.S. government regulators will never endorse anything Swedish, even if it is the obvious natural and popular solution and has proven to be miraculously harm reducing.  But I am sure that Big Crib can come up with some convoluted solution to the problem that has not been contaminated by being an accepted lifestyle alternative for centuries.

[Disclaimers:  (1) No babies were harmed in the making of this blog post.  Not seriously anyway.  (2)  The journal Pediatrics is a go-to outlet for both utter crap junk-science that has a particular nanny-state bias and for legitimate research about children's health that the author thinks should be read by activists rather that just by medics and scientists.  The latter studies are not necessarily junk, they are merely written by authors who are willing to implicitly support the junk so that they can gain greater visibility among those who are not sufficiently expert to know that Pediatrics publishes junk.  Not high praise, I suppose, but it is only fair to concede the point.]

23 February 2011

Unhealthful News 54 - Exercising your brain is good, microwaving it perhaps not

If you glanced at the health news today, you undoubtedly learned of a new study that found, based on brain scans, that talking on a mobile phone has some effect on the brain, though it is not known if that is unhealthy (here is a version with still images of the what the scans look like, which is of course utterly meaningless to the reader, but the colors are pretty).  There has long been speculation about whether the radiation (i.e., signals) from phones that enter the brain, due to being transmitted from a point close to it when the phone is held to the ear, might cause cancer or some other disease.  The new study found increased brain activity at the point nearest the transmitting phone.  I am not going to take on the subject as a whole, but I thought I would point out some specific observations that struck me about the stories.

First, it was remarkable how many stories made observations about how the radiation from cell phones is non-ionizing (that is, it cannot break molecular bonds, which is what makes some radiation carcinogenic) but did not mention that the frequency of the radiation is in the microwave range.  You might recognize that term as something that makes water molecules get hotter, which could alter the brain due to the minor heating effect (as could the direct thermal effect of the waste heat from the phone pressed against the head, or sunlight or just being warm).  I am not saying I believe there is some effect from this – I have almost no idea about the biophysics here – but it was very odd that no report bothered to tell us whether this was likely the explanation for the results that were observed, probably was not the likely explanation for some reason, or that the experts have no idea.

Also, I noticed a lot of the stories seemed to place great stock in the fact that the observed metabolic change was "highly statistically significant".  This seemed intended to cause the reader to believe that the change was of important magnitude even though the magnitude of the change, a 7% increase in activity, seemed modest (though I have no idea whether this is truly small in context).  But all that "statistically significant" means is that the observed result was unlikely to occur by chance alone, which means that even though the effect seems small, random spikes in metabolism are rare enough or they repeated the experiment enough times to see a clear signal above any noise.  This does not mean that the result matters or is even impressive, though presumably that is what the news reader is supposed to be tricked into believing.  (Also, as a more technical point, the phrase "highly statistically significant" is nonsense and indicates a lack of understanding of statistics on the part of the researchers.  Statistical significance is, by construction, a "yes or no" proposition; there are no degrees of "yes" nor is there an "almost" category.  There are other related statistics that have magnitude, but statistical significance does not.)  Note: I wrote more about the technical meaning of statistical significance in UN16.

On a disappointing related note, one of my all time favorite new clippings for teaching was from sometime in the 1990s when an early epidemiologic study reported no statistically significant increase in brain cancer among mobile phone users.  But, the story reported, when researchers looked individually at each of the 20 different brain cancers studied, they did find a statistically significant result for one of them, which was portrayed as worrisome.  The beauty of this, if you do not recognize it, is that the concept of "statistically significant at the .05 level" (which is what is usually meant by "statistically significant") is often explained by saying that if you repeated a study multiple times and there was really no correlation between the exposure and the outcome, then only 5% of the time (1 time out of 20) you would get a statistically significant result due to bad luck.  Thus, we would expect to see 1 out of the 20 different cancers show up as statistically significant. 

This is not actually quite correct, but it is works in spirit, fitting the usual simplification story, so the fact that there were exactly 20 different brain cancers examined made it such a great example, kind of an inside joke for students learning this material.  Unfortunately, this was back in the days before digital copies of everything and I apparently lost every copy of it.  I thought I had found it again a couple of days ago in an old file and pulled it out, and thought that everything had just come together perfectly when the stories about the new study ran today.  Alas, the clipping I found was a far less interesting random story about the same topic from about the same era.  My perfect example remains lost.

So as to not finish on that note of minor tragedy, one last observation about the news stories.  One story caught my eye because the lead for it included the promise to explain how, "Many variables have prevented scientists from getting good epidemiological evidence about the potential health risks of cell phones."  That sounded interesting since only the epidemiology can tell us whether there is any actual health problem, and so far it has not supported the fears that there is.  However, it is far from definitive.  After all, with an exposure this common, a tiny increase in probability among those exposed could still be a lot of cases, and with brain problems – not restricted to cancer – being as complicated as they are, figuring out what to look for is not easy.  So it was disappointing that the article only included the above sentence and, "Radiation levels also change depending on the phone type, the distance to the nearest cell phone tower and the number of people using phones in the same area."

The claim was that because there is so much heterogeneity of exposure, it prevents us from getting good epidemiologic evidence.  But it is actually in cases where there is so much heterogeneity that observational epidemiology of the real-world variety of exposures is particularly important.  The experiment that was reported today, like most experiments, looked at only one very specific exposure (and, in fact, one that was not very realistic), but it served as a "proof of concept" – a demonstration that the phones can have some effect.  But other experiments or narrow studies might have missed this effect if they had looked at a different very specific exposure.  Epidemiology that measures any health problems associated with a varying collection of different but closely related exposures (e.g., all mobile phone use) can provide a proof of concept that does not run so much risk of missing the effect.  With a study of the right type and sufficient quality, observational epidemiology can show whether at least some variations on the exposure are causing a problem, even if not all of them are.  The same data can then be mined to suggest which specific exposures seem to be more strongly associated.

Oh, and just for the record, I try to use a plug-in earphone/microphone when I have a long conversation on a mobile phone.  I would not be surprised if no important health risk is ever found, and is seems that any risk must be small or we would have noticed it already.  On the other hand, why be part of the experiment if you do not have to?  Besides, I just do not like the feeling of the side of my head heating up.

22 February 2011

Unhealthful News 53 - Methadone and the urge to never be positive about harm reduction

It was gratifying to read that first sub-Saharan African methadone-based harm reduction program for injection heroin users had been introduced to Tanzania's main city, a port of call for shipments from Afghanistan to the West.  (It is a perfect trade arrangement: We send troops to Afghanistan and they send back something that also produces adamant feelings of both love and hate, depending on who you ask.)  Tanzania's heroin problem is not the biggest harm reduction target in the world, but every little bit of civilized behavior toward drug users makes the world a better place.

Most of the news story was a matter-of-fact presentation of the situation, and a report on the value of providing an alternative to needle sharing, which has led to a very high prevalence of HIV in the target population.  Interesting, there was no mention of the easier and less-invasive response to that problem, needle exchanges.  It is very strange to report a story that focuses on needle sharing and not even note "a needle exchange program is being considered" or "needle exchange is currently politically infeasible in this country".  But the painful part of the report was this sentence:
Methadone is even more addictive than heroin, though it is given in oral doses meant to be small enough to produce no high.
First, that "no high" observation, presented in the article with a tone of "this is the only/right way to do it", is often cited as a barrier to harm reduction.  For my readers more familiar with tobacco, it is the equivalent of limiting product substitution for cigarettes to nicotine patches which, unless you use several at once, leave most smokers bereft of the effects they want, even if some of the pain of abstinence is removed.  It is possible to give enough methadone that it produces enough high to attract more product switching, rather that restricting doses to unrewarding levels.  Moreover, methadone patients who are unwilling to forgo the high end up scoring heroin periodically or (if the distribution logistics make it possible) taking multiple days' doses at once (which likely leaves them wanting heroin on the off day).

But worse is "more addictive".  Does anyone who writes or takes seriously a claim like that even pause to ask "what would 'more addictive' even mean?"  Even without going on to the next logical step, asking "for that matter, what does 'addictive' even mean", it seems like this would evoke some skepticism.  Even to the extent that there is a well-defined phenomenon that is labeled "addiction", there is no associated quantification, no addicto-meter or even an index of degrees of addiction.  Thus there is no room for comparative statements.

Often the original source of such a statement was someone claiming merely that one behavior is more readily ceased than another.  This often means one takes place for more calendar time than another – e.g., alcohol is "more addictive" than crack cocaine because the typical "addict" continues to consume the drug for more years.  An alternative claim consists of counting up how often someone "tries to quit", which can be little more than declarations of intent, and using that as a measure.  This tends to be higher for drugs that can be used more casually – e.g., smoking is "more addictive" than heroin use because the average smoker declares "ok, that's it, I'm quitting" much more often.  Yet another quantification is how often someone starts the behavior again after stopping for long enough to get clean.  By that measure, once again, alcohol use and smoking will be "more addictive" than more ominous behaviors because once someone extricates himself from the culture surrounding use of a highly-life-altering drug it is a huge step to go back. 

I suspect that almost no one thinks any of these is what they are being told when they read "more addictive".  After all, why would someone use a sweeping term like "more addictive" when what they really mean is something much more specific?  Actually that is pretty easy to answer, but the point is that the phrase misleads people who think they know what they are being told.

So what does today's news reporter mean by "more addictive".  I would guess that he has no idea.  What was the basis for the claim that he heard and uncritically repeated?  I am not sure, but either of the first two above is plausible -- methadone gets used for a long time, and is not much fun so users probably want to quit all the time (however I do not know, offhand, what the relevant statistics are).  But what seems more clear than what the phrase means is that the counter-intuitive claim about drinking methadone being more addictive than shooting heroin seems to be political rhetoric disguised as a meaningless pseudo-scientific statement:  However positive a report about harm reduction is, someone still sneaks in innuendo about the evils of any intervention that does not just force people to stop.  At least in this case, the politics seem to actually be sympathetic to the poor addicted user, rather than the disturbingly common disdain that demands users suffer until they quit.

21 February 2011

Unhealthful News 52 - The more things change

From the New York Times:
A new study has found that most of the time, health information on the Internet is hard to find, hard to read and often incorrect or incomplete, even on the best sites.

The study, described in today's Journal of the American Medical Association, is the broadest on the topic to date; it includes detailed questions on four diseases, rates the difficulty of finding information and the site's reading level, and assesses its accuracy and completeness. ….

"Too many sites are just trying to sell something," [one interviewed researcher] said, "and it is scary how they can make a bad site look good."….

Even though the study reviewed only top-rated sites, researchers still found that the sites gave complete and accurate information only 45 percent of the time, on average.

On average, reviewers found that Web sites had some information that contradicted other information on the same site and the same topic 53 percent of the time. There was wide variation in whether sources for posted information were given. On average, 65 percent of the sites gave both pieces of source information — authors and a date — but the spread was large, from none to 95 percent across the Web sites.
That story ran ten years ago.  I came across a clipping as I was going through a box of my old files (which I am slowing digitizing and getting rid of – dusty paper, yuck!).  Conveniently, it is still available if you want to see the rest of it.

A lot has improved since 2001.  The researchers then lamented not even being able to figure out where to look to find any substantive information.  Now the problem is more with information that is incomplete, politically biased, and especially not up to the current state of knowledge.  What appears now is much easier to read and more professionally packaged (not necessarily a good thing).  But there is still wading through required, but now it is through a lot of content rather than content-free sites that showed up in keyword search results.  And I suspect that the statistics from those last two paragraphs are not any better now.

At least back then, it was clear that simply believing whatever came up at the top of a search was not such a good epistemic strategy.

20 February 2011

Unhealthful News 51 - Argument is good for the soul (of science)

Three weeks ago, in the first of my Sunday "how to figure out who to believe" posts, I started an examination of how Chris Snowdon is much more credible in his criticism of "The Spirit Level" than the authors of that book are.  What I emphasized was that Snowdon criticized specific points that the original authors made, challenging their methods and analytic choices, and the original authors in their replies did not address his (and other critics') points.  Also the critics offered further rebuttals to the indirect responses by the original authors.  Just because someone argues for their criticism does not make it right, of course, but when those who are criticized never respond to credible substantive criticism, it tends to suggest that the criticism is right.  This is especially true when the original authors continue to argue their points, as they did with The Spirit Level (if they just become silent, perhaps feeling that their original words speak for themselves, then it becomes a bit trickier to interpret their failure to respond).

Last week I was giving testimony in the matter reported here.  (Just for the record, I would not have summarized my testimony the way that reporter did.  There is nothing specifically wrong with it, it is just not exactly what I would want to say.)  As part of my testimony, I wrote a critique of a government document, and was cross examined about that by a government attorney.  The questioning brought to my mind an interesting point of clarification about the principle I talked about three weeks ago.  (Indulge me with one more note for the record – sorry about these, but in legal cases it is a necessary hassle:  This proceeding was open to the public and I believe there will be a transcript on the public record, so there is nothing secret about this.  But just in case it might annoy someone if I did, I am not going to say anything of substance about the testimony, and will make up an example instead.)

One of the points I had made in my written opinion was, to paraphrase, that if you are going to legitimately review a contentious issue and justifiably declare that a particular possible conclusion is right, then you must present the best possible opposing case and rebut it.  To make a legitimate argument, you cannot just ignore the information and arguments on the other side, present just one side of the issue, and then declare a conclusion.  Instead, it is necessary to acknowledge the opposing claims and explain why they are wrong, are over-matched by your claims, or are otherwise are unconvincing. 

To take a very simple example, if you want to claim that you are the tallest person in the room (assume that gets you some kind of prize, so you have some incentive to make the claim), you cannot just say, "I am six-foot-five-inches" (or whatever that is in centimeters – let's say a thousand – I am really lousy with centimeters) and "someone just said 'I'll bet you are the tallest person in the room'".  Those tend to support the claim, but we are only hearing one side of the story.  What if you are omitting facts like "but that guy over there looks like he is six-foot-seven", or counterarguments like "as soon as someone said I was tallest someone else said 'no he's not – look over there'"?  If so, then you have not fully represented the situation.  Because of this possible incomplete reporting problem, therefore, if you say nothing about the apparent heights of everyone else and nothing about whether anyone agreed with the claim about you, then a careful listener should become suspicious; you have not even told us who else might have something to say about your claim.  Moreover, if that careful listener had already heard someone say the "no he's not – look over there" then there is clearly a problem.  Failure to even acknowledge that there is opposing evidence does not prove you are wrong, of course.  It might be that you are indeed tallest despite those claims to the contrary.  But if you were confident that your claim stood up(!) to scrutiny and all opposing arguments, why did you just ignore the other information?

When I was being cross examined, I discovered that I may have not communicated my point as effectively as I might have done (or maybe it is possible that the attorney was just pretending to misinterpret it – that is one of the hazards of legal matters:  unlike in science writing where you trust that most people are trying to understand you and if they do not it is because you communicated badly, in legal matters sometimes people are trying to look for a way to misunderstand you).  I was asked what about that government report I found to be good, and when I responded that I was not quite sure how to answer the question, the attorney pointed out that I had said that we have an obligation to present the best points by the opposition and respond to them.  I realize that what I had written might be interpreted as some kind of "you should always find something nice to say about someone" type of thing. 

I could see why that would create some confusion because what I actually do, and what I am suggesting others do, is present the best arguments that are fielded in opposition to your claims and then explain why they are wrong.  (This does not, of course, mean that you need to acknowledge everything that someone says that is right, just address the bits that are in contention.)  This is something that people who are not very good scientists (e.g., in my experience, most people who fancy themselves to be epidemiologists) react to as if it were unacceptably combative.  Quote someone else's claims and then pointedly argue that they are wrong?  How rude!  But only by citing the opposing claims can you (a) make it clear that you understand them but still disagree (as opposed to reaching a contrary conclusion simply because you do not get the point) and (b) present an efficient argument that they are wrong.

But wait, someone might say, you want to present the opposing arguments in order to explain why they are wrong?  Does that mean you are you assuming that the opposing arguments are wrong?  Absolutely not.  I am not assuming they are wrong because I believe a particular conclusion; I reached that conclusion, in part, because I have considered the opposing arguments and concluded that they are wrong.  If I had concluded without considering those arguments, then my conclusion would not be worth very much.  But also, my conclusion would not be worth much if I could not explain why opposing arguments were wrong after having considered them.

In most legal proceedings, it is the duty of an expert witness to inform the primary audience (judge, jury, board, tribunal, etc.) of the truth to the best of your ability and understanding, rather than to be an advocate like an attorney, who often has a duty to do anything within the bounds of the law to advocate for a client's preferred position.  This is not a difficult role to step into, because it is basically the same as the ethical duty to society of a scientist, educator, or journalist.  So the same principles apply in each case.  There is no duty to be nice to those who argue views opposite to what you believe (that is, frankly, hard to do while efficiently presenting the science, though scientists learn that harsh and blunt are part of the scientific process, they are not really non-nice).  But there is a duty to give full credit to their points.

And therein lies what must be the best possible clue about who to believe in a matter where you are not a subject matter expert:  The path of scientific righteousness passes through a complete and honest appraisal of the best of the other paths.  Beware of the guide who tries to tell you that there are no other paths out there.

Update on a related point:  Two Sundays ago I wrote about a New York Times article that urged readers to avoid WebMD and instead read the Mayo Clinic website.  I pointed out some flaws in that reasoning.  In the letters to the editor, a spokesman for WebMD reiterated my point that there is no reason to believe that the Mayo website is less biased.  However, he tries to argue that we should be impressed that WebMD separates advertising from content, which is obviously a minimum standard, but not really a response to the points that generated the original criticism.  He also points out that Mayo's website is also a commercial entity that accepts advertising, which says nothing about their credibility, unlike the pattern of disinformation that I cited.  The spokesman quite reasonably responded to the spurious arguments about who we should trust that appeared in the original article; interestingly, though, he did not respond to the core question about whether we should trust WebMD.

Rather more disappointing, though, was the next letter (same link) by a librarian who suggested that readers should forgo both of those sources and go to Medline Plus instead because it is "without bias".  Apparently her basis for claiming this is that it is produced by the U.S. government.  I trust I do not have to address either of these absurdities.]

19 February 2011

Unhealthful News 50 - Reasonably good reporting about bad breast cancer treatment

Readers will know that I am of the opinion that health reporting is, well, a bit weak most of the time.  Yet there is something about reports of bad medical practice that seem to bring out the best in health news coverage.  I am not entirely sure why; perhaps the difference is that such stories are really more technology reporting (even if triggered by scientific research), which tends to be better than reporting on science.  Also, the research methods tend to be pretty straightforward.

I also notice that there is something about breast cancer that brings out the worst in medical decision making.  Just over a week ago came the discovery that surgeons treating breast cancer had been removing more bits of women's bodies than served any useful purpose.  This time the story is that more than three times too many women in the U.S. get surgical biopsies rather than the cheaper, easier, less damaging needle biopsies that are recommended in almost all cases. 

One article (which when reprinted seemed to account for most of the coverage of this news) suggested that the reason might be because the needle biopsy is typically performed by a radiologist, and surgeons do not like to lose the income by referring the patient to someone else (after all, when you are only making a mid-six-figure income, you really have to scrimp for every penny, even if it means needlessly disfiguring a few hundred trusting patients).  A less disturbing speculation was simply that physicians were not keeping up with the research on what was optimal practice, though this is only slightly less disturbing.  How difficult is it for someone to pick up on a simple guideline for what is one of his specialty's most common activities?  What else do they not know?

That, actually, is a point that was disappointingly overlooked.  What else are cancer surgeons getting wrong and what will be the next piece of news about breast cancer treatment being too aggressive?  (Will it be the realization that it is insane to harangue 40-year-old women to all get mammograms?  Nah, that one will probably be a while longer.)  On the other hand, the leading report on the topic did include a recognition that this is not just imperfect, but is outlandish and demands accountability.  And the article pointed out that the surgeons doing these unnecessary surgeries might be "informing" patients of their choices in a biased way that favors the inferior option.  This observation could be useful to the news-reading public, especially if they realize it generalizes.

So the reporting was generally good.  But, as evidenced by there basically being one major story about it, the coverage was not so good.  Unlike many of the fairly pointless, over-hyped stories I often report on, this seemed to be picked up only by the two major news outlets linked above and re-run in only a handful other print outlets.  I could not find it in any broadcast media (I am sure it was there somewhere, but it certainly was not common or prominent).  I suppose it might run tomorrow, though the first print story had already appeared yesterday, so perhaps not.  Anyway, who would want to read a story (one of the twenty or so found in a Google news search) about these 300,000 unnecessary surgeries per year when you can instead read one of the (>500) stories about the 10,000 toddlers injured each year in cribs, or more precisely, mostly by climbing out of the cribs?

Don't worry.  If you would rather read about the latter, I plan to cover it next week. 

Finally, since this was not the most entertaining of UNs I will direct anyone who follow the politics of tobacco harm reduction to today's post about Obama quitting smoking on the THR blog – it is hilarious.  (Paul Bergen wrote it, though I will claim credit for the "CSNa" bit.)

18 February 2011

Unhealthful News 49 - Good analysis of health information? Priceless.

Every time the concept of the amount to spend to save one statistical life makes the news there is much confusion and kerfuffle about it.  Usually this is when the government adjusts the number that is used (at least in the U.S. for the U.S. government – I assume it is similar elsewhere), as just happened in the U.S.  Perhaps you have not heard of the concept of "amount to spend to save one statistical life", and that is probably because the news stories about it usually use the shorthand "value of a life".  Therein lies much of the problem.

In fairness to the news reporters, that phrase is the shorthand used by those of us who teach, write about, and make use of such numbers.  But that is clearly not actually an accurate descriptor of the concept, and so the reporters are not off the hook:  It is the job of the press to translate our technical jargon when writing a news story.

The concept of a statistical life refers to a situation where lots of people face a small risk, such that you have no idea who might die from an exposure, but you can predict that someone will (though you might not even know who it was after they die).  For example, a government might choose to save some money by leaving a known-dangerous highway interchange without an upgrade.  It is possible to predict that over the next ten years, one person will die as a result of foregoing the upgrade who would not have died otherwise.  You do not know which of the millions of people who use the interchange it will be.

A value for that statistical life is chosen because the government must choose a figure such that any expenditure below that to save a statistical life if deemed worthwhile and anything above it is not.  I used the word "must" literally:  This is not a case of deciding whether we ought to make such a choice. It is always possible to spend more to reduce risk or spend less and allow more risk, so a decision must be made, and "eliminate all risk" is not one of the options.  Any government decision about spending resources to reduce risk or improve health implicitly invokes such a quantitative decision.  Every decision to spend to reduce risk creates a floor (that is, it implicitly declares that it is worth at least that much to save a life) and every decision to forego spending creates a ceiling (however much it is worth, it is not worth what that would have cost).  Yes, it is possible to avoid setting a common number, letting those decisions be made ad hoc, but that just leads to a lot of such numbers (or ranges) which probably are mutually contradictory.  So we either have to make a rational decision to pick a common number or default into decisions that are based on some number anyway, but are not rational.

The reason that a common number is needed for rationality is easy to see.  Imagine that the government decides to spend $10 million to save a statistical life on the highway but only $1 million to per life for food safety.  Or even worse, imagine that that range of numbers was for traffic safety, but we were willing to pay for very expensive policies to fix major intersections, but were only willing to spend a tenth that (per life saved) on signage and enforcement to protect residential neighborhoods.  Obviously we could shift some resources from the first expenditure in each example to the second, and thereby save more lives, and could even do that while spending less money/resources.  This does not help us know what the number should be, but it makes it pretty clear that it should be fairly similar across different policies.

As for the choice of number, it has some grounding in empiricism, though it need not have much.  In theory, those in charge of such numbers try to base them on what people are willing to spend to (statistically) save their own life.  That is, how much money does someone demand to face a 0.1% chance of dying, or how much will they spend to avoid a 0.01% chance.  We look at such things as how much more someone gets paid for a dangerous job compared to one that is equally difficult but with less danger.  Or we look at how much extra people will spend for a safer care.  These involve lots of tricky statistics that try to separate out the premium demanded for risk, or the amount paid to avoid risk, from the other features of the job/car/etc.  This does not, of course, reflect what someone would spend to save himself from a high probability of death, like 100% or even 10% – in such cases we would face a major wealth constraint – most people would spend all they have and would be willing to spend more if they had it, but most people do not have very much.  But the wealth constraint is not binding for smaller numbers.

However, saying that the number is based on these estimates is partially a convenient fiction.  Those estimates are rough and, furthermore, it is not necessary for government to make the same choice that people do for themselves.  A government can choose any number it thinks appropriate, though to deviate too much from the empirical estimates of what people spend themselves would create some problems kind of like the inconsistencies described above. 

The new figure is $9.1 million, up from about $5 million a few years ago.  Most of use considered the old figure to be too low.

Notice how different all this is from a case where an identified person is in peril and we can expend resources to save them.  There is a concept know as the "duty to rescue" that says if we know the specific person we are trying to save, the statistical calculation no longer counts – we have a moral obligation to do whatever we can.  We will spend a limited amount on mine safety, but if some miners are lucky enough to survive a collapse and be trapped underground, we will spare no expense to get them out (though generally it will cost a lot less to rescue someone in that situation then the accepted values of a statistical life – even the Chilean rescue costs a small fraction of $9 million per person).  It is actually hard to imagine spending $9 million to save someone.  But it is possible to spend more than that per statistical life to rescue identified cancer patients, giving a treatment that is very expensive and has only a 1% chance of saving them.  (That opens up the very similar question of rational restrictions on medical spending, which I will not go into here.)

The concept obviously has nothing to do with the existential question, "what is a human life really *worth?"  But confusing the two is what careless readers (aided by careless reporters) often seem to do.  This generates no end of silly complaints about the whole concept.  No one presumes to offer an official answer to the question of value.  But we must provide an answer to the statistical life question.  Answering the existential question with "priceless" seems to cause people who are not familiar with the material world to suggest that there be no limit on spending to save every statistical life, an idea that I trust readers of this blog will see the problems with.

That is not to say that there are not flaws in the concept.  The biggest is that not all saved lives are equal, which is a fatal(!) flaw for setting a single number.  At the extreme we obviously want to spend less to save frail, lonely 97-year-olds than to save healthy, productive, 32-year-old mothers of small children.  Right?  If you do not agree, think of it this way, which is exactly equivalent, but does not bait you into objecting:  Figure out how much you would spend to save the (statistical) life of a 97-year-old.  However much that is, would you not want to spend more to save 32-year-old mothers?

A partial solution is to replace "lives saved" (a rather odd concept if you think about it) with "life years saved".  Better still conceptually (though almost impossible to legitimately calculate, despite implicit claims you might see to the contrary) are "quality adjusted life years".  Even that is not quite right, though because many people would probably agree that it is worth greater expenditure to save 17-year-olds rather than newborns who have more life expectancy.  A death of either would be tragic, obviously, but the 17-year-old is more a part of social networks and generally has greater value to more people, and to be blunt about it,  has consumed a lot more of society's resources and is on the verge of being productive.  (Again, if you think that is a terrible thought, go through the exercise above, picking a number for the infant and then asking if you should not pick an even bigger number for the teen.)

Another complication that gets overlooked is that the same number does not have to be used for all sources of expenditure.  I simplified what I wrote above by talking about direct government spending, but the main role of the figure is to decide when a life-saving regulation is worth the resources it will cost.  But there is no reason why the government might not decide to make those two numbers different.  A cash strapped government (perhaps one under attack by oligarchs who have tricked people into believing naive anti-government propaganda that demands cuts to childhood nutrition programs while cutting taxes on the rich) might decide it can only spend a few million per statistical life saved by government programs directly.  But it could still demand that profit-making companies that are creating risks for people spend a lot more than that to reduce those risks.

Further differences are possible and, indeed, seem very appropriate, and they tend to sneak through though they are seldom formally proposed.  Perhaps a polluter should be required to spend more to reduce the risk to innocent bystanders than a food company should to protect its customers.  Perhaps an auto maker should be required to spend more to protect innocent bystanders (from pollution or hazards a vehicle creates for other driver) than to protect the driver.  In theory, of course, drivers or food buyers could choose their own level of risk, paying a bit more or less based on their own willingness to spend to save their own statistical life, but for obvious reasons this is not practical.

Another problem with the concept is that it still gives some harm away for free, as it were.  That is, if a company makes a product that kills a few people, but it is allowed to do that because to reduce the risk any more would be more expensive than the guidelines call for, then the company saves the money it would have cost to save those lives, but the people at risk do not get the money (except in the sense that resources are not consumed so all of society is a bit richer – that wealth accrues to the company and its customers).  This is not so bothersome when the person at risk is the customer, such as "how strong should the roofs of cars be made" example that has been widely reported.  The customer is the one at risk from not spending more on safety features and is also the one who gets a cheaper vehicle.  It is more bothersome when the hazard is air pollution and those who are put at risk get nothing in exchange for regulators only demanding so much expenditure to reduce the risk.  Something more is demanded there, and it is seldom offered.

In short (and I write this to head off certain comments that this topic inevitably generates), there are limits to how much we can spend to reduce risk.  There are limits to how much regulators can demand to force companies to reduce risk (if there is no limit, no one will make anything).  It is good to have a number for those limits that is consistent within types of expenditure and fairly similar across types.  It is good to base it on rough empirical estimates of what people spend to reduce their own risk.  It is not so good to call it "value of a life", but we are kind of stuck with that.

17 February 2011

Unhealthful News 48 - It turns out that almost everyone disdains giving people information

A couple of years ago, New York City started requiring chain restaurants to post the calories for their food items.  It is not a perfect rule in its details but it seems like something no one could seriously object to.  Unlike New York's efforts to ban hydrogenated oils or force reductions in salt, this does not take away anyone's options.  Even if it does not have much effect it has little downside.  And it definitely has some effect – I have changed my ordering choices several times when visiting New York, with a moment of "wow, really, that many calories? I guess I really don't need that". 

Yes, perhaps the results are limited, in terms of making people become less overweight.  But where is the harm in educating and informing, and inevitably it does lead to some healthier choices.  The effects were very limited in a newly published study that looked at:
A total of 349 children and adolescents aged 1–17 years who visited [McDonald's, Burger King, Wendy's and Kentucky Fried Chicken] with their parents (69%) or alone (31%) before or after labeling was introduced. In total, 90% were from racial or ethnic minority groups.
They reported the result, "We found no statistically significant differences in calories purchased before and after labeling."  I will set aside for another time the fact that "no statistically significant" is not a result (actual measures are a result, reporting only statistical significance is a result mostly in the sense that it is a result of bad epidemiology education).  Instead, I want to note that you could probably not find a subpopulation that was less likely to reduce their calories in response to the labels.  First, we have no idea how many of them should have been reducing their calories.  Moreover, I know that when I was 17, I would have probably seen more calories as a way to get more food value from my limited budget; such an effect might even cancel out those who reacted the "right" way.  Obviously those particular establishments do not exactly attract the most health-conscious consumer; either they eat badly in general or are ignoring health for the moment.  At these restaurants, other than leaving out the fries or ordering smaller sizes (which do not take numbers to recognize as calorie reducing), there is really not a lot of room to maneuver with respect to calories. 

To interpret this study as showing that the calorie labels are not doing any good is an example of the Mythbusters fallacy.  It is not clear that the study authors made any such mistake.  So the question is why this single study, of a population that is far less likely than average to respond to the information and perhaps had no need to cut calories, and thus an equivocal result translated into dramatic, general, and definitive headlines like:

"Calorie Labels Have 'No Effect' on Food Choices" (Associated Press
"Study confirms calorie listings don’t have much bite" (American Council on Science and Health)
"Calorie labelling has no effect on food choices" (The Telegraph)
"Calorie Counts on Menus: Apparently, Nobody Cares" (Time)
In fairness, some other news outlets did qualify their headlines by reporting that the result applied only to kids (though not noting the focus on poorer kids who were eating at the must junk-foody restaurants).  Others, though, compounded the error, like the New York Times's article that included a photo of calorie counts on Starbucks pastries.  Starbucks is the opposite of what was studied, perhaps the best-case scenario for calorie labels.  It attracts customers who are more likely to understand and care, and offers some items that are very highly caloric but also choices that are less so.  Indeed, Starbucks is one place I have found the labels to be quite informative.  Could it be that something fails to have a measurable effect on one subpopulation, but still has a benefit for another?  Apparently it would just be too difficult for someone writing a news story to try to explain that (or perhaps even understand it).

It was interesting that those media outlets that usually rush to report any claim about the increasing scourge of obesity were so quick to dismiss the value of the this intervention.  Perhaps they could at least recognize that the information could trickle through and have some overall effect, even if one observation about someone's fast food order did not reveal that.  Maybe the reminder even caused the study subjects to make up for their splurge with a bit less food that very same day.  Even more interesting is that among those who seemed most eager to over-conclude from this study, and then conclude that the labels are pointless, were media outlets whose politics makes them quick to condemn aggressive government interventions and to say that consumers should be able to make their own informed choices.  Apparently what they really mean is that companies should be free to sell what they want and consumers should be required to make their choices without being informed.

Maybe this was a perfect storm of those with different political biases wanting to spin the same tale of failure.  Those who object to all regulation jumped on board to call for an end to the labeling regulation.  Those who favor action, either more aggressive education or choice-restricting interventions, saw an opportunity to demand that something be done beyond providing information.  Either way, the actual health science, a modest project with a fairly boring and unsurprising result that is nothing more or less than one useful bit of our growing overall knowledge about the topic, was turned into headline news because that offered the chance to editorialize about the topic under the guise of reporting.

16 February 2011

Unhealthful News 47 - Two stories of the problems with randomized trials

One simple lesson I have focused on in a few UNs is to not fall into the trap of believing that randomized trials (RCTs, experiments on people) are always the best way of gathering information.  The usual mantra is "you cannot always do them" or "they would be unethical for some things" and so they are not always possible but, the claim goes, "they are always best if you can do them".

A news item a few days ago emphasizes the ethical problem with RCTs, even when you can do them.  To try to determine (as they eventually confirmed) whether prenatal surgery for spina bifida really is better than surgery shortly after birth (it was generally believed that the former has better outcomes, but creates some risks of triggering premature birth and apparently they were wondering if it was enough better), the small number of American surgical centers that are capable of doing the former basically conspired to take away the option except as part of the RCT they were doing.  The article pointed out the heartbreak of mother with a fetus who had been diagnosed with spina bifida entering the trial and finding out she was randomized to the believed-inferior option.

Now this is not quite as nefarious as it sounds.  A conspiracy to deny a treatment to everyone except those in a trial is roughly equivalent to drug regulators allowing the use of a new drug only as part of a trial.  However, this was a case where the new and presumed better treatment had been available for two decades and was then taken away – and there is something significant that it was taken away by a conspiracy of private actors rather than a government agency that, for better or worse, we have granted police powers to.  (There is also the fact that a baby being born into a sufficiently rich family would presumably be able to get the better treatment outside the U.S., while those who rely on insurance or public funding are forced into the experiment; for the case of a new drug, the need to divert a controlled substance raises the bar for buying-out, at least somewhat.) 

There is one of those "yeah, sure – you just keep believing that if it lets you get through the day"-type myths, that if something is being studied in a trial then everyone's beliefs are supposed to be in a state of "equipoise" about the options, a fancy way of saying they have no idea which is better.  But those who claim this are generally taking refuge in the error of equating "we are not as sure as we would like about which is better" (true, or we would not be doing the experiment) with "we have no strong belief about which is better (almost always false, especially since the experiments are often motivated by the belief that a new therapy is better and we want to confirm that).  Typically the "not always ethical to do a RCT" claim refers to cases where one of the exposures is actively bad for people, as it would be if we were studying the effects of smoking.  But the quasi-ethicists who govern these matters seem to not understand that assigning people to a believed-inferior treatment (even if both treatments are better than nothing) is functionally no different from assigning something harmful.  This story exemplifies that. 

It is still possible to argue that, in effect, forcing some people to take the apparently inferior treatment for a while, in order to become sufficiently sure which is better, is an acceptable cost.  I suspect that most of us would agree with that characterization much of the time, that the value of the knowledge justifies using a group of people who have the bad luck to need the treatment as study subjects.  Nevertheless, the ethical concern about hurting some people for the greater good is not limited to those cases that are considered ethically unacceptable – virtually all RCTs suffer from it.

A more nuts-and-bolts problem, of the type that I have tried to explain, is reported in this story from this morning.  A review of studies found that many of them excluded older people, even though older people use far more medical care.  A closer look at the study, Examining the Evidence: A Systematic Review of the Inclusion and Analysis of Older Adults in Randomized Controlled Trials, reveals that the authors (the senior of whom, Rod Hayward, was one of my mentors and taught me much about evidence-based medicine and is one of the best thinkers in the field) actually seem to be more concerned about the general lack of research protocols that provide information that is optimal for making decisions about older people. These include failure to research diseases and outcomes that might be more relevant to older people, and not doing age-specific breakdowns to see if study results might only reflect the experience of the younger majority among the study subjects.  It appears that the news reporter focused on just the exclusions as either a hook or as a way to dumb-down the story.

All of the concerns seem valid.  The lack of breaking out subgroups is a problem with a lot of medical research (it is not so common with observational epidemiology); there is a strange obsession about reporting results by race, but amazingly little attention to the obviously more important characteristic, age.  As for lack of research on the right topics, there is much to complain about in terms of what research is prioritized.  Both of these are made worse for RCTs, where cost and logistics limit the number of studies that can be done (thus many topics are not covered) and the number of participants in each study (thus there subgroup analysis is less informative because the subgroups get so small).

The point about trials excluding older people, either explicitly or because requirements that subjects have good health and are free living are likely to disproportionately exclude them, is a problem more specific to experiments.  Some people are easier to experiment on than others, because they are less likely to suffer as badly from side effects, because they are more similar to one another, or because they are easier to deal with.  Such  problems do not haunt observational research quite so much:  Subjects do not have to be as cooperative or functional, and any side effects would have happened anyway, so you do not have to endeavor to exclude people because of them.  Also, you can usually collect more data and so having a homogeneous population, as a substitute for having enough data to analyze a mixed population, is not so necessary.

The bottom line remains that RCTs have advantages and disadvantages.  Perhaps observational evidence on spina bifida outcomes was so hopelessly confounded that it genuinely was difficult to be confident about what was better.  When I worked on similar issues (research about both birth outcomes and high-tech, high-skill, rare medical interventions) we saw lots of situations where this was the case.  Among the biggest problems (creating bias to unknown degrees) are that the highest-skilled clinicians tend to gravitate to the most exciting new technologies, and the sickest people often get the more aggressive treatment.  On the other hand, a lot of trials that are done end up telling us nothing much of value compared to the observational evidence we already have.  Sometimes they simply provide too little data, but often they provide a nice solid answer to a question we were not actually very interested in, like "does this treatment for a disease that is usually geriatric provide better outcomes for the rare 50-year-olds that get the disease?"