EP-ology by Carl V. Phillips: Smokeless Tobacco Junk Science, the Original Winn Sin

Following on the background from my previous post, this is further information about what Winn et al. avoided telling the world in their 1981 paper, thereby increasing how misleading the oft-quoted 50-fold risk estimate is. (Readers encountering this in a newest-first ordering of posts may want to skip down and read the next (i.e., earlier) post first. I thought I should break it up because the entire analysis is so long.)

Much of the bias is what I have labeled publication bias in situ (PBIS), authors making a biased choice, based on results that they prefer, of what to report from their data. This primarily takes the form of running many statistical analyses of a dataset (each of which will produce a different result) and reporting only the results they like without even acknowledging that other analyses were even done. This is a problem that calls a huge portion of the epidemiologic literature into question, though very few people seem to be aware of it.

I should point out to my readers that I have reanalyzed this data several times and written about the results, but for purposes that were not quite the same as those of the present analysis and so were not optimized for this purpose. I did not re-run any statistics or otherwise re-access the original data for this. So if someone wants me to turn this into a more formal publication for another forum, I would want to analyze the data again to be able to provide more precise reports of its contents.

1. Winn et al. failed to communicate just how intense the subjects’ exposures were. This is not a biased analysis of the data, per se, but is an important omission that is likely to mislead readers. Winn collected data on how frequently the exposed subjects used snuff and, as I recall, the median was about 21 hours per day. That means that most of them used it basically all the time – asleep, eating, etc. The subjects started using at early ages, some before they were ten years old. Not in the data, though undoubtedly known to Winn from her research, was that Appalachian women (the study was restricted to women) used powdered snuff by rubbing it all over the surface of all of their gums. These unreported details of the exposure mean that any risk for cancer of the gums and cancer of the inner surface of the cheek that touches them (the particular rare forms of oral cancer, referred to above, that showed the elevated risk in the reported results) would thus be elevated compared to those for a typical ST user, even apart from the product being different. Failure to report these obviously important facts appears intended to mislead the reader into thinking that the results are more generalizable than they really are.

2. The study’s main result is not directly relevant to the 50-fold figure, but it provides important perspective that should be considered.

The study is often cited as showing a 4.2-fold increase in risk from using smokeless tobacco. This, of course, only applies to the particular form and intensity of usage noted above. But this risk is also only for the non-black women who reported ever using snuff. The much lower result for black women is almost never mentioned. This does not seem to be because of the oft-quoted observation that the U.S. government did not care about black people, but rather because it makes the result seem larger than it actually was. It turns out that if this unjustified exclusion is not made (i.e., no race is left out of the “real” result) that the risk is below 4 (though it remains above 3.5 because of the smaller number of blacks in the sample so when someone does not exaggerate the precision and correctly rounds the results to a 4-fold increase, is actually still represents the statistics). As an interesting aside, the 4.2 is often reported as being for “white” women (as the author’s themselves misreported it), but a look at the data shows that it is actually for non-black women, including native Americans along with whites, a very odd unexplained choice that further suggests trying to cook the data.

More significant for present purposes is the definition of exposure that you may not have even noticed when reading the previous paragraph: Subjects who reported ever using snuff were included among the exposed. Anyone doing epidemiology today would recognize that some distinction needs to be made between long-ago former users and recent users, and would recognize that the failure to do so in this study – while perhaps understandable given the primitive state of epidemiology in the 1970s – makes the results suspect. This is especially bad given that many of the “exposed” used very little snuff in their lives and quit decades before the study. Winn et al. could have separated out former from current users, and made even finer divisions than that – they had the data to do so.

If exposure measurement were perfect, inclusion of former users in the exposed group would likely cause an underestimate of any risk, since long-quit former users would have the lower risk of a non-user and so would dilute the average risk for users. But this rule of thumb – that the bias would be downward – is misleading in this case for two reasons. One requires some context, and so is described below. The other is the well-known problem of recall bias: Someone who is suffering from OC will naturally seek an explanation for the disease, thinking hard about possible relevant exposures, wanting to recall them rather than hide them. By contrast, a healthy person who is chosen at random for the comparison group has little incentive to try to recall or admit to long-past substance use. Similarly for recent decedents (part of the sample compared women who had recently died from oral cancer to those who had recently died from other causes), the relatives of those who suffered from oral cancer were more likely to have recently heard the subject recount a story of long-past oral product use, during their time of her final illness, than those who died suddenly of a heart attack. Thus, including former users tends to dramatically increase this bias which overstates the association of the exposure and the outcome (it is possible that current users or relatives of those who used until their deaths could also deny the exposure, of course, creating measurement error even for current users, but misreporting former use does not require such blatant conscious lying).

Returning to the relative risk of about 4, if this is the real main result, whatever its upward biases, why does the much larger figure of 50 even exist? There actually is explanation can be attributed to a legitimate honest analysis, though one that I need to explain here because the authors did not do so in their overly-abbreviated paper and Winn did not include this analysis in her much more complete dissertation. (The latter observation is interesting in itself: Though I can come up with an honest explanation for the existence of this analysis, it is a bit suspicious that Winn, and her advisors at what was one of the best epidemiology programs, did not feel the analysis was worth including in her dissertation, but the political actors who hired her after graduation wanted to add it.)

The analysis that included the 50-fold result appears primarily intended to help support the hypothesis that the observed actual risk (i.e., the 4) represents a real causal relationship rather than confounding (which is jargon for: there was something about those who used snuff that was different from those who did not – other that the use of snuff – and those other differences rather than the snuff were what caused the difference in disease outcomes). Sometimes a behavioral risk reflects a different personality type that is just generally less healthy for a variety of reasons, and so a naive analysis might suggest that the behavior is causing the risk when it really is not. For example, as I have previously argued, the Henley/American Cancer Society reports that are sometimes thought to show that there are disease risks from ST use really just show that among relatively wealthy, mostly middle-class, socially connected people in the U.S. in the mid-20th century, those who used ST were different from those who did not use tobacco – big surprise! ST users had a higher risk for a wide variety of diseases, including violence/trauma and cirrhosis, reflecting diseases that were obviously not caused by ST, thus suggesting that all of the other elevated risks were also not caused by ST. Indeed, when even very poor control variables were used to partially adjust for these differences among, most of the apparent risk disappeared. The nature of the Winn study means that risks of diseases other than OC could not be measured, but a similar analysis could be done by looking at the cancers that were at the specific anatomical place where the snuff was used to the much more common cancers elsewhere in the mouth.

Another observation that can help support a conclusion of causation rather than confounding is the “dose response” trend, such that people who are most exposed have greatest risk, people with less exposure have some elevated risk but not as much, and those who are unexposed have lowest risk. While it is quite possible for the effect of confounding to also have such a dose-response trend, it is often reassuring to see some trend before concluding causation. Thus, the analysis that produced the 50, which separated the rare proximate cancers and looked as dose-response (albeit dishonestly – see below) could be seen as a reasonable attempt to test the claim of causation, or at least it could have been seen as honest and reasonable if the authors did not try to interpret the results has having any meaning beyond that.

[Aside: It is important to not overstate the value of these observations in discriminating causation from confounding or errors. Sometimes people who do not understand the nature of scientific inference refer to these considerations as “causal criteria” there there is actually no such thing as causal criteria, let alone method for proving an association is causal. But that is another story.]

3. Cutpoint bias: The choice of points at which to divide up continuous data (like the number of years someone used snuff) into categories offers great opportunities for biased analysis. Since there is no necessarily right way to do this – one method looks just as good as any other if done (i.e., without reference to stated standard or previously used methods, as was the case in the Winn paper) – it is impossible for readers to see that the choice was biased. We recently analyzed this problem and proposed a solution, though implementing it requires that the authors desire to report honest results, rather than to take advantage of the opportunity to cook up results the authors prefer, or that the editors/readership understand the problem well enough to demand it.

If the reader has access to the data, however, it is possible to assess what the result would have been if other cutpoints had been used. In the case of the Winn data, most other choices that could have been made produce a dramatically less clear dose-response trend, and a much smaller largest relative risk (i.e., less than 50). The authors did not quite choose the most biased results possible; there are (just) a few ways to get (slightly) more extreme results from the data by choosing different cutpoints. But authors who are trying to bias their results this way need to make their methods appear unbiased and so need to restrict themselves to round numbers and other characteristics that make the cutpoints seems like a “natural” choice, and given this constraint they did about as “well” as they could.

4. Choice of dose measurement: Someone reading up to this point may not have questioned the reference to dosage defined in terms of counting up the years during which the subject consumed any snuff. Presumably Winn et al. were counting on exactly the same failure to notice the oddity. It is almost unheard of to measure dosage for an exposure like ST this way instead of in terms of total consumption (e.g., the “pack years” measurement for cigarette dosage). Even intensity of current consumption (quantity or time used per day) is a probably a better measure of exposure than is a measure that conflates someone who used a few times a year with someone who used large quantities for 24 hours a day for the same period. If years of use were the only proxy for total exposure that the authors had, it might be worth analyzing the data with this as a proxy for dose (with a clear statement that they recognize it is not optimal). What readers who has not seen the data would not know is that Winn had data on intensity of consumption and total consumption, though this is not mentioned in either her dissertation (which did not include any analyses based on dosage, and so did not have need to mention it, though it might have been useful descriptive information) or the Winn et al. paper.

I bet that you can see where this is going. Yes, you guessed right: If you use the more legitimate measure of exposure dosage, it is impossible to get a dose-response trend that is so clear or such a big number at the top, even playing with the cutpoints. It is possible that there is an honest explanation for the choice that the authors made, but since the paper does not even acknowledge that this choice was made, let alone defend it, it seems like a stretch.

5. Trimming away the unexposed OC cases: Most readers of epidemiology who have a bit of knowledge about how to critically analyze results think to look at the number of exposed cases. This number usually appears in the text, though seldom in the press release. When it is quite small, as it often is, there is greater potential for certain biases and definitely greater instability. But for the 50-fold result it is actually the low number of unexposed cases that drive the result. But wait, you might ask, is not the number of unexposed cases the same for the main analysis as the one that looks at the highest exposure level? It turns out not, for several reasons. The first reason is legitimate – recall that this analysis is restricted to the rare subset of cancers, which eliminates a lot of the cases.

But then Winn et al. decided to eliminate all of the deceased subjects, making some vague allusion to the possibility that relatives who were asked to recall periods of use would be less able to recall it than living subjects (this allusion is made especially vague since the very brief paragraph that includes it has prose editing errors that render it nonsense). This concern might be valid, but it seems like it is less of a problem than is throwing out half of the already rather sparse data, and so conveniently removing all but 2 of the unexposed cases. Because of this, the trimmed data makes it appear that almost no one who does not use snuff gets these particular cancers, and thus the relative risk is dramatically elevated.

If the authors had been genuinely concerned with recall bias, they would have not included long-past former users in the exposed category, as noted above, since this dramatically increases the chance of recall bias. Indeed, this is where the exposure definition really gives some misleading results. Presumably most readers would be bothered if they had been told that subjects who had not used snuff for half a century before the study, and who had only used it briefly before that, were classified as exposed rather than unexposed. As it turns out, if you change the classification for such individuals, the number of unexposed cases doubles and so the relative risk drops by about half. Seriously. If we merely do not count someone as “exposed” for purposes of the time of the study if she only used for a couple of years in her youth – half a century before she developed OC and well before the start of the Great Depression just to put that in perspective – then the 50 drops to 20. Further tightening the definition of exposure causes further changes. Even that 20 is still an exaggeration, but it is a much smaller exaggeration.

Note that this is a case where the recall bias does not create a downward bias in the risk estimate, as the naive rule of thumb says it will. There will be some “exposed” noncases that had not used for decades also, who will thus increase the number of unexposed noncases if classified correctly, which would slightly increase the relative risk. But in the dose-response analysis, these are stripped out of the low exposure category (a long-former users could not have used for 50 years), which does not affect this estimate at all, and added to the never-user noncases which had not been been trimmed down to 2 and so the effect is not as great. If they were stripped out of the exposure group that was being studied (as would be the case if only exposed-vs-unexposed were being considered), this would increase the relative risk estimate, balancing the decrease, but that is not what happens in this case. Moreover, I believe that it turns out in this particular case that the subjects who were most absurd to count as exposed happened to be cases (perhaps because of that recall bias problem already noticed).

In summary, a review of the data that was used to produce the Winn et al. paper reveals several ways in which the analysis was biased to exaggerate the statistics that were reported. Such methods of biasing the data in ways that can easily be hidden from most readers appear to be fairly common in analyses by activists, and this PBIS calls into question large parts of the epidemiology literature. It is difficult to even count all of the reasons why the 50-fold risk number has no place in an honest discussion of THR or the risks from ST more generally. Given that many of these reasons (from Part 1 of this analysis) are well-documented or simply obvious, anyone who presents that number is either trying to lie or is so ignorant of the science that they really should not pretend otherwise. As Part 2 of the analysis shows, those accusations go doubly for Winn, the NCI, and others involved with the analysis, since unlike most readers they know the additional hidden dirty secrets that can be found in the data.

EP-ology by Carl V. Phillips

22 June 2010

Smokeless Tobacco Junk Science, the Original Winn Sin – Part 2, what the data shows

No comments:

Post a Comment

EP-ology

Blog Archive

Blogs: critical analysis, THR, and other stuff I work on