Thanks for the reply. To clarify, what I meant is...

2012-03-24T11:42:06.150-04:00

Thanks for the reply. To clarify, what I meant is that they never answer a probability question about the real world. So, they answer questions of "how likely would...?" (subjunctive tense reflecting the hypothetical nature of the question) about some mathematical construct, as in "if X it were true, how likely would Y be?", but not "is" questions that refer to reality. Even such an apparently mathematical question as "how likely is it that a fair die rolls a 1" can only be answered with the subjective (Bayesian) analysis, and only hypothetical questions about non-real constructs can be answered with the nice clean math.

I definitely agree that ignoring random error in reporting or analysis is bad. But reporting just a p-value, as is often done, can be just as bad in its own way -- it is not very informative and to most readers it implies things that are not true. Reporting a CI is much better, so long as readers recognize that it is just a rough reporting of about how much random error there is, and that the exact numbers are meaningless.

But while we are at it, the version of the "how likely is it...." hypothetical (which is better phrased "how likely would it be...", btw) is incomplete in important ways that are often overlooked. The question also needs to include caveats about there being no measurement error (or else be rephrased to refer to the measured rather than true values) and that selection is unbiased. Furthermore -- quite important and always ignored -- the question needs to refer to the statistical model: "analyzing the data using only the particular model that was used".

But if multiple models were tried and only the one the researchers liked best was reported (less likely to have happened for the simple proportion from Oregon, but possible even there), then the "using the particular model" error statistics are misleading. Instead an incredibly complicated (and never reported) statistic described by "if all of the following models were tried and the one that was the most X was reported". The error statistics that are reported are based on the "using this particular model" fiction, and are thus incorrect. (I have written several papers about this.)

"You see, frequentist statistics never answer...

2012-03-24T07:54:27.430-04:00

"You see, frequentist statistics never answer the question "how likely is...?"

Surely they do answer the question 'how likely is...[something]', but it's just not the something that people intuitively expect (i.e. probability of an outcome), which is why it gets mangled so much.

In your difference of proportions example, a p value of a test of differences is a measure of 'how likely is it we'd observe the difference in proportions we are observing (or a larger difference) if there genuinely was no difference between the proportions in the underlying population'.

I'd agree that there is the whole 'cult of the p value' thing where people don't consider the other issues of measurement quality. But there are still many spheres of data presentation where things like standard error & p values aren't even considered when it'd be appropriate to do so. A frequentist approach has many limits, but even its strengths aren't always (or maybe even often) used.

Comments on EP-ology by Carl V. Phillips: Quick statistics lesson - difference of two proportions and limits of frequentist stats

Thanks for the reply. To clarify, what I meant is...

"You see, frequentist statistics never answer...