Sunday, July 08, 2007

Statistical fallacies

I commented Friday on David Sloan Wilson's criticism of Richard Dawkins. Reflecting on the article, I detected some egregious statistical fallacies:
Hypothesis testing does not always require quantification and the other trappings of modern science. ... It should be possible to use [descriptive data about religion] to evaluate the major evolutionary hypotheses. ...

Of course, it is necessary to gather the information systematically rather than picking and choosing examples that fit one’s pet theory. In Darwin’s Cathedral, I initiated a survey of religions drawn at random from the 16-volume Encyclopedia of World Religions, edited by the great religious scholar Mircia Eliade. The results are described in an article titled “Testing Major Evolutionary Hypotheses about Religion with a Random Sample,” which was published in the journal Human Nature and is available on my website. The beauty of random sampling is that, barring a freak sampling accident, valid conclusions for the sample apply to all of the religions in the encyclopedia from which the sample was taken.
Wilson does not correctly use statistical methodology; he presents statistical concepts in a way that harms rather than helps the reader's statistical intuition.

First, random sampling is about statistics, statistics are mathematical, and mathematics is about quantities, not qualities. You cannot use statistical methods on qualitative data. Period. Wilson proceeds to draw a quantitative conclusion from his method:
By my assessment, the majority of religions in the sample are centered on practical concerns, especially the definition of social groups and the regulation of social interactions within and between groups. [emphasis added]
"Majority" is a quantitative term: It means at least 51% of the sample has some characteristic. One must count something to get a percentage. But if you're going to quantify something, it's not sufficient to quantify it by pure subjective assessment, and it's doubly worse to draw subjective conclusions about the sample as a whole and call them statistically quantified. You need to have an objective procedure to precisely specify what you're counting and how you're counting it.

Second, you cannot use random sampling or statistics to draw conclusions about the population from the sample; you can use random sampling only to estimate the statistics of the population from the statistics of the sample. The difference is subtle but important. I cannot simply pick out 100 people and then say that 1% of all people are exactly like each person in the sample. I don't think Wilson is actually doing something this obviously stupid, but he doesn't even mention what he's quantifying and how he's drawing his statistical conclusions.

Even if you quantify something, you have to ensure that your sample is large enough to draw conclusions about the population statistics. For instance, a random sample of 10 people from the New York City phone book would not be large enough to draw conclusions about the prevalence of given names. One can actually quantify the power of a sample with the aptly named technique of Power Analysis.

One might hope that the presentation in the eSkeptic article was merely an unfortunate oversimplification for a lay audience. However, an examination of the published scientific study, Testing Major Evolutionary Hypotheses about Religion with a Random Sample, reveals the same conceptual errors as in the article: A detailed analysis is given for the selection criteria, but there is absolutely no description whatsoever of the methodology used to actually analyze each religion; the analysis is purely subjective.

The paper itself is not all that bad, actually. There's nothing wrong with subjective analysis per se, and a random survey has real value in a new scientific field. But the use of statistical language is misleading, and adds a purely pseudo-scientific veneer to an otherwise interesting paper. Given the subjective nature of the analysis, it would have been sufficient to just say, "I picked 35 religions more-or-less at random and evaluated them subjectively, here's what I saw."

Far better, from a scientific perspective, would be to formulate some hypothesis precisely and state that, "If my hypothesis were false, we would see thus-and-such; we picked 35 religions more or less at random and saw no evidence of thus-and-such."


  1. In other words, I'm not very interested in how he picked the religions; I'm far more interested in the gory details of how he analyzed the religions. It's extremely important to consciously and explicitly control bias in the analysis.

  2. I read his paper, and it went thud not ding.

    I must read it again, properly, not so much because I am interested in the substance (although I am), but because I am interested in whether the thud feeling seems to be trustworthy; can I isolate and explicate proper reasons for it.

  3. They're talking about David Sloan Wilson's attack on "Stranger Fruit" here.

    You might want to pop over and link your blog.

  4. I thought DSW made some interesting points. Regarding your view about using statistical methods on qualitative data, I don't think things are quite as simple as you make out. Of course, there is a tautological point you could make there and I agree that the criteria of interpretation used should be transparent and open to challenge, but I see nothing wrong in DSW using a sampling method in order to mitigate potential bias.

  5. I think DSW made some interesting points too.

    It's not just that he has an over-elaborate sampling method. The real meat of his statistical fallacy comes here: "The beauty of random sampling is that, barring a freak sampling accident, valid conclusions for the sample apply to all of the religions in the encyclopedia from which the sample was taken." This assertion is fallacious; I invoke the elaboration of his sampling method as evidence that he's not just waxing poetical here.

  6. I'm not sure I see the fallacy. All the sampling method does is seek to minimise selection bias, it has no direct bearing on how the conclusions from each of the texts in the sample are arrived at.
    So, suppose you feed each text through a hermeneutic software package, assess the results, publish your assessment criteria and get a high level of agreement between independent assessors of the material. Then you might be able to make an argument that your conclusions are likely to be valid. If somebody then makes the accusation that you have cherry picked the candidate texts in order to support your thesis, you can point to your sampling method. As far as I can tell, Wilson is just preempting that argument.

  7. Psiomniac: The fallacy is that Wilson is making much too strong a statement about what can be inferred from a sampling. It is not the case that, in general, "valid conclusions for the sample apply to all of the religions," i.e. the population.

    It is the case that only some valid conclusions about the sample apply to the population. Specifically, only those conclusions apply where one can calculate the probability that if the null hypothesis were true, one would see properties of the sample, and such probability is sufficiently small.

    To calculate a probability, you have to have numbers, you have to count something. To count something, you have to state precisely what you are counting, and how you're counting it.

    By definition one cannot count anything in a purely descriptive analysis. Wilson does not tell us what he's counting, how he's counting it, or what the count it. He is therefore unjustified in drawing any sort of statistical conclusions.

    Keep in mind that I'm not saying that Wilson's whole paper is invalid or unimportant, or that he's unjustified in drawing any conclusions at all. I'm saying only that he is using the language of statistics in a misleading manner.

  8. Null hypothesis testing has its own critics of course but I think we broadly agree. I read his random sample passage to mean that if he has a good way of drawing conclusions from a given passage, then the random sample specifically indemnifies him against the charge of choosing passages to ensure this result given the criteria he uses. I can see obvious vulnerabilities to criticism in the methodology he uses to draw conclusions from the text, but that is a separate issue.


Please pick a handle or moniker for your comment. It's much easier to address someone by a name or pseudonym than simply "hey you". I have the option of requiring a "hard" identity, but I don't want to turn that on... yet.

With few exceptions, I will not respond or reply to anonymous comments, and I may delete them. I keep a copy of all comments; if you want the text of your comment to repost with something vaguely resembling an identity, email me.

No spam, pr0n, commercial advertising, insanity, lies, repetition or off-topic comments. Creationists, Global Warming deniers, anti-vaxers, Randians, and Libertarians are automatically presumed to be idiots; Christians and Muslims might get the benefit of the doubt, if I'm in a good mood.

See the Debate Flowchart for some basic rules.

Sourced factual corrections are always published and acknowledged.

I will respond or not respond to comments as the mood takes me. See my latest comment policy for details. I am not a pseudonomous-American: my real name is Larry.

Comments may be moderated from time to time. When I do moderate comments, anonymous comments are far more likely to be rejected.

I've already answered some typical comments.

I have jqMath enabled for the blog. If you have a dollar sign (\$) in your comment, put a \\ in front of it: \\\$, unless you want to include a formula in your comment.