Statistical assumptions as empirical commitments Real probability samples have two great benefits: (i) they allow unbiased extrapolation from the sample; (ii) with data internal to the sample, it is possible to estimate how much results are likely to change if another sample is taken. These benefits, of course, have a price: drawing probability samples is hard work. An investigator who assumes that a convenience sample is like a random sample seeks to obtain the benefits without the costs—just on the basis of assumptions. If scrutinized, few convenience samples would pass muster as the equivalent of probability samples. Indeed, probability sampling is a technique whose use is justified because it is so unlikely that social processes will generate
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
Statistical assumptions as empirical commitments
Real probability samples have two great benefits: (i) they allow unbiased extrapolation from the sample; (ii) with data internal to the sample, it is possible to estimate how much results are likely to change if another sample is taken. These benefits, of course, have a price: drawing probability samples is hard work. An investigator who assumes that a convenience sample is like a random sample seeks to obtain the benefits without the costs—just on the basis of assumptions. If scrutinized, few convenience samples would pass muster as the equivalent of probability samples. Indeed, probability sampling is a technique whose use is justified because it is so unlikely that social processes will generate representative samples. Decades of survey research have demonstrated that when a probability sample is desired, probability sampling must be done. Assumptions do not suffice. Hence, our first recommendation for research practice: whenever possible, use probability sampling.
If the data-generation mechanism is unexamined, statistical inference with convenience samples risks substantial error. Bias is to be expected and independence is problematic. When independence is lacking, the p-values produced by conventional formulas can be grossly misleading. In general, we think that reported p-values will be too small; in the social world, proximity seems to breed similarity. Thus, many research results are held to be statistically significant when they are the mere product of chance variation.
In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.
The assumption of imaginary ‘super populations’ is one of many dubious assumptions used in modern econometrics and statistical analyses to handle uncertainty. As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made.
Economists who apply econometrics usually react to this kind of critique by telling us they — of course — already know it all. But, if so, why do they just keep on doing what they do without ever discussing the validity of these assumptions?
Fictional analysis does not give us anything but fictional results. Inferences to imaginary populations are — rather unsurprisingly — also imaginary. Convenient fiction? Yes. Of real-world significance? No!
Without knowledge of the real-world data-generating mechanisms, the use of statistics is of questionable value.