Case selection and selection bias in small-n research
Case selection and selection bias in small-n research
Introduction
Designing social research is often a blood, toil, sweat and tears experience, with the road to publication usually long and winding. Constantly, the researcher has to weigh different options, and case selection is often considered a particularly delicate and demanding step. For King and colleagues, "poor case selection can vitiate even the most ingenious attempts, at a later stage, to make valid causal inferences." In small-n as well as in large-n approaches "the cases you choose affect the answers you get." However, case selection usually differs between those two approaches - and for good reasons. Whilst large-n studies generally seek representativeness, for example by random sampling, case selection in small-n research usually follows an intentional logic. Intentional does not, however, mean arbitrary. In the end, the types of cases you select determine which inferences you can draw.
In an idealized research cycle, case selection usually takes place after the formulation of the research question, elaboration or compilation of theories and concept specification. Case selection thus links theory development and the empirical testing of these theories. As they usually select their cases non-randomly, small-n researchers are particularly in jeopardy of introducing selection bias. A selection bias results from a faulty inference that wrongly attributes the properties of the scrutinized cases to the larger universe of cases. In this chapter, I will first identify different types of selection bias. I will then introduce some strategies for case selection that are commonly applied in small-n research. I will argue that case selection in small-n research should be considered a theory-guided iterative process. Theory defines the variables that are to be included in the research design. On the basis of these variables we can construct multi-dimensional classification schemes that structure the possible universe of cases on theoretical grounds. Such typologies help in selecting cases as well as discussing the generalizability of one's findings. In this chapter, I will illustrate some methods of case selection by referring to my own research on the consequences of French divided government. Referring to my own research allows me to point out some problems and trade-offs around the issue of case selection that can arise during the research process. Methodological treatises often tend to merge the steps of case selection, data collection and data analysis. This, however, is not the practitioner's view. In practice, for example, the universe of cases is often not known right from the start, and case selection must take place behind a veil of ignorance. But how should one proceed when the universe of cases is 'clouded in mist'?
In general, when it comes to case-selection there is not one road to salvation, rather "earthly sinners" must find their own paths. Some paths, however, seem more appropriate than others. Researchers should first of all be conscious about the pitfalls of case selection and be as transparent as possible when describing their case selection strategies. Verbalizing the problems around case selection and discussing the principal trade-offs is already a step that helps the reader gauge the impact of possible bias.
Design problem
Design problem
This chapter analyzes case selection and selection bias in the social sciences. A case here is considered a unit or an object of comparison. It takes a particular value on each dimension that is submitted to the comparison. As stated above, a selection bias is a systematic error that results from improper inferences drawn from a sample. Accordingly, in "configurative-ideographic studies," "atheoretical" or "interpretative case studies" selection bias is not a big problem since those kinds of research focus on cases per se. However, to therefore conclude that we should all refrain from drawing generalizations and inferences altogether is certainly not a satisfying response. In fact, such a strategy would correspond to suicide from fear of death.
In theory, we can distinguish between different types of bias. In a real world or contingency bias, the universe of available cases is biased by historical contingencies. For instance, in comparative country studies the number of available cases is clearly determined by the historical development of nation-states. If, however,
nation-building is linked to what we wish to explain, such a contingency bias may raise some delicate problems of endogeneity. In this chapter we will, however, mostly focus on researcher-induced bias. Such bias results from improper measurement or non-random case selection strategies. Measurement bias can be attributed to unreliable indicators or biased sources. For instance, shows that an analysis of social movements that primarily relies on media reports is likely to suffer from a bias. The media pre-select cases according to their own logic. For example, newspapers might over-report violent forms of participation. Similarly, legislative datasets can be biased. Mayhew's study on divided government, for example, is criticized by Fiorina for only analyzing the production of legislation and neglecting its demand. Edwards and colleagues and Binder in response use datasets that include unsuccessful proposals as well. This, indeed, brings forth more nuanced findings about the consequences of divided government.
Self-selection, volunteer or participation bias is closely related to the measurement process. Pre-selection mechanisms such as response or non-response non-randomly determine who participates in a survey or experimental sample. In the case of such bias the sample does not mirror the target population properly. Whereas real world and measurement bias equally applies to large-n and small-n research the bias resulting from an intentional or non-random selection of cases is more of a problem for small-n research. In small-n research, cases are generally selected intentionally. An intentional selection of cases, however, leaves room for manipulation. A rather obvious selection bias occurs if a researcher only selects cases that confirm the initial theory. Conspiracy theories most clearly display such confirmation bias. Conspiracy designers only collect information that supports their theory and do not report any opposing evidence. Almost anything can be claimed by such a method; however, such practice does not at all meet the standards of social scientific research. A more common type of confirmation bias, however, consists in selecting cases that share the same value on the dependent variable. When interested in revolutions, researchers study revolutions, when interested in prosperity they analyze economically successful countries. Not including "negative" cases, however, can lead to the false conclusion that any characteristic that these cases share should be considered a cause. Finally, selection bias does not only occur in cross sectional analysis.