Confidently Wrong: Why Ignoring Binaries Biases IMF Inference at Large Sample Sizes
Confidently Wrong: Why Ignoring Binaries Biases IMF Inference at Large Sample Sizes
ABSTRACT
The stellar initial mass function high-mass slope "and" is routinely measured by fitting single-star models to photometric samples that contain twenty to ninety percent unresolved binaries. This practice introduces a systematic negative bias on "and" that is constant with sample size N. Because posterior credible intervals shrink as one over square root N, at sufficiently large N the bias exceeds the reported uncertainty and the true value falls outside the credible interval - a regime we call "confidently wrong." We bracket this bias between two limiting observation operators: mass-addition (mobs equals m one plus m two), a formal upper bound on unresolved-system mass overestimation, and luminosity-addition (mobs equals L one over L two times L one plus L two), an idealized lower-bias photometric case based on the ZAMS mass-luminosity relation. Across four astrophysical environments spanning "and" equals one point six zero to two point three zero, we find: one) mass-addition bias of zero point zero five four to zero point zero eight six with crossover to confidently wrong at Ncross approximately five thousand to ten thousand; two) luminosity-addition bias of zero point zero one one to zero point zero two one with Ncross approximately seventy-five thousand to one hundred fifty thousand; and three) a binary-aware mixture likelihood that marginalizes over the Moe and Di Stefano twenty seventeen binary population model recovers the true slope in the synthetic tests presented here. Published single-star IMF slopes can therefore plausibly carry systematic errors of order zero point zero one to zero point zero nine if unresolved binaries are not modeled, comparable to or exceeding reported uncertainties in some regimes. Since current and upcoming surveys (Gaia, JWST, Roman, LSST) will deliver N equals ten to the power of four to ten to the power of six resolved stars per rich cluster, binary-aware inference is likely necessary to avoid binary-driven systematic bias in the large-N single-star-fitting regime.
One. INTRODUCTION
One. INTRODUCTION
The stellar initial mass function, the distribution of stellar masses at birth, sets the supernova rate, chemical enrichment, and energy budget of star-forming galaxies. Its high-mass slope "and", defined so that §(m) equals dN over dm times m to the power of negative "and" for m greater than or equal to one M o (Salpeter equals two point three five in this convention), is the single most consequential IMF parameter: it controls the ratio of massive to low-mass stars. Whether "and" is universal or varies with environment remains one of the oldest open questions in star and galaxy formation.
Unresolved binary stars contaminate every photometric IMF measurement. A substantial fraction of stars, from twenty-two percent of M dwarfs to ninety percent of O stars, reside in binary or higher-order multiple systems. When a binary system is unresolved, the observer infers a single "system mass" that is systematically higher than the primary mass. This shifts the observed mass function toward higher masses, mimicking a shallower (less negative) IMF slope. The problem extends beyond the IMF: unresolved multiples inflate inferred cluster masses, create apparently ultramassive stars above the true stellar upper-mass limit, and bias stellar parameters and abundances in spectroscopic surveys.
Kroupa et al. showed that unresolved binaries significantly bias the low-mass luminosity function, and Maíz Apellániz showed that unresolved multiple systems and chance superpositions bias massive-star IMF determinations. Weidner et al. found that even one hundred percent binarity shifts the observed high-mass slope by less than or equal to zero point one. While tools such as BASE-nine can in principle model unresolved binaries, in practice most IMF analyses, including those using ASteCA, treat photometric sources as single stars or assume fixed binary fractions, because fully marginalizing over binary parameters is computationally expensive. However, no study has characterized the crossover sample size Ncross at which the bias exceeds the credible interval width, as a function of observation operator (mass- or luminosity-based) and birth environment.
The Moe and Di Stefano twenty seventeen model provides primary-mass-dependent binary fractions and joint distributions of orbital period and mass ratio calibrated from solar-type to O stars and over nearly six orders of magnitude in orbital period. This framework makes it possible for the first time to compute the crossover sample size as a function of environment using empirically motivated binary statistics. The mass-dependent structure of Moe and Di Stefano twenty seventeen is essential because the contamination from unresolved binaries itself depends on the underlying IMF slope.
The critical question is not whether the bias exists prior work establishes that it does - but at what sample size it begins to matter statistically. Because the systematic bias on "and" is approximately constant while statistical uncertainty shrinks as one over square root N, there must exist a crossover sample size Ncross beyond which the bias exceeds the credible interval width and the posterior excludes the true value.
Current and upcoming surveys are already delivering sample sizes where this effect becomes unavoidable. Gaia DR three contains greater than one hundred five stars in nearby open clusters, and Gaia DR four will provide approximately one hundred six astrometric orbital solutions that directly constrain binary populations. The Vera C. Rubin Observatory's LSST, now commissioning, will deliver deep photometry of thousands of resolved stellar populations with one hundred three to one hundred six stars each. Early Rubin commissioning observations already reveal unresolved binary sequences in forty-seven Tucanae. JWST is resolving individual stars in Magellanic Cloud clusters where binary contamination must be modeled to recover the intrinsic IMF. The Nancy Grace Roman Space Telescope, scheduled to launch by approximately twenty twenty-seven, will extend resolved-star censuses to dust-obscured young massive clusters, including the Arches, Quintuplet, and Galactic Center young stellar populations, where the IMF slope measurements remain actively debated and sample sizes will reach N greater than one hundred four. Because statistical uncertainties shrink as one over square root N while binary-driven bias remains approximately constant, these large surveys will inevitably enter the bias-dominated regime if binaries are ignored. In our benchmark tests this crossover occurs at Ncross approximately five thousand to one hundred fifty thousand, depending on the observation operator.
In this paper, we bracket the binary contamination bias between two limiting observation operators: mass-addition (mobs equals m one plus m two; worst case) and luminosity-addition (mobs equals L-one (L-one plus L-two); best photometric case for unevolved populations). Both operators produce a bias-dominated regime at survey-relevant sample sizes, and a binary-aware mixture likelihood removes the bias. Section Two describes the binary population model and inference framework; Section Three presents the bias, crossover, and recovery results; and Section Four discusses implications for current and future surveys.