Research Article: P-curve won’t do your laundry, but it will distinguish replicable from non-replicable findings in observational research: Comment on Bruns & Ioannidis (2016)

Date Published: March 11, 2019

Publisher: Public Library of Science

Author(s): Uri Simonsohn, Leif D. Nelson, Joseph P. Simmons, Iratxe Puebla.


p-curve, the distribution of significant p-values, can be analyzed to assess if the findings have evidential value, whether p-hacking and file-drawering can be ruled out as the sole explanations for them. Bruns and Ioannidis (2016) have proposed p-curve cannot examine evidential value with observational data. Their discussion confuses false-positive findings with confounded ones, failing to distinguish correlation from causation. We demonstrate this important distinction by showing that a confounded but real, hence replicable association, gun ownership and number of sexual partners, leads to a right-skewed p-curve, while a false-positive one, respondent ID number and trust in the supreme court, leads to a flat p-curve. P-curve can distinguish between replicable and non-replicable findings. The observational nature of the data is not consequential.

Partial Text

P-curve is the observed distribution of statistically significant p-values (p ≤ .05) testing the hypotheses of interest from a set of studies. The shape of that distribution diagnoses if the findings contain evidential value, telling us whether we can statistically rule out selective reporting of studies (file-drawering) and/or analyses (p-hacking) as the sole cause of those statistically significant findings [1]. In follow-up work we have extended p-curve uses to estimate underlying statistical power in a way that corrects for selective reporting [2], made p-curve more robust to errors and fraud [3], and applied it to the popular and controversial power-posing literature [4]. An online app makes it easy to use p-curve by copy-pasting test results into a simple form (

To demonstrate p-curve’s ability to distinguish between replicable and non-replicable findings in observational data we provide two examples that use data from the General Social Survey [6]. In the first example we examine a confounded association: shotgun owners have had more female sexual partners. The omitted variable is gender.

It is as important to distinguish causation form correlation when interpreting results from single studies, as it is when evaluating the performance of statistical procedures on sets of studies.




Leave a Reply

Your email address will not be published.