Date Published: February 12, 2018
Publisher: Public Library of Science
Author(s): Diana E. Kornbrot, Richard Wiseman, George J. Georgiou, Sergi Lozano.
The quality of psychological studies is currently a major concern. The Many Labs Project (MLP) and the Open-Science-Collaboration (OSC) have collected key data on replicability and statistical effect sizes. We build on this work by investigating the role played by three measurement types: ratings, proportions and unbounded (measures without conceptual upper limits, e.g. time). Both replicability and effect sizes are dependent on the amount of variability due to extraneous factors. We predicted that the role of such extraneous factors might depend on measurement type, and would be greatest for ratings, intermediate for proportions and least for unbounded. Our results support this conjecture. OSC replication rates for unbounded, 43% and proportion 40% combined are reliably higher than those for ratings at 20% (effect size, w = .20). MLP replication rates for the original studies are: proportion = .74, ratings = .40 (effect size w = .33). Original effect sizes (Cohen’s d) are highest for: unbounded OSC cognitive = 1.45, OSC social = .90); next for proportions (OSC cognitive = 1.01, OSC social = .84, MLP = .82); and lowest for ratings (OSC social = .64, MLP = .31). These findings are of key importance to scientific methodology and design, even if the reasons for their occurrence are still at the level of conjecture.
There has been much recent concern about the reproducibility of research in science (see, e.g., [1–3]. In psychology this has led to two major replication studies. The Open Science Collaboration (OSC) attempted to replicate 100 cognitive and social psychological effects from prestige cognitive, social and general psychological journals (2). The OSC attempted one replication of each of 98 originally significant effects. The Many Labs Project  made 36 attempts to replicate 16 effects from 13 studies, mostly in social psychology journals (1). There were 36% of OSC and 72% of MLP effects that replicated (where an effect is defined as replicable when both an original study and its replication are significant at the 95% significance level). Using Cohen’s d as a common measure of effect size (the r—values reported by OSC were converted to d, using equation d = 2r/√ (1-r2)) , the OSC replications yielded an overall mean of .96 (.83, 1.09) whilst the studies involved in the MLP resulted in a mean of .85 (.45, 1.28).
As predicted, for the MLP replication, proportion measures are superior to rating measures for replicability, d = .80. For the OSC replication, with weaker power, but a more numerous and diverse range of studies, there are also clear effects of measure type. Rating measures are inferior to proportion and unbounded measures combined for replicability, with a moderate effect size of Cohen’s d = .47.