Date Published: March 17, 2017
Publisher: Public Library of Science
Author(s): Maurits Kaptein, Robin van Emden, Davide Iannuzzi, Lidia Adriana Braunstein.
Due to the ubiquitous presence of treatment heterogeneity, measurement error, and contextual confounders, numerous social phenomena are hard to study. Precise control of treatment variables and possible confounders is often key to the success of studies in the social sciences, yet often proves out of the realm of control of the experimenter. To amend this situation we propose a novel approach coined “lock-in feedback” which is based on a method that is routinely used in high-precision physics experiments to extract small signals out of a noisy environment. Here, we adapt the method to noisy social signals in multiple dimensions and evaluate it by studying an inherently noisy topic: the perception of (subjective) beauty. We show that the lock-in feedback approach allows one to select optimal treatment levels despite the presence of considerable noise. Furthermore, through the introduction of an external contextual shock we demonstrate that we can find relationships between noisy variables that were hitherto unknown. We therefore argue that lock-in methods may provide a valuable addition to the social scientist’s experimental toolbox and we explicitly discuss a number of future applications.
Social science experiments are often affected by large measurement errors . The effects under study are complex  and the results of the experiments largely depend on the experimental context  or on the particular group of people under study . Due to this complex nature of human behavior, even experiments demonstrating some of the most compelling principles of human decision making have proven difficult to replicate when conditions undergo minor changes or when researchers leave the confines of their laboratories [5, 6]. Hence, it is no surprise that recently there has been an increased interest in the development of experimental methods that are robust to noise or contextual changes. Apart from general guidelines that focus on averting bad research practices , these methods range from registering studies and adopting different reporting standards [8–10] to the application of Bayesian statistics . Considerable work has been devoted to optimally choosing possible treatment values to efficiently estimate effects [12–15] (for an extensive overview, we refer the reader to ), often focusing on the reduction of variance in estimates obtained given an a priori assumed experimental setup and functional relationship between dependent and independent variables . With the functional form of the effect of treatment variables at hand, these methods dictate at which points in treatment space stimuli should be positioned . In recent years, researchers have further turned their attention to sequential methods that could determine the optimal design of experiments, the optimal stimuli, or the optimal sample sizes even when the functional form of the effect of a treatment variable is unknown (see for examples [13, 19]). In those cases, treatment assignments are continuously improved as the data are collected . These adaptive designs, and the associated early stopping of experiments , currently find application in the health and life sciences .
In our evaluation of the utility of LiF for the social sciences, which was conducted online, we asked N = 7402 participants to express their opinion on the physical attractiveness of an avatar’s face (the dependent variable y). All faces were identical, except for the brow-nose-chin ratio (first independent variable x1) and the eye-to-eye distance (second independent variable x2). Our goal was to use LiF to sequentially and simultaneously determine the values of x1 and x2 that maximize y.
Our experiment had two objectives. First, we intended to test whether LiF would indeed converge towards an optimal value of two treatments simultaneously in the face of considerable noise. Second, we wanted to examine whether LiF would be able to withstand external shocks. Fig 4 displays the raw answers on the rating scale as provided by our N = 7402 participants in sequence. The gray line shows the raw scores and illustrates lucidly the extremely noisy setting: raw ratings range from 0 to 100 at almost any configuration of the actual face. The solid black line presents a moving average rating over a sample of 150 participants; this line clearly describes an upwards trend—indicating increasing average attractiveness—over the first 2000 data points after which the (average) ratings seem to stabilize. The “dip” in mean ratings around i = 3750 is caused by our external shock, as described later in the text.
We have shown how the algorithm of lock-in feedback amplifiers, which is routinely used in high-precision physics experiments , can be applied to social science experiments. In this setting the algorithm allows experimenters to optimally choose treatment values in a multidimensional treatment space even in the face of large noise. Furthermore, we have demonstrated that this approach can quickly recover from external perturbations—an important feature that increases its potential for social science experiments in which contextual changes are likely to introduce such external perturbations. In the current study we track the (group)-average subjective evaluation of beauty; we assume that this is relatively constant within the study given shared timing and context. LiF would theoretically be able to measure fluctuations in the subjective experience within individuals if their opinions were measured sequentially over time; an approach not further explored here. Finally, we have demonstrated that the method can unveil non-trivial, unexpected correlations between the variables involved in a social experiment.