Research Article: Language can mediate eye movement control within 100 milliseconds, regardless of whether there is anything to move the eyes to

Date Published: June 1, 2011

Publisher: North Holland Publishing

Author(s): Gerry T.M. Altmann.


The delay between the signal to move the eyes, and the execution of the corresponding eye movement, is variable, and skewed; with an early peak followed by a considerable tail. This skewed distribution renders the answer to the question “What is the delay between language input and saccade execution?” problematic; for a given task, there is no single number, only a distribution of numbers. Here, two previously published studies are reanalysed, whose designs enable us to answer, instead, the question: How long does it take, as the language unfolds, for the oculomotor system to demonstrate sensitivity to the distinction between “signal” (eye movements due to the unfolding language) and “noise” (eye movements due to extraneous factors)? In two studies, participants heard either ‘the man…’ or ‘the girl…’, and the distribution of launch times towards the concurrently, or previously, depicted man in response to these two inputs was calculated. In both cases, the earliest discrimination between signal and noise occurred at around 100 ms. This rapid interplay between language and oculomotor control is most likely due to cancellation of about-to-be executed saccades towards objects (or their episodic trace) that mismatch the earliest phonological moments of the unfolding word.

Partial Text

Until recently, interest in the speed with which saccades can be launched towards intended (or unintended) targets has been restricted to the domain of visual psychophysics. Studies of oculomotor capture (e.g. Theeuwes, Kramer, Hahn, & Irwin, 1998), as well as of the distinction between anti- and pro-saccades (Hallett, 1978), have informed estimates of the time it takes to plan and then launch a saccadic eye movement. Studies of how launch times are influenced by the likelihood that a target to which the eye should saccade will appear in one position or another, have also informed theories of the time-course with which information apprehended from the visual environment can drive the oculomotor system to launch a saccade (Carpenter & Williams, 1995). Unsurprisingly, there has been some considerable convergence on the properties of this time-course. Most estimates assume that the time to plan and then launch an eye movement varies between 150 and 200 ms. Interestingly, these estimates have not changed significantly since the earliest estimates from the 1930s (Travis, 1936). The present paper also explores the speed with which language input can modulate oculomotor control, but in the context of the ‘visual world’ paradigm (Cooper, 1974; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995), in which participants’ eye movements are monitored as they listen to instructions to manipulate objects presented within their visual environment (either real objects or objects depicted on a computer screen), or as they listen to sentences or narratives that describe events which may affect the people and/or objects depicted in a concurrent (or previously seen) scene. The paradigm presents a number of challenges to estimates of saccadic launch times: For example, participants are not waiting at a pre-determined location for a GO signal as the dynamically changing, and noisy, linguistic signal unfolds – rather, they have shifted their gaze to a new location (for a variety of reasons not all of which are determined by the linguistic signal) and are engaged in one or more of apprehending information, shifting their attention, planning their next saccade, or even executing that saccade, as the linguistic signal unfolds. The interplay between the unfolding language and the saccadic control system(s) is therefore particularly noisy (most regulatory neurophysiological systems are inherently noisy, but psychophysical studies of oculomotor control minimize the noise), and a distinction therefore has to be drawn between eye movements due to the language signal itself, and those due to extraneous noise; this allows the critical time-course issue to be recast in terms of the time-course with which the saccadic system exhibits a distinction between signal-driven and noise-driven saccades, where the signal to move the eyes to a visual target is an unfolding word that refers to that target. It is this issue that will be the focus of the present article.

Early studies on the time-course of signal apprehension and saccadic planning involved eye movement studies in which participants fixated successive locations indicated by a range of visual stimuli, including lighted bulbs (Saslow, 1967) and fixation crosses (Rayner, Slowiaczek, Clifton, & Bertera, 1983). These studies measured the delay between stimulus onset and the initiation of the saccade towards the target, with latencies in the 200–220 ms range. Saslow (1967) observed a reduction in the launch latency if the participant knew in advance the location of the stimulus to which they would have to move their eye (see Carpenter & Williams, 1995, for an account of how such a reduction may come about). In a later study, Matin, Shao, and Boff (1993) attempted to determine the ‘saccadic overhead’ – in effect, the time it takes to plan and launch a saccade minus the time taken to process/apprehend the visual stimulus. That study in fact estimated an overhead of around 100 ms, based on a procedure in which they subtracted between conditions which either did or did not require a saccadic eye movement in order to view and judge the visual stimuli they presented to their participants. However, they did not monitor eye movements directly, but rather based their estimates on participants’ response times in a push-button judgment task. The current consensus on the time it takes to process a triggering cue (i.e. the cue to move the eyes), and programme a new eye movement indicates programming times of 150 to 200 ms (e.g. Becker, 1991; Findlay, 1997; Salthouse & Ellis, 1980). Programming and execution times in studies of oculomotor capture of eye movements (e.g. Theeuwes et al., 1998) are typically just over 200 ms.

The data reported below are based on re-analyses of studies reported by Kamide, Altmann, and Haywood (2003) and by Altmann (2004). Kamide et al. showed participants scenes such as the one shown in Fig. 1. One second after the onset of the scene, participants heard one of the following four sentences:(1)The man will ride the motorbike(2)The girl will ride the carousel(3)The man will taste the beer(4)The girl will taste the sweets

For Study 1, saccades were deemed to have landed on the man or the girl in just those cases that they landed on the actual pixels occupied by the man or girl (no claim is made about the accuracy of eye movements – this criterion was chosen simply to avoid an arbitrary decision about where to place the boundary if not at the object’s edge). For Study 2, saccades were calculated by quadrant (or equivalent for the stimuli arranged in a diamond configuration): saccades that landed in the quadrant that had been occupied by the man or the woman were deemed to have been launched “towards” the location previously occupied by these protagonists. This necessarily meant that approximately 25% of the data were lost (because the eye may have already been fixating, through chance alone, within the target quadrant, in which case saccades on that trial were not included in the analyses). Altmann (in press) demonstrates that the pattern and approximate timing of eye movements in this study remains the same regardless of how the regions of interest are defined (quadrants vs. rectangles covering the target objects vs. exact pixels occupied by the objects); however, the absolute number of saccades does change, even if the relative patterns do not, and it is for this reason that quadrant analyses were adopted here (to lessen the sparseness of the data – see below – despite losing 25% of the data in the quadrant analyses, there were still more saccades that could be entered into the analysis than if smaller regions of interest were employed).

Despite the sparseness of the data, the fact that multiple statistical models converge on the same pattern suggests the validity of the conclusion: language-mediation of oculomotor control can occur within 80–120 ms; examination of the distribution of launch times towards the man (or the girl) as the words ‘man’ or ‘girl’ unfold shows that there were more signal-driven saccades launched in this and subsequent bins than there were noise-driven saccades. This would appear to have been the case not just when the scene was concurrent with the unfolding word (Study 1) but also when the scene had been removed prior to the spoken stimulus (Study 2, although interpretation of these data is hampered by their sparseness). Unlike prior estimates of the time-course with which language can influence oculomotor control, these studies were typical ‘visual world’ studies in which the eyes were free to fixate on any part of the scene prior to the onset of the target word. The speed with which language-mediation was observed in these studies is all the more remarkable given the nature of the look-and-listen task that was employed in both studies: participants were not under instructions to move their eyes as quickly as they could – indeed, they were under no explicit instruction at all. Whether more explicit task demands would change the patterns observed in these studies is an empirical question; most likely the timing with which signal is distinguished from noise would not change, but the number of saccades would.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments