Research Article: On the Contribution of Binocular Disparity to the Long-Term Memory for Natural Scenes

Date Published: November 15, 2012

Publisher: Public Library of Science

Author(s): Matteo Valsecchi, Karl R. Gegenfurtner, Chris I. Baker.


Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s) were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as images containing cars and houses, which are mostly identified by the presence of distinctive artifacts rather than by their spatial layout. Evidence from a further experiment indicates that observers do not retain a trace of stereo presentation in long-term memory.

Partial Text

Human observers possess an astonishing long-term memory for images of objects and scenes. Early studies showed that observers can quite accurately recognize the gist of as many as 10.000 pictures of objects and scenes [1], [2].

In Experiment 1 we asked whether presenting scenes in 3D improves long-term memory performance. We used scenes belonging to three categories, i.e. scenes containing buildings, scenes containing cars and forest scenes. The rationale behind the choice of the stimulus categories was to vary the relative relevance of the spatial layout in the scenes and to vary the strength of binocular disparity signals. The first two categories contain mainly man-made objects in a urban setting and could be mainly identified based on the functional characteristics of the objects, whereas the forest scenes are completely deprived of man-made objects and we assumed that their memorization might be more strongly supported by a representation of the spatial layout. The building pictures contain objects which are comparatively more distant (usually beyond 5 meters) from the observer and their processing should be less affected by the weaker binocular disparity signal.

In the first experiment memory performance was much lower than one could expect based on the capacity of long-term visual memory for scenes [1], [2]. A possible reason might be related to the fact that memory for scenes is very often supported by a conceptual representation, which involves some form of distinctive categorization of the picture [7], [31]. Since we always used relatively similar items belonging to the same category as targets and distractors, this form of long-term memory for pictures was neutralized.

The results from the first two experiments indicate that stereo presentation enhances long-term visual memory for scene images only when observers are allowed to encode the scene for an extremely long time and when the scenes are maximally identified by their spatial arrangement. The lack of any advantage for 3D presentation in Experiment 1 could have two interpretations. On one hand, the observers might not have retained a trace of the stereo quality of the pictures, alternatively, the observers might have retained a memory of whether the display was 3D but this could be not relevant for picture recognition, which could be mediated by non-spatial attributes of the picture. In the third and last experiment we explicitly test our observerś memory for the pictureśdisplay type in a modified version of Experiment 1.

In a delayed recognition task, we investigated whether the presence of binocular disparity can improve the long-term memory for scenes. The results indicate that this is only the case when an extremely long time is available to encode pictures and when the spatial layout of the scene is prominently relevant to distinguish target pictures from foils belonging to the same category. Specifically, we only found a stereo enhancement of the long-term memory for forest pictures which had been studied for 7 seconds. When observers memorized images of houses or vehicles, which are more likely to be encoded conceptually, the recognition performance, both in terms of Hit Rate and in terms of sensitivity was no better for stereo pictures as compared to 2D scene pictures.