Four Frames

Visual Constancy

What the Stable Feature Frame Gets

(an interpretation of two papers about the visual system)

Jeffrey Ventrella
MIT Media Lab
December 7, 1992

Introduction In this paper I bring together terms and ideas from two different papers about the visual system, and discuss eye movements and the sense of constancy in the visual world. One paper, entitled, "Four Frames Suffice: A provisional model of vision and space", by Jerome Feldman (1985), presents a general computational treatment of how mammals are able to deal with visual objects and environments. Feldman's paper explains his model, and does not describe any experiments. The other paper, by Coren, Bradley, Honig, and Girgus, (1973) entitled, "The effect of smooth tracking and saccadic eye movements on the perception of size: the shrinking circle illusion", describes a set of experiments. In these experiments, it is shown that under certain conditions subjects experience an illusion from an artifact of smooth tracking (even, smooth eye movements, usually following something which is moving). The results from the experiments shed some light on how different kinds of activities in the eye are processed in different ways.

Frames of Seeing

Feldman suggests that four frames are simultaneously active in the perception and interpretation of visual stimuli - that is, in the computational model he has constructed. These frames are: the "retinotopic frame", the "stable feature frame", the "environmental frame", and the "world knowledge formulary". I will mention only the first two. The retinotopic frame collects visual stimuli and is updated at every saccade (jumpy, rapid eye movement). This frame corresponds most literally to the retinal image, and incorporates a highly-resolved fovea. The "stable feature frame", is what ties together the signals coming in from the first frame and, in turn, affects the retinotopic frame by influencing its future gazes, in a collaborative maintenance of a stable visual world. The stable feature frame contributes to a holistic, integrated scene, given stimuli from saccadic instances, as well as influences from the other frames, such as the environmental frame and the world knowledge formulary.

Frame-Arrays

Feldman's feature frame reminds me of the ideas from Minsky's "Society of Mind" on frame-arrays (chapter 25). A frame-array is a collection of predicted instances of how a prototypical object or scene can appear from many views. This allows a scene within which one is moving to appear stable even though the "images" projected via the retinotopic frame are constantly changing. One's memories of how prototypical objects in a scene appear from various distances and angles are held in array-like structures which are activated upon perception of an object which triggers that prototype. Minsky suggests also that frame arrays are helpful for "visualizing" imaginary scenes, such as what might happen if we were moving in them: the visual instances we would expect to see are filled-in automatically. This filling-in could be a contributor to what Fedlman's stable feature frame does. The development of newer and more complex frame arrays, through experiences, can be seen as enriching the stable feature frame to be robust in larger varieties of visual environments.

Coren's Experiment

The findings of Coren (the second paper) suggest that under certain circumstances, the communication between the retinotopic and stable feature frames can break down (to use terms from the first paper). For instance, illusions can occur when a slow moving target causes the eye to affect smooth tracking as opposed to saccadic motion. The response to the saccadic activities of the eye appear to be of a sort in which disparate "snapshots" are merged together. But smooth tracking seems to deliver a different brand of signals to the stable feature frame. Coren 's experiments suggest that it generates a continuous mapping from the physical sensation of smooth eye motion to some representation of real-world space. The experiments demonstrated that in smooth tracking the eye tends to drag a little behind the target, creating an artifact in the stable feature frame. This artifact creates a subtle illusion that distances traveled are shorter than they really are. Briefly, the experiment went like something like this: subjects sat in a dark room and watched a small white dot move along a circular trajectory. They were asked to keep their eyes fixed on the dot. After watching the dot move around in the circle for a while, this stimulus was removed and replaced by two horizontally-oriented dots, which they were instructed to adjust to indicate the diameter they perceived the trajectory's implied circle to have. During viewing of the moving dot, the subjects' horizontal eye movements were recorded with electrodes. Vertical motion was ignored because eyelid activity interfered too much with the signals from vertical motion. At slow speeds of rotation, the eye movements were comprised mostly of smooth tracking. At higher speeds, some saccadic activity occurred, but it still consisted mostly of smooth tracking. At very high speeds, eye movements became mostly saccadic. What the experiments showed was that, with high speeds (but not high enough to affect significant saccadic movements) the lagging effect of smooth tracking caused the subjects to trace a circular path of a smaller diameter than the actual circular path. Higher speeds (still within the smooth tracking range) induced this effect more. At speeds in which saccadic motion took over, the perceived diameter was corrected, although foveation (keeping the target in the center of view) became erratic. My take on this is that the stable feature frame has at least two ways of assimilating stuff from the retinotopic frame, owing to the two modes of eye movement. Smooth tracking is robust in the approximation of the rate of a moving target, but not a good indicator of the absolute space covered by the trajectory.

From Many Small Worlds, One

Saccadic eye movements may seem, when considered on "surface value" alone, to be highly noisy and chaotic within the visual system, because saccadic motion is so very sporadic and random-like. But in fact it is not at all random, (in a drug-free, happy animal): just complex, and exquisitely calibrated with "higher" frames of vision, which are highly dynamic in themselves. These eye motions work in close collaboration with the stable feature frame, filling in missing holes and contributing to a gestalt so efficiently that in daily life we don't notice how active the eyes really are. Similarly, the development of higher frames, such as the world knowledge formulary, is derived from (among other inputs and throughputs) a collection of many visual experiences from the stable feature frame, held together in this case with a sort of semantic glue. "Associationists" hold that the experience of complex wholes is built by combining more elementary sensations, whereas Gestalt psychologists generally hold that the whole precedes the parts. Anne Triesman (1980) suggests that even though the Gestalt belief conforms to subjective experience, information processing most likely goes on in early stages of visual perception, bringing together fragments into coherent feature structures before they are brought to awareness. In Feldman's model, the stable feature frame could be an active participant in this pre-conscious processing.

Cubism - The Unstable Feature Frame

The Cubist painters (particularly of the "Analytic" school, represented by Picasso and Braque) may have been as influential as they were because they hit upon a strong piece of reality, usually hidden from "view" by our efficient visual system: they found a way to strip away the stable feature frame and leave just the raw superimposition of visual impressions, with multiple views, and multiple time instances.

A Note on the AI Influence

Feldman's paper reflects the influence of AI on the field of cognitive science. The model of computation has replaced other models (and metaphors), such as "descriptive" models, based on physical mechanisms, etc. The shift towards the computational model seems to have brought cognitive science closer to the way in which the brain may really work, although there are some dangers in taking the metaphor too literally. It seems that it is more productive in science to compare computers to brains than to compare brains to computers - it is a difference of which prototype is stronger (to use terms from the field). It is easy, in a scientific world saturated with computers, to fall under this influence unintentionally. The emergence of massively parallel architectures, and the rise of connectionism has encouraged a major difference in the "naturalness" of computational models of the brain's activities. Feldman's model employs connectionist techniques. He mentions, in the section on connectionism, that standard (serial) computing models present a problem: animal brains do not compute like a conventional computer (which should be obvious to most people anyway - but apparently cognitive scientists have to remind each other occasionally). Parallel computation, however, is becoming more common, and may eventually be considered the "standard". The question is, do animal brains, then, compute like parallel computers? They simply work more like parallel computers. Again, it is better to say that parallel computers work more like animal brains.

Sources

Coren S., Bradley D. R., Hoenig P., and Girgus J. (1973) The Effect of Smooth Tracking and Saccadic Eye Movements on the Perception of Size: The Shrinking Circle Illusion. Vision Research. Vol 15: pages 49-55

Feldman, Jerome A. (1985) Four Frames Suffice: A provisional model of vision and space. The Behavioral and Brain Sciences. Vol 8: pages 265-289

Minsky, Marvin. The Society of Mind. (1985)

Treisman, Anne. and Gelade, Garry. A Feature-Integration Theory of Attention. Cognitive Psychology Vol 12: pages 97-136