Visual Constancy
What the Stable Feature Frame Gets
(an interpretation of two papers about the visual system)
Jeffrey Ventrella
MIT Media Lab
December 7, 1992
Introduction
In this paper I bring together terms and ideas from two different
papers about the visual system, and discuss eye movements and the
sense of constancy in the visual world. One paper, entitled,
"Four Frames Suffice: A provisional model of vision and space",
by Jerome Feldman (1985), presents a general computational treatment
of how mammals are able to deal with visual objects and environments.
Feldman's paper explains his model, and does not describe any
experiments. The other paper, by Coren, Bradley, Honig, and Girgus,
(1973) entitled, "The effect of smooth tracking and saccadic
eye movements on the perception of size: the shrinking circle
illusion", describes a set of experiments. In these experiments,
it is shown that under certain conditions subjects experience an
illusion from an artifact of smooth tracking (even, smooth eye
movements, usually following something which is moving). The
results from the experiments shed some light on how different
kinds of activities in the eye are processed in different ways.
Frames of Seeing
Feldman suggests that four frames are simultaneously
active in the perception and interpretation of visual stimuli -
that is, in the computational model he has constructed. These
frames are: the "retinotopic frame", the "stable feature frame",
the "environmental frame", and the "world knowledge formulary".
I will mention only the first two. The retinotopic frame
collects visual stimuli and is updated at every saccade
(jumpy, rapid eye movement). This frame corresponds most
literally to the retinal image, and incorporates a highly-resolved
fovea. The "stable feature frame", is what ties together
the signals coming in from the first frame and, in turn,
affects the retinotopic frame by influencing its future gazes,
in a collaborative maintenance of a stable visual world.
The stable feature frame contributes to a holistic, integrated
scene, given stimuli from saccadic instances, as well as
influences from the other frames, such as the environmental
frame and the world knowledge formulary.
Frame-Arrays
Feldman's feature frame reminds me of the ideas from Minsky's
"Society of Mind" on frame-arrays (chapter 25). A frame-array
is a collection of predicted instances of how a prototypical
object or scene can appear from many views. This allows a
scene within which one is moving to appear stable even though
the "images" projected via the retinotopic frame are constantly
changing. One's memories of how prototypical objects in a
scene appear from various distances and angles are held in
array-like structures which are activated upon perception
of an object which triggers that prototype. Minsky suggests
also that frame arrays are helpful for "visualizing" imaginary
scenes, such as what might happen if we were moving in them:
the visual instances we would expect to see are filled-in
automatically. This filling-in could be a contributor to
what Fedlman's stable feature frame does. The development
of newer and more complex frame arrays, through experiences,
can be seen as enriching the stable feature frame to be
robust in larger varieties of visual environments.
Coren's Experiment
The findings of Coren (the second paper) suggest that under
certain circumstances, the communication between the retinotopic
and stable feature frames can break down (to use terms from the
first paper). For instance, illusions can occur when a slow moving
target causes the eye to affect smooth tracking as opposed to saccadic
motion. The response to the saccadic activities of the eye appear to
be of a sort in which disparate "snapshots" are merged together.
But smooth tracking seems to deliver a different brand of signals
to the stable feature frame. Coren 's experiments suggest that
it generates a continuous mapping from the physical sensation of
smooth eye motion to some representation of real-world space.
The experiments demonstrated that in smooth tracking the eye
tends to drag a little behind the target, creating an artifact
in the stable feature frame. This artifact creates a subtle illusion
that distances traveled are shorter than they really are.
Briefly, the experiment went like something like this: subjects sat
in a dark room and watched a small white dot move along a circular
trajectory. They were asked to keep their eyes fixed on the dot.
After watching the dot move around in the circle for a while, this
stimulus was removed and replaced by two horizontally-oriented
dots, which they were instructed to adjust to indicate the diameter
they perceived the trajectory's implied circle to have.
During viewing of the moving dot, the subjects' horizontal eye movements
were recorded with electrodes. Vertical motion was ignored because
eyelid activity interfered too much with the signals from vertical
motion. At slow speeds of rotation, the eye movements were comprised
mostly of smooth tracking. At higher speeds, some saccadic activity
occurred, but it still consisted mostly of smooth tracking. At very
high speeds, eye movements became mostly saccadic. What the experiments
showed was that, with high speeds (but not high enough to affect
significant saccadic movements) the lagging effect of smooth
tracking caused the subjects to trace a circular path of a smaller
diameter than the actual circular path. Higher speeds (still within
the smooth tracking range) induced this effect more. At speeds in
which saccadic motion took over, the perceived diameter was corrected,
although foveation (keeping the target in the center of view) became
erratic.
My take on this is that the stable feature frame has at least two ways
of assimilating stuff from the retinotopic frame, owing to the two
modes of eye movement. Smooth tracking is robust in the approximation
of the rate of a moving target, but not a good indicator of the
absolute space covered by the trajectory.
From Many Small Worlds, One
Saccadic eye movements may seem, when considered on "surface value"
alone, to be highly noisy and chaotic within the visual system,
because saccadic motion is so very sporadic and random-like.
But in fact it is not at all random, (in a drug-free, happy animal):
just complex, and exquisitely calibrated with "higher" frames of
vision, which are highly dynamic in themselves. These eye motions
work in close collaboration with the stable feature frame, filling
in missing holes and contributing to a gestalt so efficiently that
in daily life we don't notice how active the eyes really are.
Similarly, the development of higher frames, such as the world
knowledge formulary, is derived from (among other inputs and
throughputs) a collection of many visual experiences from the stable
feature frame, held together in this case with a sort of semantic glue.
"Associationists" hold that the experience of complex wholes is built
by combining more elementary sensations, whereas Gestalt psychologists
generally hold that the whole precedes the parts. Anne Triesman (1980)
suggests that even though the Gestalt belief conforms to subjective
experience, information processing most likely goes on in early stages
of visual perception, bringing together fragments into coherent feature
structures before they are brought to awareness. In Feldman's model,
the stable feature frame could be an active participant in this pre-conscious
processing.
Cubism - The Unstable Feature Frame
The Cubist painters (particularly of the "Analytic" school, represented
by Picasso and Braque) may have been as influential as they were because
they hit upon a strong piece of reality, usually hidden from "view"
by our efficient visual system: they found a way to strip away the
stable feature frame and leave just the raw superimposition of visual
impressions, with multiple views, and multiple time instances.
A Note on the AI Influence
Feldman's paper reflects the influence of AI on the field of cognitive
science. The model of computation has replaced other models
(and metaphors), such as "descriptive" models, based on physical
mechanisms, etc. The shift towards the computational model seems
to have brought cognitive science closer to the way in which the
brain may really work, although there are some dangers in taking
the metaphor too literally. It seems that it is more productive in science
to compare computers to brains than to compare brains to computers - it
is a difference of which prototype is stronger (to use terms from the field).
It is easy, in a scientific world saturated with computers, to fall under
this influence unintentionally.
The emergence of massively parallel
architectures, and the rise of connectionism has encouraged a major difference
in the "naturalness" of computational models of the brain's activities.
Feldman's model employs connectionist techniques. He mentions, in the
section on connectionism, that standard (serial) computing models
present a problem: animal brains do not compute like a conventional
computer (which should be obvious to most people anyway - but apparently
cognitive scientists have to remind each other occasionally).
Parallel computation, however, is becoming more common, and may
eventually be considered the "standard". The question is, do animal
brains, then, compute like parallel computers? They simply work more
like parallel computers. Again, it is better to say that parallel
computers work more like animal brains.
Sources
Coren S., Bradley D. R.,
Hoenig P., and Girgus J. (1973) The Effect of Smooth Tracking and Saccadic
Eye Movements on the Perception of Size: The Shrinking Circle Illusion.
Vision Research. Vol 15: pages 49-55
Feldman, Jerome A. (1985) Four Frames
Suffice: A provisional model of vision and space. The Behavioral and
Brain Sciences. Vol 8: pages 265-289
Minsky, Marvin. The Society of Mind.
(1985)
Treisman, Anne. and Gelade, Garry. A Feature-Integration Theory of
Attention. Cognitive Psychology Vol 12: pages 97-136