Home About Press Employ Contact Spyglass Advanced Search
HHMI Logo
HHMI News
HHMI News
Scientists & Research
Scientists & Research
Janelia Farm
Janelia Farm
Grants & Fellowships
Grants & Fellowships
Resources
Resources
  Scientists & Research
  Overview  
dashed line
  FindSci  
dashed line
Scientific Competitions
dashed line
HHMI Investigators
dashed line
  JFRC Scientists  
dashed line
  Internatinal Scholars  
dashed line
  Profs  
dashed line
  Nobel Laureates  

HHMI-NIH Research Scholars
Learn about the HHMI-NIH Research Scholars Program, also known as the Cloister Program. Moresmall arrow

dashed line

Janelia Farm Research Campus
Learn about the new HHMI research campus located in Virginia. Moresmall arrow

Computational Models of Vision


Summary: Eero Simoncelli constructs computational models of vision that are consistent with the properties of the visual world, the requirements of visual tasks, and the constraints of biological implementation.

I want to understand the process that we call "vision'' by constructing computational models of biological visual processes. A successful model should be consistent with the observed behaviors of biological systems but should also take into account the properties of the visual environment and be based on a specific (albeit often hypothetical) functional goal or task. To fully understand biological visual processing, I find it valuable to implement engineering systems that solve the same computational problems. Thus, my work is inherently interdisciplinary, requiring empirical study of the structure of visual environments, construction of mathematical theories for representation and processing of that structure, implementation and simulation of biologically plausible instantiations of these theories, and physiological or psychophysical investigations that are motivated by the theories. Below, I describe a few recent projects.

Statistical Modeling of Visual Images
Recently, I have focused much of my research efforts on issues of image representation. It has long been assumed that visual systems are adapted, at evolutionary, developmental, and behavioral timescales, to the images to which they are exposed. Since not all images are equally likely, it is natural to assume that the system should be able to process best those images that occur most frequently. Thus, it is the statistical properties of the environment that are relevant for sensory processing. Such concepts are fundamental in engineering disciplines: compression, transmission, and enhancement of images all rely heavily on statistical models.

How can we determine the likelihood of occurrence of a given image, or portion of an image? The problem is difficult, because the set of all images is enormous. So we start with a few simplifying assumptions. First, structures in images occur at arbitrary sizes (dependent on distance from the observer), and it is thus intuitively sensible to analyze image content simultaneously at multiple scales. My work in this area began many years ago with a study of multiscale, multiorientation representations (now commonly referred to as "wavelets''). The basic properties of these representations bear a strong resemblance to the receptive fields derived from physiological measurements of neurons in primary visual cortex. Second, we examine local properties of images when decomposed into multiple scales. That is, we look for regularly occurring structures in localized patches of image. In examining such patches over a large collection of natural images, we have uncovered surprising regularities that may be described using parametric probability models. We have applied these models to classical engineering problems of compression and noise removal and achieved state-of-the-art results.

These probability models also have a direct implication for neural representation. If one assumes (following the British physiologist Horace Barlow) that neurons in a population strive to produce statistically independent responses, the model we have developed suggests that the optimal representation of an image should proceed by decomposing with multiscale-oriented functions, followed by a divisive gain-control mechanism. Specifically, the response of each "neuron'' should be divided by a weighted linear combination of the responses of cells at adjacent locations, orientations, and scales. Such divisive mechanisms (often called normalization models) have been widely used to account for the nonlinear response properties of neurons in primary visual cortex. As such, our statistical model provides the first theoretical justification for these cortical normalization models.

Perhaps more importantly, the statistical measurements from natural images can be used to determine the optimal parameters of the model. We have used this observation to "derive'' a model that can account for recent physiological data on suppression from beyond the classical receptive field. The model also makes predictions about the physiological effects of adaptation. Recently, we have found that the same structures are present in natural sounds and that an analogous model may be derived and compared with neurons of the auditory nerve. Thus, these models provide an opportunity for us to test directly (through physiological predictions and comparisons) the ecological hypothesis that neural computations are optimally matched to the statistics of the environment.

Functional Characterization of Neural Response
The functional properties of sensory neurons have been traditionally summarized using "receptive fields." But these do not provide a complete description of the response properties unless one makes additional simplifying assumptions (e.g., linearity). Furthermore, as sensory neuroscience research has been extended to areas that are farther removed from the sensory input, it has become increasingly difficult to describe the receptive fields of neurons, because it is difficult to construct parametric stimuli that elicit responses. Recently my laboratory has been developing new forms of stimuli and data analysis techniques for probing and characterizing neurons, specifically techniques for identifying and estimating various forms of nonlinear response behavior, such as short-timescale gain adjustments or nonlinearities associated with spike generation. We are also using the statistical models described above to explore the generation of stochastic stimuli with "naturalistic" properties, which we hope will be more effective at eliciting neuronal responses.

Visual Motion Estimation
When a person moves within his or her environment, the visual image projected onto the retina changes accordingly. These changes may be described as two-dimensional translations of the local intensity pattern. Physiological and psychophysical experiments have established that mammalian visual systems contain mechanisms that are sensitive to such local translational motions, and theoretical and computational studies have confirmed that such translations carry important information about the environment. The basic model for motion representation that I have developed comes from a classical estimation-theoretic formulation of the problem. Assuming that the light intensity pattern falling on the retina over time undergoes local translational motion, and assuming a slight preference for slower speed interpretations, one can derive an optimal method of estimating image velocities. A computer implementation of this method produces a state-of-the-art algorithm for visual motion estimation, useful in various image-processing or computer vision tasks. Surprisingly, the method also provides an excellent description of human perception of local image velocity. In particular, we have shown that this model can account for human psychophysical data regarding the perception of a variety of moving patterns, as well as motion aftereffects.

The method may also be instantiated as a physiological model for motion representation. This model is constructed in two stages of identical architecture, corresponding to neurons in visual cortical areas known as V1 and MT. The commonality of structure in the two stages is an attractive feature of the model, since it is often noted that the pattern of connections is similar across a variety of cortical areas. Computations in each stage are based on a linear receptive field, followed by a rectifying nonlinearity, and divisive normalization (in which the response of each cell is divided by the summed responses of other cells). The population response of the V1 neurons provides a distributed encoding of local spatiotemporal orientation, and the population response of the MT neurons provides a distributed encoding of the local image velocity.

Simulations demonstrate that this two-stage model is remarkably consistent with a broad set of single-cell physiological data recorded in area MT. In addition, we have recently developed a novel class of stochastic stimuli for which this model is an ideal detector. By examining human detection performance for such stimuli, we have produced strong evidence for the existence of such mechanisms in the human visual system.

This research has also been partially supported by a National Science Foundation CAREER grant, an Alfred P. Sloan Fellowship, and the Sloan-Swartz Center for Theoretical Visual Neuroscience at New York University.

Last updated: January 13, 2006

HHMI INVESTIGATOR

Eero P. Simoncelli
Eero P. Simoncelli
 

Related Links

AT HHMI

bullet icon

How the Visual System Perceives Approaching Objects
(04.11.01)

ON THE WEB

external link icon

The Laboratory for Computational Vision at NYU
(nyu.edu)

search icon Search PubMed
dashed line
 Back to Topto the top
HHMI Logo

Home | About HHMI | Press Room | Employment | Contact

© 2008 Howard Hughes Medical Institute. A philanthropy serving society through biomedical research and science education.
4000 Jones Bridge Road, Chevy Chase, MD 20815-6789 | (301) 215-8500 | e-mail: webmaster@hhmi.org