Active inference and motion perception

If perception corresponds to hypothesis testing (Gregory, 1980); then visual searches might be construed as experiments that generate sensory data. In this work, we explore the idea that saccadic eye movements are optimal experiments, in which data are gathered to test hypotheses or beliefs about how those data are caused. This provides a plausible model of visual search that can be motivated from the basic principles of self-organized behavior: namely, the imperative to minimize the entropy of hidden states of the world and their sensory consequences.

This imperative is met if agents sample hidden states of the world efficiently. This efficient sampling of salient information can be derived in a fairly straightforward way, using approximate Bayesian inference and variational free-energy minimization. Simulations of the resulting active inference scheme reproduce sequential eye movements that are reminiscent of empirically observed saccades and provide some counterintuitive insights into the way that sensory evidence is accumulated or assimilated into beliefs about the world. (Friston, 2012).


Role of prediction in motion detection


In the early visual system, information about the visual world as represented by neural activity is dynamically building up from sensory input but also by contextual information coming from neighboring cells and re-entrant signal from other cortical areas. Low-level sensory areas are therefore an excellent model for exploring how neural computations solve the problem of selecting a single, coherent and global representation from the dispersed information collected locally and in parallel by neurons. Our goal in this program is to study the dynamics of neural fields implementing probabilistic computations for early sensory processing. Emphasis will be put onto the role of anisotropic diffusion, in particular within a cortical area through lateral interactions.

The aperture problem is a generic conundrum for the spatio-temporal integration and binding of sensory information from the local to global scales. It is believed that its neural solution originates from the recursive propagation implemented by finely tuned feed-back and lateral interactions, the so-called "association field". We challenged the long-held hypothesis that a propagation defined as a motion-based prediction may solve the aperture problem. Motion-based prediction is defined as the prediction that motion follows smooth trajectories, as is observed in natural scenes. (Perrinet, 2012, Neural Computation).

We use probabilities as a generic framework for understanding the consequences of this hypothesis at the functional level. To overcome simulation problems, we use a simple method inherited from computer vision that gives a much more precise approximation to this complex problem compared to previous models. Using this dynamical model, we find that motion-based predictive coding is indeed sufficient to solve the aperture problem, without the need of ad-hoc edge detectors, a prior on slow speeds or selection process. We also found that the dynamical system exhibits many properties characteristic of low-level sensory areas, both at the behavioral and neurophysiological levels. As a conclusion, the inclusion of such local interactions inspired by the structure of natural scenes proves to be a simple and efficient model for such a low-level sensory system. Neural implementation of such association fields would open up new perspectives for the implementation of new computational paradigms.

Spatio-temporal integration of motion

The machinery behind the visual perception of motion and the subsequent sensori-motor transformation, such as in Ocular Following Response (OFR), is confronted to uncertainties which are efficiently resolved in the primate's visual system. We may understand this response as an ideal observer in a probabilistic framework by using Bayesian theory (Weiss et al., 2002) which we previously proved to be successfully adapted to model the OFR for different levels of noise with full field gratings (Perrinet, 2005, ECVP, Perrinet, 2006, FENS and Perrinet, 2007, Sec. 2.3).

In general, a bayesian model is defined by introducing a prior for the inference of a latent state: For motion processing, this takes the form of a prior favoring slow speeds, as these are physically more probable given an observed motion signal. In particular, behavioral results suggested from the dynamics of short-latency responses that the information was separated in a 2 pathways bayesian model, separating 1D cues from 2D cues (Barthélemy, 2007, Vision Research). However, these observations stay rather descriptive and the function and mechanisms underlying the separation between 1D and 2D cues remain to be discovered.

In that direction, more recent experiments of OFR have used disk gratings and bipartite stimuli which are optimized to study the dynamics of center-surround integration and for which we extended the previous model using the integration of independent spatio-temporal "modules". These models show similar behavior as physiological data (Perrinet, 2007, Journal of Physiology (Paris), Perrinet, 2008, COSYNE) and may be compared to the Ratio-of-Gaussians model (Perrinet, 2008, AREADNE). Also, we modeled the dynamical properties of the perception of motion in the visual flow as the recurrent interaction of elementary inferential processes (Montagnini, 2007).

The emerging properties of the system allows to predict and understand some aspects of the psycho- and neuro-physiological observations obtained in the DyVA team and allows to propose an architecture to understand the properties of cortical processing for visual functions (see this FACETS' presentation). In particular, it permits to compare the relative importance of feed-forward, lateral and feed-back streams of information in the visual architecture.


Figure 1 Basic properties of human OFR. Several properties of motion integration for driving ocular following as summarized from our previous work. (a) A leftward drifting grating elicits a brief acceleration of the eye in the leftward direction. Mean eye velocity profiles illustrate that both response amplitude and latency are affected by the contrast of the sine-wave grating, given by numbers at the right-end of the curves. Quantitative estimates of the sensori-motor transformation are given by measuring the response amplitude (i.e. change in eye position) over a fixed time window, at response onset. Relationships between (b) response latency or (c) initial amplitude and contrast are illustrated for the same grating motion condition. These curves define the contrast response function (CRF) of the sensori-motor transformation and are best fitted by a Naka–Rushton function (reprinted from (Barthélemy et al., 2007)). (d) At fixed contrast, the size of the circular aperture can be varied to probe the spatial summation of OFR. Clearly, response amplitude first linearly grows up with stimulus size before reaching an optimal size, the integration zone. For larger stimulus sizes, response amplitudes are lowered (reprinted from (Barthélemy et al., 2006)). (e) OFR are recorded for center-alone and center–surround stimuli. The contrast of the center stimulus is varied to measure the contrast response function and compute the contrast gain of the sensori-motor transformation at both an early and a late phase during response onset. Open symbols are data obtained for a center-alone stimulus, similar to those illustrated in (c). When adding a flickering surround, ones can see that late (but not early) contrast gain is lowered, as illustrated by a rightward shift of the contrast response function (Barthélemy et al., 2006).


welcome: please sign in