Role of homeostasis in learning sparse representations
Animation of the formation of RFs during aSSC learning. 
This paper explores the importance of homeostasis in the unsupervised learning of efficient, sparse codes. This work emerged as a necessity since the definition of competition is essential to sparse codes, but the role of homeostasis is often overlooked. We try in this paper to first define the common principle underlying Sparse Hebbian Learning schemes that were developed in the followup from the seminal paper of Olshausen in 1996 & 1998 which introduced the SparseNet model. We then define a generic homeostasis based on observations on neural computations by Laughlin (1981) that tunes the competition such that in the population of neuron, the probability of choosing any neuron is a priori uniform (see Figure 1). This tuning uses output statistics over slow time scales of the order of the learning time scale. This demonstrates that a sparsening like Matching Pursuit can generate results qualitatively similar to SparseNet, that is with more simple hypothesis (see Figure 2), but more importantly that it could overcome the problems that arose earlier while using no homeostasis (see Figure 3). We then argue that this strategy yields a more efficient strategy: competition is optimal when it is optimally fair (see Figure 4). Obviously, neural computations are more complex: its large scale architecture is not uniform and branched. However, these results hint at the fact that a population of neurons which are related by a common fate the cell assemblies pioneered by Hebb (1949) should on the long term use gain control mechanisms if they want to maximize on the short term the coding efficiency of each individual neurons.
available @ PubMed, MIT Press or CiteULike.
get a reprint (on HAL or arXiv)
reference
 Laurent U. Perrinet. Role of homeostasis in learning sparse representations, URL URL2 . Neural Computation, 22(7):181236, 2010 abstract
Neurons in the input layer of primary visual cortex in primates develop edgelike receptive fields. One approach to understanding the emergence of this response is to state that neural activity has to efficiently represent sensory data with respect to the statistics of natural scenes. Furthermore, it is believed that such an efficient coding is achieved using a competition across neurons so as to generate a sparse representation, that is, where a relatively small number of neurons are simultaneously active. Indeed, different models of sparse coding coupled with Hebbian learning and homeostasis have been proposed that successfully match the observed emergent response. However, the specific role of homeostasis in learning such sparse representations is still largely unknown. By quantitatively assessing the efficiency of the neural representation during learning, we derive a cooperative homeostasis mechanism which optimally tunes the competition between neurons within the sparse coding algorithm. We apply this homeostasis while learning small patches taken from natural images and compare its efficiency with stateoftheart algorithms. Results show that while different sparse coding algorithms give similar coding results, the homeostasis provides an optimal balance for the representation of natural images within the population of neurons. Competition in sparse coding is optimized when it is fair: By contributing to optimize statistical competition across neurons, homeostasis is crucial in providing a more efficient solution to the emergence of independent components.
.

Figure 1: Simple neural model of sparse coding and role of homeostasis. (Left) We define the coding model as an information channel constituted by a bundle of Linear/NonLinear spiking neurons. (L) A given input image patch is coded linearly by using the dictionary of filters and transformed by sparse coding (such as Matching Pursuit) into a sparse vector. Each coefficient is transformed into a driving coefficient in the (NL) layer by using a point nonlinearity which drives (S) a generic spiking mechanism. (D) On the receiver end (for instance in an efferent neuron), one may then estimate the input from the neural representation pattern. This decoding is progressive, and if we assume that each spike carries a bounded amount of information, representation cost in this model increases proportionally with the number of activated neurons. (Right) However, for a given dictionary, the distribution of sparse coefficients and hence the probability of a neuron's activation is in general not uniform. We show (Lower panel) the logprobability distribution function and (Upper panel) the cumulative distribution of sparse coefficients for a dictionary of edgelike filters with similar selectivity (dotted scatter) except for one filter which was randomized (continuous line). This illustrates a typical situation which may occur during learning when some components did learn less than others: Since their activity will be lower, they are less likely to be activated in the spiking mechanism and from the Hebbian rule, they are less likely to learn. When selecting an optimal sparse set for a given input, instead of comparing sparse coefficients with respect to a threshold (vertical dashed lines), it should instead be done on the significance value z_i (horizontal dashed lines): In this particular case, the less selective neuron (a_1 < a_2) is selected by the homeostatic cooperation (z_1 > z_2). The role of homeostasis during learning is that, even if the dictionary of filters is not homogeneous, the point nonlinearity in (NL) modifies sparse coding in (L) such that the probability of a neuron's activation is uniform across the population. 


Figure 2: Comparison of the dictionaries obtained with SparseNet and aSSC. We show the results of Sparse Hebbian Learning using two different sparse coding algorithms at convergence (20000 learning steps): (Left) conjugate gradient function (CGF) method as used in SparseNet (Olshausen, 1998) with (Right) COMP as used in aSSC. Filters of the same size as the image patches are presented in a matrix (separated by a black border). Note that their position in the matrix is arbitrary as in ICA. 


Figure 3: Coding efficiency of SparseNet versus aSSC. We evaluate the quality of both learning schemes by comparing coding efficiency of their respective coding algorithms, that is CGF and COMP, with the respective dictionary that was learnt (see Fig. 1). (Left) We show the probability distribution function of sparse coefficients obtained by both methods with random dictionaries (respectively 'SNinit' and 'aSSCinit') and with the dictionaries obtained after convergence of respective learning schemes (respectively 'SN' and 'aSSC'). At convergence, sparse coefficients are more sparsely distributed than initially, with more kurtotic probability distribution functions for aSSC in both cases. (Right) We plot the average residual error (L_2 norm) as a function of the relative number of active (nonzero) coefficients. This provides a measure of the coding efficiency for each dictionary over the set of image patches (error bars are scaled to one standard deviation). The L_0 norm is equal to the coding step in COMP. Best results are those providing a lower error for a given sparsity (better compression) or a lower sparseness for the same error (Occam's razor). We observe similar coding results in aSSC despite its nonparametric definition. This result is also true when using the two different dictionaries with the same OOMP sparse coding algorithm: The dictionaries still have similar coding efficiencies. 


Figure 4: Cooperative homeostasis implements efficient quantization. (Left) When switching off the cooperative homeostasis during learning, the corresponding Sparse Hebbian Learning algorithm, Adaptive Matching Pursuit (AMP), converges to a set of filters which contains some less localized filters and some highfrequency Gabor functions which correspond to more "textural" features (Perrinet, 2003). One may wonder if these filters are inefficient and capturing noise or if they rather correspond to independent features of natural images in the LGM model. (Right, Inset) In fact, when plotting residual energy as a function of L_0 norm sparseness with the MP algorithm (as plotted in Fig. 3, Right), the AMP dictionary gives a slightly worse result than aSSC. (Right) Moreover, one should consider representation efficiency as the overall coding and decoding algorithm. We compare the efficiency for these dictionaries thanks to same coding method (SSC) and the same decoding method (using rank quantized coefficients). Representation length for this decoding method is proportional to the L_0 norm with lambda=log(M)/L ~ 0.032 bits per coefficient and per pixel as defined in Eq. 1 (see text). We observe that the dictionary obtained by aSSC is more efficient than the one obtained by AMP while the dictionary obtained with SparseNet (SN) gives an intermediate result thanks to the geometric homeostasis: Introducing cooperative homeostasis globally improves neural representation. 

 Bruno A. Olshausen, David J. Field. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?. Vision Research, 37:331125, 1998 abstract
.
 Simon B. Laughlin. A simple coding procedure enhances a neuron's information capacity. Zeitung für Naturforschung, 910(36):9102, 1981 abstract
.
 Donald O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, New York, 1949.
All material (c) L. Perrinet. Please check the copyright notice.
This work was supported by European integrated project FP6015879, "FACETS". 
TagFacets TagYear10 TagPublicationsArticles TagSparse