Role of homeostasis in learning sparse representations

Animation of the formation of RFs during aSSC learning
Animation of the formation of RFs during aSSC learning.





Figure 1: Simple neural model of sparse coding and role of homeostasis. (Left) We define the coding model as an information channel constituted by a bundle of Linear/Non-Linear spiking neurons. (L) A given input image patch is coded linearly by using the dictionary of filters and transformed by sparse coding (such as Matching Pursuit) into a sparse vector. Each coefficient is transformed into a driving coefficient in the (NL) layer by using a point non-linearity which drives (S) a generic spiking mechanism. (D) On the receiver end (for instance in an efferent neuron), one may then estimate the input from the neural representation pattern. This decoding is progressive, and if we assume that each spike carries a bounded amount of information, representation cost in this model increases proportionally with the number of activated neurons. (Right) However, for a given dictionary, the distribution of sparse coefficients and hence the probability of a neuron's activation is in general not uniform. We show (Lower panel) the log-probability distribution function and (Upper panel) the cumulative distribution of sparse coefficients for a dictionary of edge-like filters with similar selectivity (dotted scatter) except for one filter which was randomized (continuous line). This illustrates a typical situation which may occur during learning when some components did learn less than others: Since their activity will be lower, they are less likely to be activated in the spiking mechanism and from the Hebbian rule, they are less likely to learn. When selecting an optimal sparse set for a given input, instead of comparing sparse coefficients with respect to a threshold (vertical dashed lines), it should instead be done on the significance value z_i (horizontal dashed lines): In this particular case, the less selective neuron (a_1 < a_2) is selected by the homeostatic cooperation (z_1 > z_2). The role of homeostasis during learning is that, even if the dictionary of filters is not homogeneous, the point non-linearity in (NL) modifies sparse coding in (L) such that the probability of a neuron's activation is uniform across the population.


Figure 2: Comparison of the dictionaries obtained with SparseNet and aSSC. We show the results of Sparse Hebbian Learning using two different sparse coding algorithms at convergence (20000 learning steps): (Left) conjugate gradient function (CGF) method as used in SparseNet (Olshausen, 1998) with (Right) COMP as used in aSSC. Filters of the same size as the image patches are presented in a matrix (separated by a black border). Note that their position in the matrix is arbitrary as in ICA.


Figure 3: Coding efficiency of SparseNet versus aSSC. We evaluate the quality of both learning schemes by comparing coding efficiency of their respective coding algorithms, that is CGF and COMP, with the respective dictionary that was learnt (see Fig. 1). (Left) We show the probability distribution function of sparse coefficients obtained by both methods with random dictionaries (respectively 'SN-init' and 'aSSC-init') and with the dictionaries obtained after convergence of respective learning schemes (respectively 'SN' and 'aSSC'). At convergence, sparse coefficients are more sparsely distributed than initially, with more kurtotic probability distribution functions for aSSC in both cases. (Right) We plot the average residual error (L_2 norm) as a function of the relative number of active (non-zero) coefficients. This provides a measure of the coding efficiency for each dictionary over the set of image patches (error bars are scaled to one standard deviation). The L_0 norm is equal to the coding step in COMP. Best results are those providing a lower error for a given sparsity (better compression) or a lower sparseness for the same error (Occam's razor). We observe similar coding results in aSSC despite its non-parametric definition. This result is also true when using the two different dictionaries with the same OOMP sparse coding algorithm: The dictionaries still have similar coding efficiencies.


Figure 4: Cooperative homeostasis implements efficient quantization. (Left) When switching off the cooperative homeostasis during learning, the corresponding Sparse Hebbian Learning algorithm, Adaptive Matching Pursuit (AMP), converges to a set of filters which contains some less localized filters and some high-frequency Gabor functions which correspond to more "textural" features (Perrinet, 2003). One may wonder if these filters are inefficient and capturing noise or if they rather correspond to independent features of natural images in the LGM model. (Right, Inset) In fact, when plotting residual energy as a function of L_0 norm sparseness with the MP algorithm (as plotted in Fig. 3, Right), the AMP dictionary gives a slightly worse result than aSSC. (Right) Moreover, one should consider representation efficiency as the overall coding and decoding algorithm. We compare the efficiency for these dictionaries thanks to same coding method (SSC) and the same decoding method (using rank quantized coefficients). Representation length for this decoding method is proportional to the L_0 norm with lambda=log(M)/L  ~ 0.032 bits per coefficient and per pixel as defined in Eq. 1 (see text). We observe that the dictionary obtained by aSSC is more efficient than the one obtained by AMP while the dictionary obtained with SparseNet (SN) gives an intermediate result thanks to the geometric homeostasis: Introducing cooperative homeostasis globally improves neural representation.

All material (c) L. Perrinet. Please check the copyright notice.

This work was supported by European integrated project FP6-015879, "FACETS".

TagFacets TagYear10 TagPublicationsArticles TagSparse

welcome: please sign in