An Annealed Self-Organizing Map for Source 
Channel Coding 
Matthias Burger, Thore Graepel, and Klaus Obermayer 
Department of Computer Science 
Technical University of Berlin 
FR 2-1, Franklinstr. 28/29, 10587 Berlin, Germany 
{burger, graepe12, oby} @cs.tu-berlin.de 
Abstract 
We derive and analyse robust optimization schemes for noisy vector 
quantization on the basis of deterministic annealing. Starting from a 
cost function for central clustering that incorporates distortions from 
channel noise we develop a soft topographic vector quantization al- 
gorithm (STVQ) which is based on the maximum entropy principle 
and which performs a maximum-likelihood estimate in an expectation- 
maximization (EM) fashion. Annealing in the temperature parameter/3 
leads to phase transitions in the existing code vector representation dur- 
ing the cooling process for which we calculate critical temperatures and 
modes as a function of eigenvectors and eigenvalues of the covariance 
matrix of the data and the transition matrix of the channel noise. A whole 
family of vector quantization algorithms is derived from STVQ, among 
them a deterministic annealing scheme for Kohonen's self-organizing 
map (SOM). This algorithm, which we call SSOM, is then applied to 
vector quantization of image data to be sent via a noisy binary symmetric 
channel. The algorithm's performance is compared to those of LBG and 
STVQ. While it is naturally superior to LBG, which does not take into 
account channel noise, its results compare very well to those of STVQ, 
which is computationally much more demanding. 
1 INTRODUCTION 
Noisy vector quantization is an important lossy coding scheme for data to be transmitted 
over noisy communication lines. It is especially suited for speech and image data which 
in many applieations have to be transmitted under low bandwidth/high noise level condi- 
tions. Following the idea of (Farvardin, 1990) and (Luttrell, 1989) of jointly optimizing 
the codebook and the data representation w.r.t. to a given channel noise we apply a deter- 
ministic annealing scheme (Rose, 1990; Buhmann, 1997) to the problem and develop a 
An Annealed Self-Organizing Map for Source Channel Coding 431 
soft topographic vector quantization algorithm (STVQ) (cf. Heskes, 1995; Miller, 1994). 
From STVQ we can derive a class of vector quantization algorithms, among which we 
find SSOM, a deterministic annealing variant of Kohonen's self-organizing map (Kohonen, 
1995), as an approximation. While the SSOM like the SOM does not minimize any known 
energy function (Luttrell, 1989) it is computationally less demanding than STVQ. The de- 
terministic annealing scheme enables us to use the neighborhood function of the SOM 
solely to encode the desired transition probabilities of the channel noise and thus opens 
up new possibilities for the usage of SOMs with arbitrary neighborhood functions. We 
analyse phase transitions during the annealing and demonstrate the performance of SSOM 
by applying it to lossy image data compression for transmission via noisy channels, 
2 DERIVATION OF A CLASS OF VECTOR QUANTIZERS 
Vector quantization is a method of encoding data by grouping the data vectors and pro- 
viding a representative in data space for each group. Given a set , of data vectors xi  
3 d, i -- 1,..., D, the objective of vector quantization is to find a set W of code vectors 
w, r -- 0,..., N - 1, and a set .A4 of binary assignment variables mir, --r mir -- 1 , i, 
such that a given cost function 
E (.A4, W I, ) ---  E mirEr(xi, W) 
i r 
(1) 
is minimized. E (Xi, W) denotes the cost of assigning data point xi to code vector w. 
Following an idea by (Luttrell, 1994) we consider the case that the code labels r form a 
compressed encoding of the data for the purpose of transmission via a noisy channel (see 
Figure 1). The distortion caused by the channel noise is modeled by a matrix H of tran- 
sition probabilities hrs, 5-s hrs --- 1, r, for the noise induced change of assignment of 
a data vector xi from code vector w. to code vector ws. After transmission the received 
index s is decoded using its code vector ws. Averaging the squared Euclidean distance 
[[xi - ws [[2 over all possible transitions yields the assignment costs 
1 
E(xi, W) --   hs Ilxi - wsll 
s 
, (2) 
where the factor 1/2 is introduced for computational convenience. 
Starting from the cost function E given in Eqs. (1), (2) the Gibbs-distribution 
e (.Ad, W},) :  exp (-/3 E (.A4, W I,)) can be obtained via the principle of maxi- 
mum entropy under the constraint of a given average cost (E/. The Lagrangian multiplier 
/3 is associated with {E} and is interpreted as an inverse temperature that determines the 
fuzziness of assignments. In order to generalize from the given training set ,Y we cal- 
culate the most likely set of code vectors from the probability distribution P (.A4,1421 ,) 
marginalized over all legal sets of assignments .A4. For a given value of/3 we obtain 
w. -- Ei xi Es h.se(xi  s) 
- }-i Y'-s hsP(xi  s) ' Vr, (3) 
where P(xi 
exp (- 2g rt hst Ilxi - wt[[  ) 
P(xi E s): , 
Eu exp Et hut ilxi _ wtl12) (4) 
is the assignment probability of data vector xi to code vector ws. Solving Eqs. (3), (4) by 
fixed-point iteration comprises an expectation-maximization algorithm, where the E-step, 
432 M. Burger, T. Graepel and K. Obermayer 
Figure 1: Cartoon of a generic data com- 
munication problem. The encoder assigns 
input vectors xi to labeled code vectors 
w. Their indices r are then transmit- 
ted via a noisy channel which is charac- 
terized by a set of transition probabilities 
hg. The decoder expands the received in- 
dex s to its code vector ws which repre- 
sents the data vectors assigned to it during 
encoding. The total error is measured via 
the squared Euclidean distance between 
the original data vector xi and its repre- 
sentative ws averaged over all transitions 
( ) Era'oder x,  wr I 
x 
I r 
Distortion [I xi- w 11 Channel Noise h,,' r -- s 
I $ 
Do2o(iol - S  W. 
Eq. (4), determines the assignment probabilities P(xi  s) for all data points xi and the 
old code vectors ws and the M-step, Eq. (3), determines the new code vectors wr from the 
new assignment probabilities P(xi  s). In order to find the global minimum of E, /3 -- 0 
is increased according to an annealing schedule which tracks the solution from the easily 
solvable convex problem at low/3 to the exact solution of Eqs. (1), (2) at infinite/3. In the 
following we call the solution of Eqs. (3), (4) soft topographic vector quantizer (STVQ). 
Eqs. (3), (4) are the starting point for a whole class of vector quantization algorithms (Fig- 
ure 2). The approximation h - 6. applied to Eq. (4) leads to a soft version of Koho- 
nen's self-organzing map (SSOM), if additionally applied to Eq. (3) soft-clustering (SC) 
(Rose, 1990) is recovered. fl - cc leads to the corresponding "hard" versions topographic 
vector quantisation (TVQ) (Luttrell, 1989), self-organizing map (SOM) (Kohonen, 1995), 
and LBG. In the following, we will focus on the soft self-organizing map (SSOM). SSOM 
is computationally less demanding than STVQ, but offers - in contrast to the traditional 
SOM - a robust deterministic annealing optimization scheme. Hence it is possible to ex- 
tend the SOM approach to arbitrary non-trivial neighborhood functions h as required, 
e.g. for source channel coding problems for noisy channels. 
3 PHASE TRANSITIONS IN THE ANNEALING 
From (Rose, 1990) it is known that annealing in/3 changes the representation of the data. 
Code vectors split with increasing/3 and the size of the codebook for a fixed/3 is given by 
the number of code vectors that have split up to that point. With non-diagonal H, however, 
permutation symmetry is broken and the "splitting" behavior of the code vectors changes. 
At infinite temperature every data vector xi is assigned to every code vector wr with equal 
probability P(x i G r) -- l/N, where N is the size of the codebook. Hence all code 
0 __ 1 
vectors are located in the center of mass, w,. - U i xi, Vr, of the data. Expanding the 
r.h.s. of Eq. (3) to first order around the fixed point {w  } and assuming h - hs, V r, s, 
we obtain the critical value 
1 
/3' = Ac A (5) 
max max 
An Annealed Self-Organizing Map for Source Channel Coding 433 
STVQ 
E- Step 
SSOM 
hl's " 5rs 
E- Step 
TVQ 
SOM 
hs ' 5r 
M - Step . 
sc 
' LBG 
M - Step . j 
Figure 2: Class of vector quantizers derived from STVQ, together with approximations and 
limits (see text). The "S" in front stands for "soft" to indicate the probabilistic approach. 
c 
for the inverse temperature, at which the center of mass solution becomes unstable. /max 
1 
is the largest eigenvalue of the covariance matrix C =  ]i xix' of the data and corre- 
sponds to their variance )%Cax 2 along the principal axis which is given by the asso- 
--- O-ma x 
ciated eigenvector c and along which code vectors split. Ama x is the largest eigenvalue 
Vmax 
of a matrix G whose elements are given by grt: Y-is hrs (hst - -). The r th component 
of the corresponding eigenvector ( determines for each code vector wr in which direc- 
Vmax 
0 and how it moves relative to the other code 
tion along the principal axis it departs from w r 
vectors. For SSOM a similar result is obtained with G in Eq. (5) simply being replaced by 
GSSOM _SSOM ___ hrt 1 
, rt -- . See (Graepel, 1997) for details. 
4 NUMERICAL RESULTS 
In the following we consider a binary symmetric channel (BSC) with a bit error rate (BER) 
e. Assuming that the length of the code indices is n bits, the matrix elements of the transi- 
tion matrix H are 
hs - (1 - e)n-dH(r,s) d.(r,s) , (6) 
where dki (r, s) is the Hamming-distance between the binary representations of r and s. 
4.1 TOY PROBLEM 
The numerical analysis of the phase transitions described in the previous section was per- 
formed on a toy data set consisting of 2000 data vectors drawn from a two-dimensional 
elongated Gaussian distribution P(x) = (2rr) - IcI-  exp(- xXC-lx) with diagonal 
covariance matrix C = diag(1, 0.04). The size of the codebook was N = 4 corresponding 
to n -- 2 bits. Figure 3 (left) shows the x-coordinates of the positions of the code vectors in 
data space as functions of the inverse temperature/3. At a critical inverse temperature/3* 
the code vectors split along the x-axis which is the principal axis of the distribution of data 
points. In accordance with the eigenvector Vmax = (1, 0, 0, -1) T for the largest eigen- 
G 
value )max of the matrix G two code vectors with Hamming distance dH = 2 move to 
opposite positions along the principal axis, and two remain at the center. Note the degener- 
acy of eigenvalues for matrix (6). Figure 3 (right) shows the critical inverse temperature fl* 
as a function of the BER for both STVQ (crosses) and SSOM (dots). Results are in very 
good agreement with the theoretical predictions of Eq. (5) (solid line). The inset displays 
the average cost (E) = -} Y']i ]]r e(xi  r) ]3 hr Ilxi - w112as a function of/3 for 
434 M. Burger, T. Graepel and K. Obermayer 
e -- 0.08 for STVQ and SSOM. The drop of the average cost occurs at the critical inverse 
temperature/3*. 
X 0 
-0.5 
0 1 2 3 4 
3.5 
3 
1'15 -" 0.5 0:1 0.15 0:2 0.25 
BER 
Figure 3: Phase transitions in the 2 bit "toy" problem. (left) X-coordinate of code vectors 
for the SSOM case plotted vs. inverse temperature/3, e = 0.08. The splitting of the four 
code vectors occurs at/3 = 1.25 which is in very good accordance with the theory. (right) 
Critical values of/3 for SSOM (dots) and STVQ (crosses), determined via the kink in the 
average cost (inset: e -- 0.08, top line STVQ), which indicates the phase transition. Solid 
lines denote theoretical predictions. Convergence parameter for the fixed-point iteration, 
giving the upper limit for the difference in successive code vector positions per dimension, 
was J = 5.0E - 10. 
4.2 SOURCE CHANNEL CODING FOR IMAGE DATA 
In order to demonstrate the applicability of STVQ and in particular of SSOM to source 
channel coding we employed both algorithms to the compression of image data, which 
were then sent via a noisy channel and decoded after transmission. As a training set we 
used three 512 x 512 pixel 256 gray-value images from different scenes with blocksize 
d = 2 x 2. The size of the codebook was chosen to be N = 16 in order to achieve a com- 
pression to 1 bpp. We applied an exponential annealing schedule given by/3t+1 = 2/3t 
and determined the start value/3o to be just below the critical/3* for the first split as given 
in Eq. (5). Note that with the transition matrix as given in Eq. (6) this optimization cor- 
responds to the embedding of an n -- 4 dimensional hypercube in the d -- 4 dimensional 
data space. We tested the resulting codebooks by encoding our test image Lena  (Figure 5), 
which had not been used for determining the codebook, simulating the transmission of the 
indices via a noisy binary symmetric channel with given bit error rate and reconstructing 
the image using the codebook. 
The results are summarized in Figure 4 which shows a plot of the signal-to-noise-ratio 
(SNR) as a function of the bit-error rate for STVQ (dots), SSOM (vertical crosses), and 
LBG (oblique crosses). STVQ shows the best performance especially for high BERs, 
where it is naturally far superior to the LBG-algorithm which does not take into account 
channel noise. SSOM, however, performs only slightly worse (approx. 1 dB) than STVQ. 
Considering the fact that SSOM is computationally much less demanding than STVQ 
The Lenna Story can be found at http://www. isr. com/chuck/lennapg/lenna.shtml 
An Annealed Self-Organizing Map for Source Channel Coding 435 
(O(N) for encoding) - due to the omission of the convolution with h.s in Eq. (4) - the re- 
sult demonstrates the efficiency of SSOM for source channel coding. Figure 4 also shows 
the generalization behavior of a SSOM codebook optimized for a BER of 0.05 (rectan- 
gles). Since this codebook was optimized for e = 0.05 it performs worse than appropri- 
ately trained SSOM codebooks for other values of BER, but still performs better than LBG 
except for low values of BERs. At low values, SSOMs trained for the noisy case are out- 
performed by LBG because robustness w.r.t. channel noise is achieved at the expense of 
an optimal data representation in the noise free case. Figure 5, finally, provides a visaal 
impression of the performance of the different vector quantizers at a BER of 0.033. While 
the reconstruction for STVQ is only slightly better than the one for SSOM, both are clearly 
superior to the reconstruction for LBG. 
Figure 4: Comparison between differ- 
ent vector quantizers for image com- 
pression, noisy channel (BSC) transmis- 
sion and reconstruction. The plot shows 
the signal-to-noise-ratio (SNR), defined 
as 101og10(O'signal/O'noise), as a func- 
tion of bit-error rate (BER) for STVQ 
and SSOM, each optimized for the given 
channel noise, for SSOM, optimized for 
a BER of 0.05, and for LBG. The train- 
ing set consisted of three 512 x 512 pixel 
256 gray-value images with blocksize 
d -- 2 x 2. The codebook size was N --- 
16 corresponding to 1 bpp. The anneal- 
ing schedule was given by ft+l -- 2 fit 
and Lena was used as a test image. Con- 
vergence parameter 5 was 1.0E - 5. 
5 CONCLUSION 
We presented an algorithm for noisy vector quantization which is based on deterministic 
annealing (STVQ). Phase transitions in the annealing process were analysed and a whole 
class of vector quantizers could be derived, includings standard algorithms such as LBG 
and "soft" versions as special cases of STVQ. In particular, a fuzzy version of Kohonen's 
SOM was introduced, which is computationally more efficient than STVQ and still yields 
very good results as demonstrated for noisy vector quantization of image data. The de- 
terministic annealing scheme opens up many new possibilities for the usage of SOMs, in 
particular, when its neighborhood function represents non-trivial neighborhood relations. 
Acknowledgements This work was supported by TU Berlin (FIP 13/41). We thank 
H. Bartsch for help and advice with regard to the image processing example. 
References 
J. M. Buhmann and T. Hofmann. Robust Vector Quantization by Competitive Learning. 
Proceedings of ICASSP' 97, Munich, (1997). 
N. Farvardin. A Study of Vector Quantization for Noisy Channels. IEEE Transactions on 
Information Theory, vol. 36, p. 799-809 (1990). 
436 M. Burger, T. Graepel and K. Obermayer 
Original LBG SNR 4.64 dB 
STVQ SNR 9.00 dB SSOM SNR 7.80 dB 
Figure 5: Lena transmitted over a binary symmetric channel with BER of 0.033 encoded 
and reconstructed using different vector quantization algorithms. 
T. Graepel, M. Burger, and K. Obermayer. Phase Transitions in Stochastic Self-Organizing 
Maps. Physical Review E, vol. 56, no. 4, p. 3876-3890 (1997). 
T. Heskes and B. Kappen. Self-Organizing and Nonparametric Regression. Artificial 
Neural Networks - ICANN'95, vol. 1 ,p. 81-86 (1995). 
T. Kohonen. Self-OrganizingMaps. Springer-Verlag, 1995. 
S. P. Luttrell. Self-Organisation: A Derivation from first Principles of a Class of Learning 
Algorithms. Proceedings of IJCNN'89, Washington DC, vol. 2, p. 495-498 (1989). 
S. P. Luttrell. A Baysian Analysis of Self-Organizing Maps. Neural Computation, vol. 6, 
p. 767-794 (1994). 
D. Miller and K. Rose. Combined Source-Channel Vector Quantization Using Determin- 
istic Annealing. IEEE Transactions on Communications, vol. 42, p. 347-356 (1994). 
K. Rose, E. Gurewitz, and G. C. Fox. Statistical Mechanics and Phase Transitions in 
Clustering. Physical Review Letters, vol. 65, No. 8, p. 945-948 (1990). 
