A Topographic Product for the Optimization 
of Self-Organizing Feature Maps 
Hans-Ulrich Bauer, Klaus Pawelzik, Theo Geisel 
Institut flit theoretische Physik and SFB Nichtlineaxe Dynamik 
Universitit Frankfurt 
Robert-Mayer-Str. 8-10 
W-6000 Frankfurt 11 
Fed. Rep. of Germany 
email: bauer@asgard.physik.uni-fr ankfur t.dbp 
Abstract 
Optimizing the performance of self-organizing feature maps like the Ko- 
honen map involves the choice of the output space topology. We present 
a topographic product which measures the preservation of neighborhood 
relations as a criterion to optimize the output space topology of the map 
with regard to the global dimensionality D A as well as to the dimensi- 
ons in the individual directions. We test the topographic product method 
not only on synthetic mapping examples, but also on speech data. In the 
latter application our method suggests an output space dimensionality of 
D A = 3, in coincidence with recent recognition results on the same data 
set. 
1 INTRODUCTION 
Self-organizing feature maps like the Kohonen map (Kohonen, 1989, Ritter et al., 
1990) not only provide a plausible explanation for the formation of maps in brains, 
e.g. in the visual system (Obermayer et al., 1990), but have also been applied to 
problems like vector quantization, or robot arm control (Martinetz et al., 1990). 
The underlying organizing principle is the preservation of neighborhood relations. 
For this principle to lead to a most useful map, the topological structure of the 
output space must roughly fit the structure of the input data. However, in technical 
1141 
1142 Bauer, Pawelzik, and Ceisel 
applications this structure is often not a priory known. For this reason several 
attempts have been made to modify the Kohonen-algorithm such, that not only 
the weights, but also the output space topology itself is adapted during learning 
(Kangas et al., 1990, Martinetz et al., 1991). 
Our contribution is also concerned with optimal output space topologies, but we 
follow a different approach, which avoids a possibly complicated structure of the 
output space. First we describe a quantitative measure for the preservation of neigh- 
borhood relations in maps, the topographic product P. The topographic product 
had been invented under the name of "wavering product" in nonlinear dynamics in 
order to optimize the embeddings of chaotic attractors (Liebert et al., 1991). P = 0 
indicates perfect match of the topologies. P  0 (P > 0) indicates a folding of 
the output space into the input space (or vice versa), which can be caused by a 
too small (resp. too large) output space dimensionality. The topographic product 
can be computed for any self-organizing feature map, without regard to its specific 
learning rule. Since judging the degree of twisting and folding by visually inspec- 
ting a plot of the map is the only other way of "measuring" the preservation of 
neighborhoods, the topographic product is particularly helpful, if the input space 
dimensionality of the map exceeds D t = 3 and the map can no more be visualized. 
Therefore the derivation of the topographic product is already of value by itself. 
In the second part of the paper we demonstrate the use of the topographic product 
by two examples. The first example deals with maps from a 2D input space with 
nonflat stimulus distribution onto rectangles of different aspect ratios, the second 
example with the map of 19D speech data onto output spaces of different dimen- 
sionality. In both cases we show, how the output space topology can be optimized 
using our method. 
2 DERIVATION OF THE TOPOGRAPHIC PRODUCT 
2.1 KOHONEN-ALGORITHM 
In order to introduce the notation necessary to derive the topographic product, we 
very briefly recall the Kohonen algorithm. It describes a map from an input space 
V into an output space A. Each node j in A has a weight vector wj associated with 
it, which points into V. A stimulus v is mapped onto that node i in the output 
space, which minimizes the input space distance dV(wi, v): 
i: dV(wi,v)--mindV(wj,v). (1) 
jA 
During a learning step, a random stimulus is chosen in the input space and mapped 
onto an output node i according to Eq. 1. Then all weights wj are shifted towards v, 
with the amount of shift for each weight vector being determined by a neighborhood 
function h9.. 
o (d.a(j, i))(v- wj) j & A. (2) 
5Wj = hj, i 
(d'4(j, i) measures distances in the output space.) h..i effectively restricts the nodes 
participating in the learning step to nodes in the vxcinity of i. A typical choice for 
A Topograhic Product for the Optimization of Self-Organizing Feature Maps 1143 
the neighborhood function is 
2o.   
In this way the neighborhood relations in the output space are enforced in the 
input space, and the output space topology becomes of crucial importance. Finally 
it should be mentioned that the learning step size e as well as the width of the 
neighborhood function r are decreased during the learning for the algorithm to 
converge to an equilibrium state. A typical choice is an exponential decrease. For 
a detailed discussion of the convergence properties of the algorithm, see (Ritter et 
al., 1988). 
2.2 TOPOGRAPHIC PRODUCT 
After the learning phase, the topographic product is computed as follows. For each 
output space node j, the nearest neighbor ordering in input space and output space 
is computed (n(j) denotes the k-th nearest neighbor of j in A, n? (j) in V). Using 
these quantities, we define the ratios 
dV(wj' w'(J)) (4) 
O(j,k) = dV(wj,w%v(j) ), 
(5) 
One has Ox(j,k) = O2(j,k) = 1 only, if the k-th nearest neighbors in V and A 
coincide. Any deviations of the nearest neighbor ordering will result in values for 
Qx,2 deviating from 1. However, not all differences in the nearest neighbor orderings 
in V and A are necessarily induced by neighborhood violations. Some can be due 
to locally varying magnification factors of the map, which in turn are induced by 
spatially varying stimulus densities in V. To cancel out the latter effects, we define 
the products 
1 
k) = (nLq(j, 0) (6) 
1 
= (n,=020, O) (7) 
P2(j,k)   
For these the relations 
Px(j,k) _> 1, 
P=(j, k) _< 1 
hold. Large deviations of Pi (resp. P=) from the value 1 indicate neighborhood 
violations, when looking from the output space into the input space (resp. from the 
input space into the output space). In order to get a symmetric overall measure, 
we further multiply Px and P2 and find 
1 
Pa(j,i) = =. (8) 
1144 Bauer, Pawelzik, and Geisel 
Further averaging over all nodes and neighborhood orders finally yields the topo- 
graphic product 
N N--1 
1 
P = N(N- 1) Z Z log(P3(j,k)). (9) 
j=l k=l 
The possible values for P are to be interpreted as follows' 
P _< 0  output space dimension D A too low, 
P = 0  output space dimension D A o.k., 
P _> 0  output space dimension D  too high. 
These formulas suffice to understand how the product is to be computed. A more 
detailed explanation for the rational behind each individual step of the derivation 
can be found in a forthcoming publication (Bauer et al., 1991). 
3 EXAMPLES 
We conclude the paper with two examples which exemplify how the method works. 
3.1 ILLUSTRATIVE EXAMPLE 
The first example deals with the mapping from a 2D input space onto rectangles of 
different aspect ratios. The stimulus distribution is fiat in one direction, Gaussian 
st, aped in the other (Fig la). The example demonstrates two aspects of our method 
at once. First it shows that the method works fine with maps resulting from nonfiat 
stimulus distributions. These induce spatially varying areal magnification factors 
of the map, which in turn lead to twists in the neighborhood ordering between 
input space and output space. Compensation for such twists was the purpose of 
the multiplication in Eqs (6) and (7). 
Table 1: Topographic product P for the map from a square input space with a 
Gaussian stimulus distribution in one direction, onto rectangles with different aspect 
ratios. The values for P are averaged over 8 networks each. The 43x 6-output space 
matches the input data best, since its topographic product is smallest. 
N 
256x 1 
128x2 
64x4 
43x6 
32x8 
21x12 
16x16 
aspect ratio 
256 
64 
16 
7.17 
4 
1.75 
1 
P 
-0.04400 
-0.03099 
-0.00721 
0.00127 
0.00224 
0.01335 
0.02666 
A Topograhic Product for the Optimization of Self-Organizing Feature Maps 1145 
1.0 
P(x.y) 
[18 
O.B 
0.4 
Q2 
0.8 
0.6 
0.4 
0.2[ 
o 
02 
0.4 0.6 
x 
Fig. la 
Fig. lb 
1.0 
0.8 
0.B 
02 
o 
02 
o., o.6 0.8 
x 
1.o 
0.8 - 
0.6 
0.4 
O2- 
O. 
0.2 
0.6 
Fig. lc Fig. ld 
Figure 1: Self-organizing feature maps of a Gaussian shaped (a) 2-dimensional 
stimulus distribution onto output spaces with 128 x 2 (b), 43 x 6 (c) and 16 x 16 
(d) output nodes. The 43 x 6-output space preserves neighborhood relations best. 
1146 Bauer, Pawelzik, and Geisel 
Secondly the method cannot only be used to optimize the overall output space 
dimensionality, but also the individual dimensions in the different directions (i.e. 
the different aspect ratios). If the rectangles are too long, the resulting map is 
folded like a Peano curve (Fig. lb), and neighborhood relations are severely violated 
perpendicular to the long side of the rectangle. If the aspect ratio fits, the map has 
a regular look (Fig. lc), neighborhoods are preserved. The zig-zag-form at the 
outer boundary of the rectangle does not correspond to neighborhood violations. 
If the rectangle approaches a square, the output space is somewhat squashed into 
the input space, again violating neighborhood relations (Fig. ld). The topographic 
product P coincides with this intuitive evaluation (Tab. 1) and picks the 43 x 6-net 
a.s the most neighborhood preserving one. 
3.2 APPLICATION EXAMPLE 
In our second example speech data is mapped onto output spaces of various di- 
mensionality. The data represent utterances of the ten german digits, given as 
19-dimensional acoustical feature vectors (Gramf et al., 1990). The P-values for 
the different maps are given in Tab. 2. For both the speaker-dependent as well as 
the speaker-independent case the method distinguishes the maps with D a = 3 as 
most neighborhood preserving. Several points are interesting about these results. 
First of all, the suggested output space dimensionality exceeds the widely used 
D A = 2. Secondly, the method does not generally judge larger output space di- 
mensions as more neighborhood preserving, but puts an upper bound on D '4. The 
data seems to occupy a submanifold of the input space which is distinctly lower 
than four dimensional. Furthermore we see that the transition from one to several 
speakers does not change the value of D a which is optimal under neighborhood 
considerations. This contradicts the expectation that the additional interspeaker 
variance in the data occupies a full additional dimension. 
Table 2: Topographic product P for maps from speech feature vectors in a 19D 
iLput space onto output spaces of different dimensionality D V. 
D V 
1 
2 
3 
4 
N 
256 
16x16 
7x6x6 
4x4x4x4 
P 
speaker- 
dependent 
-0.156 
-0.028 
0.019 
0.037 
P 
speaker- 
independent 
-0.229 
-0.036 
0.007 
0.034 
What do these results mean for speech recognition? Let us suppose that several 
utterances of the same word lead to closeby feature vector sequences in the input 
space. If the mapping was not neighborhood preserving, one should expect the tra- 
jectories in the output space to be separated considerably. If a speech recognition 
system compares these output space trajectories with reference trajectories corre- 
sponding to reference utterances of the words, the probability of misclassification 
rises. So one should expect that a word recognition system with a Kohonen-map 
A Topograhic Product for the Optimization of Self-Organizing Feature Maps 1147 
preprocessor and a subsequent trajectory classifier should perform better if the 
neighborhoods in the map are preserved. 
The results of a recent speech recognition experiment coincide with these heuristic 
expectations (Brandt et al., 1991). The experiment was based on the same data 
set, made use of a Kohonen feature map as a preprocessor, and of a dynamic time- 
warping algorithm as a sequence classifier. The recognition performance of this 
hybrid system turned out to be better by about 7% for a 3D map, compared to a 
2D map with a comparable number of nodes (0.795 vs. 0.725 recognition rate). 
Acknowledgement s 
This work was supported by the Deutsche Forschungsgemeinschaft through SFB 
185 "Nichtlineare Dynamik", TP A10. 
References 
H.-U. Bauer, K. Pawelzik, Quantifying the Neighborhood Preservation of Self- 
Organizing Feature Maps, submitted to IEEE TNN (1991). 
W.D. Brandt, H. Behme, H.W. Strube, Bildung von Merkmalen zur Spracherken- 
nuns mittels Phonotopischer Karten, Fortschritte der Akustik - Proc. of DAGA 91 
(DPG GmbH, Bad Honnef), 1057 (199). 
T. GramfJ, H.W. Strube, Recognition of Isolated Words Based on Psychoacoustics 
and Neurobiology, Speech Comm. 9, 35 (1990). 
J.A. Kansas, T.K. Kohonen, J.T. Laaksonen, Variants of Self-Organizing Maps, 
IEEE Trans. Neut. Net. 1, 93 (1990). 
T. Kohonen, Self-Organization and Associative Memory, 3rd Ed., Springer (1989). 
W. Liebert, K. Pawelzik, H.G. Schuster, Optimal Embeddings of Chaotic Attractors 
from Topological Considerations, Europhysics Left. 14, 521 (1991). 
T. Martinetz, H. Ritter, K. Schulten, Three-Dimensional Neural Net for Learning 
Visuomotor Coordination of a Robot Arm, IEEE Trans. Neut. Net. 1, 131 (1990). 
T. Martinetz, K. Schulten, A "Neural-Gas" Network Learns Topologies, Proc. 
ICANN 91 Helsinki, ed. Kohonen et al., North-Holland, 1-397 (1991). 
K. Obermaier, H. Ritter, K. Schulten, A Principle for the Formation of the Spatial 
Structure of Cortical Feature Maps, Proc. Nat. Acad. Sci. USA 87, 8345 (1990). 
H. Ritter, K. Schulten, Convergence Properties of Kohonen's Topology Conserving 
Maps: Fluctuations, Stability and Dimension Selection, Biol. Cyb. 60, 59-71 
(1988). 
H. Ritter, T. Martinetz, K. Schulten, Neuronale Netze, Addison Wesley (1990). 
PART XIV 
PERFORMANCE 
COMPARISONS 
