184 
THE CAPACITY OF THE KANERVA ASSOCIATIVE MEMORY IS EXPONENTIAL 
P. A. Chou  
Stanford University, Stanford, CA 94305 
ABSTRACT 
The capacity of an associative memory is defined as the maximum 
ntuzber of words that can be stored and retrieved reliably by an address 
within a given sphere of attraction. It is shown by sphere packing 
ar%uents that as the address length increases, the capacity of any 
associative memory is limited to an exponential growth rate of 1 - 
where h2() is the binary entropy function in bits, and  is the radius 
of the sphere of attraction. This exponential growth in capacity can 
actually be achieved by the Kanerva associative memory, if its 
parameters are optimally set. Formulas for these optimal values are 
provided. The exponential growth in capacity for the Kanerva 
associative memory contrasts sharply with the sub-linear growth in 
capacity for the Hopfield associative memory. 
ASSOCIATIVE MEMORY AND ITS CAPACITY 
Our model of an associative memory is the following. Let (X,Y) be 
an (address, datum) pair, where X is a vector of n ls and Y is a 
vector of rn ls, and let (XO),YO)),...,(x(M),y(M)), be M (address, 
datum) pairs stored in an associative memory. If the associative memory 
is presented at the input with an address X that is close to some 
stored address X (j), then it should produce at the output a word Y that 
is close to the corresponding contents Y(J). To be specific, let us say 
that an associative memory can corctfraction  errors if an X within 
Hamming distance n6 of X (3) retrieves Y equal to yO). The Hmming 
sphere around each X (3) will be called the sphere of attraction, and  
will be called the radius of attraction. 
One notion of the capacity of this associative memory is the 
maximum number of words that it can store while correcting fraction  
errors. Unfortunately, this notion of capacity is ill-defined, because 
it depends on exactly which (address, datum) pairs have been stored. 
Clearly, no associative memory can correct fraction  errors for ucry 
sequence of stored (address, datum) pairs. Consider, for example, a 
sequence in which several different words are written to the sme 
address. No memory can reliably retrieve the contents of the 
overwritten words. At the other extreme, any associative memory'can 
store an unlimited number of words and retrieve them all reliably, if 
their contents are identical. 
A useful definition of capacity must lie somewhere between these 
two extremes. In this paper, we are interested in the largest M such 
that for most sequences of addresses X(1),...,X (M) and most sequences of 
data y(X)...y(M), the memory can correct fraction  errors. We define 
This work was supported by the National Science Foundation under NSF 
grant IST-8S0980 and by an IBM Doctoral Fellowship. 
 American Institute of Physics 1988 
185 
'most sequences' in a probabilistic sense, as some set of sequences with 
total probability greater than say, .99. When all sequences are 
equiprobable, this reduces to the deterministic version: 99 of all 
sequences. 
In practice it is too difficult to compute the capacity of a given 
associative memory with inputs of length n and outputs of length n. 
Fortunately, though, it is easier to compute the asymptotic rate at 
which M increases, as n and rn increase, for a given family of 
associative memories. This is the approach taken by McEliece et al. 
towards the capacity of the Hopfield associative memory. We take the 
same approach towards the capacity of the Kanerva associative memory, 
and towards the capacities of associative memories in general. In the 
next section we provide an upper bound on the rate of growth of the 
capacity of any associative memory fitting our general model. It is 
shown by sphere packing arguments that capacity is limited to an 
exponential rate of growth of 1- h2(), where h2(5) is the binary entropy 
function in bits, and  is the radius of attraction. In a later section 
it will turn out that this exponential growth in capacity can actually 
be achieved by the Kanerva associative memory, if its parameters are 
optimally set. This exponential growth in capacity for the Kanerva 
associative memory contrasts sharply with the sub-linear growth in 
capacity for the Hopfield associative memory 
A UNIVERSAL UPPER BOUND ON CAPACITY 
Recall that our definition of the capacity of an associative memory 
is the largest M such that for most sequences of addresses 
.(1),...,x(M) and most sequences of data Y(x),...,Y(M), The memory can 
correct fraction  errors. Clearly, an upper bound to this capacity is 
the largest M for which there exists some sequence of addresses 
X (1),... ,X  such that for most sequences of data Y(1),... ,Y(M), the 
memory can correct fraction  errors. We now derive an expression for 
this upper bound. 
Let  be the radius of attraction and let DH(X(J),d) be the sphere 
of attraction, i.., the set of all Xs at most Hamming distance d = [nJ 
from .(J). Since by assumption the memory corrects fraction  errors, 
every address = 6 DH(X(3),d) retrieves the word y(3). The size of 
DH(X(),d) is easily shown to be independent of X(j) and equal to 
out of a total of 2  n-bit addresses, at least ,d addresses retrieve 
Y(), at least ,d addresses retrieve Y(2), at least w,d addresses 
retrieve y(3), and so forth. It follows that the total number of 
distinct Y(3)s can be at most 2"/,d. Now, from Stirling's formula it 
can be shown that if d _ n/2, then w. = 2 h2(d/)+O(1$) , where 
h2(5) =-51og25-(1-)1og2(1-6) is the binary entropy function in bits, 
and 69(logn) is some function whose magnitude grows more slowly than a 
constant times log n. Thus the total number of distinct Y()s can be at 
most 2 n(-:(5))+O(1g) . Since any set containing 'most sequences' of M 
m-bit words will contain a large number of distinct words (if rn is 
186 
1 2 --- n 
Figure 1: Neural net representation of the Kanerva associative memory. Signals propa- 
gate from the bottom (input) to the top (output). Each arc multiphes the signal by its 
weight; each node adds the incoming signals and then thresholds. 
sufficiently large --- see [2] for details), it follows that 
M _< 2 n(1-h(6))+O(lgn). (1) 
In general a function f(n) is said to be O(g(n)) if f(n)/g(n) is 
bounded, i.e., if there exists a constant a such tha; If(n)[ < aig(n)[ for 
all n. Thus (1) says that there exists a constant a such that 
M < 2 n(1-h2(5))+alg'. It should be emphasized that since ct is unknown, 
this bound has no meaning for fixed n. However, it indicates that 
asymptotically in n, the maximum exponential rate of growth of M is 
- 
Intuitively, only a sequence of addresses XO),...,X  that 
optimally pack the address space {-1,+1} ' can hope to achieve this 
upper bound. Remarkably, most such sequences are optimal in this sense, 
when n is large. The Kanerva associative memory can take advantage of 
this fact. 
THE KANERVA ASSOCIATIVE MEMORY 
The Kanerva associative memory [3,4] can be regarded as a two-layer 
neural network, as shown in Figure 1, where the first layer is a 
preprocessor and the second layer is the usual Hopfield style array. 
The preprocessor essentially encodes each n-bit input address into a 
very large k-bit internal representation, k>>n, whose size will be 
permitted to grow exponentially in n. It does not seem surprising, 
then, that the capacity of the Kanerva associative memory can grow 
exponentially in n, for it is known that the capacity of the Hopfield 
array grows almost linearly in k, assuming the coordinates of the 
k-vector are drawn at random by independent flips of a fair coin 
187 
1 k 
w 
 o 
z 
Figure 2: Matrix representation of the Kanerva associative memory. Signals propagate 
from the right (input) to the left (output). Dimensions are shown in the box corners. 
Circles stand for functional composition; dots stand for matrix multiplication. 
In this situation, however, such an assumption is ridiculous: Since the 
k-bit internal representation is a function of the m-bit input address, 
it can contain at most m bits of information, whereas independent flips 
of a fair coin contain k bits of information. Kanerva's primary 
contribution is therefore the specification of the preprocessor, that 
is, the specification of how to map each m-bit input address into a very 
large k-bit internal representation. 
The operation of the preprocessor is easily described. Consider 
the matrix representation shown in Figure 2. The matrix Z is randomly 
populated with ls. This randomness assumption is required to ease the 
analysis. The function fr is 1 in the ith coordinate if the ith row of 
Z is within Hamming distance r of X, and is 0 otherwise. This is 
accomplished by thresholding the ith input against m - 2r. The 
parameters r and k are two essential parameters in the Kanerva 
associative memory If r and k are set correctly, then the number of is 
in the representation (ZX) will be very small in comparison to the 
number of Os. Hence (ZX) can be considered to be a sparse internal 
representation of X. 
The second stage of the memory operates in the usual way, except on 
the internal representation of X. That is, Y = g(Wfr(ZX)), where 
M 
W = Y(J)[fr(ZX(J))] t, (2) 
j=l 
and g s the threshold function whose ith coordinate is +1 if the 
input is greater than 0 and -1 is the ith input is less than 0. The ith 
column of W can be regarded as a memory location whose address is the 
ith row of Z. Every X within Hamming distance r of the ith row of Z 
accesses this location Hence r is known as the access radius, and k is 
the number of memory cations. 
The approach taken in this paper is to fix the linear rate p at 
which r grows with n, and to fix the exponential rate  at which k grows 
with m. It turns out that the capacity then grows at a fixed 
exponential rate Cp,(6), depending on p, , and 6. These exponential 
rates are sufficient to overcome the standard loose but simple 
polynomial bounds on the errors due to combinatorial approximations. 
188 
THE CAPACITY OF THE KANERVA ASSOCIATIVE MEMORY 
Fix OS 1, OpS 1/2, and 0S5 J mJn{2p, 1/2}. Let n be the 
input address length, and let m be the output word length. It is 
assumed that m is at most polynomial in n, i.e., m = exp{O(logn)}. Let 
r = [pn] be the access radius, let k = 2 [n] be the number of memory 
locations, and let d = [Sn] be the radius of attraction. Let Mn be the 
number of stored words. The components of the n-vectors J(1),...,x(M"), 
... y(%4,), and the k x n matrix Z are assumed to be 
the m-vectors y(1), , 
IID equiprobable 1 random variables. Finally, given an n-vector X, 
let Y = 9(Wf.(ZX)) where W M . 
Define the quantity 
c.,(5) = { 25 + 2(1- 5)n(.;_- ) + - 2n(p) if  <_ o(p) 
c,o()(5) if  > o(p) ' 
() 
where 
and 
o(p) = 2n(p) - 2 - 2(1 - )n([_-) + 1 - n() 
(4) 
-r = ]- - 2p(1 - p). 
Theorem: If 
M <_ 2 c",()+0) 
then for all e > O, all sufficiently large n, all j 6 {1,...,Mn}, and all 
X  DH(X(J),d), 
P{Y  Y('/)} < e. 
Proof: See [2]. 
Interpretation: If the exponential growth rate of the number of 
stored words M is asymptotically less than Cp,(5), then for every 
sufficiently large address length n, there is some realization of the 
n X 2 " preprocessor matrix Z such that the associative memory can 
correct fraction 5 errors for most sequences of Mn (address, datum) 
pairs. Thus C,,(5) is a lower bound on the exponential growth rate of 
the capacity of the Kanerva associative memory with access radius n and 
number of memory locations 2 ". 
Figure 3 shows C,(5) as a function of the radius of attraction 5, 
for  = 0() and  = 0.1, 0.2, 0.3, 0.4 and 0.45. For. any fixed access 
radius p, C,,0()(5 ) decreases as 5 increases. This reflects the fact 
that fewer (address, datum) pairs can be stored if a greater fraction of 
errors must be corrected. As p increases, C,,o(,)(5 ) begins at a lower 
point but falls off less steeply. In a moment we shall see that  can 
be adjusted to provide the optimal performance for a given 5. 
Not shown in Figure 3 is the behavior of C,,(5) as a function of . 
However, the behavior is simple. For  > 0(), C,,(5) remains 
unchanged. while for  < 0(P). C,.(5) is simply shifted down by the 
difference 0(P)- - This establishes the conditions under which the 
Kanerva associative memory is robust against random component failures. 
Although increasing the number of memory locations beyond 2 "0(") does 
not increase the capacity, it does increase robustness. Random 
189 
t.1 
=0.2 
0.3 
Figure 3: Graphs of Cp,o(p)(5 ) as defined by (3). The upper envelope is 1 - ha(5). 
component failures will not affect the capacity until so many components 
have failed that the number of surviving memory locations is less than 
Perhaps the most important curve exhibited in Figure 3 is the 
sphere packing upper bound !- h(6), which is achieved for a particular 
3 - 2p(1- p). Equivalently, the upper bound is achieved 
pby=- 
for a particular 6 by p equal to 
(5) 
Thus (4) and ($) specify the optimal values of the parameters  and p, 
respectively. These functions are shown in Figure 4. With these 
optimal values, (3) simplifies to 
= - 
the sphere packing bound. 
It can also be seen that for  = 0 in (3), the exponential Towth 
rate of the capacity is asymptotically equal to , which is the 
exponential growth rate of the number of memory locations, k. That is, 
AJ = 2 n+O(g) = k.2 0(). Kanerva [3] and Keeler [5] have argued 
that the capacity at  = 0 is proportional to the number of memory 
locations, i.e., /=k. , for some constant . Thus our results are 
consistent with those of Kanerva and Keeler, provided the polynomial' 
O(n) can be proved to be a constant. However, the usual statement of 
their result, /= k. , that the capacity is simply proportional to the 
number of memory locations, is false, since in light of the universal 
190 
o 
0 0.1 0.2 0.3 0.9 0.5 
p 
Figure 4: Graphs of no(p) and So(p), the inverse of po(5), as defined by (4) and (5). 
upper bound, ir is impossible for the capacity to grow without bound, 
with no dependence on'the dimension n. In our formulation, this 
difficulty does nor arise because we have explicitly related the number 
of memory locations ro the input dimension: kn = 2 n. In fact, our 
formulation provides explicit, coherent relationships between all of the 
following variables: the capacity M, the number of memory locations k, 
the input and output dimensions n and m, the radius of attraction 6, 
and the access radius p. We are therefore able ro generalize the 
results of [3,S] ro the case  > 0, and provide explicit expressions for 
the asymptotically optimal values of p and n as well. 
CONCLUSION 
We described a fairly general model of associative memory and 
selected a useful definition of its capacity. A universal upper bound 
on the growth of the capacity of such an associative memory was shown by 
a sphere packing ar&xmenr ro be exponential with rate 1 - h2(), where 
h2(5) is the binary entropy function and  is the radius of attraction. 
We reviewed the operation of the Kanerva associative memory, and stared 
a lower bound on the exponential growth rare of its capacity. This 
lower bound meets the universal upper bound for optimal values of the 
memory parameters p and n. We provided explicit formulas for these 
optimal values. Previous results for  = 0 staring rhar the capacity of 
the Kanerva associative memory is proportional to the number of memory 
locations cannot be strictly true. Our formulation corrects the problem 
and generalizes those results ro the case  > 0. 
191 
References
1. R.J. McEliece, E.C. Posner, E.R. Rodemich, and S.S. Venkatesh, 
''The capacity of the Hopfield associative memory,'' IEEE 
Transactions on Information Theory, submitted. 
2. P.A. Chou, ''The capacity of the Kanerva associative memory,'' 
IEEE Transactions on Information Theory, submitted. 
3. P. Kanerva, ''Self-propagating search: a unified theory of 
memory,'' Tech. Rep. CSLI-84-7, Stanford Center for the Study of 
Language and Information, Stanford, CA, March 1984. 
4. P. Kanerva, ''Parallel structures in human and computer memory,'' 
in Neural NetworksforOomputing, (J.S. Denker, ed.), New York: 
American Institute of Physics, 1986. 
5. J.D. Keeler, ''Comparison between sparsely distributed memory and 
Hopfield-type neural network models,'' Tech. Rep. RIACS TR 86.31, 
NASA Research Institute for Advanced Computer Science, Mountain 
Vie, CA, Dec. 1986. 
