211 
ltI6H DENSITY ASSOCIATIVE ORIES 1 
Air Dembo 
Information Systems Laboratory, Stanford University 
Stanford, CA 94305 
Ofer Ze itouni 
Laboratory for Informat ion and Dec is ion Systems 
MIT, Cambridge, MA 02139 
A class of high density associative memories is constrcted, 
starting from a description of des ired properties those should 
exhibit. These properties include high capacity, oontrollable basins 
of attraction and fast speed of convergence. lortunately enough, the 
resulting memory is implementable by an artificial Neural Net. 
I I/fl!01) UCTION 
Most of the work on associative memories has been structure 
oriented! i.e., given a Neural architecture, efforts were directed 
towards the analysis of the resulting network. Issues like capacity, 
basins of attractions, etc. were the main objects to be analyzed of., 
e.g. [11, [21, [31, [41 and references there, among others. 
In this paper, we take a different approach! we start by 
explicitly stating the desired properties of the network, in terms of 
capacity, etc. Those requirements are given in terms of axioms (c.f. 
below). Then, we bring a synthesis method which enables one to design 
an architecture which will yield the des ired performance. 
Surprisingly enough, it turns out that one gets rather easily the 
following propert ies: 
(a) High capacity (unlimited in the continuous state-space case, 
bounded only by sphere-packing bounds in the discrete state 
case) . 
(b) Guaranteed basins of attractions in terms of the natural 
metric of the state space. 
(c) High speed of oonvergence in the guaranteed basins of 
attract ion. 
Moreover, it turns out that the architecture suggested below is the 
only one which satisfies all our axioms ('desired properties')! 
Our approach is based on defining a potential and following a 
descent algergrim (e.g., a gradient alger grim). The main design task 
is to construct such a potential (and, to a lesser extent, an 
implementation of the descent algorithm via a Neural network). In 
doing so, it turns out that, for reasons described below, it is useful 
to regard each desired memory location as a "particle' in the state 
space. It is natural to require now the following requirement from a 
1An expanded version of this work has been submitted to Phys. Rev. A. 
This work was carried out at the Center for Neural Science, Brown 
Univers Sty. 
American Institute of Physics 1988 
212 
memory: 
(P1) The potential should be linear w.r.t. adding partic les in 
the sense that the potential of two particles should be the sum of the 
potentials induced by the individual particles (i.e., we do not allow 
interpart ioles interact ion). 
(P2) Particle locations are the only possible sites of stable 
memory 1coat ions. 
(P3) The system should be invariant to translations and 
rotations of the coordinates. 
We note that the last requirement is made only for the sake of 
simplicity. It is not essential and may be dropped without affecting 
the results. 
In the sequel, we construct a potential which satisfies the above 
requirements. We refer the reader to [5] for details of the proofs, 
OtC. 
Acknowledgements. We would like to thank Prof. L.N. Cooper and C.M. 
Baolmann for many fruitful discussions. In particular, section 2 is 
part of a joint work with them ([6]). 
2. HIGH DENSITY STORAGE MODEL 
In what follows we present a particular case of a method for the 
construction of a high storage density neural memory. We define a 
function with an arbitrary number of minima that lie at preassigned 
points and define an appropriate relaxation procedure. The general 
case in presented in [5]. 
Let l,...,m be a set of m arbitrary distinct memories in R N. 
The 'energy' function we viii use is: 
m 
1 
:i 
i=1 
(1) 
where we assume tloughout that N _} 3, L _} (N - 2), and Qi ) 0 and use 
1... [ to denote the Euclidean distance. Note that for L = 1, N=3, [ 
is the electrostatic potential induced by negative fixed particles 
with charges -Qi' This 'energy' function possesses global minima at 
l,...,m (where (i) ----) and has no local minima except at these 
points. A rigorous proof is presented in [5] together with the 
complete characterization of functions having this property. 
As a relaxation procedure, we can choose any dynamical system for 
which [ is strictly decreasing, uniformly in compacts. In this 
instance, the theory of dynamical systems guarantees that for almost 
any initial data, the trajectory of the system converges to one of the 
desired points l,...,m. However, to give concrete results and to 
further exploit the resemblanoe to electrostatic, consider the 
relaxat ion: 
213 
 = - = - i (ti[ - i I-(L+2) ( - Xi) 
i=1 
where for N=3, L=I, equation (2) describes the motion of a positive 
test pazticle in the electrostatic field  Eenerated by the neEative 
fixed chazEes -QI' ' ' ' '- -O at El' ' ' ' 'Era' 
Since the field  is just minus the Eradient of 5, it is clear 
that alon E trajectories  of (2), d/dt _{ 0, with equality only at the 
fixed points of (2), which are exactly the stationary points of 
Therefore, usinE (2) as the relaxation procedure, we can conclude 
that enterinE at any (0), the system converEes to a stationary point 
of . The space of inputs is part itioned into m domains of 
attraction, each one correspondin E to a different memory, and the 
boundaries (a set of measure zero), on which (0) will converEe to a 
saddle point of 5. 
We can now explain why  has no spurious local minima, at least 
 . Suppose  a 
for L=I, N=3 usinE elementary physical arEuments has 
spurious local minima at   l,...,m, then in a small neiEhborhood 
of  which does not include any of the i' the field  points towards 
. Thus on any closed surface in that neiEhborhood, the inteEral of 
the normal inward component of .. is positive. However, this inteEral 
is just the total charEe included inside the surface, which is zero. 
Thus we arrive at a contradiction, so  can not be a local minimum. 
We now have a relaxation procedure, such that almost any (0) is 
attracted by one of the i' but we have not yet specified the shapes 
of the basins of attraction. By varyin E the charEes Qi' we can 
enlarEe one basin of attraction at the expense of the others (and vice 
versa) , 
]ven when all of the /t i are equal, the position of the 
cause (0) not to converEe to the closest memory, as emphasized in the 
example in fiE. 1. However, let r = minl_iij_%[ i - be the 
minimal distance between any two memories! then if It(0) - 
it can be shown that (0) will converEe to i' (provided that k = 
- 1). Thus, if the memories are densely packed in a hypersphere, by 
choosin E k larEe enouEh (i.e. enlarEin E the parameter L), converEence 
to the closest memory for any 'interest in E' input, that is an input 
(0) with a distinct closest memory, is Euaranteed. The detailed 
proof of the above property is Eiven in [5]. It is based on boundinE 
the number of _., ji, in a hypersphere of radius R(lr) around i' by 
[2R/r + 1] N, then boundinE the maEnitude of the field induced b,y any 
j, ji, on the bounda,] of such a hypersphere by (R-[(O)-i[)-IL+l), 
and finally inteEratinE to show that for i(0)_i[i C 
the converEence of (0) to i i is within finite time T, which behaves 
like 0 L+2 for L )) I and 0 { I and fixed. Intuitively the reason for 
214 
this behaviour is the short-range nature of the fields used in 
equation (2). Because of this. we also expect extremely low 
convergence rate for inputs (0) far away from all of the i' 
The radial nature of these fields suggests a 
way to overcome this difficulty. that is to 
increase the convergence rate from points very far 
away. without disturbing all of the aforementioned 
desirable properties of the model. Assume that we 
" know in advance that all of the i lie inside some 
large hypersphere S around the o_rigin. Then. at 
any point  outside S. the field  has a positive 
projection radially into S. By'adding a long- 
2 range force to p. effective only outside of S. we 
can hasten the movement towards S. from points far 
away. without creating additional minima inside of 
', S. As an example the force (- for   S! 0 for 
, 7  4   S) will pull any test input (0) to 
,, the boundary of S within the small finite time T  
 , 1/iS[. and from then on the system will behave 
inside S according to the original field h; 
Up to this point. our derivations e been 
for a continuous system. but from it we can deduce 
Figure 1 
a discrete system. We shall do this mainly for a 
R)) 1 and 5  1 clearer comparison between our high density memory 
model and the discrete version of Hopfield's 
model. Before continuing in that direction, note 
that our continuous system has unlimited storage 
o apac ity unlike flop field ' s cent inuous system, 
which Iike his disc fete model, has Iimit ed 
o ap ac ity. 
Nor the discrete system, assume that the i are composed of 
elements +1 and replace the Euclidean distance in (1) with the 
normalized Hamming distance Il - -I --0 [j=ll - jl. This places 
the vectors i on the unit hypersphere. 
The relaxation process for the discrete system will be of the 
type defined in Hopfield's model in [i] . Choose at random a 
component to be updated (that is, a neighbor ' of  such that 
[' -[ = 2/N), calculate the 'enersy' difference,  -- (-(), 
and only if [ ( 0, change this component, that is: 
i '9i sign([() - [()), () 
where () is the potential energy. in (1). Since there is a finite 
number of possible  vectors (2"), convergence in finite tie is 
guaranteed. 
This relaxation procedure is rigid since the movement is limited 
to points with components +1. Therefore, although the local minima of 
() defined in (2) are only at the desired points ii' the relaxation 
may et stuck at some  which is not a stationary point of (). 
However, the short range behaviour of the potential (), unlike the 
long-range behavior of the quadratic potential used by flopfield, gives 
215 
rise to results similar to those we have quoted for the continuous 
model (equation (1)). 
Speoifioally, let the stored memories l,...,m be separated from 
one another by hay ins at least pN different oomponents (0 ( p _( 1/2 
and p fixed), and let (0) asree up to at least one i with at most 
0pN errors between them (0 _( 0 ( 1/2, with 0 fixed), then (0) 
oonverses monotonioally to i by the relaxation prooedure siren in 
equation (3). 
This result holds independently of m, provided that N is larse 
enoush (typically, Np ln() -)1)and L is chosen so that _(ln({-) 
The proof is oonstuoted by bound in s the cummulative effeot of terms 
[ - it[ -L, ji, to the enersy differenoe $5 and showinS that it is 
dominaffed by [- ii[ -L. For details, we refer the reader asain to 
[51. 
Note the importance of this property: unlike the Hopfield model 
whioh is limited to m _( N, the sussested system is optimal in the 
sense of Information Theory, sinoe for every set of memories 
separated from eaoh other by a ltammin s distanoe pN, up to 1/2 pN 
errors in the input can be oorreoted, provided that N is larse and L 
properly ohosen. 
As for the oomplexity of the system, we note that the nonlinear 
operation a -L for a}0 and L inteser (whioh is at the heart of our 
system computat ionally)' is equivalent to e -Lln(a) and oan be 
implemented, therefore, by a simple electrical circuit composed of 
diodes, which have exponent ial input-output characteristics, and 
resisters, which can carry out the necessary multiplications (cf. the 
implementation of section 3). 
Further, since both [ii[ and [[ are held fixed in the discrete 
system, where all states are on the unit hypersphere, [ - i[ 2 is 
equivalent to the inner product of  and i' up to a constant. 
To conclude, the sussested model involves about m'N 
multiplications, followed by m nonlinear operation_s, and then 
additions. The orisinal model of flopfield involves N z multiplications 
and add it ions, and then N nonlinear operations, but is limited to 
m _( N. Therefore, whenever the flopfield model is applicable the 
complexity of both models is comparable. 
3. DPLF2NTATION 
We propose below one possible network which implements the 
discrete time and space version of the model described above. An 
implementation for the ocntinuous time case, which is even simpler, is 
also hinted. We point out that the implementation described below is 
by no means unique, (and maybe even not the simplest one). foreover, 
the 'neurons' used are artificial neurons which perform various tasks, 
as follows: There are (N+I) neurons which are delay elements, and 
pointwise non-linear functions (which may be interpreted as delay 
less, intermediate neurons). There are mB[ synaptic connections 
between those two layers of neurons. In addition, as in the Hopfield 
216 
model, we have at each iteration to specify (either deterministically 
or stochastically) which coordinate are we updating. To do that, we 
use an N dimensional 'control register' whose content is always a unit 
vectoz of [0, 1] N (and the location of the 'X' will denote the next 
coordiante to be changed). This vector may be varied from instant n 
to n + ! either by shift ('sequential coordinate update') or at 
random. 
Let Ai, i_(i(N be the i-th output of the '0ontrol' register, x i, 
_i(N and V be the (N+I) neurons taputs and x i = xi(1-2Ai) the 
corresponding outputs (where x, xs[+l,-1], as(0,1], but V is a real 
number), p,, l_j_m be the input of the j-th intermediate neuron 
(-1-(p,-l), Jq: :-(i-p,) -L be its output, and . .W_. i --u!J)/N be the 
i-th element of the j-th memory. 
The system's equations are: 
i "' xi(! - 2Ai) 
! _ i - N (4a) 
N 
= wj 
i=! 
! _( j _< m (4b) 
qj - -(x - j)-L 
I _ j _ m (4c) 
V = qj (4d) 
j=! 
! 
s = t-sign(V - v)) 
(4e) 
x i <--x i + 83. ! < i -< N (4f) 
V <--V + $ (4g) 
The system is initialized by x i = xi(O) (the probe vector), and 
V -- + m. A block diagram of this sytem appears in Fig. 2. Note that 
we made use of N + m + I neurons and O(Nm) oonnections. 
As for the continuous time case (with memories on the unit 
sphere) we wiII get the equations: 
217 
x. + 2m Vx = LN W iq , 
 i j j 
N N 
j = N  jixl, 5 =  X, 
i=l i=1 
-(- + 1) 
qj -- (1 + 5 - 2fj) , 
! - i _4 N (3a) 
 _< j _< m (st,) 
  j J m (Sc) 
j=l 
with similar interpretation (here there is no 'control' register as 
all components are updated continuously). 
Control Register 
\ / / Intermediate 
' , \l/ ~ / Neurons 
I _  I-q , l/,_' w,, / It,l IT, Auxiliary Neurons 
.-dij" 
' 7 2 
'--I_LJ - w.,.. I I lr,l 
s 
L._.egend 
l -,o 
Delay Unit (Neuron) 
 c=O 
Synoptic Switch (0 = i 2 c=l ) 
i--..  Synoptic Swilch (O:{_ii c:O 
tc c:l ) 
li' ? ComputotionUn,t(O-1,2__(I-sign(i2-il ))) 
F_[igure 2. Neural Network Implementorion 
218 
REFERENCES 
Y.Y. Hopfleld, 'Neural Networks and Physical Systems with 
Emergent Collective Computational Abilities', Proc. Nat. Acad. 
Sci. U.S.A., Vol. 79 (1982), pp. 2554-2558. 
R.Y. McEliece, et al., 'The Capacity of the Hop field Associative 
Memory', IEEE' Trans. on Inf. Theory, Vol. IT-33 (1987), pp. 461- 
482. 
A. Dembo, 'On the Capacity of the Hop field Memory', submitted, 
IEEB Trans. on Inf. Theory. 
Kohonen, T., Self Organization and Associative Memory, Springer, 
Berlin, 1984. 
Dembo, A. and Zeitouni, 0., General Potential Surfaces and Neural 
Networks, submitted, Phys. Rev. A. 
Bachmann, C.M., Cooper, L.N., Dembo, A. and Zeitouni, 0., A 
relazation Model for Memory with high storage density, to appear, 
Proc. Natl. Ac. Science. 
