Phase Diagram and Storage Capacity of 
Sequence Storing Neural Networks 
A. Diiring 
Dept. of Physics 
Oxford University 
Oxford OX1 3NP 
United Kingdom 
a.during 1 @physics.oxford.ac.uk 
A. C. C. Coolen 
Dept. of Mathematics 
King's College 
London WC2R 2LS 
United Kingdom 
tcoolen @mth.kcl.ac.uk 
D. Sherrington 
Dept. of Physics 
Oxford University 
Oxford OX1 3NP 
United Kingdom 
d.sherrington 1 @physics.oxford.ac.uk 
Abstract 
We solve the dynamics of Hopfield-type neural networks which store se- 
quences of patterns, close to saturation. The asymmetry of the interaction 
matrix in such models leads to violation of detailed balance, ruling out an 
equilibrium statistical mechanical analysis. Using generating functional 
methods we derive exact closed equations for dynamical order parame- 
ters, viz. the sequence overlap and correlation and response functions, 
in the limit of an infinite system size. We calculate the time translation 
invariant solutions of these equations, describing stationary limit-cycles, 
which leads to a phase diagram. The effective retarded self-interaction 
usually appearing in symmetric models is here found to vanish, which 
causes a significantly enlarged storage capacity of acm 0.269, com- 
pared to acm 0.139 for Hopfield networks s[oring static patterns. Our 
results are tested against extensive computer simulations and excellent 
agreement is found. 
212 A. Diring, A. C. C. Coolen and D. Sherrington 
1 INTRODUCTION AND DEFINITIONS 
We consider a system of N neurons or(t) = {ai(t) = +1}, which can change their states 
collectively at discrete times (parallel dynamics). Each neuron changes its state with a 
l[1-tanh/ai(t)[j Jijo'j(t)+Oi(t)]], so that the transition matrix is 
probability Pi (t) --  
W[o- + 1)1o')1 = 
N 
H e/rr'(s+l)[Y-;/v--1 d''rr3(s)+O'(s)]-ln2csh(fl[Y= 
i=1 
(1) 
with the (non-symmetric) interaction strengths Jij chosen as 
1 P 
Jij = NZ +1 
-- i sj, (2) 
=1 
The ' represent components of an ordered sequence of patterns to be stored 1. The gain 
parameter/ can be interpreted as an inverse temperature governing the noise level in the 
dynamics (1) and the number of patterns is assumed to scale as N, i.e. p - oN. If 
the interaction matrix would have been chosen symmetrically, the model would be acces- 
sible to methods originally developed for the equilibrium statistical mechanical analysis 
of physical spin systems and related models [1, 2], in particular the replica method. For 
the nonsymmetric interaction matrix proposed here this is ruled out, and no exact solution 
exists to our knowledge, although both models have been first mentioned at the same time 
and an approximate solution compatible with the numerical evidence at the time has been 
provided by Amari [3]. The difficulty for the analysis is that a system with the interactions 
(2) never reaches equilibrium in the thermodynamic sense, so that equilibrium methods 
are not applicable. One therefore has to apply dynamical methods and give a dynamical 
meaning to the notion of the recall state. Consequently, we will for this paper employ the 
dynamical method of path integrals, pioneered for spin glasses by de Dominicis [4] and 
applied to the Hopfield model by Rieger et al. [5]. 
We point out that our choice of parallel dynamics for the problem of sequence recall is 
deliberate in that simple sequential dynamics will not lead to stable recall of a sequence. 
This is due to the fact that the number of updates of a single neuron per time unit is not 
a constant for sequential dynamics. Schemes for using delayed asymmetric interactions 
combined with sequential updates have been proposed (see e.g. [6] for a review), but are 
outside the scope of this paper. 
Our analysis starts with the introduction of a generating functional Z[b] of the form 
z[p]=  p[o'(0),...,o-(t)le-iE<, "(s)'(s), (3) 
,(o) ..., (t) 
which depends on real fields {i(t)}. These fields play a formal role only, allowing for the 
identification of interesting order parameters, such as 
Upper (pattern) indices are understood to be taken modulo p unless otherwise stated. 
Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks 213 
for the average activation, response and correlation functions, respectively. Since this func- 
tional involves the probability p[cr(0),... , or(t)] of finding a 'path' of neuron activations 
{or(0),... , rr(t)}, the task of the analysis is to express this probability in terms of the 
macroscopic order parameters itself to arrive at a set of closed macroscopic equations. 
The first step in rewriting the path probability is to realise that (1) describes a one- 
step Markov process and the path probability is therefore just the product of the 
single-time transition probabilities, weighted by the probability of the initial state: 
p[cr(O),.. cr(t)] p(cr (0))t- 
 , = I-Is=o W[cr(s + 1)let(s)]. Furthermore, we will in the 
course of the analysis frequently isolate interesting variables by introducing appropriate 
5-functions, such as 
i----1 j=l 
f dh(s) dfa(s) N 
i=1 
The variable hi(t) can be interpreted as the local field (or presynaptic potential) at site i 
and time t and their introduction transforms z[Ip] into 
Z[Ip] =  p(cr(0))/ 
(o)...(t) 
t--1 
dh dfa [e3a(s+l).h(s)_i 
II 
In 2 cosh(hi(s)) 
ei(h(s).h(s)-N- X:. 3 h,(s) X:, ,"+ya,(s)-h(s).O(s)-p(s).a(s))]. 
(4) 
This expression is the last general form of z[Ip] we consider. To proceed with the analysis, 
we have to make a specific ansatz for the system behaviour. 
2 DYNAMIC MEAN FIELD THEORY 
As sequence recall is the mode of operation we are most interested in, we make the 
ansatz that, for large systems, we have an overlap of order (9 (N ) between the pattern 
s at time s, and that all other patterns are overlapping with order (9 (N -/2) at most. 
Accordingly, we introduce the macroscopic order parameters for the condensed pattern 
re(s) = N -1 Y-i ]o'i(s) and for the quantity k(s) = N - Y-i ]i(s), and their noncon- 
densed equivalents yU(s) = N -1/2 Y-i o'i($) and x(s) = N -1/2 Y,i i($) (I   $), 
where the scaling ansatz is reflected in the normalisation constants. Introducing these ob- 
jects using 5 functions, as with the local fields hi(s), removes the product of two patterns 
in the last line of eq. (4), so that the exponent will be linear in the pattern bits. 
Because macroscopic observables will in general not depend on the microscopic realisation 
of the patterns, the values of these observables do not change if we average z[Ip] over the 
realisations of the patterns. Performing this average is complicated by the occurrence of 
some patterns in both the condensed and the noncondensed overlaps, depending on the 
current time index, which is an effect not occurring in the standard Hopfield model. Using 
some simple scaling arguments, this difficulty can be removed and we can perform the 
average over the noncondensed patterns. The disorder averaged Z[Ip] acquires the form 
Z[b] = / dmdriadk dc dqd/ldQ d( dK d e N('I'['"]+a't'"]+at'"])+(N/) (5) 
214 A. Daring, A. C. C. Coolen and D. Sherrington 
where we have introduced the new observables q(s, s') -- 1IN Y']i o' (s)o'i (s'), Q( s, s') = 
1IN 5-i hi(s)hi(st), and K(s,s') = 1IN 5-i O'i($)hi($t) , and their corresponding conju- 
gate variables. The functions in the exponent turn out to be 
[m, fn, k, fc, q,(:l, Q, (, K,] = i E [rh(s)m(s)+ [c(s)k(s)-m(s)k(s)] + 
i E [O(s, s')q(s, s') + O(s, s')Q(s, s')+ t(s, s')K(s, s')l, (6) 
1 [ 
 [m,k,l,Q,]= . In 
.(o)...(t) 
Pi(a(O)) /l<t [dh(s)2(s) ] 
eel<, [fia(s+l)h(s)-In2cosh(2h(s))] X 
e - E.,,< [(s,s ),(s),(s )+q(s,s )h(s)h(')+R(,'),()hl')] x 
e i Z,<, (*)[a(s)-o,(,)-i(s)C +*] -i Z<, -(s)[m(*)C +,(s)] ], 
(7) 
and 
[q, Q, Q] =  In [ ((v 
--.>t  [uo(s)Q(s,s)uo(s)+u(s)K(s',s)vo(s')+vo(s)K(s,s)uo(s)+v.(s)q(s,s')v.(s')]. 
(8) 
The first of these expressions is just a result of the introduction of 5 functions, while the 
second will turn out to represent a probability measure given by the evolution of a single 
neuron under prescribed fields and the third reflects the disorder contribution to the local 
fields in that single neuron measure 2. We have thus reduced the original problem involving 
N neurons in a one-step Markov process to one involving just a single neuron, but at the 
cost of introducing two-time observables. 
3 DERIVATION OF SADDLE POINT EQUATIONS 
The integral in (5) will be dominated by saddle points, in our case by a unique saddle 
point when causality is taken into account. Extremising the exponent with respect to all 
occurring variables gives a number of equations, the most important of which give the 
physical meanings of three observables: q($, s') = C($, $'), K($, $') = iG($, $'), 
re(s) = lira 1 
m--,oc  E <oh($)> (9) 
with 
C($,$')= lim 1 
N'-'o  E <O'i(8)O'i(st)) G(8 s')= lim 1 
' 00i($') ' 
(lO) 
2We have assumed p(rr(0)) = 1-I, p,(a,(O)). 
Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks 215 
which are the single-site correlation and response functions, respectively. The overline... 
is taken to represent disorder averaged values. Using also additional equations arising from 
the normalisation Z[0] = 1, we can rewrite the single neuron measure (I, as 
/i<t [dh(s)dt(s)' [rr(s+l,h,s,-In cosh,/3h(s,,] 
(f[{er}]), = E i i' P(a(O))f[{er}]eEs<' 2 
o...a(t) 
x e i  <, h() [a()-0()-m(s)] -  a E, ,, <, R(s,' )h(s)h(') ( 11 ) 
with the short-hand R = t=o GttCGt' To simplify notation, we have here assumed 
that the initial probabilities p(ai(O)) are uniform and that the external fields Oi(s) are 
so-called staggered ones, i.e. Oi(s) = 0( +, which makes the single neuron measure 
i 
site-independent. This single neuron measure (11) represents the essential result of our 
calculations and is already properly normalised (i.e. (1). = 1). 
When one compares the present fo of the single neuron measure with that obtained for 
the symmetric Hopfield network, one finds in the latter model an additional term which 
coffesponds to a retarded self-interaction. The absence of such a term here suggests that 
the present model will have a higher storage capacity. It can be explained by the constant 
change of state of a large number of neurons as the network goes through the sequence, 
which prevents the build-up of microscopic memory of past activations. 
However, as is the case for the standard Hopfield model, the measure (11 ) is still too com- 
plicated to find explicit equations for the observables we are interested in. Although it is 
possible to evaluate the necessary integrals numerically, we instead concentrate on the inter- 
esting behaviour when transients have died out and time-translation invariance is present. 
4 STATIONARY STATE 
We will now concentrate on the behaviour of the network at the stage when transients have 
subsided and the system is on a macroscopic limit cycle. Then the relations 
re(s) = m C(s,s') -- C(s - s') G(s,s') - C(s - s'). (12) 
hold and also R(s, s') -- R(s - s'). We can then for simplicity shift the time origin 
to = -oc and the upper temporal bound to t = oc. Note, however, that this state is not 
to be confused with microscopic equilibrium in the thermodynamic sense. The stationary 
versions of the measure (11 ) for the interesting observables are then given by the following 
expressions (note that C(0) '- 1): 
m = / H dv(s)dw(s) eiV.w_w. Rw tanh film + 0 + av(0)] 
2- 
s 
C(r - 0) = / H dv(s) dw(s) eiV.W_W.mW x 
2' 
$ 
tanh [m + 19 + av(r)] tanh [rn + 19 + av(0)] 
G(r) = O&,, [1- f lIdv(s)dw(S) tanh2 (13) 
s 27r 
and we notice that the response function is now limited to a single time step, which again 
reflects the influence of the uncorrelated flips induced by the sequence recall. These equa- 
tions can be solved by separating the persistent and fluctuating parts of C(r) and/(r), 
216 A. Dtiiring, A. C. C. Coolen and D. Sherrington 
C(r) = q + (r), R(r) = r +/(r), lim 
Doing so eventually leads us to the coupled equations 
p -- [1 -- f12(1 -- ej)2] -1 
m=fDz tanhfl[m+0+z] 
q=f 
8(r) = lim /(r) = 0. 
(14) 
(15) 
Dz tanh 2 [rn + 0 + Zv/-G-fi ] 
Dz Dxtanh/[rn+O+z xfO-+xa(1-q) p 
(16) 
(17) 
Note that the three equations (14--16) form a closed set, from which the persistent corre- 
lation q simply follows. 
5 PHASE DIAGRAM AND STORAGE CAPACITY 
1.0 
0.4 q 
0.2 
0.0 ' ' 
0.6 
T 
0.0 0.1 0.2 0.3 
Figure 1: Phase diagram of the sequence storage network, in which one finds two phases: 
a recall phase (R), characterized by {m - 0, q > 0,  > 0}, and a paramagnetic phase 
(P), characterized by {m -- 0, q = 0,  > 0}. The solid line separating the two phases 
is the theoretical prediction for the (discontinuous) phase transition. The markers represent 
simulation results, for systems of N = 10,000 neurons measured after 2,500 iteration 
steps, and obtained by bisection in a. The precision in terms of a is at least Ac - 0.005 
(indicated by error bars); the values for T are exact. 
The coupled equations (14--17) can be solved numerically for 0 = 0 to find the area in 
the c--T plane where solutions m  0 -- corresponding to sequence recall -- exist. The 
boundary of this area describes the storage capacity of the system. This theoretical curve 
can then be compared with computer simulations directly performing the neural dynamics 
Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks 217 
given by (1) and (2). We show the result of doing both in the same accompanying diagram. 
We find that there are only two types of solutions, namely a recall phase R where m if- 0 
and q if- 0, and a paramagnetic phase where m -- q - 0. Unlike the standard Hopfield 
model, the present model does not have a spin glass phase with m -- 0 and q if- 0. The 
agreement between simulations (done here for N - 10,000 neurons) and theoretical re- 
sults is excellent and separate simulations of systems with up to N = 50,000 neurons to 
assess finite size effects confirm that the numerical data are reliable. 
6 DISCUSSION 
In this paper, we have used path integral methods to solve in the infinite system size limit 
the dynamics of a non-symmetric neural network model, designed to store and recall a se- 
quence of patterns, close to saturation. This model has been known for over a decade from 
numerical simulations to possess a storage capacity roughly twice that of the symmetric 
Hopfield model, but no rigorous analytic results were available. We find here that in con- 
trast to equilibrium statistical mechanical methods, which do not apply due to the absence 
of detailed balance, the powerful path integral formalism provides us with a solution and 
a transparent explanation of the increased storage capacity. It turns out that this higher ca- 
pacity is due to the absence of a retarded self-interaction, viz. the absence of microscopic 
memory of activations. 
The theoretically obtained phase diagram can be compared to the results of numerical sim- 
ulations and we find excellent agreement. Our confidence in this agreement is supported 
by additional simulations to study the effect of finite size scaling. Full details of the calcu- 
lations will be presented elsewhere [7]. 
References 
[1] Sherrington D and Kirkpatrick S 1975 Phys. Rev. Lett. 35 1972 
[2] Amir D J, Gutfreund H, and Sompolinsky H 1985 Phys. Rev. Lett. 55 1530 
[3] Amari S and Maginu K 1988 Neural Net,[,orks 1 63 
[4] de Dominicis G 1978 Phys. Rev. B 18 4913 
[5] Rieger H, Schreckenberg M, and Zittartz J 1988 J. Phys. A: Math. Gen. 21 L263 
[6] Kfihn R and van Hemmen J L 1991 Temporal Association ed E Domany, J L van 
Hemmen, and K Schulten (Berlin, Heidelberg: Springer) p 213 
[7] Dfiring A, Coolen A C C, and Sherrington D 1998 J. Phys. A: Math. Gen. 31 8607 
