Modern Analytic Techniques to Solve the 
Dynamics of Recurrent Neural Networks 
A.C.C. Coolen 
Dept. of Mathematics 
King's College London 
Strand, London WC2R 2LS, U.K. 
S.N. Laughton 
Dept. of Physics - Theoretical Physics 
University of Oxford 
I Keble Road, Oxford 0X1 3NP, U.K. 
D. Sherrington * 
Center for Non-linear Studies 
Los Alamos National Laboratory 
Los Alamos, New Mexico 87545 
Abstract 
We describe the use of modern analytical techniques in solving the 
dynamics of symmetric and nonsymmetric recurrent neural net- 
works near saturation. These explicitly take into account the cor- 
relations between the post-synaptic potentials, and thereby allow 
for a reliable prediction of transients. 
I INTRODUCTION 
Recurrent neural networks have been rather popular in the physics community, 
because they lend themselves so naturally to analysis with tools from equilibrium 
statistical mechanics. This was the main theme of physicists between, say, 1985 
and 1990. Less familiar to the neural network community is a subsequent wave of 
theoretical physical studies, dealing with the dynamics of symmetric and nonsym- 
metric recurrent networks. The strategy here is to try to describe the processes 
at a reduced level of an appropriate small set of dynamic macroscopic observables. 
At first, progress was made in solving the dynamics of extremely diluted models 
(Derrida et al, 1987) and of fully connected models away from saturation (for a 
review see (Coolen and Sherrington, 1993)). This paper is concerned with more 
recent approaches, which take the form of dynamical replica theories, that allow 
for a reliable prediction of transients, even near saturation. Transients provide the 
link between initial states and final states (equilibrium calculations only provide 
* On leave from Department of Physics - Theoretical Physics, University of Oxford 
254 A.C.C. COOLEN, S. N. LAUGHTON, D. SHERRINGTON 
information on the possible final states). In view of the technical nature of the 
subject, we will describe only basic ideas and results for simple models (full details 
and applications to more complicated models can be found elsewhere). 
2 RECURRENT NETWORKS NEAR SATURATION 
Let us consider networks of N binary neurons ai E {-1, 1}, where neuron states 
are updated sequentially and stochastically, driven by the values of post-synaptic 
potentials hi. The probability to find the system at time t in state er - (al,..., 
is denoted by pt(er). For the rates wi(er) of the transitions ai --> --ai and for the 
potentials hi () we make the usual choice 
1 [l--ai tanh[hi()]] hi() =  JijGj 
= 
ji 
The parameter  controls the degree of stochticity: the  = 0 dynamics is com- 
pletely random, whereas for  =  we find the deterministic rule ai  sgn[hi(a)]. 
The evolution in time of Pt() is given by the mter equation 
N 
Pt() =  t(Fk)wk(Fk) - pt()w()l (1) 
k=l 
with Fk(a) = (a,...,k,...,aN). For symmetric models, where 
for all (ij), the dynamics (1) leads asymptotically to the Boltzmann equilibrium 
distribution peq(a)  exp [-E(a)], with the ener E(a) = - i<j ai&jaj. 
For sociative memory models with Hebbian-type synapses, required to store a set 
of p random binary patterns u = (,..., ), the relevant macroscopic observable 
is the overlap m between the current microscopic state  and the pattern to be 
1 1 
retrieved (say, pattern 1)' m = W i i ai. Each post-synaptic potential can now 
be written as the sum of a simple signal term and an interference-noise term, e.g. 
p=aN 
1 1 
Jij N    = J j (2) 
All complications arise from the noise terms. 
The 'Local Chaos Hypothesis' (LCH) consists of suming the noise terms to be 
independently distributed Gaussian variables. The macroscopic description then 
consists of the overlap m and the width A of the noise distribution (Amari and 
Maginu, 1987). This, however, works only for states near the nominated pattern, 
see also (Nishimori and Ozeki, 1993). In reality the noise components in the po- 
tenrials have hr more complicated statistics 1. Due to the build up of correlations 
between the system state d the non-nominated patterns, the noise components 
can be highly correlated and described by bi-modal distributions. Another approach 
involves a description in terms of correlation- and response functions (with two time- 
arguments). Here one builds a generating functional, which is a sum over all possible 
trajectories in state space, averaged over the distribution of the non-nominated pat- 
terns. One finds equations which are exact for N  , but, unfortunately, also 
rather complicated. For the typical neural network models solutions are known 
only in equilibrium (Rieger et al, 1988); information on transients has so hr only 
been obtained through cumbersome approximation schemes (Horner et al, 1989). 
We now turn to a theory that takes into account the non-trivial statistics of the 
post-synaptic potentials, yet involves observables with one time-arment only. 
Correlations e negligible only in extremely diluted (ymmetric) networks (Derrida 
et al, 1987), and in networks with independently drawn (ymmetric) random synapses 
Modern Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks 255 
3 DYNAMICAL REPLICA THEORIES 
The evolution of macroscopic observables 12(rr) = (Ft(rr),..., fiK(rr)) can be de- 
scribed by the so-called Kramers-Moyal expansion for the corresponding probability 
distribution Pt (12) (derived directly from (1)). Under certain conditions on the sen- 
sitivity of 12 to single-neuron transitions ai -> -ai, one finds on finite time-scales 
and for N --> oc the macroscopic state 12 to evolve deterministically according to: 
d 12 = Err pt(rr)5 [12-12(rr)] Y'-i wi(rr) [12(Firr)- 12(rr)] (3) 
d t Y' rr p t ( rr ) 5 [12-12(rr)] 
This equation depends explicitly on time through Pt (rr)- However, there are two nat- 
ural ways for (3) to become autonomous: (i) by the term Y'-i wi(rr) [12(Firr)- 12(rr)] 
depending on rr only through 12(rr) (as for attractor networks away from satura- 
tion), or (ii) by (1) allowing for solutions of the form Pt(rr) = ft[12(rr)] (as for 
extremely diluted networks). In both cases Pt(rr) drops out of (3). Simulations fur- 
ther indicate that for N --> c the macroscopic evolution usually depends only on 
the statistical properties of the patterns {u}, not on their microscopic realisation 
('self-averaging'). This leads us to the following closure assumptions: 
1. Probability equipartitioning in the 12 subshells of the ensemble: pt(rr)  
5 [12t- 12(rr)]. If 12 indeed obeys closed equations, this assumption is safe. 
2. Self-averaging of the 12 flow with respect to the microscopic details of the 
non-nominated patterns: --12 a 
dt -- (12)patt- 
Our equations (3) are hereby transformed into the closed set: 
_12 = (Err 5 [12-12(rr)] Ei wi(rr)[12(Firr)- 12(rr)])pt t 
dt 
Err 5 [12-12(rr)] 
The final observation is that the tool for averaging fractions is replica theory: 
d12= lim lim y (y'wi(rr ) [12(Firr)-12(rr)] HS[12-12(rr)])ptt (4) 
dt n-O 
rr...rr' i 
The choice to be made for the observables 12(rr), crucial for the closure assumptions 
to make sense, is constrained by requiring the theory to be exact in specific limits: 
exactness for a -> 0: 12 = (m,...) 
exactness for t --> oc: 12 = (E,...) (for symmetric models only) 
4 SIMPLE VERSION OF THE THEORY 
For the Hopfield model (2) the simplest two-parameter theory which is exact for a - 
0 and for t -> oc is consequently obtained by choosing 12 - (m, E). Equivalently 
we can choose 12 = (m, r), where r(rr) measures the 'interference energy': 
i i i 1 
= -- i ai E 2 a 
i /.t>l i 
The result of working out (4) for 12 = (m, r) is: 
d / 
rn= dz Dm,r[z]tanh3(m+z)-m 
i d i /dz Dm,r[z]ztanh3(m+z)+ 1-r 
256 A.C.C. COOLEN, S. N. LAUGHTON, D. SHERRINGTON 
15 
7' 
/ 
/ 
/ 
o 
o 
m 
Figure 1: Simulations (N = 32000, dots) versus simple RS theory (solid lines), for 
a - 0.1 and/3 = c. Upper dashed line: upper boundary of the physical region. 
Lower dashed line: upper boundary of the RS region (the AT instability). 
in which D,,r[z] is the distribution of 'interference-noise' terms in the PSP's, for 
which the replica calculation gives the outcome (in so-called RS ansatz): 
D,,r[z] = e 2 (a+z)2 a  - 
2 1- Dy tanh ,Xy 4-(A4-Z) ap r 4-1 
2 1- Dy tanh y  :+(A-Z)r-  
1 1.2 
with Dy = [2r]-e-V ely, A = apr-A2/p and A = pv/[1-p(1-q)]- and with 
the remaining parameters {q,/z, p} to be solved from the coupled equations: 
m= Dy tanh[Ay+/z] q = Dy tanh2[Ay+/z] r = [l_p(l_q)]: 
Here we only give (partly new) results of the calculation; details can be found 
in (Coolen and Sherrington, 1994). The noise distribution is not Gaussian (in 
agreement with simulations, in contrast to LCH). Our simple two-parameter theory 
is found to be exact for t  0, t - oc and for a - 0. Solving numerically the 
dynamic equations leads to the results shown in figures 1 and 2. We find a nice 
agreement with numerical simulations in terms of the flow in the (ra,r) plane. 
However, for trajectories leading away from the recall state m - 1, the theory 
fails to reproduce an overall slowing down. These deviations can be quantified by 
comparing cumulants of the noise distributions (Ozeki and Nishimori, 1994), or by 
applying the theory to exactly solvable models (Coolen and Franz, 1994). Other 
recent applications include spin-glass models (Coolen and Sherrington, 1994) and 
more general classes of attractor neural network models (Laughton and Coolen, 
1995). The simple two-parameter theory always predicts adequately the location of 
the transients in the order parameter plane, but overestimates the relaxation speed. 
In fact, figure 2 shows a remarkable resemblance to the results obtained for this 
model in (Horner et al, 1989) with the functional integral formalism; the graphs of 
m(t) are almost identical, but here they are derived in a much simpler way. 
Modem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks 257 
.8 
.4 
.2 
0 2 4 6 8 10 
t 
Figure 2: Simulations (N - 32000, dots) versus simple RS theory (RS stable: solid 
lines, RS unstable: dashed lines), now as functions of time, for a = 0.1 and  = oe. 
5 ADVANCED VERSION OF THE THEORY 
Improving upon the simple theory means expanding the set/2 beyond/2 = (m, E). 
Adding a finite number of observables will only have a minor impact; a qualitative 
step forward, on the other hand, results from introducing a dynamic order parameter 
function. Since the microscopic dynamics (1) is formulated entirely in terms of 
neuron states and post-synaptic potentials we choose for/2(er) the joint distribution: 
1 
D[4, h](o' ) -  Y([4-(i]([h--i(')] 
i 
This choice has the advantages that (a) both m and (for symmetric systems) E are 
integrals over D[(, hi, so the advanced theory automatically inherits the exactness 
at t: 0 and t = oc of the simple one, (b) it applies equally well to symmetric and 
nonsymmetric models and (c) as with the simple version, generalisation to models 
with continuous neural variables is straightforward. Here we show the result of 
applying the theory to a model of the type (1) with synaptic interactions: 
Jo J c C )yij ] 
Jij = Nijq- [cos()xijq-sin( 
xij = xji, yij =-Yi (independent random Gaussian variables) 
(describing a nominated pattern being stored on a 'messy' synaptic background). 
The parameter v controls the degree of synaptic symmetry (e.g. v - 0: lyremetric, 
v = r: anti-symmetric). Equation (4) applied to the observable D[4, h](er) gives: 
0 D 02 h .A[4, h; Dt] 
 t[4, h] = j2[1-(atanh(H))z>t] --Dt[(,h] + 
+ Dt[4, h][h-Jo(tanh(H))Dt] 
1 1 [1-4 tanh(/h)] Dt[4, h] 
+ [1 +4 tanh(h)] Dt[-,h] -  
258 A.C.C. COOLEN, S. N. LAUGHTON, D. SHERRINGTON 
-.2 
-.8 
] I ' 
I 
0 2 4 6 
Figure 3: Comparison of simulations (N = 8000, solid line), simple two-parameter 
theory (RS stable: dotted line, RS unstable: dashed line) and advanced theory 
(solid line), for the v = 0 (symmetric background) model, with J0 = 0,  = oc. 
Note that the two solid lines are almost on top of each other at the scale shown. 
0.5 
I I [ I , I I  I  
0 2 4 6 0 2 4 6 
I (asymmetric 
Figure 4: Advanced theory versus N = 5600 simulations in the v - {r 
background) model, with  - cx and J - 1. Solid: simulations; dotted: solving the 
RS diffusion equation. 
Modem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks 259 
with (f(a,H))D: Y'. fdH D[a,H]f(a,H). All complications are concentrated in 
the kernel A[, h; D], which is to be solved from a nontrivial set of equations emerg- 
ing from the replica formalism. Some results of solving these equations numerically 
are shown in figures 3 and 4 (for details of the calculations and more elaborate com- 
parisons with simulations we refer to (Laughton, Coolen and Sherrington, 1995; 
Coolen, Laughton and Sherrington, 1995)). It is clear that the advanced theory 
quite convincingly describes the transients of the simulation experiments, including 
the hitherto unexplained slowing down, for symmetric and nonsymmetric models. 
6 DISCUSSION 
In this paper we have described novel techniques for studying the dynamics of re- 
current neural networks near saturation. The simplest two-parameter theory (exact 
for t - 0, for t - oc and for ( - 0), which employs as dynamic order parameters 
the overlap with a pattern to be recalled and the total 'energy' per neuron, already 
describes quite accurately the location of the transients in the order parameter 
plane. The price paid for simplicity is that it overestimates the relaxation speed. 
A more advanced version of the theory, which describes the evolution of the joint 
distribution for neuron states and post-synaptic potentials, is mathematically more 
involved, but predicts the dynamical data essentially perfectly, as far as present 
applications allow us conclude. Whether this latter version is either exact, or just 
a very good approximation, still remains to be seen. 
In this paper we have restricted ourselves to models with binary neural variables, 
for reasons of simplicity. The theories generalise in a natural way to models with 
analogue neurons (here, however, already the simple version will generally involve 
order parameter functions as opposed to a finite number of order parameters). 
Ongoing work along these lines includes, for instance, the analysis of analogue and 
spherical attractor networks and networks of coupled oscillators near saturation. 
References 
B. Derrida, E. Gardner and A. Zippelius (1987), Europhys. Lett. 4:167-173 
A.C.C. Coolen and D. Sherrington (1993), in J.G. Taylor (ed.), Mathematical Ap- 
proaches to Neural Networks, 293-305. Amsterdam: Elsevier. 
S. Amari and K. Maginu (1988), Neural Networks 1:63-73 
H. Nishimori and T. Ozeki (1993), J. Phys. A 26:859-871 
H. Rieger, M. Schreckenberg and J. Zittartz (1988), Z. Phys. B 72:523-533 
H. Horner, D. Bormann, M. Frick, H. Kinzelbach and A. Schmidt (1989), Z. 'Phys. 
B 76:381-398 
A.C.C. Coolen and D. Sherrington (1994), Phys. Rev. E 49(3): 1921-1934 
H. Nishimori and T. Ozeki (1994), J. Phys. A 27:7061-7068 
A.C.C. Coolen and S. Franz (1994), J. Phys. A 27:6947-9954 
A.C.C. Coolen and D. Sherrington (1994), J. Phys. A 27:7687-7707 
S.N. Laughton and A.C.C. Coolen (1995), Phys. Rev. E 51:2581-2599 
S.N. Laughton, A.C.C. Coolen and D. Sherrington (1995), J. Phys. A (in press) 
A.C.C. Coolen, S.N. Laughton and D. Sherrington (1995), Phys. Rev. B (in press) 
