Optimal signalling in Attractor Neural 
Networks 
Isaac Meilijson Eytan Ruppin * 
School of Mathematical Sciences 
Raymond and Beverly Sackler Faculty of Exact Sciences 
Tel-Aviv University, 69978 Tel-Aviv, Israel. 
Abstract 
In [Meilijson and Ruppin, 1993] we presented a methodological 
framework describing the two-iteration performance of Hopfield- 
like attractor neural networks with history-dependent, Bayesian 
dynamics. We now extend this analysis in a number of directions: 
input patterns applied to small subsets of neurons, general con- 
nectivity architectures and more efficient use of history. We show 
that the optimal signal (activation) function has a slanted sigmoidal 
shape, and provide an intuitive account of activation functions with 
a non-monotone shape. This function endows the model with some 
properties characteristic of cortical neurons' firing. 
I Introduction 
It is well known that a given cortical neuron can respond with a different firing pat- 
tern for the same synaptic input, depending on its firing history and on the effects 
of modulator transmitters (see [Connors and Gutnick, 1990] for a review). The time 
span of different channel conductances is very broad, and the influence of some ionic 
currents varies with the history of the membrane potential [Lytton, 1991]. Moti- 
vated by the history-dependent nature of neuronal firing, we continue.our previous 
investigation [Meilijson and Ruppin, 1993] (henceforth, M & R) describing the per- 
formance of Hopfield-like attractor neural networks (ANN) [Hopfield, 1982] with 
history-dependent dynamics. 
*Currently in the Dept. of Computer science, University of Maryland 
485 
486 Meilijson and Ruppin 
Building upon the findings presented in M & R, we now study a more general 
framework: 
We differentiate between 'input' neurons receiving the initial input signal 
with high fidelity and 'background' neurons that receive it with low fidelity. 
Dynamics now depend on the neuron's history of firing, in addition to its 
history of input fields. 
The dependence of ANN performance on the network architecture can be 
explicitly expressed. In particular, this enables the investigation of cortical- 
like architectures, where neurons are randomly connected to other neurons, 
with higher probability of connections formed between spatially proximal 
neurons [Braitenberg and Schuz, 1991]. 
Our goal is twofold: first, to search for the computationally most efficient history- 
dependent neuronal signal (firing) function, and study its performance with relation 
to memoryless dynamics. As we shall show, optimal history-dependent dynamics 
are indeed much more efficient than memoryless ones. Second, to examine the 
optimal signal function from a biological perspective. As will shall see, it shares 
some basic properties with the firing of cortical neurons. 
2 The model 
Our framework is an ANN storing m + 1 memory patterns ,2,...,m+x, each 
an N-dimensional vector. The network is composed of N neurons, each of which 
is randomly connected to K other neurons. The (m + 1)N memory entries are 
independent with equally likely :t:l values. The initial pattern X, signalled by 
L(_< N) initially active neurons, is a vector of :t:l's, randomly generated from one 
of the memory patterns (say  = m+l) such that P(Xi = i) = 1+ for each of 
2 
1+, 
the L initially active neurons and P(Xi = i) = 2 for each initially quiescent 
(non-active) neuron. Although e,/  [0, 1) are arbitrary, it is useful to think of e 
as being 0.5 (corresponding to an initial similarity of 75%) and of/ as being zero 
- a quiescent neuron has no prior preference for any given sign. Let ax = rn/nx 
denote the initial memory load, where nx = LK/N is the average number of signals 
received by each neuron. 
The notion of 'iteration' is viewed as an abstraction of the overall dynamics for 
some length of time, during which some continuous input/output signal function 
(such as the conventional sigmoidal function) governs the firing rate of the neuron. 
We follow a Bayesian approach under which the neuron's signalling and activation 
decisions are based on the a-posteriori probabilities assigned to its two possible true 
memory states, :t:l. 
Initially, neuron i is assigned a prior probability hi  - P( = llXi Ii ()) = 14-e 
-- ' 2 
or 1___ which is conveniently expressed as hi  - x where, letting g(t) - 
-- lq.e_2gi(O ) ' 
- log l+t 
2 1-t' 
#i(o) _ ( #(e)Xi if i is active 
#()Xi if i is silent 
Optimal Signalling in Attractor Neural Networks 487 
The input field observed by neuron i as a result of the initial activity is 
N 
nl j=l 
where Ij () = 0, 1 indicates whether neuron j has fired in the first iteration, Iij = 0, 1 
indicates whether a connection exists from neuron j to neuron i and Wij denotes 
its magnitude, given by the Hopfield prescription 
rn+l 
(2) 
As a result of observing the input field fi(1), which is approximately normally 
distributed (given i,Xi and Ii()), neuron i changes its opinion about {i - 1} 
from hi  to 
( ) 1 
Ai (1) = P i = llXi,//(1) fi(1) = (3) 
' 1 q- e -201() ' 
expressed in terms of the (additive) generalized field gi(D = gi  + fi (). 
We now get to the second iteration, in which, as in the first iteration, some of the 
neurons become active and signal to the network. We model the signal function 
neuron i emits as h(gi(), Xi, Ii()). The field observed by neuron i (with n2 updating 
neurons per neuron) is 
N 
i,(2) = 1 
n2 j=l 
(4) 
on the basis of which neuron i computes its posterior belief A,  = P(i = 
11Xi , Ii (), fi (), fi ) and expresses its final choice of sign as Xi (2) = sign(Ai  - 
0.5). The two-iteration performance of the network is measured by the final simi- 
larity 
S! = 1 + ! = P(Xi(2 ) = i) = 1 +  E7=1Xj(2)j (5) 
2 2 
3 Analytical results 
The goals of our analysis have been: A. To present an expression for the performance 
under arbitrary architecture and activity parameters, for general signal functions 
h0 and h. B. Use this expression to find the best choice of signal functions which 
maximize performance. We show the following: 
The neuron's final decision is given by 
Xi  = Sign [(Ao + Boli(D)Xi + Alfi (D + A2fi (2)] (6) 
for some constants A0, B0, A and A2. 
488 Meilijson and Ruppin 
The performance achieved is 
where, for some Aa > 0 
n* nl q- mA3 ' 
2 + 2  x x 
d  is the standard normal cumulative distribution function. 
(7) 
(8) 
(9) 
The optimal analog signal function, illustrated in figure 1, is 
ho = h(gi (x), q,1, 0)= 
h = h(gi(X),+l,1)= R(gi(1),e)- 1 
where, for some A4 > 0 and A5 > 0, R(s,t) = A4 tanh(s) - As(s - g(t)). 
(10) 
(a) (b) 
4.0 
-- Silent neurons 
.... Acve neurons 
...... 
2.0 
V 
0.0 
-V 
..0 
Signal 
,5 i Input field 
-4.0 
-5.0 -3.0 -1.0 1.0 3.0 5.0 
Input field 
Figure 1: (a) A typical plot of the slanted sigmoid, Network parameters are N - 
5000, K - 3000, n = 200 and m = 50. (b) A sketch of its discretized version. 
The nonmonotone form of these functions, illustrated in figure 1, is clear. Neurons 
that have already signalled +1 in the first iteration have a lesser tendency to send 
positive signals than quiescent neurons. The signalling of quiescent neurons which 
receive no prior information (5 = 0) has a symmetric form. The optimal signal is 
shown to be essentially equal to the sigmoid modified by a correction term depending 
only on the current input field. In the limit of low memory load (e/vfG' --. oo), the 
best signal is simply a sigmoidal function of the generalized input field. 
Optimal Signalling in Attractor Neural Networks 489 
To obtain a discretized version of the slanted sigmoid, we let the signal be sign(h(y)) 
as long as Ih(y)l is big enough - where h is the slanted sigmoid. The resulting signal, 
as a function of the generalized field, is (see figure la and lb) 
+1 y < fix (j) or/?4 (i) < y < fi5 (j) 
hj(y) = -1 y > fi6 (i) or fi2 (j) < y < fia (j) (11) 
otherwise 
where -oo < fil  < 2  <_ fi3  < 4  5 5  < 6  <  and - < 1 (1) < 
2(1) g a()) < 4(1)  (1) < () <  define, respectively, the firing pattern of 
the neurons that were silent or active in the first iteration. To find the best such 
discretized version of the optimal signal, we search numerically for the activity level 
v which mimizes performance. Every activity level v, used as a threshold on Ih(u)l, 
defines the (at most) twelve parameters O) (which are identified numerically via 
the Newton-aphson method)  illustrated in figure lb. 
4 Numerical Results 
(a) (b) 
Ilty-baed tgnalling 
] ..... Discretlzed ;tnalllng 
..... Analog optimal tgnalllng 
-- Discrete signalling 
......... Analog signalling 
0.,5 , A , I ,  .  , 0.940 , . I , t , I . F , 
0.0 1000.0 2000.0 30(10.0 4000.0 5000.0 0.0 1000.0 2000.0 3000.0 4000.0 5000.0 
K K 
Figure 2: Two-iteration performance as a function of connectivity K. (a) Network 
parameters are N = 5000, n = 200, and m = 50. All neurons receive their input 
state with similar initial overlap e =/ = 0.5. (b) Network parameters are N = 5000, 
m=50, nx=200, e=0.5andS=0. 
Using the formulation presented in the previous section, we investigated numeri- 
cally the two-iteration performance achieved in several network architectures with 
optimal analog signalling and its discretization. Already in small scale networks of a 
few hundred neurons our theoretical calculations correspond fairly accurately with 
490 Meilijson and Ruppin 
0.970 
simulation results. First we repeat the example of a corticM-like network investi- 
gated in M & R, but now with optimal analog and discretized signalling. The nearly 
identical marked superiority of optimal analog and discretized dynamics over the 
previous, posterior-probability-based signalling is evident, as shown in figure 2 (a). 
While low activity is enforced in the first iteration, the number of neurons allowed 
to become active in the second iteration is not restricted, and best performance is 
typically achieved when about 70% of the neurons in the network are active (both 
with optimal signalling and with the previous, heuristic signalling). 
Figure 2 (b) displays the performance achieved in the same network, when the input 
signal is applied only to the small fraction (4%) of neurons which are active in the 
first iteration (expressing possible limited resources of input information). We see 
that (for K > 1000) near perfect final similarity is achieved even when the 96% 
initially quiescent neurons get no initial clue as to their true memory state, if no 
restrictions are placed on the second iteration activity level. 
 1 and contrasted the case (nx = 
Next we have fixed the value of o: = vr  - , 
200, e = 0.5) of figure 2 (b) with (nl = 50, e = 1). The overall initial similarity ander 
(hi = 50, e = 1) is only half its value under (hi = 200, e = 0.5). In spite of this, we 
have found that it achieves a slightly higher final similarity. This supports the idea 
that the input pattern should not be applied as the conventional uniformly distorted 
version of the correct memory, but rather as a less distorted pattern applied only 
to a small subset of the neurons. 
(a) (b) 
-- I signaling 
.... Analog signalllno 
0.94 .0 2000.0 4000.0 2)00.0 8000.0 10000.0 O. 0 1000.0 2000.0 
N K 
I ! [ / .... Mul-Iayered nee,,vok 
,/,) / ........ Lower bound peremnce 
Figure 3: (a) Two-iteration performance in a full-activity network as a function of 
network size N. Network parameters are nl = K -- 200, m = 40 and e - 0.5. 
(b) Two-iteration performance achieved with various network architectures, as a 
function of the network connectivity K. Network parameters are N = 5000, nx = 
200, m = 50, e = 0.5 and/ = 0. 
Figure 3 (a) illustrates the performance when connectivity and the number of sig- 
Optimal Signalling in Attractor Neural Networks 491 
hals received by each neuron are held fixed, but the network size is increased. 
A region of decreased performance is evident at mid-connectivity (K -, N/2) 
values, due to the increased residual variance. Hence, for neurons capable of 
forming K connections on the average, the network should either be fully con- 
nected or have a size N much larger than K. Since (unavoidable eventually) 
synaptic deletion would sharply worsen the performance of fully connected net- 
works, cortical ANNs should indeed be sparsely connected. The final similar- 
ity achieved in the fully connected network (with N = K = 200) should be 
noted. In this case, the memory load (0.2) is significantly above the criti- 
cal capacity of the Hopfield network, but optimal history-dependent dynamics 
still manage to achieve a rather high two-iterations similarity (0.975) from ini- 
tial similarity 0.75. This is in agreement with the findings of [Morita, 1993, 
Yoshizawa et al., 1993], who show that nonmonotone dynamics increase capacity. 
Figure 3 (b) illustrates the performance achieved with various network architec- 
tures, all sharing the same network parameters N, K, m and input similarity pa- 
rameters n, ,/, but differing in the spatial organization of the neurons' synapses. 
As evident, even in low-activity sparse-connectivity conditions, the decrease in per- 
formance with Gaussian connectivity (in relation, say, to the upper bound) does not 
seem considerable. Hence, history-dependent ANNs can work well in a cortical-like 
architecture. 
5 Summary 
The main results of this work are as follows: 
The Bayesian framework gives rise to the slanted-sigmoid as the optimal 
signal function, displaying the non monotone shape proposed by [Morita, 
1993]. It also offers an intuitive explanation of its form. 
Martingale arguments show that similarity Under Bayesian dynamics per- 
sistently increases. This makes our two-iteration results a lower bound for 
the final similarity achievable in ANNs. 
The possibly asymmetric form of the function, where neurons that have 
been silent in the previous iteration have an increased tendency to fire in 
the next iteration versus previously active neurons, is reminiscent of the 
hi-threshold phenomenon observed in biological neurons [Tam, 1992]. 
 In the limit of low memory load the best signal is simply a sigmoidal func- 
tion of the generalized input field. 
In an efficient associative network, input patterns should be applied with 
high fidelity on a small subset of neurons, rather than spreading a given 
level of initial similarity as a low fidelity stimulus applied to a large subset 
of neurons. 
If neurons have some restriction on the number of connections they may 
form, such that each neuron forms some K connections on the average, then 
efficient ANNs, converging to high final similarity within few iterations, 
should be sparsely connected. 
492 Meilijson and Ruppin 
With a properly tuned signal function, cortical-like Gaussian-connectivity 
ANNs perform nearly as well as randomly-connected ones. 
Investigating the 0, 1 (silent, firing) formulation, there seems to be an in- 
terval such that only neurons whose field values are greater than some low 
threshold and smaller than some high threshold should fire. This seemingly 
bizarre behavior may correspond well to the behavior of biological neurons; 
neurons with very high field values have most probably fired constantly in 
the previous 'iteration', and due to the effect of neural adaptation are now 
silenced. 
References 
[Braitenberg and Schuz, 1991] V. Braitenberg and A. Schuz. Anatomy of the Cor- 
tex: Statistics and Geometry. Springer-Verlag, 1991. 
[Connors and Gutnick, 1990] B.W. Connors and M.J. Gutnick. Intrinsic firing pat- 
terns of diverse neocortical neurons. TINS, 13(3):99-104, 1990. 
[Hopfield, 1982] J.J. Hopfield. Neural networks and physical systems with emergent 
collective abilities. Proc. Nat. Acad. Sci. USA, 79:2554, 1982. 
[Lytton, 1991] W. Lytton. Simulations of cortical pyramidal neurons synchronized 
by inhibitory interneurons. J. Neurophysiol., 66(3):1059-1079, 1991. 
[Meilijson and Ruppin, 1993] I. Meilijson and E. Ruppin. History-dependent at- 
tractor neural networks. Network, 4:1-28, 1993. 
[Morita, 1993] M. Morita. Associative memory with nonmonotone dynamics. Neu- 
ral Networks, 6:115-126, 1993. 
[Tam, 1992] David C. Tam. Signal processing in multi-threshold neurons. In 
T. McKenna, J. Davis, and S.F. Zornetzer, editors, Single neuron computation, 
pages 481-501. Academic Press, 1992. 
[Yoshizawa et al., 1993] S. Yoshizawa, M. Morita, and S.-I. Amari. Capacity of as- 
sociative memory using a nonmonotonic neuron model. Neural Networks, 6:167- 
176, 1993. 
