A Theory for Neural Networks with Time Delays 
Bert de Vries 
Department of Electrical Engineering 
University of Florida, CSE 447 
Gainesville, FL 32611 
Jose C. Principe 
Department of Electrical Engineering 
University of Florida, CSE 444 
Gainesville, FL 32611 
Abstract 
We present a new neural network model for processing of temporal 
patterns. This model, the gamma neural model, is as general as a 
convolution delay model with arbitrary weight kernels w(t). We 
show that the gamma model can be formulated as a (partially 
prewired) additive model. A temporal hebbian learning rule is 
derived and we establish links to related existing models for 
temporal processing. 
INTRODUCTION 
In this paper, we are concerned with developing neural nets with short term memory for 
processing of temporal patterns. In the literature, basically two ways have been 
reported to incorporate short-term memory in the neural system equations. The first 
approach utilizes reverberating (self-recurrent) units of type  = - ao (x) + e, that 
hold a trace of the past neural net states x(t) or the input e(t). Elman (1988) and 
Jordan (1986) have successfully used this approach. The disadvantage of this method 
is the lack of weighting flexibility in the temporal domain, since the system equations 
are described by first order dynamics, implementing a recency gradient (exponential 
for linear units). 
The second approach involves explicit inclusion of delays in the neural system 
equations. A general formulation for this type requires a time-dependent weight 
matrix W(t). In such a system, multiplicative interactions are substituted by temporal 
convolution operations, leading to the following system equations for an additive 
convolution model- 
t 
dx = $W(t-s)o(x(s))ds+e. (1) 
dt 
0 
162 
A Theory for Neural Networks with Time Delays 163 
Due to the complexity of general convolution models, only strong simplifications of 
the weight kernel have been proposed. Lang et. al. (1990) use a delta function kernel, 
K 
W(t) =  WtJ(t-tt), which is the core for the Time-Delay-Neural-Network 
k=0 
(TDNN). Tank and Hopfield (1987) prewire W(t) as a weighted sum of dispersive 
delay kernels, W(t) = W t( ) e = Wthl(t, xt). The kernels 
/=0 /=0 
ht (t, xt) are the integrands of the gamma function. Tank and Hopfield described a 
one-layer system for classification of isolated words. We will refer to their model as 
a Concentration-In-Time-Network (CITN). The system parameters were non- 
adaptive, although a Hebbian rule equivalent in functional differential equation form 
was suggested. 
In this paper, we will develop a theory for neural convolution models that are 
expressed through a sum of gamma kernels. We will show that such a gamma neural 
network can be reformulated as a (Grossberg) additive model. As a consequence, the 
substantial learning and stability theory for additive models is directly applicable to 
gamma models. 
2 
THE GAMMA NEURAL MODEL - FORMAL DERIVATION 
Consider the N-dimensional convolution model - 
t 
d = -ax+ WOy+ dsW(t-s)y(s) +e, 
o 
where x(t), y(t)=o(x) and e(t) are N-dimensional signals; W 0 is NxN and W(t) is 
NxNx [0, oo]. The weight matrix W 0 communicates the direct neural interactions, 
whereas W(t) holds the weights for the delayed neural interactions. We will now 
assume that W(t) can be written as a linear combination of normalized gamma 
kernels, that is, - 
K 
W(t) =  Wtgt(t), 
k=l 
where - 
t  - 1 e-tt, (4) 
g/ (t) = (k- 1) ! 
where t is a decay parameter and k a (lag) order parameter. If W(t) decays 
exponentially to zero for t  oo, then it follows from the completeness of Laguerre 
polynomials that this approximation can be made arbitrarily close (Cohen et. al., 
164 de ies and Principe 
1979). In other words, for all physical plausible weight kernels there is a K such that 
W(t) can be expressed as (3), (4). The following properties hold for the gamma 
kernels gt (t) - 
- [1] The gamma kernels are related by a set of linear homogeneous ODEs - 
dg 1 
= 
dg k 
-- = - ggt + ggt_ , k=2,..,K 
dt 
dgt k- 1 
- [2] The peak value ( dt - O) occurs at tp - g 
- [3] The area of the gamma kernels is a normalized, that is, 
ldsgl (s) 
o 
=1. 
Substitution of (3) into (2) yields 
K 
-- =-ax+  Wy + e, 
dt 
k=0 
(o) 
where we defined Yo (t) = y (t) and the gamma state variables - 
t 
yn (t) = ldsgn (t- s) YO (s), k=l,..,K. 
o 
(z) 
The gamma state variables hold memory traces of the neural states Yo(t). The 
important question is how to compute yn (t). Differentiating (7) using Leibniz' rule 
yields - 
dt I (t - s) y (s) ds + gl (0) y (t) . (8) 
o 
We now utilize gamma kernel property [1] (eq. (5)) to obtain - 
dY k 
dt 
t 
I[- I. tg/ (t - s) 
o 
+ gg- 1 (t - s) ] y (s) ds + g (0) y (t) 
Note that since gn (0) = 0 for k > 2 and g (0) = g, (9) evaluates to - 
A Theory for Neural Networks with Time Delays 165 
dY k 
dt - - gyt +gyt-l' k=l,..,K. ( 
The gamma model is described by (6) and (10). This extended set of ordinary 
differential equations (ODEs) is equivalent to the convolution model, described by 
the set of functional differential equations (2), (3) and (4). 
It is a valid question to ask whether the system of ODEs that describes the gamma 
model can still be expressed as a neural network model. The answer is affirmative, 
since the gamma model can be formulated as a regular (Grossberg) additive model. 
To see this, define the N(K+l)-dimensional augmented state vector X = 
neural output signal Y = 
Y 
I 
YK 
of decay parameters M = 
W o W 1 
o go 
form - 
, an external input E = 
a o 
o\ 
e 
the 
, a diagonal matrix 
and the weight (super)matrix 
 Then the gamma model can be rewritten in the following 
dX 
d' = -MX+fY+ E, (11) 
the familiar Grossberg additive model. 
3 
HEBBIAN LEARNING IN THE GAMMA MODEL 
The additive model formulation of the gamma model allows a direct generalization 
166 de Vries and Principe 
of learning techniques to the gamma model. Note however that the augmented state 
vector X contains the gamma state variables Yl,'",YK, basically (dispersively) 
delayed neural states. As a result, although associative learning rules for 
conventional additive models only encode the simultaneous correlation of neural 
states, the gamma learning rules are able to encode temporal associations as well. 
Here we present Hebbian learning for the gamma model. 
The Hebbian postulate is often mathematically translated to a learning rule of the 
form dW _ lx (t)yT(t) where 1 is a learning rate constant, x the neural activation 
dt ' 
vector and yT the neuron output signal vector. This procedure is not likely to encode 
temporal order, since information about past states is not incorporated in the learning 
equations. 
Tank and Hopfield (1987) proposed a generalized Hebbian learning rule with delays 
that can be written as - 
dt - qx(t) dsg(s)y(t-s) , (1) 
where g (s) is a normalized delay kernel. Notice that (12) is a functional differential 
equation, for which explicit solutions and convergence criteria are not known (for 
t 
most implementations of g (s)). In the gamma model, the signals fdsgn (s) y (t - s) 
0 
are computed by the system and locally available as yn (t) at the synaptic junctions 
Wn. Thus, in the gamma model, (12) reduces to - 
dt = lx ( t) yT ( t) ' ( 
This learning rule encodes simultaneous correlations (for k=0) as well as temporal 
associations (for k > 1). Since the gamma Hebb rule is structurally similar to the 
conventional Hebb rule, it is also local both in time and space. 
4 
RELATION TO OTHER MODELS 
The gamma model is related to Tank and Hopfield's CITN model in that both models 
decompose W(t) into a linear combination of gamma kernels. The weights in the 
CITN system are preset and fixed. The gamma model, expressed as a regular additive 
system, allows conventional adaptation procedures to train the system parameters; g 
and K adapt the depth and shape of the memory, while W0,..,W K encode 
spatiotemporal correlations between neural states. 
Time-Delay-Neural-Nets (TDNN) are characterized by a tapped delay line memory 
structure. The relation is best illustrated by an example. Consider a linear one-layer 
A Theory for Neural Networks with Time Delays 167 
feedforward convolution model, described by - 
x(t) =e(t) 
t 
y (t) -' $W(t - s) x (s) ds 
0 
(14) 
where x(t), e(t) and y(t) are N-dimensional signals and W(t) a NxNx [0, oo] 
dimensional weight matrix. This system can be approximated in discrete time by - 
x(n) =e(n) 
y(n) =  W(n-m) x(m) 
m=0 
(15) 
which is the TDNN formulation. An alternative approximation of the convolution 
model by means of a (discrete-time) gamma model, is described by (figure 1) - 
xo(n) =e(n) 
xt(n) = (1-[t)xt(n- 1) +gxt_ 1 (n- 1) 
K 
y (n) =  Wtx t (n) 
k=0 
k=l,..,K 
(16) 
The recursive memory structure in the gamma model is stable for 0 < t < 2, but an 
interesting memory structure is obtained only for 0 < g < 1. For g = 0, this system 
collapses to a static additive net. In this case, no information from past signal values 
are stored in the net. For 0 < g < 1, the system works as a discrete-time CITN. The 
gamma memory structure consists of a cascade of first-order leaky integrators. Since 
the total memory structure is of order K, the shape of the memory is not restricted to 
a recency gradient. The effective memory depth approximates _K for small g. For 
 = 1, the gamma model becomes a TDNN. In this case, memory is implemented by 
a tapped delay line. The strength of the gamma model is that the parameters g and K 
can be adapted by conventional additive learning procedures. Thus, the optimal 
temporal structure of the neural system, whether of CITN or TDNN type, is part of 
the training phase in a gamma neural net. Finally, the application of the gamma 
memory structure is of course not limited to one-layer feedforward systems. The 
topologies suggested by Jordan (1986) and Elman (1988) can easily be extended to 
include gamma memory. 
168 de ies and Principe 
e(n) 
a one-layer 
ffw gamma net 
5 
CONCLUSIONS 
We have introduced the gamma neural model, a neural net model for temporal 
processing, that generalizes most existing approaches, such as the CITN and TDNN 
models. The model can be described as a conventional dynamic additive model, 
enabling direct application of existing learning procedures for additive models. In the 
gamma model, dynamic objects are encoded by the same learning equations as static 
objects. 
Acknowledgments 
This work has been partially supported by NSF grants ECS-8915218 and DDM- 
8914084. 
References 
Cohen et. al., 1979. Stable oscillations in single species growth models with 
hereditary effects. Mathematical Biosciences 44:255-268, 1979. 
DeVries and Principe, 1990. The gamma neural net - A new model for temporal 
processing. submitted to Neural Networks, Nov. 1990. 
Elman, 1988. Finding structure in time. CRL technical report 8801, 1988. 
Jordan, 1986. Attractor dynamics and parallelism in a connectionist sequential 
machine. Proc. Cognitive Science 1986. 
Lang et. al. 1990. A time-delay neural network architecture for isolated word 
recognition. Neural Networks, vol. 3 (1), 1990. 
Tank and Hopfield, 1987. Concentrating information in time: analog neural networks 
with applications to speech recognition problems. 1st int. conf. on neural 
networks, IEEE, 1987. 
