Sequential Adaptation of Radial Basis Function 
Neural Networks and its Application to 
Time-series Prediction 
V. Kadirkamanathan 
Engineering Department 
Cambridge University 
Cambridge CB2 1PZ, UK 
M. Niranjan 
Abstract 
F. Fallside 
We develop a sequential adaptation algorithm for radial basis function 
(RBF) neural networks of Gaussian nodes, based on the method of succes- 
rove '-Projections. This method makes use of each observation efficiently 
in that the network mapping function so obtained is consistent with that 
information and is also optimal in the least La-norm sense. The RBF 
network with the '-Projections adaptation algorithm was used for pre- 
dicting a chaotic time-series. We compare its performance to an adapta- 
tion scheme based on the method of stochastic approximation, and show 
that the '-Projections algorithm converges to the underlying model much 
faster. 
I INTRODUCTION 
Sequential adaptation is important for signal processing applications such as time- 
series prediction and adaptive control in nonstationary environments. With increas- 
ing computational power, complex algorithms that can offer better performance 
can be used for these tasks. A sequential adaptation scheme, called the method 
of successive '-Projections [Kadirkamanathan &; Fallside, 1990], makes use of each 
observation efficiently in that, the function so obtained is consistent with that ob- 
servation and is the optimal posterior in the least La-norm sense. 
In this paper we present an adaptation algorithm based on this method for the 
radial basis function (RBF) network of Gaussian nodes [Broomhead &; Lowe, 1988]. 
It is a memoryless adaptation scheme since neither the information about the past 
samples nor the previous adaptation directions are retained. Also, the observations 
are presented only once. The RBF network employing this adaptation scheme 
721 
722 Kadirkamanathan, Niranjan, and Fallside 
was used for predicting a chaotic time-series. The performance of the algorithm is 
compared to a memoryless sequential adaptation scheme based on the method of 
stochastic approximation. 
2 METHOD OF SUCCESSIVE -PROJECTIONS 
The principle of'-Projection [Kadirkamanathan et hi., 1990] is a general method 
of choosing a posterior function estimate of an unknown function f*, when there 
exists a prior estimate and new information about f* in the form of constraints. 
The principle states that, of all the functions that satisfy the constraints, one should 
choose the posterior fn that has the least L-norm, Ill. - f,- II, where is the 
prior estimate of f*. viz., 
f. - argrrinllf- f--ill such that f. E Hx (1) 
where Hi is the set of functions that satisfy the new constraints, and 
Ilf- f.-ll  - f II/(z--)- f.-(z-)llld-zl- O(f, fn_) (2) 
where z__ is the input vector, is the infinitesimal volume in the input space 
domain C. 
In functional analysis theory, the metric D(., .) describes the L-normed linear space 
of square integrabit functions. Since an inner product can be defined in this space, 
it is also the Itilbert space of square integrahie functions [Linz, 1984]. Constraints 
of the form Yn = f(x-.n) are linear in this space, and the functions that satisfy the 
constraint lie in a hyperplane subspace Hi. The posterior fn, obtained from the 
principle can be seen to be a projection of fn-1 onto the subspace Hi containing 
f*, the underlying function that generates the observation set, and hence is optimal 
(i.e., best possible choice), see Figure 1. 
Hilbert space 
Figure 1: Principle of f-Projection 
Neural networks can be viewed as constructing a function in its input space. The 
structure of the neural network and the finite number of parameters restrict the class 
Sequential Adaptation of Radial Basis Function Neural Networks 723 
of functions that can be constructed to a subset of functions in the ttilbert space. 
Neural networks, therefore approximate the underlying function that describes the 
set of observations. Hence, the principle of '-Projection now yields a posterior 
f(O_.n) e Hi that is an approximation of fn (see Figure 1). 
The method of successive '-Projections is the application of the principle of '- 
Projection on a sequence of observations or information [Kadirkamanathan et al., 
1990]. For neural networks, the method gives an algorithm that has the following 
two steps. 
 Initialise parameters with random values or values based on a priori knowledge. 
 For each pattern (x_i, Yi) i - 1... n, determine the posterior parameter estimate 
 = argrn f 
_EC 
such that f(x, ) = Yi. 
where (x_i , Yi), for i = 1... n constitutes the observation set, 0_ is the neural network 
parameter set and f(x_,_0) is the function constructed by the neural network. 
3 '-PROJECTIONS FOR AN RBF NETWORK 
The class of radial basis function (RBF) neural networks were first introduced by 
Broomhead & Lowe [1988]. One such network is the RBF network of Gaussian 
nodes. The function constructed at the output node of the RBF network of Gaussian 
nodes, f(x_), is derived from a set of basis functions of the form, 
;hi(x__) = exp{-(x_- )Tc-I(;r -- __./)} i = 1...m 
(3) 
Each basis function bi(x_) is described at the output of each hidden node and is 
centered on  in the input space. bi(x) is a function of the radial weighted distance 
between the input vector x_ and the node centre . In general, Ci is diagonal with 
elements [eril, erI2,..., eriN]. f(x_) is a linear combination of the ra basis functions. 
i(x_) = 
i=1 
(4) 
and 0_ = [..., ai,_yi,_ai,...] is then the parameter vector for the RBF network. 
There are two reasons for developing the sequential adaptation algorithm for the 
RBF network of Gaussian nodes. Firstly, the method of successive '-Projections is 
based on minimizing the hypervolume change in the hypersurface when learning new 
patterns. The RBF network of Gaussian nodes construct a localized hypersurface 
and therefore the changes will also be local. This results in the adaptation of a few 
nodes and therefore the algorithm is quite stable. Secondly, the L-norm measure of 
the hypervolume change can be solved analytically for the RBF network of Gaussian 
nodes. 
724 Kadirkamanathan, Niranjan, and Fallside 
The method of successive '-Projections is developed under deterministic noise-free 
conditions. When the observations are noisy, the constraint that f(0, x_i ) -- Yi must 
be relaxed to, 
Ilf(0, _) - yill e _< e (5) 
Hence, the sequential adaptation scheme is modified to, 
0_ n = argninJ(_0) (6) 
j(0) = + yi[[ (7) 
ci is the penalty parameter that trades off between the importance of learning the 
new pattern and losing the information of the past patterns. This minimization 
can be performed by the gradient descent procedure. The minimization procedure 
is halted when the change AJ falls below a threshold. The complete adaptation 
algorithm is as follows: 
 Choose 0_0 randomly 
 For each pattern (i = 1... P) 
 = 
 Repeat (k ta iteration) 
_ =_ ? -  ) 
Until A J() < eta 
where X7J is the gradient vector of J(0_) with respect to 0__, A J(}) = J(_0? )) - 
J(_0? -1)) is the change in the cost function and eta is a threshold. Note that 
ci,_Yi,_ai for i = 1... ra are all adapted. The details of the algorithm can be found 
in the report by Kadirkamanathan et al., [Kadirkamanathan, Niranjan & Fallside, 
1991]. 
4 TIME SERIES PREDICTION 
An area of application for sequential adaptation of neural networks is the prediction 
of time-series in nonstationary environments, where the underlying model generat- 
ing the time-series is time-varying. The adaptation algorithm must also result in 
the convergence of the neural network to the underlying model under stationary 
conditions. The usual approach to predicting time-series is to train the neural net- 
work on a set of training data obtained from the series [Lapedes & Farbet, 1987; 
Farmer & Sidorowich, 1988; Niranjan, 1991]. Our sequential adaptation approach 
differs from this in that the adaptation takes place for each sample. 
In this work, we examine the performance of the '-Projections adaptation algorithm 
for the RBF network of Gaussian nodes in predicting a deterministic chaotic series. 
The chaotic series under investigation is the logistic map [Lapedes & Farbet, 1987], 
whose dynamics is governed by the equation, 
xn = 4Xn-l(1- Xn_l) (8) 
Sequential Adaptation of Radial Basis Function Neural Networks 725 
This is a first order nonlinear process where only the previous sample determines 
the value of the present sample. Since neural networks offer the capability of con- 
structing any arbitrary mapping to a sufficient accuracy, a network with input nodes 
equal to the process order will find the underlying model. Hence, we use the RBF 
network of Gaussian nodes with a single input node. We are thus able to compare 
the map the RBF network constructed with that of the actual map given by eqn 
(s). 
First, RBF network with 2 input nodes and 8 Gaussian nodes was used to predict 
the logistic map chaotic series of 100 samples. Each sample was presented only once 
for training. The training was temporarily halted after 0, 20, 40, 60, 80 and 100 
samples, and in each case the prediction error residual was found. This is given in 
Figure 2 where the increasing darkness of the curves stand for the increasing number 
of patterns used for training. It is evident from this figure that the prediction model 
improves very quickly from the initial state and then slowly keeps on improving as 
the number of training patterns used is increased. 
0'51 t 
0.0  
10 20 30 40 50 50 70 80 90 100 
time(samples) 
Figure 2: Evolution of prediction error residuals 
In order to compare the performance of the sequential adaptation algorithm, a mem- 
oryless adaptation scheme was also used to predict the chaotic series. The scheme is 
the LMS or stochastic approximation (sequential back propagation [White, 1987]), 
where for each sample, one iteration takes place. The iteration is given by, 
where, 
0__ i -" 0_.i_ 1 -- 7VJ 1 (9) 
_0=_0, 
J(O) = IIf(O,)- y11 (10) 
and J(0_) is the squared prediction error for the present sample. 
Next, the RBF network with a single input node and 8 Gaussian units was used 
to predict the chaotic series. The '-Projections and the stochastic approximation 
726 Kadirkamanatan, Niranjan, and Fallside 
adaptation algorithms were used for training this network on 60 samples. Results 
on the map constructed by a network trained by each of these schemes for 0, 20 and 
60 samples and the samples used for training are shown in Figure 3. Again, each 
sample was presented only once for training. 
0 
...... (0 Panens) 
- - - (20 Paem) 
(60 Pallruns) 
'do 0.8 , .o 
Past Sample 
(a) 
...... (0 Pallerns) 
 - -- -- (20 Palerns) 
CL {60 Pagerns) 
0.0 o.2 o.4 o.6 0. 1.0 
Past Sample 
(b) 
Figure 3' Map f(x) constructed by the R. BF network. (a) f-projections (b) stochastic 
approximation. 
The stochastic approximation algorithm fails to construct a close-fit mapping of 
the underlying function after training on 60 samples. The '-Projections algorithm 
however, provides a close-fit map after training on 20 samples. It also shows stability 
by maintaining the map up to training on 60 samples. The speed of convergence 
achieved, in terms of the number of samples used for training, is much higher for 
the 5v-Projections. 
Comparing the cost functions being minimized for the 5v-Projections and the 
stochastic approximation algorithms, given by eqns (7) and (10), it is clear that 
the difference is only an additional integral term in eqn (7). This term is not a 
function of the present observation, but is a function of the a priori parameter 
values. The addition of such a term is to incorporate a priori knowledge of the 
network to that of the present observation in determining the posterior parameter 
values. The faster convergence result for the 5v-Projections indicate the importance 
of the extended cost function. Even though the cost term for the 5v-Projections 
was developed for a recursive estimation algorithm, it can be applied to a block 
estimation method as well. The cost function given by eqn (7) can be seen to be an 
extension of the nonlinear least squared error to incorporate a priori knowledge. 
5 CONCLUSIONS 
The principle of 5v-Projection proposed by Kadirkamanathan et al., [1990], pro- 
vides an optimal posterior estimate of a function, from the prior estimate and new 
Sequential Adaptation of Radial Basis Function Neural Networks 727 
information. Based on it, they propose a sequential adaptation scheme called, the 
method of successive f-projections. We have developed a sequential adaptation 
algorithm for the RBF network of Gaussian nodes based on this method. 
Applying the RBF network with the 5v-Projections algorithm to the prediction 
of a chaotic series, we have found that the RBF network was able to map the 
underlying function. The prediction error residuals at the end of training with 
different number of samples, indicate that, after a substantial reduction in the error 
in the initial stages, with increasing number of samples presented for training the 
error was steadily decreasing. By comparing with the performance of the stochastic 
approximation algorithm, we show the superior convergence achieved by the 
Projections. 
Comparing the cost functions being minimized for the 5v-Projections and the 
stochastic approximation algorithms reveal that the 5v-Projections uses both the 
prediction error for the current sample and the a priori values of the parameters, 
whereas the stochastic approximation algorithms use only the prediction error. We 
also point out that such a cost term that includes a priori knowledge of the network 
can be used for training a trained network upon receipt of further information. 
References 
[1] Broomhead.D.S & Lowe. D, (1988), "Multi-variable Interpolation and Adaptive 
Networks", RSRE memo No.4148, Royal Signals and Radar Establishment, 
Malvern. 
[2] Farmer. J.D  Sidorowich.J.J, (1988), "Exploiting chaos to predict the future 
and reduce noise", Technical Report, Los Alamos National Laboratory. 
[3] Kadirkamanathan.V &: Fallside. F (1990), "sv-Projections: A nonlinear recur- 
sive estimation algorithm for neural networks", Technical Report CUED/F- 
INFENG/TR.53, Cambridge University Engineering Department. 
[4] Kadirkamanathan.V, Niranjan. M & Fallside. F (1991), "Adaptive RBF network 
for time-series prediction", Technical Report CUED/F-INFEN G/TR.56, Cam- 
bridge University Engineering Department. 
[5] Lapedes. A.S z Farben. R, (1987), "Non-linear signal processing using neural 
networks: Prediction and system modelling", Technical report, Los Alamos 
National Laboratory, Los Alamos, New Mexico 87545. 
[6] Linz.P, (1984), "Theoretical Numerical Analysis", John Wiley, New York. 
[7] Niranjan.M, (1991), "Implementing threshold autoregressive models for time 
series prediction on a multilayer perceptnon", Technical Report CUED/F- 
INFENG/TR.50, Cambridge University Engineering Department. 
[8] White. H, 1987, "Some asymptotic results for learning in single hidden layer 
feedforward network models", Technical Report, Department of Economics, 
Univeristy of California, San Diego. 
