Oriented Non-Radial Basis Functions for Image 
Coding and Analysis 
Avijit Saha 1 Jim Christian D.S. Tang 
Microelectronics and Computer Technology Corporation 
3500 West Balcones Center Drive 
Austin, TX 78759 
Chuan-Lin Wu 
Department of Electrical and Computer Engineering 
University of Texas at Austin, 
Austin, TX 78712 
ABSTRACT 
We introduce oriented non-radial basis function networks (ONRBF) 
as a generalization of Radial Basis Function networks (RBF)- wherein 
the Euclidean distance metric in the exponent of the Gaussian is re- 
placed by a more general polynomial. This permits the definition of 
more general regions and in particular- hyper-ellipses with orienta- 
tions. In the case of hyper-surface estimation this scheme requires a 
smaller number of hidden units and alleviates the "curse of dimen- 
sionality" associated kernel type approximators.In the case of an im- 
age, the hidden units correspond to features in the image and the 
parameters associated with each unit correspond to the rotation, scal- 
ing and translation properties of that particular "feature". In the con- 
text of the ONBF scheme, this means that an image can be 
represented by a small number of features. Since, transformation of an 
image by rotation, scaling and translation correspond to identical 
transformations of the individual features, the ONBF scheme can be 
used to considerable advantage for the purposes of image recognition 
and analysis. 
1 INTRODUCTION 
Most, "neural network" or "connectionist" models have evolved primarily as adaptive 
function approximators. Given a set of input-output pairs <x,y> (x from an underlying 
function f (i.e. y = fix)), a feed forward, time-independent neural network estimates a 
1. Alternate address: Dept. of ECE, Univ. of Texas at Austin, Austin, TX 78712 
728 
Oriented Non-Radial Basis Functions for Image Coding and Analysis 729 
function y' = g(p,x) such that E= p(y - y') is arbitrarily small over all <x,y> pairs. Here, p 
is the set of parameters associated with the network model and p is a metric that measures 
the quality of approximation, usually the Euclidean norm. In this paper, we shall restrict 
our discussion to approximation of real valued functions of the form f:R n -> R. For a net- 
work of fixed structure (determined by g), all or part of the constituent parameter set p, 
that minimize E are determined adaptively by modifying the set of parameters. The prob- 
lem of approximation or hypersurface reconstruction is then one of determining what class 
of g to use, and then the choice of a suitable algorithm for determining the parameters p- 
given a set of samples {<x,y>).By far the most popular method for determining network 
parameters has been the gradient descent method. If the error surface is quadratic or con- 
vex, gradient descent methods will yield an optimal value for the network parameter- 
s.However, the burning problem in still remains the determination of network parameters 
when the error function is infested with local minimas. One way of obviating the problem 
of local minimas is to match a network architecture with an objective function such that 
the error surface is free of local minimas. However, this might limit the power of the net- 
work architecture such as in the case of linear perceptrons[1]. Another approach is to ob- 
tain algebraic transformations of the objective functions such that algorithms can be 
readily designed around the transformed functions to avoid local minimas. Random opti- 
mization method of Matyas and its variations have been studied recently [2], as alternate 
avenues for detennining the parameter set p. Perhaps the most probable reason for the BP 
algorithms popularity is that the error surface is relatively smooth [1],[3] 
The problem of local minimas is circumvented somewhat differently in local or kernel 
type estimators. The input space in such a method is partitioned into a number of local re- 
gions and if the number of regions defined is sufficiently large, then the output response in 
each local region is sufficiently uniform or smooth and the error will remain bounded i.e. a 
local minima will be close to the global minima. The problem with kernel type of esthna- 
tors is that the number of "bins", "kemels" or "regions" that need to be defined increases 
exponentially with the dimension of the input space. An improvement such as the one con- 
sidered by [4] is to define the kemels only in regions of the input space where there is data. 
However, our experiments indicate that even this may not be sufficient to lift the curse of 
dimensionality. If instead of limiting the shape of the kemels to be boxes or even hyper- 
spheres we select the kernels to be shapes defined by a second order polynomials then a 
larger class of shapes or regions can be defined resulting in significant reductions in the 
number of kernels required. This was the principal motivation behind our generalization 
of ordinary RBF networks. Also, we have determined that radial basis function networks 
will, given sufficiently large widths, linearize the output response between two hidden 
units. This gives rise to hyperacuity or coarse coding, whereby a high resolution of stimuli 
can be observed at the signal level despite poor resolution in the sensor array. In the con- 
text of function approximation this means that if the hyper-surface being approximated 
varies linearly in a certain region, the output behavior can be captured by suitably placing 
a single widely tuned receptive field in that region. Therefore, it is advantageous to choose 
the regions with proper knowledge of the output response in that region as opposed to 
choosing the bins based on the inputs alone. These were some of the principal motivations 
for our generalization. 
In addition to the architectural and learning issues, we have been concerned with approx- 
imation schemes in which the optimal parameter values have readily interpretable forms 
that may allow other useful processing elsewhere. In the following section we present 
ONBF as a generalization to RBF [4] and GRBF [5]. We show how rotation, scaling and 
730 Saha, Christian, 2mg, and Wu 
translation (center) information of these regions can be readily extracted from the parame- 
ter values associated with each hidden unit. In subsequent sections we present experimen- 
tal results illustrating the performance of ONRBF as a function approximator and 
feasibility of ONRBF for the purposes of image coding and analysis. 
2 ORIENTED NON-RADIAL BASIS FUNCTION NETWORKS 
Radial Basis Function networks can be described by the formula: 
k 
f(x) =  wcRc(x) 
Ct=0 
where fix) is the output of the network, k is the number of hidden units, w is the weight 
associated with hidden unit at, and P(x) is the response of unit at, The response P(x) of 
unit at is given by 
R(x=c 
Poggio and Girosi [5] have considered the generalization where a different width parame- 
ter o, is associated with each input dimension i. The response function P is then defined 
as 
i=l ai 
R(x(x) =  
Now each o, s can influence the response of the atth unit and the effect is that widths associ- 
ated with irrelevant or correlated inputs will tend to be increased. It has been shown that if 
one of the input components has a random input and a constant width (constant for that 
particular dimension) is used for each receptive field, then the width for that particular re- 
ceptive field is maximum [6]. 
The generalization we consider in this paper is a further shaping of the response P by 
composing it with a rotation function S designed to rotate the unit about its center in d- 
space, where d is the input dimension. This composition can be represented compactly by 
a response function of the form: 
-II M[x, ..... x d,1 2 
Rct=e 
where M is a d by d+l matrix. The matrix transforms the input vectors and these transfor- 
mations correspond to translation (center information), scaling and rotation of the input 
vectors. The response function presented above is the restricted form of a more general re- 
sponse function of the form: 
Rot = e-[P(x)] 
where the exponent is a general polynomial in the input variables. In the following sec- 
tions we present the learning rules and we show how center, rotation and scaling informa- 
tion can be extracted from the matrix elements. We do this for the case when the input 
dimension is 2 (as is the case for 2-dimensional images) but the results are generalized 
easily. 
Oriented Non-Radial Basis Functions for Image Coding and Analysis 731 
2.1 LEARNING RULES 
Consider the n-dimensional case where <x,...x.> represents the input vector and 
represents the matrix element of the j"' row and k"' column of the matrix M associated 
with the c"' unit. Then the response of the c"' unit is given by: 
Rct(x.y) = 'l,i-, maxidd 
The total stun square error over b patterns is given by: 
TE  [f(xl)- F (xl)] 2 
Then the derivative of the error due to the I h pattern with respect to the matrix element 
mmj of the W h unit is given by: 
i) (Ei) 2[f( F(x) ]_ i) f = 2[L] i) f 
'"Om.. = x)- Om.. 
lj lj Ij 
and: 
 .f = -2met ' 1)TxjRa(xl) 
3m..  (xl' 
ij 
where, 
nk6' is the i th row of the matrix corresponding to the c th unit 
x' is the input vector 
xj' is the jth variable in the input space. 
Then the update role for the matrix elements with learning rate 1 is given by: 
t+l t 
mij = mij-q'..(EiQ 
ij 
and the leaming role for the weights wc is given by: 
t+l t 
wc = wc-LRc(xi) 
2.2 EXTRACTING ROTATION, SCALE AND CENTER VALUES 
In this section we present the equations for extracting the rotation, translation and scal- 
ing values (widths) of the cz th receptive field from its associated matrix elements. We 
present these for the special case when n the input dimension is equal to 2, since that is 
the case for images. The input vector x is represented by <x,y> and the rules for con- 
verting the matrix elements into center, scaling and rotation information is as follows: 
 center (x0,y0) 
(m 12 m 11 +m21 m22) (m 11 m 13 + m23m21)- (m211 +ml ) (m 13 m 12 +m23 m22) 
x0-- A 
732 Saha, Christian, 11mg, and Wu 
(m12mll +m21 m22) (m12m13+m23m22)- (m122+m222) (m13mll +m23m21) 
Yo = & 
where, 
(m 121 + m221 ) (m122 + m222 )-(m ,,m 12 + m22m21) 
rotation (0) 
]tail-1 (. m2__ lm22+m12mll ) 
0 =  2 2 2 _m222 ) 
[mll + (m21-m12 
scaling or receptive field widths or sigmas 
1 m +m22m 
dl . (m211 +m221 2 2 12mll 21 1 
= +m 12 +m22 ) + sin20 -= 
1 
d2 . (m121 +m221 2 2 
= +m12+m22 )- 
m 12 m 11 +m22m21 
1 
sin 2 0 
2.3 HIERARCHICAL CLUSTERING 
We use a multi-resolution, hierarchical approach to determine where to place hidden units 
to maximize the accuracy of approximation and to locate image features. For illustration, 
we consider our method in the context of image processing, though the idea will work for 
any type of function approximation problem. The process begins with a small number of 
widely tuned receptive field units. The widths are made high my multiplying the value ob- 
tained from the nearest neighbor-heuristic by a large overlap parameter. The large widths 
force the units to excessively smooth the image being approximated. Then, errors will be 
observed in regions where detailed features occur. Those pixels for which high error (say, 
greater than one standard deviation from the mean) occurred are collected and new units 
are added in locations chosen randomly from this set. The entire process can be repeated 
until a desired level of accuracy is reached. Notice that, when the network is finally 
trained, the top levels in the hierarchy provide global information about the image under 
consideration. This scheme is slightly different than the one presented in [7], where units 
in each resolution learn the error observed in the previous resolution-- in our method, after 
the addition of the new units all the units learn the original function as opposed to the 
some error function. 
3 RESULTS 
3.1 ONRBF AS AN APPROXIMATOR 
Oriented non-radial basis function networks allow the definition of larger regions or recep- 
tive fields- this is due to the fact that rotation, along with the elliptical hyper-spheres as op- 
posed to mere spheres, permits the grouping of more nearby points into a single region. 
Oriented Non-Radial Basis Functions for Image Coding and Analysis 733 
Therefore, the approximation accuracy of such a network can be quite good with even a 
small number of units. For instance, Table I compares ordinary radial basis function net- 
works with oriented non-radial basis function networks in terms of the number of units re- 
quired to achieve various levels of accuracy. The function approximated is the Mackey- 
Glass differential delay equation: 
dx t xt. T 
dt - bxt+a l+----'- 
t-T 
TABLE 1. Normalized approximation error for radial and non-radial basis functions 
RBF 'lxr,in ONBF 'lxr,4n RBF Tc,t I ONBF Test 1 RBF Te,t 
10 unita .436 .30 .367 .161 .fi .308 
0 unltw .377 .110 .167 .071 .407 .166 
40 unitw .236 .057 .184 .065 .310 .105 
80 Units .10 .123 
160 units .15 .126 
320 unit .107 .131 .207 
S00 unt .061 .11 .08 
The series used was generated with t = 17, a = 0.1 and b = 0.2. A series of 500 consecutive 
points was used for training, and the next two sets of 500 points were used for cross-vali- 
dation. The training vector at time t is the tuple (xt,xt.6,Xt_12,Xt_lS,Xt+85), where the first 
four components form the input vector and the last forms the target, and xt is the value of 
the series at time t. Table I lists the normalized error for each experiment- that is, the root 
mean square prediction error divided by the standard deviation of the data series. Oriented 
non-radial basis function networks yield higher accuracy than do radial basis function net- 
works with the same number of units. In addition, ONRBF nets were found to generalize 
better. 
3.2 IMAGE CODING AND ANALYSIS 
For images each hidden unit corresponds some feature in the input space. This implies that 
there is some invariant property associated with the region spanned by the receptive field. 
For bitmaps this property could be the probability density function (ignoring higher order 
statistics) and a feature is a region over which the probability density function remains the 
same. For grey level images, instead of the linear weight this property could be described 
by a low order polynomial. We have found that when the parameters of an image function 
are determined adaptively using the learning rules in section 2.1-- the receptive fields or- 
ganize themselves so as to capture features in the input space. This is illustrated in Figure 
1, where the input image is a bitmap for a set of Chinese characters. The property of a fea- 
ture in this case is the value of the pixel (0 or 1) in the coordinate location specified by the 
input- and therefore a linear term (for the weight) as used in section 2.1 is sufficient. Fig- 
ure 1.a is the input bitmap image and figure 1.b shows the plot of the regions of influence 
of the individual receptive fields. Notice that the individual receptive fields tend to become 
"responsible" for entire strokes of the character. 
We would like to point out that if the initial positions of the hidden units are chosen ran- 
domly, then with each new start of the approximation process a single feature may be rep- 
resented by a collection of hidden units in many different manners- and the task of 
734 Saha, Christian, Tang, and Wu 
Figurel.a: Bitmap Of 
Chinese Character Which 
Is The Input Image 
Figure 1.b: Plot Of Regions Of Influence 
Of Receptive Fields After Training 
recognition becomes difficult. Therefore, for consistent approximation, a node deletion or 
region growing algorithm is needed. Such an algorithm has been developed and will be 
presented elsewhere. If with every approximation of the same image, we get the same fea- 
tures (parameters for the hidden units), then images under rotation and scaling can also be 
recognized easily-- since there will be a constant scaling and rotational change in all the 
hidden units. 
4 CONCLUSIONS 
We have presented a generalization of RBF networks that allows interpretation of the pa- 
rameter values associated with the hidden units and perfonns better as a function approxi- 
mator. The number of parameters associated with each hidden units grow quickly with the 
input dimension (O(d2)). However, the number of hidden units required is significantly 
lower if the function is relatively smooth. Alternatively, one can compose the Gaussian re- 
sponse of the original RBF by using a suitable clipping function in which the number of 
associated parameters grow linearly with the input dimension d. For images, the input di- 
mension is 2 and the number of parameters associated with each hidden unit is 6 as op- 
posed to 5- when the multidimensional Gaussian is represented by the superposition of 1- 
dimensional Gaussians, and 4 with RBF networks. 
References 
[1 ] Widrow, Bernard and Michael A. Lehr,"30 Years of Adaptive Neural Networks: 
Perceptton, Madaline, and Backpropagation", Proc. of the IEEE, vol.78, No. 9, Sept 
1990, pp 1415-1442. 
[2] Baba, Norio,"A New Approach for Finding the Global Minimum of Error Function 
of Neural Networks", Neural Networks, Vol. 2, pp 367-373, 1989. 
[3] Baldi, Pierre and Kurt Homik,"Neural Networks and Principal Component 
Analysis: Learning from Examples Without Local Minima", Neural Networks, Vol. 
2,pp 53-58, 1989. 
[4] Moody, John and Darken, Christen," Learning with Localized Receptive Fields", 
Proc. of the 1988 Connectionist Models Summer School,CMU. 
[5] Poggio Tomaso and Fedrico Giorsi,"Networks for Approximation and Learning", 
Proc. of IEEE, vol. 78, no. 9, September 1990, pp 1481- 1496. 
[6] Saha, Avijit, D. S. Tang and Chuan-Lin Wu,."Dimension Reduction Using 
Networks of Linear Superposition of Gaussian Units",MCC Technical Report,, 
Sept. 1990. 
[7] Moody, John and Darken, Christen," Learning with Localized Receptive Fields", 
Proc. of the 1988 Connecfionist Models Summer School, CMU. 
