Rational Parametrizations of Neural 
Networks 
Uwe Helmke 
Department of Mathematics 
University of Regensburg 
Regensburg 8400 Germany 
Robert C. Williamson 
Department of Systems Engineering 
Australian National University 
Canberra 2601 Australia 
Abstract 
A connection is drawn between rational functions, the realization 
theory of dynamical systems, and feedforward neural networks. 
This allows us to parametrize single hidden layer scalar neural 
networks with (almost) arbitrary analytic activation functions in 
terms of strictly proper rational functions. Hence, we can solve the 
uniqueness of parametrization problem for such networks. 
I INTRODUCTION 
Nonlinearly parametrized representations of functions b:  --  of the form 
(1.1) b(x) -  ci(x - ai) x  , 
i-1 
have attracted considerable attention recently in the neural network literature. 
a:  -  is typically a sigmoidal function such as 
(1.2) a(x) = (1 + e-) -x, 
but other choices than (1.2) are possible and of interest. 
representations such as 
(1.3) b(x) -  cia(bix-ai) 
i=1 
Here 
Sometimes more complex 
623 
624 Helmke and Williamson 
or even compositions of these are considered. 
The purpose of this paper is to explore some parametrization issues regarding (1.1) 
and in particular to show the close connection these representations have with the 
standard system-theoretic realization theory for rational functions. We show how 
to define a generalization of (1.1) parametrized by (A, b, c), where A is a matrix 
over a field, and b and c are vectors. (This is made more precise below). The 
parametrization involves the (A, b, c) being used to define a rational function. The 
generalized r-representation is then defined in terms of the rational function. This 
connection allows us to use results available for rational functions in the study of 
neural-network representations such as (1.1). It will also lead to an understanding 
of the geometry of the space of functions. 
One of the main contributions of the paper is to show how in general neural network 
representations are related to rational functions. In this summary all proofs have 
been omitted. A complete version of the paper is available from the second author. 
2 REALIZATIONS RELATIVE TO A FUNCTION 
In this section we explore the relationship between sigmoidal representations of real 
analytic functions b' lI -. IR defined on an interval lI C IR, real rational functions 
defined on the complex plane C, 
linear dynamical systems 
y(t) 
and the well established realization theory for 
= Az(t) +bu(t) 
= cx(t) + 
For standard textbooks on systems theory and realization theory we refer to [5, 7]. 
Let lK denote either the field IR of real numbers or the field C of complex numbers. 
Let A C C be an open and simply connected subset of the complex plane and let 
r: A -. C be an analytic function defined on A. For example, a may be obtained 
by an analytic continuation of some sigmoidal function a: IR -. IR into the domain 
of holomorphy of the complex plane. 
Let T:  -*  be a linear operator on a finite-dimensional K-vector space  such 
that T has all its eigenvalues in A. Let I' C A be a simple closed curve, oriented 
in the counter-clockwise direction, enclosing all the eigenvalues of T in its interior. 
More generally, I' may consist of a finite number of simple closed curves I'k with 
interiors A such that the union of the domains A contains all the eigenvalues of 
T. Then the matrix valued function rr(T) is defined as the contour integral [8, p.44] 
1 
(2.1) rr(T) := 2ri 
Note that for each linear operator 
operator on V. 
T'-., 
T)-  dz. 
r(T)'  -*  is again a linear 
If we now make the substitution T := xI + A for x  C and A' V -* V lK-linear, 
then 
o'(xI+A)= 1 fr'(z)((z-x)I-A) -dz 
2ri 
Rational Parametrizations of Neural Networks 625 
becomes a function of the complex variable z, at least as long as F contains all the 
eigenvalues of zI + A. Using the change of variables  := z - z we obtain 
(2.2) a(zI + A) = 1 fr a(z + ')('I- A) -1 d, 
2ri , 
where I" = I' - z C A encircles all the eigenvalues of A. 
Given an arbitrary vector b   and a linear functional c:  -- lK we achieve the 
representation 
2a-i 
(2.3) 
Note that in (2.3) the simple closed curve F C C is arbitrary,  long  it satisfies 
the two conditions 
(2.4) F encircles all the eigenvalues of A 
(2.5) + r = + e r) c 
Let 4: 1I -. 11 be a real analytic function in a single variable z  1I, defined on an 
interval 1I C 11. 
Definition 2.1 A quadruple (A, b, c, d) is called a finite-dimensional a-realization 
of qS : lI --.  over a field of constants IK if for all z  lI 
(2.6) 4,(x) = ca(x + A)t, + a 
holds, where the right hand side is given by (.$) and F is assumed to satisfy the 
conditions (.d)-(.5). Here d  lK, b  , and A:  --* , c:  -- lK are lK-linear 
maps and  is a finite dimensional lK-vector space. 
Definition 2.2 The dimension (or degree) of a a-realization is dim:. The r- 
degree of qS, denoted 5(q5), is the minimal dimension of all a-realizations of qS. A 
minimal a-realization is a a-realization of minimal dimension 5 (qS). 
a-realizations are a straightforward extension of the system-theoretic notion of a 
realization of a transfer function. In this paper we will address the following specific 
questions concerning a-realizations. 
Q1 What are the existence and uniqueness properties of a-realizations? 
Q2 How can one characterize minimal a-realizations? 
qa How can one compute 
3 EXISTENCE OF a-REALIZATIONS 
We now consider the question of existence of a-realizations. To set the stage, we 
consider the systems theory case a(z) = z -1 first. Assume we are given a formal 
power series 
N 
qi i 
(3.1) qb(x) = E i- 'x' 
i=0 
626 Helmke and Williamson 
and that (A, b, c) is a a-realization in the sense of definition 2.1. The Taylor expan- 
sion of c(xI 4- A)-lb at 0 is (for A nonsingular) 
(3.2) c(xI 4- A)-lb '- -.(-1) / cA-('+l)bx '. 
i--0 
Thus 
65i = (_l)icA_(i+l)b ' i = O, N. 
(3.3) ..., 
if and only if the expansions of (3.1) and (3.2) coincide up to order N. Observe [7] 
that 
c)(x) = c(xI + A)-lb and dim < c 
4(x) is rational with 4(c) = 0. 
The possibility of solving (3.3) is now easily seen as follows. Let V = ]I N'F1 -- 
Map({0,..., N},IR)be the finite or infinite (N + 1)-fold product space of I. (Here 
Map(X, Y) denotes the set of all maps from X to Y.) If N is finite let 
(3.4) A -1 
b 
0 .-. 0 1 
1 .-. 0 0 
0 .-. 1 0 
(10 ..- 0) T, 
For N = x we take A-I' 
(3.5) 
 ]](N+I) x (N+I), 
as a shift operator 
A-1. l TM -+ l N 
A -1. (xo, xo, 
and b=(1,O,...), c=(0,0o, 01,02/2!,...).' 
We then have 
x i be analytic at x = 0 and let (A,b,c) be a a- 
Lemma 3.1 Let () = ]]i i! 
realization of the formal power series 4(x) = ]]=0 xi N < cx> (i.e. matching of 
the first N + 1 derivatives of c)(x) and cer(xI + A)b at x = 0). Then 
(3.6) 
qSi = cr(i)(A)b for i = 0,... , N. 
Observe that for er(x) = x -1 we have er(i)(-A) = i!(A-1) i+1 as before. The exis- 
tence part of the realization question Q1 can now be restated as 
v' x i and a sequence of real numbers (qb0, N), does 
Q4 Given r(x) := z_,i=0 i! '"' 
there exist an (A, b, c) with 
(3.7) 4i = ca(0(A)b, i= 0,... ,N? 
Rational Parametrizations of Neural Networks 627 
Thus question Q1 is essentially a Loewner interpolation question [1, 3]. 
Let 7 = cAtb,   H0, and let 
(3.8) 
Write 
(3.9) [7] = 
0 1 2 ''' ] 
1 2 3 
oo 
F= a2 a3 a4 =(ai+j)i,j=O' 
72/2! 
73/3! 
, and [4]= 
Then (3.6) (for N = ) can formally be written as 
(3.10) [0] = F. [7]. 
Of course, any meaningful interpretation of (3.10) requires that the infinite sums 
j-0 ',_y__z. i  N0, exist. This happens, for example, if j :0 a/+j < , i  N0 
j! I. , 
and -]j=0 (7J/J!) 2 < cx exist. We have already seen that every finite or infinite 
sequence [71 has a realization (A, b, c). Thus we obtain 
Corollary 3.2 A function ok(a:) admits a a-realization if and only if [b]  
image(F). 
Corollary 3.3 Let H = (7i+j )i,j=o. There exists a finite dimensional tr-realization 
of ck(x) if and only if [,] = F[?] with rank H < co. In this case 5(,) = rank H. 
4 UNIQUENESS OF a-REALIZATIONS 
In this section we consider the uniqueness of the representation (2.3). 
Definition 4.1 (c.f. [21) A system {gl,... ,g,} of continuous functions gi' lI --+ 
1, defined on an interval lI C 1, is said to satisfy a Haft* condition of order 
n on lI if gl,...,g, are linearly independent, i.e. For every cl,..., c,  1 with 
Z?=l cigi(x) = 0 for all x  then cl =...= c. = 0. 
Remark 
that 
The Haar* condition is implied by the stronger classical Haar condition 
gl(zl) -.. gl(z.) ] 
det ' 3 0 
g,(xl) ... 
for all distinct (xi)?=l in ]I. Equivalently, if ]i"=1 cigi(x) has n distinct roots in 
then cl = ." = c, = O. 
Definition 4.2 A subset A of C is called self-conjugate if a  A implies ff  A. 
628 Helmke and Williamson 
a 0)' ' 
Let a: 1! -+ 1! be a continuous function and define z, x) := + zi). Let 
m 
where ZgJ--n, 
j=l 
gj H, gj >_ 1, j= 1,...,m 
denote a combination of n of size m. For a given combination  = (K1,... , Krn) of 
n, let I :-- {1,... ,m) and let Ji := {1,... ,i). Let Zm := {Zl,... ,z,) and let 
(4.1) 
zm):= -'). i  j  
Definition 4.3 If for all m < n, for all combinations a = (K1,' '' , tgrn) of n of size 
m, and for any self-conjugate set Z,, of distinct points, a(g, Z,,) satisfies a Haar* 
condition of order n, then a is said to be Haar generating of order n. 
Theorem 4.4 (Uniqueness).Let a' 1I --+ 1I be Haar generating of order at least 
2n on 1I and let (A, b, c) and (A, b, ) be minimal a-realizations of order n of functions 
c) and  respectively. Then the following equivalence holds 
(4.2) 
Conversely, if (d.) holds for almost all order n 
a:  --  is Haar generating on II of order > n. 
triples (A, b, c), 
(A, b, ), then 
The following result gives examples of activation functions a' IR --+ IR which are 
Haar generating. 
Lemma 4.5 Let d  No. Then 1) The function a(x) = x -a is Haar generating of 
arbitrary order. ) The monomial a(x) = x a is Haar generating of order d + 1. 3) 
The function e -z2 is Haar generating of arbitrary order. 
Remark A simple example of a a which is not Haar generating of order _> 2 is 
a(x) = e z. In fact, in this case a(x + zj) = cja(x + zi) for cj = e z-*', j = 2,... ,n. 
Remark The function a(z) = (1 +e-) -1 is not Haar generating of any order > 2. 
By the periodicity of the complex exponential function, a(z + 2ri) = a(z - 2ri), 
i = x/'-2--f, for all z. Thus the Haar* condition fails for Za = {2ri,-2ri}. 
In particular, the above uniqueness result fails for the standard sigmoid case. In 
order to cover this case we need a further definition. 
Definition 4.6 Let fl = fl C C be a self-conjugate subset of C. A function a: ]R --+ 
]R is said to be Haar generating of order n on Q, if for all m _< n, for all combinations 
 = (1,...,,,) of n of size m, and for any self-conjugate subset Z,, C Q of 
distinct points of Q, a(, Z,,) satisfies a Haar* condition of order n. 
Of course for fl = C, this definition coincides with definition 4.3. 
Rational Parametrizations of Neural Networks 629 
Theorem 4.7 (Local Uniqueness) Let a' ll -- ll be analytic and let Q C C 
be a self-conjugate subset contained in the domain of holomorphy of r. Let lI be a 
nontrivial subinterval of QCqll. Suppose a' ll -- 1 is Haar generating on Q of.o.rder 
at least 2n, n  N. Then for any two minimal a-realizations (A, b, c) and (A,b, C') 
of orders at most n with spect A, spect .3,  Q the following equivalence holds: 
(4.3) 
ca(zI + A)b = a(zI + A)b Vz  lI 
c(U - A)-b = (- )-' v  . 
Lemma 4.8 Let Q := {z  C: Izl < ). Then the standard sigmoid function 
o'(z) = (1 q-e-') -1 is Haar generating on Q of arbitrary order. 
5 MAIN RESULT 
As a consequence of the uniqueness theorems 4.4 and 4.7 we can now state our main 
result on the existence of minimal a-realizations of a function 4(z). It extends a 
parallel result for standard transfer function realizations, where a(z) = z -. 
Theorem 5.1 (Realization) Let Q C C be a self-conjugate subset, contained in 
the domain of holomorphy of a real meromorphic function a: ll-- ll. Suppose a is 
Haar generating on Q of order at least 2n and assume ok(z) has a finite dimensional 
realization (A, b, c) of dimension at most n such that A has all its eigenvalues in Q. 
1. There exists a minimal e-realization (A,b,c) of q)(x) of degree 5.(q)) _< 
dim(A, b, c). Furthermore, there exists an inverlible matrix S' such that 
(5.1) 
= 0 As , Sb= , cS- =[c,ca]. 
If (A,b,cl) and (A,b,c) are minimal r-realizations of ok(x)such that 
the eigenvalues of A1 and A t are contained in Q, then there exists a unique 
invertible matrix S' such that 
(5.) 
! 
(A,b,c) = (SAxS-,Sbx,cS'-). 
3. A e-realization (A, b, c) is minimal if and only if (A, b, c) is controllable and 
observable; i.e. if and only if(A, b, c) satisfies the generic rank conditions 
rank(b, Ab,... , A'-b) = n, rank 
c] 
cA n- 1 
for A  ]K TM, b  ] , cT  IK n . 
Remark The use of the terms "observable" and "controllable" is solely for formal 
correspondence with standard systems theory. There are no dynamical systems 
actually under consideration here. 
630 Helmke and Williamson 
Remark Note that for any a-realization (A,b,c) of the form A 
[All A12] lb01 ] [ o'(All) * ] 
0 A,, ' b = , c = [c,c,], we have a(A) = 0 
and thus ca(xI + A)b = ca(zI + A)b. Thus transformations of the above kind 
always reduce the dimension of a a-realization. 
Corollary 5.2 ([9]) Let a(z) = (1 + e-z) - and let ok(z) = 5-.in__ Cia(X- 
ai) = in= Ca(X- a) be two minimal length tr-representations with Iail  
r, Iai]  r, i = 1,... ,n. Then (ati,c) = (ap(i), cp(i)) for a unique permuta- 
tion p: {1,... ,n} --+ 1,... , n). In particular, minimal length representation (1.1) 
with real coefficients ai and ci are unique up to a permutation of the summantis. 
6 CONCLUSIONS 
We have drawn a connection between the realization theory for linear dynamical 
systems and neural network representations. There are further connections (not 
discussed in this summary) between representations of the form (1.3) and rational 
functions of two variables. There are other questions concerning diagonalizable 
realizations and Jordan forms. Details are given in the full length version of this 
paper. Open questions include the problem of partial realizations [4, 6].  
REFERENCES 
[1] A.C. Antoulas and B. D. O. Anderson, On the Scalar Rational Interpolation Prob- 
lem, IMA $ourna/oMathematical Control and Information, 3 (1986), pp. 61-88. 
[:2] E.W. Cheney, Introduction to Approximation Theory, Chelsea Publishing Com- 
pany, New York, 198:2. 
[3] W. F. Donoghue, Jr, Monotone Matrix Functions and Analytic Continuation, 
Springer-Verlag, Berlin, 1974. 
[4] W.B. Gragg and A. Lindquist, On the Partial Realization Problem, Linear Algebra 
and its Applications, 50 (1983), pp. :277-319. 
[5] T. Kailath, Linear Systems, Prentice-Hall, Englewood Cliffs, 1980. 
[6] R.E. Kalman, On Partial Realizations, Transfer Functions, and Canonical Forms, 
Acta Polytechnica Scandinavica, 31 (1979), pp. 9-3:2. 
[7] R.E. Kalman, P. L. Falb and M. A. Arbib, Topics in Mathematical System Theory, 
McGraw-Hill, New York, 1969. 
[8] T. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, Berlin, 1966. 
[9] R. C. Williamson and U. Helmke, Existence and Uniqueness Results for Neural 
Network Approximations, To appear, IEEE Transactions on Neural Networks, 
1993. 
 This work was supported by the Australian Research Council, the Australian Telecom- 
munications and Electronics Research Board, and the Boeing Commercial Aircraft Com- 
pany (thanks to John Moore). Thanks to Eduardo Sontag for helpful comments also. 
