Finite State Automata that Recurrent 
Cascade-Correlation Cannot Represent 
Stefan C. Kremer 
Department of Computing Science 
University of Alberta 
Edmonton, Alberta, CANADA T6H 5B5 
Abstract 
This paper relates the computational power of Fahlman's Recurrent 
Cascade Correlation (RCC) architecture to that of finite state automata 
(FSA). While some recurrent networks are FSA equivalent, RCC is not. 
The paper presents a theoretical analysis of the RCC architecture in the 
form of a proof describing a large class of FSA which cannot be realized 
by RCC. 
1 INTRODUCTION 
Recurrent networks can be considered to be def'med by two components: a network 
architecture, and a learning rule. The former describes how a network with a given set 
of weights and topology computes its output values, while the latter describes how the 
weights (and possibly topology) of the network are updated to fit a specific problem. It is 
possible to evaluate the computational power of a network architecture by analyzing the 
types of computations a network could perform assuming appropriate connection weights 
(and topology). This type of analysis provides an upper bound on what a network can be 
expected to learn, since no system can learn what it cannot represent. 
Many recurrent network architectures have been proven to be finite state automaton or 
even Turing machine equivalent (see for example [Alon, 1991], [Goudreau, 1994], 
[Kremer, 1995], and [Siegelmann, 1992]). The existence of such equivalence proofs 
naturally gives confidence in the use of the given architectures. 
This paper relates the computational power of Fahlman's Recurrent Cascade Correlation 
architecture [Fahlman, 1991] to that of finite state automata. It is organized as follows: 
Section 2 reviews the RCC architecture as proposed by Fahlman. Section 3 describes 
finite state automata in general and presents some specific automata which will play an 
important role in the discussions which follow. Section 4 describes previous work by other 
Finite State Automata that Recurrent Cascade-Correlation Cannot Represent 613 
authors evaluating RCC's computational power. Section 5 expands upon the previous 
work, and presents a new class of automata which cannot be represented by RCC. Section 
6 further expands the result of the previous section to identify an infinite number of other 
unrealizable classes of automata. Section 7 contains some concluding remarks. 
2 THE RCC ARCHITECTURE 
The RCC architecture consists of three types of units: input units, hidden units and output 
units. After training, a RCC network performs the following computation: First, the 
activation values of the hidden units are initialized to zero. Second, the input unit 
activation values are initialized based upon the input signal to the network. Third, each 
hidden unit computes its new activation value. Fourth, the output units compute their new 
activations. Then, steps two through four are repeated for each new input signal. 
The third step of the computation, computing the activation value of a hidden unit, is 
accomplished according to the formula: 
Here, ai(t) represents the activation value of unit i at time t, o(e) represents a sigmoid 
squashing function with finite range (usually from 0 to 1), and W/j represents the weight 
of the connection from unit i to unitj. That is, each unit computes its activation value by 
multiplying the new activations of all lowered numbered units and its own previous 
activation by a set of weights, summing these products, and passing the sum through a 
logistic activation function. The recurrent weight Wj from a unit to itself functions as a 
sort of memory by transmitting a modulated version of the unit's old activation value. 
The output units of the RCC architecture can be viewed as special cases of hidden units 
which have weights of value zero for all connections originating from other output units. 
This interpretation implies that any restrictions on the computational powers of general 
hidden units will also apply to the output units. For this reason, we shall concern ourselves 
exclusively with hidden units in the discussions which follow. 
Finally, it should be noted that since this paper is about the representational power of the 
RCC architecture, its associated learning rule will not be discussed here. The reader 
wishing to know more about the learning rule, or requiring a more detailed description of 
the operation of the RCC architecture, is referred to [Fahlman, 1991]. 
3 FINITE STATE AUTOMATA 
A Finite State Automaton (FSA) [Hopcroft, 1979] is a formal computing machine def'med 
by a 5-tuple M=(Q,,6,qo,F), where Q represents a finite set of states, l a finite input 
alphabet, 6 a state transition function mapping Qxl to Q, qoQ the initial state, and FcQ 
a set of f'mal or accepting states. FSA accept or reject strings of input symbols according 
to the following computation: First, the FSA's current state is initialized to qo. Second, 
the next inut symbol of the string, selected from l, is presented to the automaton by the 
outside world. Third, the transition function, 6, is used to compute the FSA's new state 
based upon the input symbol, and the FSA's previous state. Fourth, the acceptability of 
the string is computed by comparing the current FSA state to the set of valid f'mal states, 
F. If the current state is a member of F then the automaton is said to accept the string of 
input symbols presented so far. Steps two through four are repeated for each input symbol 
presented by the outside world. Note that the steps of this computation mirror the steps 
of an RCC network's computation as described above. 
It is often useful to describe specific automata by means of a transition diagram [Hopcroft, 
1979]. Figure 1 depicts the transition diagrams of five FSA. In each case, the states, Q, 
614 $. C. KREMER 
are depicted by circles, while the transitions defined by 6 are represented as arrows from 
the old state to the new state labelled with the appropriate input symbol. The arrow 
labelled "Start" indicates the initial state, qo; and final accepting states are indicated by 
double circles. 
We now define some terms describing particular FSA which we will require for the 
following proof. The first concerns input signals which oscillate. Intuitively, the input 
signal to a FSA oscillates if every pth symbol is repeated for p > 1. More formally, a 
sequence of input symbols, s(t), s(t+ 1), s(t+2), ..., oscillates with a period ofp if and 
only if is the minimum value such that: Vt s(t)=s(t+p). 
Our second definition concerns oscillations of a FSA's internal state, when the machine is 
presented a certain sequence of input signals. Intuitively, a FSA's internal state can 
oscillate in response to a given input sequence if there is some starting state for which 
every subsequent t th state is repeated. Formally, a FSA's state can oscillate with a period 
of t in response to a sequence of input symbols, s(t), s(t+ 1), s(t+2), ..., if and only if 
to is the minimum value for which: 
qo s.t. Vt 6(qo, s(t)) = 6( ... 6( 6( 6(q0, s(t)), s(t+l)), s(t+2)), ..., s(t+)) 
The recursive nature of this formulation is based on the fact a FSA's state depends on its 
previous state, which in turn depends on the state before, etc.. 
We can now apply these two defmifions to the FSA displayed in Figure 1. The automaton 
labelled "a)" has a state which oscillates with a period of t =2 in response to any sequence 
consisting of Os and ls (e.g. "00000...", "11111 .... ", "010101...", etc.). Thus, we can 
say that it has a state cycle of period t=2 (i.e. qoqqoq...), when its input cycles with a 
period of p= 1 (i.e. "0000..."). Similarly, when automaton b)'s input cycles with period 
p= 1 (i.e. "000000..."), its state will cycle with period t=3 (i.e. qoqq2qoqq2...). 
For automaton c), things are somewhat more complicated. When the input is the sequence 
"0000... ", the state sequence will either be qoqoqoqo... or t3 t3 t3 t... depending on the 
initial state. On the other hand, when the input is the sequence "1111...", the state 
sequence will alternate between q0 and q. Thus, we say that automaton c) has a state cycle 
of t=2 when its input cycles with periodp= 1. But, this automaton can also have larger 
state cycles. For example, when the input oscillates with a period p=2 (i.e. 
"01010101..."), then the state of the automaton will oscillate with a period t=4 (i.e. 
qoqoqqqoqoqq...). Thus, we can also say that automaton c) has a state cycle of t=4 
when its input cycles with period p =2. 
The remaining automata also have state cycles for various input cycles, but will not be 
discussed in detail. The importance of the relationship between input period (p) and the 
state period (t) will become clear shortly. 
4 PREVIOUS RESULTS CONCERNING THE COMPUTATIONAL 
POWER OF RCC 
The first investigation into the computational powers of RCC was performed by Giles et. 
al. [Giles, 1995]. These authors proved that the RCC architecture, regardless of 
connection weights and number of hidden units, is incapable of representing any FSA 
which "for the same input has an output period greater than 2" (p. 7). Using our 
oscillation definitions above, we can re-express this result as: if a FSA's input oscillates 
with a period of p= 1 (i.e. input is constant), then its state can oscillate with a period of 
at most =2. As already noted, Figure lb) represents a FSA whose state oscillates with 
a period of =3 in response to an input which oscillates with a period of p= 1. Thus, 
Giles et. al.'s theorem proves that the automaton in Figure lb) cannot be implemented (and 
hence learned) by a RCC network. 
Finite State Automata that Recurrent Cascade-Correlation Cannot Represent 615 
a) Start b) Start 
0, 1 0,1 1 
c) Start 
0 
1 
d) 
Start 
1 0 
1 
o 
0 '%) 
) 
Start 
0 
0 
1 0,1 
1 
0 
Figure 1' Finite State Automata. 
Giles et. al. also examined the automata depicted in Figures la) and lc). However, unlike 
the formal result concerning FSA b), the authors' conclusions about these two automata 
were of an empirical nature. In particular, the authors noted that while automata which 
oscillated with a period of 2 under constant input (i.e. Figure la)) were realizable, the 
automaton of lc) appeared not be be realizable by RCC. Giles et. al. could not account 
for this last observation by a formal proof. 
616 S.C. KREMER 
5 AUTOMATA WITH CYCLES UNDER ALTERNATING INPUT 
We now turn our attention to the question: why is a RCC network unable to learn the 
automaton of lc)? We answer this question by considering what would happen if lc) were 
realizable. In particular, suppose that the input units of a RCC network which implements 
automaton lc) are replaced by the hidden units of a RCC network implementing la). In 
this situation, the hidden units of la) will oscillate with a period of 2 under constant input. 
But if the inputs to lc) oscillate with a period of 2, then the state of ic) will oscillate with 
a period of 4. Thus, the combined network's state would oscillate with a period of four 
under constant input. Furthermore, the cascaded connectivity scheme of the RCC 
architecture implies that a network constructed by treating one network's hidden units as 
the input units of another, would not violate any of the connectivity constraints of RCC. 
In other words, if RCC could implement the automaton of lc), then it would also be able 
to implement a network which oscillates with a period of 4 under constant input. Since 
Giles et. al. proved that the latter cannot be the case, it must also be the case that RCC 
cannot implement the automaton of lc). 
The line of reasoning used here to prove that the FSA of Figure lc) is unrealizable can also 
be applied to many other automata. In fact, any automaton whose state oscillates with a 
period of more than 2 under input which oscillates with a period 2, could be used to 
construct one of the automata proven to be illegal by Giles. This implies that RCC cannot 
implement any automaton whose state oscillates with a period of greater than to =2 when 
its input oscillates with a period of p=2. 
6 AUTOMATA WITH CYCLES UNDER OSCILLATING INPUT 
Giles et. al.'s theorem can be viewed as defining a class of automata which cannot be 
implemented by the RCC architecture. The proof in Section 5 adds another class of 
automata which also cannot be realized. More precisely, the two proofs concern inputs 
which oscillate with periods of one and two respectively. It is natural to ask whether 
further proofs for state cycles can be developed when the input oscillates with a period of 
greater than two. We now present the central theorem of this paper, a unified definition 
of unrealizable automata: 
Theorem: If the input signal to a RCC network oscillates with a period, p, then the 
network can represent only those FSA whose outputs form cycles of length to, where 
pmodto=0 ifp is even and 2pmodto=O ifp is odd. 
To prove this theorem we will first need to prove a simpler one relating the rate of 
oscillation of the input signal to one node in an RCC network to the rate of oscillation of 
that node's output signal. By "the input signal to one node" we mean the weighted sum 
of all activations of all connected nodes (i.e. all input nodes, and all lower numbered 
hidden nodes), but not the recurrent signal. I.e.: 
j-I 
Z(t+l) --  Wo.a,(t+l ) . 
Using this definition, it is possible to rewrite the equation to compute the activation of node 
j (given in Section 2) as: 
a(t+l) -- rl(.(t+l)+Wjia(t) ) . 
But if we assume that the input signal oscillates with a period of p, then every value of 
;(t+ 1) can be replaced by one of a finite number of input signals (;o, ;, ;2, ... ;p4). In 
other words, ;(t+ 1) = ;modp. Using this substitution, it is possible to repeatedly expand 
the addend of the previous equation to derive the formula: 
a(t + 1) - o( tmodp + Wjj' 
O( (t-1)modp -{- %' 
O( (t-2)modp 'It' Wjj.... o( (t-p+l)modp 'It' Wjj.aj(t-p+ 1) )... ) ) ) 
Finite State Automata that Recurrent Cascade-Correlation Cannot Represent 617 
The unravelling of the recursive equation now allows us to examine the relationship 
between aj(t+ 1) and c(t-p+ 1). Specifically, we note that if  >0 or ifp is even then 
aj(t+ 1) = f(aj(t-p+ 1)) implies thatf is a monotonically increasing function. Furthermore, 
since o is a function with finite range, f must also have finite range. 
It is well known that for any monotonically increasing function with f'mite range, f, the 
sequence, f(x), fif(x)), fiJx))), ..., is guaranteed to monotonically approach a fixed point 
(where f(x)=x). This implies that the sequence, aj(t+l), c(t+p+ 1), a)(t+2p+ 1), ..., 
must also monotonically approach a fixed point (where a2(t+ 1) = aj.(t-p+ 1)). In other 
words, the sequence does not oscillate. Since every ph value of ai(t) approaches a fixed 
point, the sequence ai(t), a(t+ 1), a(t+2), ... can have a period of at most p, and must 
have a period which divides p evenly. We state this as our first lemma: 
Lemma 1: If )40 oscillates with even period, p, or if W > 0, then state unit j's activation 
value must oscillate with a period 60, where pmodto =0. 
We must now consider the case where W<0 and p is odd. In this case, 
a(t+ 1) = f(a(t-p+ 1)) implies thatf is a monotonically decreasing function. But, in this 
situation the function f 2(x)=ri0fix)) must be monotonically increasing with finite range. 
This implies that the sequence: aj(t+ 1), aj(t+2p+ 1), a(t+4p+ 1), ..., must monotonically 
approach a fixed point (where a(t+ 1)=a(t-2p+ 1)). This in turn implies that the sequence 
a(O, aj(t+ 1), a(t+2), ..., can have a period of at most 2p, and must have a period which 
divides 2p evenly. Once again, we state this result in a lemma: 
Lemma 2: If )(t) oscillates with odd period p, and if W<0, then state unit j must 
oscillate with a period 60, where 2pmodto =0. 
Lemmas 1 and 2 relate the rate of oscillation of the weighted sum of input signals and 
lower numbered unit activations, )(t) to that of unit j. However, the theorem which we 
wish to prove relates the rate of oscillation of only the RCC network's input signal to the 
entire hidden unit activations. To prove the theorem, we use a proof by induction on the 
unit number, i: 
Bas/s: Node i= 1 is connected only to the network inputs. Therefore, if the input signal 
oscillates with periodp, then node i can only oscillate with period to, where pmodto=0 if 
p is even and 2pmod6o=0 ifp is odd. (This follows from Lemmas 1 and 2). 
Assumption: If the input signal to the network oscillates with period p, then node i can 
only oscillate with period to, where pmod6o=0 ifp is even and 2pmodto=0 ifp is odd. 
Proof: If the Assumption holds for all nodes i, then Lemmas 1 and 2 imply that it must 
also hold for node i+ 1.[2] 
This proves the theorem: 
Theorem: If the input signal to a RCC network oscillates with a period, p, then the 
network can represent only those FSA whose outputs form cycles of length 60, where 
pmod6o=0 ifp is even and 2pmodto =0 ifp is odd. 
7 CONCLUSIONS 
It is interesting to note that both Giles et. al.'s original proof and the constructive proof by 
contradiction described in Section 5 are special cases of the theorem. Specifically, Giles 
et. al.'s original proof concerns input cycles of length p= 1. Applying the theorem of 
Section 6 proves that an RCC network can only represent those FSA whose state transitions 
form cycles of length to, where 2(1)mod6o=0, implying that state cannot oscillate with a 
period of greater than 2. This is exactly what Giles et. al concluded, and proves that 
(among others) the automaton of Figure lb) cannot be implemented by RCC. 
618 S.C. KREMER 
Similarly, the proof of Section 5 concerns input cycles of length p=2. Applying our 
theorem proves that an RCC network can only represent those machines whose state 
transitions form cycles of length , where (2)modo=0. This again implies that state 
cannot oscillate with a period greater than 2, which is exactly what was proven in Section 
5. This proves that the automaton of Figure lc) (among others) cannot be implemented 
by RCC. 
In addition to unifying both the results of Giles et. al. and Section 5, the theorem of 
Section 6 also accounts for many other FSA which are not representable by RCC. In fact, 
the theorem identifies an infinite number of other classes of non-representable FSA (for 
p=3, p=4, p =5, ...). Each class itself of course contains an infinite number of machines. 
Careful examination of the automaton illustrated in Figure ld) reveals that it contains a 
state cycle of length 9 (qoqq2qq2q3q2q3q4qoqq2qq2q3q2q3q4...) in response to an input cycle 
of length 3 ("001001..."). Since this is not one of the allowable input/state cycle 
relationships def'med by the theorem, it can be concluded that the automaton of Figure ld) 
(among others) cannot be represented by RCC. 
Finally, it should be noted that it remains unknown if the classes identified by this paper's 
theorem represent the complete extent of RCC's computational limitations. Consider for 
example the automaton of Figure le). This device has no input/state cycles which violate 
the theorem, thus we cannot conclude that it is unrepresentable by RCC. Of course, the 
issue of whether or not this particular automaton is representable is of little interest. 
However, the class of automata to which the theorem does not apply, which includes 
automaton le), requires further investigation. Perhaps all automata in this class are 
representable; perhaps there are other subclasses (not identified by the theorem) which 
RCC cannot represent. This issue will be addressed in future work. 
References 
N. Alon, A. Dewdney, and T. Ott, Efficient simulation of finite automata by neural nets, 
Journal of the Association for Computing Machinery, 38 (2) (1991) 495-514. 
S. Fahlman, The recurrent cascade-correlation architecture, in: R. Lippmann, J. Moody 
and D. Touretzky, Eds., Advances in Neural Information Processing Systems 3 (Morgan 
Kaufmann, San Mateo, CA, 1991) 190-196. 
C.L. Giles, D. Chen, G.Z. Sun, H.H. Chen, Y.C. Lee, and M.W. Goudreau, 
Constructive Learning of Recurrent Neural Networks: Limitations of Recurrent Cascade 
Correlation and a Simple Solution, IEEE Transactions on Neural Networks, 6 (4) (1995) 
829-836. 
M. Goudreau, C. Giles, S. Chakradhar, and D. Chen, First-order v.s. second-order single 
layer recurrent neural networks, IEEE Transactions on Neural Networks, 5 (3) (1994) 511- 
513. 
J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages and 
Computation (Addison-Wesley, Reading, MA, 1979). 
S.C. Kremer, On the Computational Power of Elman-style Recurrent Networks, IEEE 
Transactions on Neural Networks, 6 (4) (1995) 1000-1004. 
H.T. Siegelmann and E.D. Sontag, On the Computational Power of Neural Nets, in: 
Proceedings of the Fifth ACM Workshop on Computational Learning Theory, (ACM, New 
York, NY, 1992) 440-449. 
