573 
BIT - SERIAL NEURAL NETWORKS 
Alan F. Murray, Anthony V. W. Smith and Zoe F. Buffer. 
Department of Electrical Engineering, University of Edinburgh, 
The King's Buildings, Mayfield Road, Edinburgh, 
Scoff and, EH9 3JL. 
ABSTRACT 
A bit - serial VLSI neural network is described from an initial architecture for a 
synapse array through to silicon layout and board design. The issues surrounding bit 
- serial computation, and analog/digital arithmetic are discussed and the parallel 
development of a hybrid analog/digital neural network is outlined. Learning and 
recall capabilities are reported for the bit - serial network along with a projected 
specification for a 64 - neuron, bit - serial board operating at 20 MHz. This tech- 
nique is extended to a 256 (2562 synapses) network with an update time of 3ms, 
using a "paging" technique to time - multiplex calculations through the synapse 
array. 
1. INTRODUCTION 
The functions a synthetic neural network may aspire to mimic are the ability to con- 
sider many solutions simultaneously, an ability to work with corrupted data and a 
natural fault tolerance. This arises from the parallelism and distributed knowledge 
representation which gives rise to gentle degradation as faults appear. These func- 
tions are attractive to implementation in VLSI and WSI. For example, the natural 
fault - tolerance could be useful in silicon wafers with imperfect yield, where the 
network degradation is approximately proportional to the non-functioning silicon 
area. 
To cast neural networks in engineering language, a neuron is a state machine that is 
either "on" or "off", which in general assumes intermediate states as it switches 
smoothly between these extrema. The synapses weighting the signals from a 
transmitting neuron such that it is more or less excitatory or inhibitory to the receiv- 
ing neuron. The set of synaptic weights determines the stable states and represents 
the learned information in a system. 
The neural state, Vi, is related to the total neural activity stimulated by inputs to 
the neuron through an activation function, F. Neural activity is the level of excita- 
tion of the neuron and the activation is the way it reacts in a response to a change 
in activation. The neural output state at time t, V[, is related to x! by 
V[ = F(x/) (1) 
The activation function is a "squashing" function ensuring that (say) Vi is 1 when 
xi is large and -1 when xi is small. The neural update function is therefore straight- 
forward: 
j =n -1 
x: *f= x[ ..... +8  T o V.[ (2) 
where 8 represents the rate of change of neural activity, Tq is the synaptic weight 
and n is the number of terms giving an n - neuron array [1]. 
Although the neural function is simple enough, in a totally interconnected n - neu- 
ron network there are n 2 synapses requiring n 2 multiplications and summations and 
 American Institute of Physics 1988 
574 
a large number of interconnects. The challenge in VLSI is therefore to design a sim- 
ple, compact synapse that can be repeated to build a VLSI neural network with 
manageable interconnect. In a network with fixed functionality, this is relatively 
straightforward. If the network is to be able to learn, however, the synaptic weights 
must be programmable, and therefore more complicated. 
2. DESIGNING A NEURAL NETWORK IN VLSI 
There are fundamentally two approaches to implementing any function in silicon - 
digital and analog. Each technique has its advantages and disadvantages, and these 
are listed below, along with the merits and demerits of bit - serial architectures in 
digital (synchronous) systems. 
Digital rs. analog: The primary advantage of digital design for a synapse array is 
that digital memory is well understood, and can be incorporated easily. Learning 
networks are therefore possible without recourse to unusual techniques or technolo- 
gies. Other strengths of a digital approach are that design techniques are advanced, 
automated and well understood and noise immunity and computational speed can 
be high. Unattractive features are that digital circuits of this complexity need to be 
synchronous and all states and activities are quantised, while real neural networks 
are asynchronous and unquantised. Furthermore, digital multipliers occupy a large 
silicon area, giving a low synapse count on a single chip. 
The advantages of analog circuitry are that asynchronous behaviour and smooth 
neural activation are automatic. Circuit elements can be small, but noise immunity 
is relatively low and arbitrarily high precision is not possible. Most importantly, no 
reliable analog, non - volatile memory technology is as yet readily available. For 
this reason, learning networks lend themselves more naturally to digital design and 
implementation. 
Several groups are developing neural chips and boards, and the following listing 
does not pretend to be exhaustive. It is included, rather, to indicate the spread of 
activity in this field. Analog techniques have been used to build resistor / opera- 
tional amplifier networks [2, 3] similar to those proposed by Hopfield and Tank [4]. 
A large group at Caltech is developing networks implementing early vision and 
auditory processing functions using the intrinsic nonlinearities of MOS transistors in 
the subthreshold regime [5, 6]. The problem of implementing analog networks with 
electrically programmable synapses has been addressed using CCD/MNOS technol- 
ogy [7]. Finally, Garth [8] is developing a digital neural accelerator board ('let- 
sire") that is effectively a fast SIMD processor with supporting memory and com- 
munications chips. 
Bit - serial rs. bit - parallel: Bit - serial arithmetic and communication is efficient 
for computational processes, allowing good communication within and between 
VLSI chips and tightly pipelined arithmetic structures. It is ideal for neural net- 
works as it minimises the interconnect requirement by eliminating multi - wire 
busses. Although a bit - parallel design would be free from computational latency 
(delay between input and output), pipelining makes optimal use of the high bit - 
rates possible in serial systernx, and makes for efficient circuit usage. 
2.1 An asynchronous pulse stream VLSI neural network: 
In addition to the digital system that forms the substance of this paper, we are 
developing a hybrid analog/digital network family. This work is outlined here, and 
has been reported in greater detail elsewhere [9, 10, 11]. The generic (logical and 
layout) architecture of a single network of n totally interconnected neurons is shown 
575 
schematically in figure 1. Neurons are represented by circles, which signal their 
states, Vi upward into a matrix of synaptic operators. The state signals are con- 
neeted to a n - bit horizontal bus running through the synaptic array, with a con- 
nection to each synaptic operator in every column. All columns have n operators 
(denoted by squares) and each operator adds its synaptic contribution, T 0 Vj, to the 
running total of activity for the neuron i at the foot of the column. The synaptic 
function is therefore to multiply the signalling neuron state, Vi, by the synaptic 
weight, T0, and to add this product to the running total. This architecture is com- 
mon to both the bit - serial and pulse - stream networks. 
I I _1 I 
Synapse 
States { Vj } 
Neurons 
Figure 1. Generic architecture for a network of n totally interconnected neurons. 
This type of architecture has many attractions for implementation in 2 - dimensional 
j =n -1 
silicon as the summation  To V  is distributed in space. The interconnect 
requirement (n inputs to each neuron) is therefore distributed through a column, 
reducing the need for long - range wiring. The architecture is modular, regular and 
can be easily expanded. 
In the hybrid analog/digital system, the circuitry uses a "pulse stream" signalling 
method similar to that in a natural neural system. Neurons indicate their state by 
the presence or absence of pulses on their outputs, and synapfic weighting is 
achieved by time - chopping the presynaptic pulse stream prior to adding it to the 
postsynaptic activity summation. It is therefore asynchronous and imposes no fun- 
damental limitations on the activation or neural state. Figure 2 shows the pulse 
stream mechanism in more detail. The synaptic weight is stored in digital memory 
local to the operator. Each synaptic operator has an excitatory and inhibitory pulse 
stream input and output. The resultant product of a synaptic operation, To V, is 
added to the running total propagating down either the excitatory or inhibitory 
channel. One binary bit (the MSBit) of the stored T 0 determines whether the con- 
tribution is excitatory or inhibitory. 
The incoming excitatory and inhibitory pulse stream inputs to a neuron are 
integrated to give a neural activation potential that varies smoothly from 0 to 5 V. 
This potential controls a feedback loop with an odd number of logic inversions and 
576 
EXe. Znh. EXe. 
.. ! ,.,,,, 
Yi 
Figure 2. Pulse stream arithmetic. Neurons are denoted by C) and synaptic operators 
byE. 
thus forms a switched "ring - oscillator". If the inhibitory input dominates, the feed- 
back loop is broken. If excitatory spikes subsequently dominate at the input, the 
neural activity rises to 5V and the feedback loop oscillates with a period determined 
by a delay around the loop. The resultant periodic waveform is then converted to a 
series of voltage spikes, whose pulse rate represents the neural state, Vi. Interest- 
ingly, a not dissimilar technique is reported elsewhere in this volume, although the 
synapse function is executed differently [12]. 
3. A 5 - STATE BIT - SERIAL NEURAL NETWORK 
The overall architecture of the 5 - state bit - serial neural network is identical to 
that of the pulse stream network. It is an array of n 2 interconnected synchronous 
synaptic operators, and whereas the pulse stream method allowed Vi to assume all 
values between "off" and "on", the 5 - state network Vj is constrained to 0, -0.5 or 
-1. The resultant activation function is shown in Figure 3. Full digital multiplica- 
tion is costly in silicon area, but multiplication of T by Vj = 0.5 merely requires 
the synaptic weight to be right - shifted by 1 bit. Similarly, multiplication by 0.25 
involves a further right - shift of T 0, and multiplication by 0.0 is trivially easy. Vj 
< 0 is not problematic, as a switchable adder/subtractor is not much more complex 
than an adder. Five neural states are therefore feasible with circuitry that is only 
slightly more complex than a simple serial adder. The neural state expands from a 1 
bit to a 3 bit (5 - state) representation, where the bits represent "add/subtract?", 
"shift?" and "multiply by 07". 
Figure 4 shows part of the synaptic array. Each synaptic operator includes an 8 bit 
shift register memory block holding the synaptic weight, T 0. A 3 bit bus for the 5 
neural states runs horizontally above each synaptic row. Single phase dynamic 
CMOS has been used with a clock frequency in excess of 20 MHz [13]. Details of 
a synaptic operator are shown in figure 5. The synaptic weight T 0 cycles around the 
shift register and the neural state Vj is present on the state bus. During the first 
clock cycle, the synaptic weight is multiplied by the neural state and during the 
second, the most significant bit (MSBit) of the resultant T o Vj is sign - extended for 
577 
THRESHOLD 
Sme 
--- AclivRy x 
State Vj harper" 
State V SIGM .)./..o' _ 
-'' ' - Activity xj 
x t 
Figure 3. "Hard - threshold", 5 - state and sigmoid activation functions. 
j = -ITsjV j 
j- 10 
Figure 4. Section of the synaptic array of the 5 - state activation function neural net- 
work. 
8 bits to allow for word growth in the running summation. A least significant bit 
(LSBit) signal running down the synaptic columns indicates the arrival of the LSBit 
of the xi running total. If the neural state is -+0.5 the synaptic weight is right 
shined by 1 bit and then added to or subtracted from the running total. A multipli- 
cation of -+ 1 adds or subtracts the weight from the total and multiplication by 0 
578 
o.o 
Add/Subtract 
Add/ 
Subtract 
J= ,-ITiiV j 
J:=p 
Figure 5. The synaptic operator with a 5 - state activation function. 
does not alter the running summation. 
The final summation at the foot of the column is thresholded externally according 
to the 5 - state activation function in figure 3. As the neuron activity xj, increases 
through a threshold value x,, ideal sigmoidal activation represents a smooth switch 
of neural state from -1 to 1. The 5 - state "staircase" function gives a superficially 
much better approximation to the sigrnoid 
ment) threshold function. The sharpness 
"tune" the neural dynamics for learning and 
referred to as temperature by analogy with 
form. High "temperature" gives a smoother 
form than a (much simpler to imple- 
of the transition can be controlled to 
computation. The control parameter is 
statistical functions with this sigmoidal 
staircase and sigrnoid, while a tempera- 
ture of 0 reduces both to the 'Iopfield" - like threshold function. The effects of 
temperature on both learning and recall for the threshold and 5 - state activation 
options are discussed in section 4. 
4. LEARNING AND RECALL WITH VLSI CONSTRAINTS 
Before implementing the reduced - arithmetic network in VLSI, simulation experi- 
ments were conducted to verify that the 5 - state model represented a worthwhile 
enhancement over simple threshold activation. The 'qaenchmark" problem was 
chosen for its ubiquitousness, rather than for its intrinsic value. The implications 
for learning and recall of the 5 - state model, the threshold (2 - state) model and 
smooth sigrnoidal activation ( o _ state) were compared at varying temperatures 
with a restricted dynamic range for the weights T 0 . In each simulation a totally 
interconnected 64 node network attempted to learn 32 random patterns using the 
delta rule learning algorithm (see for example [14]). Each pattern was then cor- 
rupted with 25% noise and recall attempted to probe the content addressable 
memory properties under the three different activation options. 
During learning, individual weights can become large (positive or negative). When 
weights are "driven" beyond the maximum value in a hardware implementation, 
579 
which is determined by the size of the synaptic weight blocks, some limiting 
mechanism must be introduced. For example, with eight bit weight registers, the 
limitation is -128 <- Tii <-- 127. With integer weights, this can be seen to be a prob- 
lem of dynamic range, where it is the relafonship between the smallest possible 
weight (-1) and the largest (+ 127/-128) that is the issue. 
Results: Fig. 6 shows examples of the results obtained, studying learning using 5 - 
state activation at different temperatures, and recall using both 5 - state and thres- 
hold activation. At temperature T=0, the 5 - state and threshold models are 
degenerate, and the results identical. Increasing smoothness of activation (tempera- 
ture) during learning improves the quality of learning regardless of the activation 
function used in recall, as more patterns are recognised succeasfully. Using 5 - state 
activation in recall is more effective than simple threshold activation. The effect of 
dynamic range restrictions can be assessed from the horizontal axis, where T:? is 
shown. The results from these and many other experiments may be summarised as 
follows:- 
5 - Stste activation rs. threshold: 
1) Learning with 5 - state activation was protracted over the threshold activation, 
as binary patterns were being learnt, and the inclusion of intermediate values 
added extra degrees of freedom. 
2) Weight sets learnt using the 5 - state activation function were 'l>etter" than 
those learnt via threshold activation, as the recall properties of both 5 - state 
and threshold networks using such a weight set were more robust against 
noise. 
3) Full sigmoidal activation was better than 5 - state, but the enhancement was 
less significant than that incurred by moving from threshold --, 5 - state. This 
suggests that the law of diminishing returns applies to addition of levels to the 
neural state Vj. This issue has been studied mathematically [15], with results 
that agree qualitatively with ours. 
Weight Saturation: 
Three methods were tried to deal with weight saturation. Firstly, inclusion of a 
decay, or "forgetting" term was included in the learning cycle [1]. It is our view 
that this technique can produce the desired weight limiting property, but in the time 
available for experiments, we were unable to "tune" the rate of decay sufficiently 
well to confirm it. Renormalisafion of the weights (division to bring large weights 
back into the dynamic range) was very unsuccessful, suggesting that information 
distributed throughout the numerically small weights was being destroyed. Finally, 
the weights were allowed to "clip" (ie any weight outside the dynamic range was set 
to the maximum allowed value). This method proved very successful, as the learn- 
ing algorithm adjusted the weights over which it still had control to compensate for 
the saturation effect. It is interesting to note that other experiments have indicated 
that Hopfield nets can "forget" in a different way, under different learning control, 
giving preference to recently acquired memories [16]. The results from the satura- 
tion experiments were:- 
I) For the 32 pattern/64 node problem, integer weights with a dynamic range 
greater than _ 30 were necessary to give enough storage capability. 
2) For weights with maximum values Ti? -- 50--70, "clipping" occurs, but net- 
work performance is not seriously degraded over that with an unrestricted 
weight set. 
580 
15 
0 
15 
0 20 30 40 50 60 70 
Limit 
5 - state activation function recall 
T-- 30 ....... 
T=20 
T=10 ............ 
T=O 
0 20 30 40 50 60 70 
Limit 
"Hopfield" activation function recall 
Figure 6. Recall of patterns learned with the 5 - state activation function and subse- 
quently restored using the 5-state and the hard - threshold activation functions. 
T is the "temperature", or smoothness of the activation function, and "limit" the value 
off,?. 
These results showed that the 5 - state model was worthy of implementation as a 
VLSI neural board, and suggested that 8 - bit weights were sufficient. 
5. PROJECTED SPECIFICATION OF A HARDWARE NEURAL BOARD 
The specification of a 64 neuron board is given here, using a 5 - state bit - serial 64 
x 64 synapse array with a derated clock speed of 20 MHz. The synaptic weights are 
8 bit words and the word length of the running summation x, is 16 bits to allow for 
growth. A 64 synapse column has a computational latency of 80 clock cycles or 
bits, giving an update time of 4xs for the network. The time to load the weights 
into the array is limited to 60xs by the supporting RAM, with an access time of 
120ns. These load and update times mean that the network is executing I x 10 9 
operations/second, where one operation is --+ T,jVj. This is much faster than a 
natural neural network, and much faster than is necessary in a hardware accelera- 
tor. We have therefore developed a "paging" architecture, that effectively "trades 
off' some of this excessive speed against increased network size. 
A "moving - patch" neural board: An array of the 5 - state synapses is currently 
being fabricated as a VLSI integrated circuit. The shift registers and the 
adder/subtractor for each synapse occupy a disappointingly large silicon area, allow- 
ing only a 3 x 9 synaptic array. To achieve a suitable size neural network from this 
array, several chips need to be included on a board with memory and control circu- 
itry. The "moving patch" concept is shown in figure 7, where a small array of 
synapses is passed over a much larger n x n synaptic array. 
Each time the array is "moved" to represent another set of synapses, new weights 
must be loaded into it. For example, the first set of weights will be Tu ... 
... T2j to Tj, the second set Tj+L to T, etc.. The final weight to be loaded will be 
581 
0000 
oooo 
n neurons -. nxn synaptic array 
Small Patch" 
moves over array 
Figure 7. The "moving patch" concept, passing a small synaptic 'oatch" over a larger 
nxn synapse array. 
T,. Static, off - the - shelf RAM is used to store the weights and the whole opera- 
tion is pipelined for maximum efficiency. Figure 8 shows the board level design for 
the network. 
Synaptic Accelerator Chips 
HOST 
Figure 8. A "moving patch" neural network board. 
The small "patch" that moves around the array to give n neurons comprises 4 VLSI 
synaptic accelerator chips to give a 6 x 18 synaptic array. The number of neurons to 
be simulated is 256 and the weights for these are stored in 0.5 Mb of RAM with a 
load time of 8ms. For each "patch" movement, the partial runninz summatin-,  
582 
calculated for each column, is stored in a separate RAM until it is required to be 
added into the next appropriate summation. The update time for the board is 3ms 
giving 2 x 10 ? operations/second. This is slower than the 64 neuron specification, 
but the network is 16 times larger, as the arithmetic elements are being used more 
efficiently. To achieve a network of greater than 256 neurons, more RAM is 
required to store the weights. The network is then slower unless a larger number of 
accelerator chips is used to give a larger moving "patch". 
6. CONCLUSIONS 
A strategy and design method has been given for the construction of bit - serial 
VLSI neural network chips and circuit boards. Bit - serial arithmetic, coupled to a 
reduced arithmetic style, enhances the level of integration possible beyond more 
conventional digital, bit - parallel schemes. The restrictions imposed on both synal> 
tic weight size and arithmetic precision by VLSI constraints have been examined 
and shown to be tolerable, using the associative memory problem as a test. 
While we believe our digital approach to represent a good compromise between 
arithmetic accuracy and circuit complexity, we acknowledge that the level of 
integration is disappoinfingly low. It is our belief that, while digital approaches 
may be interesting and useful in the medium term, essentially as hardware accelera- 
tors for neural simulations, analog techniques represent the best ultimate option in 2 
- dimensional silicon. To this end, we are currently pursuing techniques for analog 
pseudo - static memory, using standard CMOS technology. In any event, the full 
development of a nonvolatile analog memory technology, such as the MNOS tech- 
nique [7], is key to the long - term future of VLSI neural nets that can learn. 
7. ACKNOWLEDGEMENTS 
The authors acknowledge the support of the Science and Engineering Research 
Council (UK) in the execution of this work. 
References 
o 
S. Grossberg, "Some Physiological and Biochemical Consequences of Psycho- 
logical Postulates," Proc. Natl. Acad. Sci. USA, vol. 60, pp. 758 - 765, 1968. 
H. P. Graf, L. D. Jackel, R. E. Howard, B. Straughn, J. S. Denker, W. 
Hubbard, D. M. Tennant, and D. Schwartz, "VLSI Implementation of a 
Neural Network Memory with Several Hundreds of Neurons," Proc. AlP 
Conference on Neural Networks for Computing, Snowbird, pp. 182 - 187, 1986. 
W. $. Mackie, H. P. Graf, and J. S. Denker, "Microelectronic Implementa- 
tion of Connectionist Neural Network Models," IEEE Conference on Neural 
Information Processing Systems, Denver, 1987. 
J. J. Hopfield and D. W. Tank, "Neural" Computation of Decisions in Optim- 
isation Problems," Biol. Cybern., vol. 52, pp. 141 - 152, 1985. 
M. A. Sivilotti, M. A. Mahowald, and C. A. Mead, Real - Time Visual Com- 
putations Using Analog CMOS Processing Arrays, 1987. To be published 
C. A. Mead, "Networks for Real o Time Sensory Processing," IEEE Confer- 
ence on Neural Information Processing Systems, Denver, 1987. 
583 
7. J.P. Sage, K. Thompson, and R. S. Withers, "An Artificial Neural Network 
Integrated Circuit Based on MNOS/CCD Principles," Proc. AIP Conference on 
Neural Networks for Computing, Snowbird, pp. 381 - 385, 1986. 
8. S.C.J. Garth, "A Chipset for High Speed Simulation of Neural Network Sys- 
tems," IEEE Conference on Neural Networks, San Diego, 1987. 
9. A.F. Murray and A. V. W. Smith, "A Novel Computational and Signalling 
Method for VLSI Neural Networks," European Solid State Circuits Conference 
,1987. 
10. A. F. Murray and A. J. W. Smith, "Asynchronous Arithmetic for VLSI 
Neural Systems," Electronics Letters, vol. 23, no. 12, p. 642, June, 1987. 
11. A. F. Murray and A. V. W. Smith, "Asynchronous VLSI Neural Networks 
using Pulse Stream Arithmetic," IEEE Journal of Solid-State Circuits and Sys- 
tems, 1988. To be published 
12. M.E. Gaspar, "Pulsed Neural Networks: Hardware, Software and the Hop- 
field AfD Converter Example," IEEE Conference on Neural Information Pro- 
cessing Systems, Denver, 1987. 
13. M. S. McGregor, P. B. Denyet, and A. F. Murray, "A Single - Phase Clock- 
ing Scheme for CMOS VLSI," Advanced Research in VLSI: Proceedings of the 
1987 Stanford Conference, 1987. 
14. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning Internal 
Representations by Error Propagation," Parallel Distributed Processing : 
Explorations in the Microstructure of Cognition, vol. 1, pp. 318 - 362, 1986. 
15. M. Fleisher and E. Levin, "The Hopfiled Model with Multilevel Neurons 
Models," IEEE Conference on Neural Information Processing Systems, Denver, 
1987. 
16. G. Parisi, "A Memory that Forgets," J. Phys. A : Math. Gert., vol. 19, pp. 
L617 - L620, 1986. 
