474 
OPTIMIZATION WITH ARTIFICIAL NEURAL NETWORK SYSTEMS: 
A MAPPING PRINCIPLE 
AND 
A COMPARISON TO GRADIENT BASED METHODS * 
Harrison MonFook Leong 
Research Institute for Advanced Computer Science 
NASA Ames Research Center 230-5 
Moffett Field, CA, 94035 
ABSTRACT 
General formulae for mapping optimization problems into systems of ordinary differential 
equations associated with artificial neural networks are presented. A comparison is made to optim- 
ization using gradient-search methods. The performance measure is the settling time from an initial 
state to a target state. A simple analytical example illustrates a situation where dynamical systems 
representing artificial neural network methods would settle faster than those representing gradient- 
search. Settling time was investigated for a more complicated optimization problem using com- 
puter simulations. The problem was a simplified version of a problem in medical imaging: deter- 
mining loci of cerebral activity from electromagnetic measurements at the scalp. The simulations 
showed that gradient based systems typically setfled 50 to 100 times faster than systems based on 
current neural network optimization methods. 
INTRODUCTION 
Solving optimization problems with systems of equations based on neurobiological principles 
has recently received a great deal of attention. Much of this interest began when an artificial 
neural network was devised to find near-optimal solutions to an np-complete problem t3. Since 
then, a number of problems have been mapped into the same acdficial neural network and varia- 
tions of it to.3.,.7.8.9.2L23.2, In this paper, a unifying principle underlying these mappings is 
derived for systems of first to n th-order ordinary differential equations. This mapping principle 
bears similarity to the mathematical tools used to generate optimization methods based on the gra- 
dient. In view of this, it seemed important to compare the optimization efficiency of dynamical 
systems constructed by the neural network mapping principle with dynamical systems constructed 
from the gradient. 
THE PRINCIPLE 
This paper concerns itself with networks of computational units having a state variable v, a 
function f that describes how a unit is driven by inputs, a linear ordinary differential operator with 
constant coefficients D (v) that describes the dynamical response of each unit, and a function g that 
describes how the output of a computational unit is determined from its state v. In particular, the 
paper explores how outputs of the computational units evolve with time in terms of a scalar func- 
tion E, a single state variable for the whole network. Fig. 1 summarizes the relationships between 
variables, functions, and operators associated with each computational unit. Eq. (1) summarizes the 
equations of motion for a network composed of such units: 
tY()(v ) = j'(g (vO ..... gN(vN ) ) 
where the i th element of/(t) is D(t)(v;) ' superscript (M) denotes that operator D is M tn order, 
the i tn element of f is fi(gt(vt) ..... gv(vv)), and the network is comprised of N computa- 
tional units. The network of Hopfield t2 has M=I, functions f are weighted linear sums, and func- 
tions  (where the i  element of  is gi(vi) ) are all the same sigmoJd function. We will exam- 
ine two ways of defining functions  given a function F. Along with these definitions will be 
Work supported by NASA Cooperative Agreement No. NCC 2408 
American Institute of Physics 1988 
475 
deftned corresponding functions E that will be used to describe the dynamics of Eq. (1). 
The first method corresponds to optimization methods introduced by artificial neural network 
research. It will be referred to as method V ("dell g"): 
-- 
with associated E function 
Here, 
dr,, (s) 
dt (2b) 
denotes the gradient of H, where partials are taken with respect to variables of 2?, and 
Ee denotes the E function associated with gradient operator Ve. With appropriate operator D and 
functions f and , EF is simply the "energy function" of Hopfield 2. Note that F_,q. (2a) makes 
explicit that we will only be concerned with f'" that can be derived from scalar potential functions. 
For example, this restriction excludes artificial neural networks that have connections between exci- 
tatory and inhibitory units such as that of Freeman 8. The second method corresponds to optimiza- 
tion methods based on the gradient. It will be referred to as method V v, ("dell v"): 
f = V-,F (3a) 
with associated E function 
t N 
= F() - ,. 9(U)(v:(s)) 
to that for Eqs. (2). 
dvi(s) ] dvi(s) 
dt J dt ds 
(3b) 
 computational unit i: 
transform that determines unit i's 
g;(, output from state variable vi 
/t D (v:)  differential operator specifying the 
\ - v(] function governing how inputs to 
 //'- unit i are combined to drive it 
Figure 1: Schematic of a computational unit i from which net- 
where notation is analogous 
The critical result 
that allows us to map 
optimization problems into 
networks described by Eq. 
(1) is that conditions on the 
constituents of the equation 
can be chosen so that along 
any solution trajectory, the 
E function corresponding 
to the system will be a 
monotonic function of time. 
For method Vxv, here are 
the conditions: all func- 
tions  axe 1) differentiable 
and 2) monotonic in the 
same sense. Only the first 
works considered in this paper are constructed. Trianes suggest 
condition is needed to 
make a similar assertion for connections between computational units. 
method V. When these conditions are met and when solutions of Eq. (1) exist, the dynamical sys- 
tems can he used for optimization. The appendix contains proofs for the monotonicity of function 
E along solution trajectories and references necessary existence theorems. In conclusion, mapping 
optimization problems onto dynamical systems summarized by Eq. (1) can he reduced to a matter 
of differentiation if a scalar function representation of the problem can be found and the integrals 
of Eqs. (2b) and (3b) are ignorable. This last assumption is certainly upheld for the case where 
operator D has no derivatives less than M t" order. In simulations below, it will be observed to 
hold for the case M=I with a nonzero 0 t" order derivative in D. (Also see Lapedes and Farber 9.) 
PERSPECTIVES OF RECENT WORK 
476 
The formulations above can be used to classify the neural network optimization techniques 
used in several recent studies. In these studies, the functions  were all identical. For the most 
part, following Hopfield's formulation, researchers o.3.1,.7,23. z have used method Vi to derive 
forms of Eq. (1) that exhibit the ability to find extrema of Ei with Ei quadratic in functions  and 
all functions  describable by sigrnoid functions such as tanh (x). However, several researchers 
have written about artificial neural networks associated with non-quadratic E functions. Method 
Vi has been used to derive systems capable of finding extrema of non-quadratic Ei 19. Method 
V has been used to derive systems capable of optimizing E- e where E- e were not necessarily qua- 
dratic in variables  2. A sort of hybrid of the two methods was used by leffery and Rosner a to 
find extrema of functions that were not quadratic. The important distinction is that their functions f 
were derived from a given function F using Eq. (3a) where, in addition, a sign definite diagonal 
matrix was introduced; the left side of Eq. (3a) was left multiplied by this matrix. A perspective 
on the relationship between all three methods to construct dynamical systems for optimization is 
summarized by Eq. (4) which describes the relationship between methods Vi and VN 
[as(v,) ]-, 
vf, = a,g [. (4) 
where diag [ xl ] is a diagonal matrix with xi as the diagonal element of row i. (A similar equation 
has been derived for quadratic F .) The relationship between the method of Jeffery and Rosner 
and V is simply F.q. (4) with the time dependent diagonal matrix replaced by a constant diagonal 
matrix of free parameters. It is noted that Jeffery and Rosner presented timing results that compared 
simulated annealing, conjugate-gradient, and artificial neural network methods for optimization. 
Their results axe not comparable to the results reported below since they used computation time as 
a performance measure, not settling times of analog systems. The perspective provided by F.q. (4) 
will be useful for anticipating the relative performance of methods Vi and V in the analytical 
example below and will aid in understanding the results of computer simulations. 
COMPARISON OF METHODS V AND 
When M=I and operator D has no 0 th order derivatives, method V is the basis of gradient- 
search methods of optimization. Given the long history of of such methods, it is important to know 
what possible benefits could be achieved by the relatively new optimization scheme, method Vi. 
In the following, the optimization efficiency of methods Vi,, ad V v, is compared by comparing set- 
tling times, the time required for dynamical systems described by Eq. (1) to traverse a continuous 
path to local optima. To qualify this performance measure, this study anticipates application to the 
creation of analog devices that would instantlate F.q. (1); hence, we are not interested in estimating 
the number of discrete steps that would be required to find local optima, an appropriate perfor- 
mance measure if the point was to develop new numerical methods. An analytical example will 
serve to illustrate the possibility of improvements in settling time by using method Vi instead of 
method V,. Computer simulations will be reported for more complicated problems following this 
example. 
For the analytical example, we will examine the case where all functions  are identical and 
g (v) = tanhG (v - Th ) (5) 
where G > 0 is the gain and Th is the threshold. Transforms similar to this are widely used in 
artificial neural network research. Suppose we wish to use such computational units to search a 
multi-dimensional binary solution space. We note that 
dg= O sech :O (v - Th ) (6) 
dv 
is near 0 at valid solution states (comers of a hypercube for the case of binary solution spaces). We 
see from Eq. O) that near a valid solution state, a network based on method Vi will allow compu- 
tational units to recede from incorrect states and approach correct states comparatively faster. Does 
477 
thin imply faster settling time for method V,? 
To obtain an analytical comparison of settling times, consider the case where M=I and 
operator D has no 0 tn order derivatives and 
1 Sij (tanhGvl XtanhGvj) (7) 
where matrix S is symmetric. Method V, gives network equations 
 = S tanhG (8) 
dt 
and method V, gives network equations 
d"V = diag [G sech 2Gvl ] S tanhG-v  (9) 
dt 
where tanhG-V denotes a vector with i's component tanhGv. For method V there is one stable 
point, i.e. where  = 0, at  = 0. For method V, the stable points are V' = 0 and V' V where 
V is the set of vectors with component values that are either +oo or --. Further trivialization 
allows for comparing estimates of settling times: Suppose S is diagonal. For this case, if v; = 0 is 
on the trajectory of any computational unit i for one method, v,. = 0 is on the trajectory of that unit 
for the other method; hence, a comparison of settling times can be obtained by comparing time 
estimates for a computational unit to evolve from near 0 to near an extremum or, equivalently, the 
converse. Specifically, let the interval be [o, 1-4] where 0<o<l-b and 0<b<l. For method Vv,, 
integrating velocity over time gives the estimate 
re = 
and for method V, the estimate is 
1- +in 'f2-b)--' 
(10) 
(11) 
From these estimates, method Vv will always take longer to satisfy the criterion for convergence: 
Note that only with the largest value for o, o = 1-4, is the first term of Eq. (10) zero; for any 
smaller o, this term is positive. Unfortunately, this simple analysis cannot be generalized to non- 
diagonal S. With diagonal S, all computational units operate independently. Hence, the derivation 
of -- is irrelevant with respect to convergence rates; convergence rate depends only on the diago- 
nal element of S having the smallest magnitude. In this sense, the problem is one dimensional. 
But for non-diagonal S, the problem would be, in general, multi-dimensional and, hence, the direc- 
tion of -- becomes relevant To compare settling times for non-diagonal S, computer simulations 
were done. These are described below. 
COMPUTER SIMULATIONS 
Methods 
The problem chosen for study was a much simplified version of a problem in medical imag- 
ing: Given electromagnetic field measurements taken from the human scalp, identify the location 
and magnitude of cerebral activity giving rise to the fields. This problem has received much atten- 
tion in the last 20 years 3,6.7. The problem, sufficient for our purposes here, was reduced to the 
following problem: given a few samples of the electric potential field at the surface of a spherical 
conductor wit_hin which reside several static electric dipoles, identify the dipole locations and 
moments. For this situation, there is a closed form solution for electric potential fields at the 
478 
spherical surface: 
(12) 
where  is the electric potential at the spherical conductor surface, ,,,,vt, is the location of the 
sample point ( :e denotes a vector,  the corresponding unit vector, and x the corresponding vector 
magnitude), ,. is the dipole moment of dipole i, and  is the vector from dipole i to :eja,,,, (This 
equafon can be derived from one derived by Brody, Terry, and Ideker a ). Fig. 2 facilitates pictur- 
ing these relationships. 
Figure 2: Vectors of Eq. (12). 
With this analytical solution, the problem was for- 
mulated as a least squares minimization problem where 
the variables were dipole moments. In short, the fol- 
lowing process was used: A dipole model was chosen. 
This model was used with Eq. (12) to calculate poten- 
6als at points on a sphere which covered about 60% of 
the surface. A cluster of internal locations that encom- 
passed the locations of the model was specified. The 
two optimization techniques were then required to deter- 
mine dipole moment values at cluster locations such 
that the collection of dipoles at cluster locations accu- 
rately reflected the dipole distribution specified by the 
model. 
This was to be done given only the potential values at the sample points and an initial guess of 
dipole moments at cluster locations. The optimization systems were to accomplish the task by 
m'mimizing the sum of squared differences between potentials calculated using the dipole model 
and potentials calculated using a guess of dipole moments at cluster locatious where the sum is 
taken over all sample points. Further simplifications of the problem included 
1) choosing the dipole model locations to correspond exactly to various locations of the cluster, 
2) requiring dipole model moments to .be 1, 0, or -1, and 
3) representing dipole moments at cluster locations with two bit binary numbers. 
To describe the dynamical systems used, it suffices to specify operator D and functions  of 
Eq. (1) and function F used in Eqs. (2a) and (3a). Operator D was 
d 
D = -- + 1. (13) 
dt 
Eq. (5) with a multiplicative factor of ,4 was used for all functions . Hence, regarding 
simplification 3) above, each cluster location was associated with two computational units. Consid- 
ering simplification 2) above, dipole moment magnitude 1 would be represented by both computa- 
tional units being in the high state, for -1, both in the low state, and for 0, one in the high state and 
one in the low state. Regarding function F, 
F = , mea,ured()--tt,r() -- C , g(v) 2 (14) 
all jampie all computational 
pointj j unJ'tJ j 
where ,,aJrd is calculated from the dipole model and Eq. (12) (The subscript measured is used 
because the role of the dipole model is to simulate electric potentials that would be measured in a 
real world situation. In real world situations, we do not know the source distribution underlying 
,,asr,a.), c is an experimentally determined constant (.002 was used), and ctt is Eq. (12) 
where the sum of Eq. (12) is taken over all cluster locations and the ktn coordinate of the i tn clus- 
ter location dipole moment is 
,p,.,= Y, g(v:,,). (15) 
all bitJ b 
479 
Index j of Eq. (14) corresponds to one combination of indices ikb. 
Sample points, 100 of them, were scattered semi-uniformly over the spherical surface 
emphasized by horizontal shading in Fig. 3. Cluster locations, 11, and model dipoles, 5, were scat- 
tered within the subset of the sphere emphasized by vertical shading. For the dipole model used, 
10 dipole moment components were non-zero; hence, optimization techniques needed to hold 56 
dipole moment components at zero and set 10 components to correct non-zero values in order to 
correctly identify the dipole model underlying 
0.8 
Figure 3: 111ustration of the distribution of 
sample points on the surface of the spheri- 
cal conductor (horizontal shading) and the 
distribution of model dipole locations and 
cluster locations within the conductor 
(vertical shading). 
The dynamical systems corresponding to 
methods V and V were integrated using the 
forward Euler method (e.g. Press, Flannery, 
Teukolsky, and Vettefiing 22). Numerical 
methods were observed to be convergent exper- 
imentally: settling time and path length were 
observed to asymtotically approach stable 
values as step size of the numerical integrator 
was decreased over two orders of magnitude. 
Settling times, path lengths, and relative 
directions of travel were calculated for the two 
optimization methods using several different 
initial bit patterns at the cluster locations. In 
other words, the search was staxted at different 
comers of the hypercube comprising the space 
of acceptable solutions. One comer of the 
hypercube was chosen to be the target solution. 
(Note that a zero dipole moment has a degen- 
erate two bit representation in the dynamical 
systems explored; the target comer was arbitrarily chosen to he one of the degenerate solutions.) 
Note from Eq. (5) that for the network to reach a hypercube comer, all elements of  would have 
to be singular. For this reason, setfling time and other measures 
proximity of the computational units to their extremum states. 
Computations were done on a Sequent Balance. 
5 
Results 
Graph 1 shows results for exploring settling 
time as a function of extrernum depth, the minimum of 
the deviations of variables 5* from the threshold of 
functions . Extremum depth is reported in multiples 3 I 
of the width of functions . The term transition, used 
in the caption of Graph 1 and below, refers to the  2 
movement of a computational unit from one extremum 
state to the other. The calculations were done for two 
initial states, one where the output of 1 computational 
unit was set to zero and one where outputs of 13 com- 
putational units were set to zero; hence, 1 and 13, 0 
respectively, half transitions were required to reach 
the target hypercube comer. It can be observed that 
settling time increases faster for method Vv than that 
for method V, just as we would expect from consid- 
ering Eqs. (4) and (5). However, it can be observed 
that method Vw is still an order of magnitude faster 
even when extremum depth is 3 widths of functions 
. For the purpose of unambiguously identifying 
what hypercube comer the dynamical system settles 
were studied as a function of the 
0 I 2 3 
exi.emum depth 
Graph 1: settling time as a function of 
extremum depth. #: method V,, 1 half 
transition requited. *: method V, 13 
half transitions requixed. +: method 
Vv,, 1 half transition requixed. -: 
13 half transitions required. 
48O 
to, this extremum depth is more than adequate. 
Table 1 displays results for various initial conditions. Angles are reported in degrees. These 
measures refer to the angle between directions of travel in -space as specified by the two optimi- 
zation methods. The average angle reported is taken over all trajectory points visited by the numer- 
ical integrator. Irdfal angle is the angle at the beginning of the path. Parasite cost percentage is a 
measure that compares parasite cost, the integral in F. qs. (2b) and (3b), to the range of function F 
over the path: 
parasite cost % = 100x parasite cost 
I F. tit - Fiiti,,t I (16) 
transitions time relative path inifal Mean angle extremum parasite 
required time length angle (std dev) depth cost % 
I 0.16 100 6.1 68 76 (3.8) 2.3 0.22 
0.0016 1.9 76 (3.5) 2.3 0.039 
2 0.14 78 4.7 75 72 (4.3) 2.5 0.055 
0.0018 1.9 73 (4.1) 2.5 0.016 
3 0.15 71 4.7 74 71 (3.7) 2.3 0.051 
0.0021 2.1 72 (3.0) 2.5 0.0093 
7 0.19 59 4.6 63 69 (4.1) 2.4 0.058 
0.0032 2.4 71 (7.0) 2.7 0.0033 
10 0.17 49 3.8 60 63 (2.8) 2.5 0.030 
0.0035 2.5 64 (4.7) 2.8 0.00060 
13 0.80 110 9.2 39 77 (11) 2.3 0.076 
0.0074 3.2 71 (8.9) 2.7 0.0028 
Table 1: Settling time and other measurements for various required transitions. For 
each transition case, the upper row is for V and the lower row is for V. Std der 
denotes standard deviation. See text for definition of measurement terms and units. 
Noting the differences in path length and angles reported, it is clear that the path taken to the target 
hypercube comer was quite different for the two methods. Method Vv settles from 1 to 2 orders of 
magnitude faster than method V, and usually takes a path less than half as long. These relation- 
ships did not change significantly for different values for c of Eq. (14) and coefficients of Eq. (13) 
(both unity in Eq. (13)). Values used favored method Vi. Parasite cost is consistently less 
significant for method V,, and is quite small for both methods. 
To further compare the ability of the optimization methods to solve the brain imaging prob- 
lem, a large variety of initial hypercube comers were tested. Table 2 displays results that suggest 
the ability of each method to locate the target comer or to converge to a solution that was con- 
sistent with the dipole model. Initial comers were chosen by randomly selecting a number of com- 
putational units and setting them to extremum states opposite to that required by the target solution. 
Five cases were run for each case of required transitions. It can be observed that the system based 
on method V v, is better at finding the target comer and is much better at finding a solution that is 
consistent with the dipole model. 
DISCUSSION 
The simulation results seem to contradict settling time predictions of the second analytical 
example. It is intuitively clear that there is no contradiction when considering the analytical exam- 
ple as a one dimensional search and the simulations as multi-dimensional searches. Consider Fig. 4 
which ilhstmtes one dimensional search starting at point I. Since both optimizafon methods must 
decrease function E monotonically, both must head along the same path to the minimum point A. 
Now consider Fig. 5 which illustrates a two dimensional search starting at point I: Here, the two 
methods needn't follow the same paths. The two dashed paths suggest that method V 1, can still be 
481 
transitions I V, 
required ferent dipole !different target different dipole different target 
solution comer comer,. solution comer comer 
3 I 0 4 0 0 5 
4 1 ! 3 0 1 4 
6 2 1 2 0 1 [ 4 
13 5 0 0 I 3 J 1 
20 5 0 0 0 5 I 0 
-' 26 5 0 0 2 3 I 0 
33 5 I 0 0 3 2t0 
40 5 0 0 3 { 2 i 0 
i o o I 2 !3t.o 
I o o I I L,o t 
Table 2: Solutions found starting from various initial conditions, five cases for each 
transition case. Different dipole solution indicates that the system assigned non-zero 
dipole moments at cluster locations that did not correspond to locations of the dipole 
model sources. Different corner indicates the solution was consistent with the dipole 
model but was not the target hypercube comer. Target corner indicates that the solu- 
tion was the target solution. 
monotonically decreasing E while traversing a more circui- 
tous route to minimum B or traversing a path to minimum 
A. The longer path lengths reported in Table 1 for method 
V, suggest the occurrence of the former. The data of Table 
2 verifies the occurrence of the latter: Note that for many 
cases where the system based on method Ve settled to the 
target comer, the system based on method V, settled to some 
other minimum. 
Would we observe similar differences in optimization 
efficiency for other optimization problems that also have 
binary solution spaces.'? A view that supports the plausibility 
of the affirmative is the following: Consider Eq. (4) and Eq. 
(5). We have already made the observation that method V,, 
would slow convergence into extrema of functions . We 
have observed this experimentally via Graph 1. These obser- 
vations suggest that computational units of V systems 
tend to stay closer to the transition regions of functions  
compared to computational units of Vi systems. It seems 
plausible that this property may allow Vv systems to avoid 
advancing too deeply toward ineffective solutions and, hence, 
allow the systems to approach effective solutions more 
efficiently. This behavior might also be the explanation for 
the comparative success of method Vv revealed in Table 2. 
E 
v 
Figure 4: One dimensional search 
for minima. 
v x t -- 
Ytgure 5: Two dimensional search 
for re'raima. 
Regarding the construction of electronic circuitry to instantiate .Eq. (1), systems based on 
method V v, would require the introduction of a component implementing multiplication by the 
derivative of functions . This additional complexity may hinder the use of method V v, for the 
482 
(a) 
Input 
? 
Ourput 
Input 
Ourput 
Figure 6: Schematized circuits for a com- 
putational unit. Notation is consistent 
with Horowitz and Hill . Shading of 
amplifiers is to earmark components 
referred to in the text. a) Computational 
unit for method Vr. b) Computational 
unit for method V v. 
construction of analog circuits for optimization. To 
illustrate the extent of this additional complexity, Fig. 
6a shows a schematized circuit for a computational 
unit of method V and Fig. 6b shows a schematized 
circuit for a computational unit of method Vv The 
simulations reported above suggest that there may be 
problems for which improvements in settling time 
may offset complications that might come with added 
circuit complexity. 
On the problem of imaging cerebral activity, the 
restfits above suggest the possibility of constructing 
analog devices to do the job. Consider the problem of 
analyzing electric potentials from the scalp of one per- 
son: It is noted that the measured electric potentials, 
measured, appear as linear coefficients in F of Eq. 
(lz0; hence, they would appear as constant terms in  
of Eq. (1). Thus, ,a,,r,d would be implemented as 
amplifier biases in the circuits of Figs. 6. This is a 
significant benefit. To understand this, note that func- 
tion f,. of Fig. 1 corresponding to the optimization of 
function F of Eq. (14) would involve a weighted 
linear sum of inputs gi(vi) ..... gtv(vtv). The weights 
would be the nonlinear coefficients of Eq. (lz0 and correspond to the strengths of the connections 
shown in Fig. 1. These connection strengths need only be calculated once for the person and can 
then be set in hardware using, for example, a resistor network. Electric potential measurements 
could then be analyzed by simply using the measurements to bias the input to shaded amplifiers of 
Figs. 6. For initialization, the system can be initialized with all dipole moments at zero (the 10 
transition case in Table 1). This is a reasonable first guess if it is assumed that cluster locations are 
far denser than the loci of cerebral activity to be observed. For subsequent measurements, the solu- 
tion for immediately preceding measurements would be a reasonable initial state if it is assumed 
that cerebral activity of interest waxes and wanes continuously. 
Might non-invasive real time imaging of cerebral activity be possible using such optimization 
devices? Results of this study are far from adequate for answering this question. Many complexi- 
ties that have been avoided may nullify the practicality of the idea. Among these problems are: 
1) The experiment avoided the possibility of dipole sources actually occurring at locations other 
than cluster locations. The minimization of function F of Eq. (lZ) may circumvent this 
problem by employing the superposition of dipole moments at neighboring cluster locations 
to give a sufficient model in the mean. 
2) The experiment assumed a very restricted range of dipole strengths. This might be dealt 
with by increasing the number of bits used to represent dipole moments. 
3) The conductor model, a homogeneously conducting sphere, may not be sufficient to model 
the human head 6. Non-sphericity and major inhomogeneities in conductivity can be dealt 
with, to a certain extent, by replacing Eq. (12) with a generalized equation based on a 
numerical approximation of a boundary integral equation 20 
z) The cerebral activity of interest may not be observable at the scalp. 
5) Not all forms of cerebral activity give rise to dipolar sources. (For example, this is well 
known in olfactory cortex s.) 
6) Activity of interest may be overwhelmed by irrelevant activity. Many methods have been 
devised to contend with this problem (For example, Gevins and Morgan 9.) 
Clearly, much theoretical work is left to be done. 
CONCLUDING REMARKS 
483 
In this study, the mapping principle underlying the application of artificial neural networks to 
the optimization of multi-dimensional scalar functions has been stated explicitly. Hopfield 2 has 
shown that for some scalar functions, i.e. functions F quadratic in functions , this mapping can 
lead to dynamical systems that can be easily implemented in hardware, notably, hardware that 
requires electronic components common to semiconductor technology. Here, mapping principles 
that have been known for a considerably longer period of time, those underlying gradient based 
optimization, have been shown capable of leading to dynamical systems that can also be imple- 
mented using semiconductor hardware. A problem in medical imaging which requires the search of 
a multi-dimensional surface full of local extrema has suggested the superiority of the latter mapping 
principle with respect to settling time of the corresponding dynamical system. This advantage may 
be quite significant when searching for global extrema using techniques such as iterated descent 2 
or iterated genetic hill climbing  where many searches for local extrema are required. This advan- 
tage is further emphasized by the brain imaging problem: volumes of measurements can be 
analyzed without reconfiguring the interconnections between computational units; hence, the cost of 
developing problem specific hardware for finding local extrema may be justifiable. Finally, simula- 
tions have contributed plausibility to a possible scheme for non-invasively imaging cerebral 
activity. 
APPENDIX 
To show that for a dynamical system based on method Vi, E is a monotonic function of 
time given that all functions  are differentiable and monotonic in the same sense, we need to 
show that the derivative of E, with respect to time is semi-definite: 
dE_,,i)F,dgi  D(t)(v,. dvildg , 
 3gi dt ,. )- dt 
dt 
Substituting Eq. (2a), 
dE, -D (st)(vi) + 
dt  
Using Eq. (1), 
(^la) 
(Alb) 
(Ale) 
as needed. The appropriate inequality depends on the sense in which functions  are monotonic. 
In a similar manner, the result can be obtained for method V With the condition that functions  
axe differentiable, we can show that the derivative of E-- e is semi-definite: 
dE e t 3Fv, dvi (t)(vg)_ -- dt 
dt  Ovl dt i 
Using Eqs. (3a) and (1), 
dE . ['dvi12> 
(A2b) 
as needed. 
In order to use the results derived above to conclude that Eq. (1) can be used for optimiza- 
6on of functions E- e and E, in the vicinity of some point o, we need to show that there exists a 
neighborhood of o in which there exist solution trajectories to Eq. (1). The necessary existence 
theorems and transformations of Eq. (1) needed in order to apply the theorems can be found in 
many texts on ordinary differential equations; e.g. Guckenheimer and Hoimes u. Here, it is mainly 
important to state that the theorems require that functions .eC ), functions  are differentiable, 
and initial conditions axe specified for all derivatives of lower order than M. 
484 
ACKNOWLEDGEMENTS 
I would like to thank Dr. Michael Raugh and Dr. Pen Kanerva for constructive criticism 
and support. I would like to thank Bill Baird and Dr. James Keeler for reviewing this work. I 
would like to thank Dr. Derek Fender, Dr. John Hopfield, and Dr. Stanley Klein for giving me 
opportunities that fostered this conglomeration of ideas. 
REFERENCES 
[1] Acldey D.H., "Stochastic iterated genetic hill climbing", Phi3. dissertation, Camegie Mellon 
U., 1987. 
[2] Baum E., Neural Networks for Computing, ed. Denker J.S. (AIP Confmc. Proc. 151, ed. 
Lerner R.G.), p53-58, 1986. 
[3] Brody D.A., IEEE Trans. vBME-32, n2, p106-110, 1968. 
[4] Brody D.A., Terry F.H., Ideker R.E., IEEE Trans. vBME-20, p141-143, 1973. 
[5] Cohen M.A., Grossberg S., IEEE Trans. vSMC-13, p815-826, 1983. 
[6] Cuffm B.N., IEEE Trans. vBME-33, n9, p854-861, 1986. 
[7] Darcey T.M., Ary J.P., Fender D.H., Prog. Brain Res., v54, p128-134, 1980. 
[8] Freeman WJ., "Mass Action in the Nervous System", Academic Press, Inc., 1975. 
[9] G-evins A.S., Morgan N.H., IEEE Trans., vBME-33, n12, p1051068, 1986. 
[10] Goles E., Vichniac G.Y., Neural Networks for Computing, ed. Denker J.S. (AIP Confrnc. 
Proc. 151, ed. Lerner R.G.), p165-181, 1986. 
[11] Guckenheimer J., Holmes P., "Nonlinear Oscillations, Dynamical Systems, and Bifurcations 
of Vector Fields", Springer Verlag, 1983. 
[12] Hopfield J.J., Proc. Natl. Acad. Sci., v81, p3088-3092, 1984. 
[13] Hopfield J.J., Tank D.W., Bio. Cybrn., v52, p141-152, 1985. 
[14] Hopfield J.J., Tank D.W., Science, v233, n4764, p625-633, 1986. 
[15] Horowitz P., Hill W., "The art of electronics", Cambridge U. Press, 1983. 
[16] Hosek R.S., Sances A., Jodat R.W., Larson S.J., IFEE Trans., vBME-25, n5, IM05-413, 1978. 
[17] Hutchinson J.M., Koch C., Neural Networks for Computing, ed. Derricer J.S. (AIP Confrnc. 
Proc. 151, ed. Lerner R.G.), p235-240, 1986. 
[18] Jeffery W., Rosner R., Astrophys. J., v310, p473-481, 1986. 
[19] Lapedes A., Farber R., Neural Networks for Computing, ed. Denker J.S. (AIP Confmc. Proc. 
151, ed. Lerner R.G.), p283-298, 1986. 
[20] Leong H.M.F., "Frequency dependence of electromagnetic fields: models appropriate for the 
brain", PhD. dissertation, California Institute of Technology, 1986. 
[21] Platt J.C., Hopfield J.J., Neural Networks for Computing, ed. Derricer J.S. (AIP Confmc. Proc. 
151, ed. Lerner R.G.), p364-369, 1986. 
[22] Press W.H., Flannery B.P., Teukolsky S.A., Vetterling W.T., "Numerical Recipes", Cam- 
bridge U. Press, 1986. 
[23] Takeda M., Goodman J.W., Applied Optics, v25, nlS, p3033-3046, 1986. 
[24] Tank D.W., Hopfield J.J., "Neural computation by concentrating information in time", pre- 
print, 1987. 
