A Lagrangian Formulation For 
Optical Backpropagation Training In 
Kerr-Type Optical Networks 
James E. Steck 
Mechanical Engineering 
Wichita State University 
Wichita, KS 67260-0035 
Steven R. Skinner 
Electrical Engineering 
Wichita State University 
Wichita, KS 67260-0044 
Aivaro A. Cruz-Cabrara 
Electrical Engineering 
Wichita State University 
Wichita, KS 67260-0044 
Elizabeth C. Behrman 
Physics Department 
Wichita State University 
Wichita, KS 67260-0032 
Abstract 
A training method based on a form of continuous spatially distributed 
optical error back-propagation is presented for an aH optical network 
composed of nondiscrete neurons and weighted interconnections. The all 
optical network is feed-forward and is composed of thin layers of a Kerr- 
type self focusing/defocusing nonlinear optical material. The training 
method is derived from a Lagrangian formulation of the constrained 
minimization of the network error at the output. This leads to a 
formulation that describes training as a calculation of the distributed error 
of the optical signal at the output which is then reflected back through the 
device to assign a spatially distributed error to the internal layers. This 
error is then used to modify the internal weighting values. Results from 
several computer simulations of the training are presented, and a simple 
optical table demonstration of the network is discussed. 
772 Elizabeth C. Behrman 
I KERR TYPE MATERIALS 
Kerr-type optical networks utilize thin layers of Kerr-type nonlinear materials, in which the 
index of refraction can vary within the material and depends on the amount of light striking 
the material at a given location. The material index of refraction can be described by: 
n(x)=n0+n2I(x), where no is the linear index of refraction, n: is the nonlinear coefficient, and 
I(x) is the irradiance of a applied optical field as a function of position x across the material 
layer (Armstrong, 1962). This means that a beam of light (a signal beam carrying 
information perhaps) passing through a layer of Kerr-type material can be steered or 
controlled by another beam of light which applies a spatially varying pattern of intensity 
onto the Kerr-type material. Steering of light with a glass lens (having constance index of 
refraction) is done by varying the thickness of the lens (the amount of material present) as 
a function of position. Thus the Kerr effect can be loosely thought of as a glass lens whose 
geometry and therefore focusing ability could be dynamically controlled as a function of 
position across the lens. Steering in the Kerr matehal is accomplished by a gradient or 
change in the matehal index of refraction which is created by a gradient in applied light 
intensity. This is illustrated by the simple experiment in Figure 1 where a small weak probe 
beam is steered away from a straight path by the intensity gradient of a more powerful pump 
beam. 
I(x) 
mp 
x Probe 
n2<0 
Figure 1: Light Steering In Kerr Materials 
2 OPTICAL NETWORKS USING KERR MATERIALS 
The Kerr optical network, shown in Figure 2, is made up of thin layers of the Kerr- type 
nonlinear medium separated by thick layers of a linear medium (free space) (Skinner, 1995). 
The signal beam to be processed propagates optically in a direction z perpendicular to the 
layers, from an input layer through several alternating linear and nonlinear layers to an output 
layer. The Kerr material layers perform the nonlinear processing and the linear layers serve 
as connection layers. The input (I(x)) and the weights (Wl(x),W2(x) ... Wn(x)) are irradiance 
fields applied to the Kerr type layers, as functions of lateral position x, thus varying the 
A Lagrangian Formulation for Optical Backpropagation 773 
refractive index profile of the nonlinear medium. Basically, the applied weight irradiences 
steer the signal beam via the Kerr effect discussed above to produce the correct output. The 
advantage of this type of optical network is that both neuron processing and weighted 
connections are achieved by uniform layers of the Kerr material. The all optical nature 
eliminates the need to physically construct neurons and connections on an individual basis. 
I(x,y) W. (x,y) Wn(x,y) 
, 
Figure 2: Kerr Optical Neural Network Architecture 
If Ei(e ) is the light entering the i th nonlinear layer at lateral position e, then the effect of the 
nonlinear layer is given by 
O) 
where W(a) is the applied weight field. Transmission of light at lateral location  at the 
beginning of the i t linear layer to location 13 just before the i+l t nonlinear layer is given by 
E,.l([3)_ J2, F(a) eJC'O-')2do where C,- (2) 
 2AL. 
3 OPTICAL BACK-PROPAGATION TRAINING 
Traditional feed-forward artificial neural networks composed of afinite number of discrete 
neurons and weighted connections can be trained by many techniques. Some of the most 
successful techniques are based upon the well known training method called back- 
propagation which results from minimizing the network output error, with respect to the 
network weights by a gradient descent algorithm. The optical network is trained using a 
form of continuous optical back-propagation which is developed for a nondiscrete network. 
Gradient descent is applied to minimize the error over the entire output region of the optical 
network. This error is a continuous distribution of error calculated over the output region. 
774 Elizabeth C. Behrman 
Optical back-propagation is a specific technique by which this error distribution is optically 
propagated backward through the linear and nonlinear optical layers to produce error signals 
by which the light applied to the nonlinear layers is modified. Recall that this applied light 
W i controls what serves as connection "weights" in the optical network. Optical back- 
propagation minimizes the error Lo over an output region rio, a subdomain of the final or n th 
layer of the network, 
ro - ---[ o- o l 2 
2 2 
where 
subject to the constraint that the propagated light, Ei(a), satisfies the equations of forward 
propagation (1) and (2). 003) = E,+l([t) and is the network output,  is a scaling factor on 
the output intensity. L0 then is the squared error between the desired output value D and the 
average intensity I0 of the output distribution O(3). 
This constrained minimization problem is posed in a Lagrange formulation similar to the 
work of (le Cunn, 1988) for conventional feedforward networks and (Pineda, 1987) for 
conventional recurrent networks; the difference being that for the optical network of this 
paper the Electric field E and the Lagrange multiplier are complex and also continuous in the 
spatial variable thus requiring the Lagrangian below. A Lagrangian is defined as; 
L= /+  f.. (a)[ Ei+l({Z ) - fF(f3)--Jc'(o .)a d[3]  
/=1 
i=1 
(4) 
Taking the variation of L with respect to Ei, the Lagrange multipliers Ji, and using gradient 
descent to minimize L with respect to the applied weight fields Wi gives a set of equations 
that amount to calculating the error at the output and propagating the error optically 
backwards through the network. The pertinent results are given below. The distributed 
assignment of error on the output field is calculated by 
X..l() = L o -() [ O - t 0 ] (S) 
This error is then propagated back through the n th or final linear optical layer by the equation 
jc. foX..l() 
8n([5 ) - e (6) 
which is used to update the "weight" light applied to the n th nonlinear layer. Optical back- 
propagation, through the i th nonlinear layer (giving .i(13)) followed by the linear layer 
(giving 8i4([3)) is performed according to the equations 
A Lagrangian Formulation for Optical Backpropagation 775 
+ ,ap2;'() 2  8) E,() e 
8,-1( [ ) jCi_t fq .(;x) e -.C,q(ll 
- do{ 
This gives the error signal 8i. 1([ ) used to update the "weight" light distribution Wi.([3) 
applied to the i-1 nonlinear layer. The "weights" are updated based upon these errors 
according to the gradient descent rule 
+tlO)koANLin2}Y,O) 2 /M[ E,.(0 ) 6,(0) e 
(s) 
where rh(13 ) is a learning rate which can be, but usually is not a function of layer number i 
and spatial position [3. Figure 3 shows the optical network (thick linear layers and thin 
nonlinear layers) with the uniform plane wave E0, the input signal distribution I, forward 
propagation signals E l E 2 ... E,, the weighting light distributions at the nonlinear layers W 
W 2 ... W,. Also shown are the error signal 3,,+ at the output and the back-propagated error 
signals 8, ... 82 8 for updating the nonlinear layers. Common nonlinear materials exist for 
which the material constants are such that the second term in the first of Equations 7 
becomes small. Ignoring this second term gives an approximate form of optical back- 
propagation which amounts to calculating the error at the output of the network and then 
reversing its direction to optically propagate this error backward through the device. 
This can be easily seen by comparing Equations 6 and 7 (with the second term dropped) for 
optical back-propagation of the output error 3., with Equations 1 and 2 for the forward 
propagation of the signal Ei. This means that the optical back-propagation training 
calculations potentially can be implemented in the same physical device as the forward 
network calculations. Equation (8) then becomes; 
())' ] 
which may be able to be implemented optically. 
4 SIMULATION RESULTS 
To prove feasibility, the network was then trained and tested on several benchmark 
classification problems, two of which are discussed here. More details on these and other 
simulations of the optical network can be found in (Skinner, 1995). In the first (Using 
Nworks, 1991), iris species were classified into one of three categories: Setosa, 
Versicolor or Virginica. Classification was based upon length and width of the sepals and 
776 Elizabeth C. Behrman 
petals. The network consisted of an input self-defocusing layer with an applied irradiance 
field which was divided into 4 separate Gaussian distributed input regions 25 microns in 
width followed by a linear layer. This pattern is repeated for 4 more groups composed 
of a nonlinear layer (with applied weights) followed by a linear layer. The final linear 
layer has three separate output regions 10 microns wide for binary classification as to 
species. The nonlinear layers were all 20 microns thick with n2=-.05 and the linear 
layers were 100 microns thick. The wavelength of applied light was 1 micron and the 
width of the network was 512 microns discretized into 512 pixels. This network was 
trained on a set of 50 training pairs to produce correct classification of all 50 training 
pairs. The network was then used to classify 50 additional pairs of test data which were 
not used in the training phase. 
accuracy level which is 
comparable to a 
standard feed forward 
network with discrete 
sigmoidal neurons. 
In the second problem, 
we tested the 
performance of the 
network on a set of 
data from a dolphin 
sonar discrimination 
experiment (Roitblat, 
1991). In this study a 
dolphin was presented 
with one of three 
different types of 
objects (a tube, a 
sphere, and a cone), 
allowed to echolocate, 
and rewarded for 
choosing the correct 
one from a comparison 
array. The Fourier 
transforms of his click 
echoes, in the form of 
average amplitudes in 
each of 30 frequency 
bin% were then used as 
inputs for a neural 
network. Nine 
nonlinear layers were 
used along with 30 
input regions and 3 
The network classified 46 of these correctly for a 92 % 
output plane 
output, region 
Wn(x,Y) 
Wi(x,Y) 
I I(x,y) 
1% 
Plane Wave 
Figure 3: Optical Network Forward Data and Backward Error 
Data Flow 
A Lagrangian Formulation for Optical Backpropagation 777 
output regions, the remainder of the network physical parameters were the same as above 
for the iris classification. Half the data (13 sets of clicks) was used to train the network, 
with the other half of the data (14 sets) used to test the training. After training, 
classification of the test data set was 100 % correct. 
5 EXPERIMENTAL RESULTS 
As a proof of the concept, the optical neural network was constructed in the laboratory to be 
trained to perform various logic functions. Two thermal self-defocusing layers were used, 
one for the input and the other for a single layer of weighting. The nonlinear coefficient of 
the index of refraction (rh) was measured to be -3x10 '4 cm2/W . The nonlinear layers had a 
thickness (ANL 0 and ANLi) of 630tm and were separated by a distance (AL0) of 15cm. The 
output region was 100tm wide and placed 15cm (ALl) behind the weighting layer. The 
experiment used HeNe laser light to provide the input plane wave and the input and 
weighting h-radiances. The spatial profiles of the input and weighting layers were realized 
by imaging a LCD spatial light modulator onto the respective nonlinear layers. The inputs 
were two bright or dark regions on a Gaussian input beam producing the intensity profile: 
I(x) = I o e -*/:)2 [1 +Q orect((x+ x o YKl) I 1 +Qrect ((x -xoYK 1 )] 
where Io = 12.5 mW/cm 2, K o = 900tm, x o = 600tm, K l = 400tm, and Q0 and Ql are the logic 
inputs taking on a value of zero or one. The weight profile Wl(X) = I0exp[-(x/K0)2][l+w(x)] 
where Wl(X ) can range from zero to one and is found through training using an algorithm 
which probed the weighting mask in order update the training weights. Table 1 shows the 
experimental results for three different logic gates. Given is the normalized output before 
and after training. The network was trained to recognize a logic zero for a normalized output 
< 0.9 and a logic one or a normalized output > 1.1. An output value greater than 1 is 
considered a logic one and an output value less than one is a logic zero. RME is the root 
mean error. 
6 CONCLUSIONS 
Work is in progress to improve the logic gate results by increasing the power of 
propagating signal beam as well as both the input and weighting beams. This will 
effectively increase the nonlinear processing capability of the network since a higher power 
produces more nonlinear effect. Also, more power will allow expansion of all of the beams 
thereby increasing the effective resolution of the thermal materials. Thisreduces the effect 
of heat transfer within the material which tends to wash out or diffuse benificial steep 
gradients in temperature which are what produce the gradients in the index of refraction. In 
addition, the use of photorefractive crystals for optical weight storage shows promise for 
being able to optically phase conjugate and backpropagate the output era-or as well as 
implement the weight update rule for all optical network training. This appears to be simpler 
than optical networks using volme hologram weight storage because the Kerr network 
requires only planar hologram storage. 
778 Elizabeth C. Behrman 
Inputs 0 0 0 1 1 0 1 1 RME 
Start 1.001 .802 .698 .807 7.3% 
Finish 1.110 .884 .772 .896 0 
NOR 
Change .109 .082 .074 .089 -7.3% 
Output i 0 0 0 
Start .998 1.092 1.148 1.440 16.4% 
Finish .757 .855 .894 1.124 0 
Change -.241 -.237 -.254 -.316 -16.4% 
Output 0 0 0 1 
Start .998 .880 .893 .994 7.3% 
Finish 1.084 .933 .928 1.073 2.7% 
XNOR 
Change .086 .053 .035 .079 -4.6% 
Output i 0 0 i 
Table 1: Preliminary Experimental Logic Gate Results 
References 
Armstrong, J.A., Bloembergen, N., Ducuing, J., and Pershan, P.S., (1962) "Interactions 
Between Light Waves in a Nonlinear Dielectric", Physical Review, Vol. 127, pp. 1918-1939. 
le Cun, Yann, (1988) "A Theoretical Framework for Back-Propagation", Proceedings of the 
1988 Connectionist Models Summer School, Morgan Kaufinann, pp. 21-28. 
Pineda, F.J., (1987) "Generalization of backpropagation to recurrent and higher order neural 
networks", Proceedings of IEEE Conference on Neural information Processing Systems, 
November 1987, IEEE Press. 
Roitblat., Moore, Nachtigall, and Penner, (1991) "atural dolphin echo recognition 
using an integrator gateway network,"in Advances in Neural Processing Systems 3 
Morgan Kaufmann, San Mateo, CA, 273-281. 
Skinner, S.R., Steck, J.E., Behrman, E.C., (1995) "An Optical Neural Network Using Kerr 
Type Nonlinear Materials", To Appear in Applied Optics. 
"Using Nworks, (1991)An Extended Tutorial for NeuralWorks Professional II/Plus and 
NeuralWorks Explorer, NeuralWare, Inc. Pittsburgh, PA, pg. UN- 18. 
