Computation of Heading Direction From 
Optic Flow in Visual Cortex 
Markus Lappe* JosefP. Rauschecker 
Laboratory ofNcurophysiology, NIMH, Poolcsville, MD, U.S.A. and 
Max-Planck-Institut Far Biologische Kybernetik, Tiibingen, Germany 
Abstract 
We have designed a neural network which detects the direction of ego- 
motion from optic flow in the presence of eye movements (Lappe and 
Rauschecker, 1993). The performance of the network is consistent with 
human psychophysical c!t_a and its output neurons show great similarity 
to "triple component" cells in area MSTd of monkey visual cortex. We 
now show that by using assumptions about the kind of eye movements 
that the observer is likely to perform, our model can generate various 
other cell types found in MSTd as well. 
1 INTRODUCTION 
Following the ideas of Gibson in the 1950's a number of studies in human psychophysics 
have demonstrated that optic flow can be used effectively for navigation in space (Rieger and 
Toet, 1985; Stone and Pearone, 1991; Warren et aL, 1988). In search for the neural basis of 
optic flow processing, an area in the cat's extrastriate visual cortex (PMLS) was described 
as having a centrifugal organi7on of neuronal direction preferences, which suggested 
an involvement of area PMLS in the processing of expanding flow fields (Rauschecker 
et al., 1987; Brenner and Rauschecker, 1990). Recently, neurons in the dorsal part of 
the roedial superior temporal area (MSTd) in monkeys have been described that respond 
to various combinations of large expanding/contracting, rotating, or shifting dot patterns 
(Duffy and Wurtz, 1991; Tanaka and Saito, 1989). Cells in MSTd show a continuum 
of response properties ranging from selectivity for only one movement pattern ("single 
*Present address: Neurobiologie, ND7, Ruhr-Universitt Boehum, 4630 Boehum, Germany. 
433 
434 Lappe and Rauschecker 
component cells") to selectivity for one mode of each of the three movement types ("triple 
component cells"). An interesting property of many MSTd cells is their position invadance 
(Andersen et al., 1990). A sizable proportion of cells, however, do change their selectivity 
when the stimulus is displaced by several tens of degrees of visual angle, and their position 
dependence sc.,.,'ns to be correlated with the type ofmovement selectivity (Duffy and Wurtz, 
1991; Orban et al., 1992): It is most common for triple component cells and occurs least 
often in single component cells. Taken together, the wide range of directional tuning and 
the apparent lack of specificity for the spatial position of a stimulus s---n. to suggest, that 
MSTd cells do not possess the selectivity needed to explain the high accuracy of human 
observers in psychophysical experiments. Our simulation results, however, demonstrate 
that a population encoding can be used, in which individual neurons are rather broadly 
tuned while the whole network gives vy accurate results. 
2 THE NETWORK MODEL 
The major projections to area MST originate from the middle temporal area (MT). Area 
MT is a well known area of monkey cortex specialized for the processing of visual motion. 
It contains a retinotopic representation of local movement directions (Allman and Kaas, 
1971; Maunsell and Van Essen, 1983). In our model we assume that area MT comprises a 
population encoding of the optic flow and that area MST uses this input from MT to extract 
the heading direction. Therefore, the network consists of two layers. In the first layer, 300 
optic flow vectors at random locations within 50 degrees of eccentricity are representecL 
Each flow vector is encoded by a population of directionally selective neurons. It has bcca 
shown previously that a biologically plausible population encoding like this can also be 
modelled by a neural network (Wang et al., 1989). For simplicity we use only four neurons 
to represent an optic flow vector O as 
4 
Oi -- E sikeik 
k=l 
(1) 
with equally spaced preferred directions ei} = (cosOrk/2),sin(rk/2)) t. A neuron's 
response to a flow vector of direction &i and speed Oi is given by the tuning curve 
0i co(Oi - 
sik "- 0 
if cos(4i - a'k/2) > 0 
otherwise. 
The second layer contains a retinotopic grid of possible translational heading directions 
Each direction is represented by a population ofneurons, whose summed activities give the 
likelihood that Tj is the correct heading. The perceived direction is finally chosen to be 
the one that has the highest population activity. 
The calculation of this likelihoodis based on the subspace algorithm by Heeger and Jepson 
(1992). It employs the minimization of a residual function over all possible heading 
directions. The neuronal populations in the second layer evaluate a related function that 
is maximal for the correct heading. The subspace algorithm works as follows: When 
an observer moves through a static environment all points in space share the same six 
motion parameters, the translation T = (T, Tv, Tz)t and the rotation fl = (f/,, f/v, f/z )t- 
The optic flow 0(z, V) is the projection ofthe movement of a 3D-point (g, Y, Z) t onto the 
retina, which, for simplicity, is modelled as an image plane. In a viewer centered coordinate 
Computation of Heading Direction From Optic Flow in Visual Cortex 435 
system the optic flow can be written as: 
1 
O(x,y) - Z(x,y) A(x,y)T -+ B(x,y)n 
with the matrices 
(2) 
dimensional vector, and 
A(xl,Yl)T ''' 0 S(xl,Yl) ) 
c(T) = ... - 
0 ... A(x, y)T 
(4) 
a 2m x (m + 3) matrix. Heeger and Jepson (1992) show that the heading direction can be 
recovered by minimizing the residual function 
R(T)- IIO'Cl(T)112. 
In this equation CI(T) is defined as follows: Provided that the colnnms of C(T) are 
2rn 
linearly independent, they form a basis of an (m + 3)-dimmsional subspace of the 7 , 
which is called therange ofC (T). Thematrix C  (T) spans theremaining (2m- (m+ 3))- 
dimensional subspace which is called the orthogonal complement of C(T). Every vector 
in the orthogonal complement of C(T) is orthogonal to every vector in the range of C(T). 
In the network, the population of neurons representing a certain Tj shall be maximally 
excited when R(Tj) = 0. Two steps are necessary to accomplish this. First an individual 
neuron evaluates part of the argument ofR(Tj ) by picking out one of the column vectors of 
C - (Tj), denoted by Cp (Tj), and computing OtCp (Tj). This is done in the following 
way: m first layer populations are chosen to form the neuron's input receptive field. The 
neuron's output is given by the sigmoid function 
rn 4 
ujl = g(E E Jij,lsik -- !), (5) 
i----1 
in which Jij:t denotes the strength of the synapfic connection between the 1-th output 
neuron in the second layer population representing heading direction Tj and the k-th input 
neuron in the first layer population representing the optic flow vector Oi, ! denotes the 
threshold. For the synapfic strengths we require that: 
rn 4 
i=1 k----1 
( -f 0 
A(x,y) = -f y f + y2/f x -x 
depending only on coordinates (x, y) in the image plane and on the "focal length" f (Heeger 
and Jepson, 1992). In trying to estimate T, given the optic flow 0, we first have to note 
that the unknowns Z(x, y) and T are multiplied together. They can thus not be determined 
independently so that the translation is considered a unit vector pointing in the direction of 
heading. Eq. (2) now contains six unknowns, Z (x, y), T and f, but only two measurements 
02 and Oy. Therefore, flow vectors from m distinct image points are combined into the 
matrix equation 
{9 = C(T)q, (3) 
where O = (01,..., 0,)  is a 2m-dimensional vector consisting of the components of 
them image velocities, q = (m + 
436 Lappe and Rauschecker 
At a single image location i this is: 
4 t fCi,_l_2i_l(Tj) I 
k=l 
Substituting eq. (1) we find: 
= J' 
Therefore we set the synaptic strengths to: 
Then, whenever Tj is compatible with the measured optic flow, i.e. when {9 is in the range 
of C (Tj), the neuron reeves a net input of zero. In the second step, another neuron ujl, is 
constructed so that the sum of the activities of the two neurons is maximal in this situation. 
Bothneurons are connected to the same set of image locations but their connection strengths 
satisfy Jijll, = -Jijlt. In addition, the threshold p is given a slightly negative value. Then 
both their sigmoid transfer functions overlap at zero input, and the sum has a single peak. 
Finally, the neurons in every second layer population are organized in such matched pairs 
so that each population j generates its maximal activity when R(Tj) = 0. 
In simulations, our network is able to compute the direction of heading with a mean eaxor 
of less than one degree in agreeanent with human psychophysical data (see Lappe and 
Rauschecker, 1993). Like heading detection in human observers it functions over a wide 
range of speeds, it works with sparse flow fields and it needs depth in the visual environment 
when eye movements are performed. 
3 DIFFERENT RESPONSE SELECTIVITIES 
For the remainder of this paper we will focus on the second layer neuron's response proper- 
ties by carrying out simulations analogous to neurophysiological experiments (Andersen et 
al., 1990; Duffy and Wurtz, 1991; Orban etal., 1992). A single neuron is constructed that 
receives input from 30 random image locations forming a 60 x 60 degree receptive field. 
The receptive field occupies the lower left quadrant of the visual field and also includes 
the fovea (Fig. 1A). The neuron is then presented with shilling, expanding/contracting and 
rotating optic flow patterns. The center (z, y) ofthe expanding/contracting and rotating 
patterns is varied over the 100 x 100 degree visual field in order to test the position depen- 
dence of the neuron's responses. Directional ttming is assessed via the direction  of the 
shitting patterns. All patterns are obtained by choosing suitable translations and rotations 
in eq. (2). For instance, rotating pattens centered at (z, /) are generated by 
T = 0 and n = v/x + /,2 + f2  (?) 
In keeping with the most common experimental condition, all depth values Z(xi, yi) are 
taken to be equal. 
Computation of Heading Direction From Optic Flow in Visual Cortex 437 
so' 
1V 3o' 
D F 
'I 
Figure 1: Single Neuron Responding To All Three Types Of Optic Flow Stimuli 
(" Triple Component Cell") 
In the following we consider different assumptions about the observer's eye movements. 
These assumptions change the equations of the subspace algorithm. The rotational matrix 
B(x, y) takes on different forms. We will show that these changes result in different 
cell types. First let us restrict the model to the biologically most important ease: During 
locomotion in a static environment the eye movements of humans or highe animals are 
usually the product of intentional behavior. A very common situation is the fixation of a 
visible object during locomotion. A specific eye rotation is necessary to compensate for the 
translational body-movement and to keep the object fixed in the cente (0, O) of the visual 
field, so that its image velocity eq. (2) vanishes: 
(00) 
o(0,0) = zv \-fTr + \+fax = ' (S) 
Z, denotes the distance of the fixation point. We can easily calculate fx and fr from 
eq. (8) and chose fz = 0. The optic flow eq. (2) in the ease of the fixation of a stationary 
object then is 
1 A(x,y)T+ 1 ~ 
b(x, y) -- Z(x, y) -F B(x, y)T, 
A(xl,Yl)T ''' 0 
(:(T) = ' -. 
 -" A(xm, ym)T 
(Xl,. yl)T ) 
and form synaptic connections in the same way as described above. The resulting network 
is able to deal with the most common types of eye movements. The response properties of a 
with 
( f + x2/f (xy)/f O) 
b(x,y) ---- (xy)/f f-at- y/f 0 ' 
We would like to emphasize that anothe common situation, namely no eye movements at 
all, can be approximated by ZF --+ c. We can now construct a new matrix 
438 Lappe and Rauschecker 
B 
D F 
Figure 2' Neuron Selective For Two Components (" Double Component Cell") 
single neuron from such a network are shown in Fig. 1. The neuron is selective for all three 
types of flow patterns. It exhibits broad directional tuning (Fig. lB) for upward shifting 
patterns (: 90 deg.). The responses to expanding (Fig. 1C), contracting (Fig. 1D) and 
rotating (Fig. 1E-F) patterns show large areas of position invariant sdectivity. Inside the 
receptive field, which covers the second quadrant (see destribution of input locations in 
Fig. 1A), the neuron favors upward shifts, contractions and counterclockwise rotations. It is 
thus compatible with a triple component cell in MSTd. Also, lines are visible along which 
the selectivities reverse. This happens because the neuron's input is a linear function of the 
stimulus position (z, yc). For example, for rotational patterns we can calculate the input 
using eqs. (2), (6), and (7): 
rn 4 Jr- i 1 1 (Clf2i_l(Tj)  
2 i2 
k=l v/z + Y, + '= 
As long as the thresholdp Js sma!l the neuron's output Js halfway between its maximal and 
minimal values whenever its input is zero, i.e. when 
i--1 
This is the equation of a line in the (x, y) plane. The neuron's selectivity for rotations 
reverses along this line. A similar equation holds expansion/contraction selectivity. 
Now, what would the neuron's selectivity look like, if we had not restricted the eye move- 
ments to the case of the fixation of an object. The responses of a neuron that is constructed 
following the unconstrained version of the algorim, as described in section 2, is shown in 
Fig. 2. There is no selectivity for clockwise versus counterclockwise rotations at all, since 
both patterns elicit the same response everywhere in the visual field. Inside the receptive 
field the neuron favors contractions and shifts towards the upper left ( - 150 deg.). It 
can thus be regarded as a double component cell. To understand the absence of rotational 
selectivity we have to calculate the whole rotational optic flow pattern Orot by inserting T 
Computation of Heading Direction From Optic Flow in Visual Cortex 439 
Figure 3: Predominantly Rotation Selective Neuron ("Single Component Cell") 
and fl from eq. (7) into eq. (3). C(T) becomes 
0 --- 0 B(xl,Yl) ) 
c(0) = - -  . 
Denoting the three rightmost column vectors of C(T) by B1, Be, and B3 we find 
()rot -- 
VflSe2 t_ ye2 t_ f2 (flscB1 t_ ycB + lB3). 
Comparison to C(T), eq. (4), shows that idiot can be written as a linear combination of 
column vectors of C(T). Thus 9rot lies in therange of C(T) and is orthogonalto C(T), 
so that {Drot C((Tj) = 0 for all j and 1. From eqs. (5) and(6) it follows, that the neuron's 
response to any rotationalpattern is always ujt -- g(-/). 
The last type of eye movements we want to consider is that of a general frontoparallel 
rotation, which is defined by f2z = 0. In addition to the fixation of a stationary object, fron- 
toparallel rotations also include smooth pursuit eye movements necessary for the fixation 
of a moving object. Inserting Qz = 0 into eq. (2) gives 
with 
1 
(x,y) = Z(x,y) 
xy/f -(f+x2/f) ) 
l(x, y) = f + y2/f -xy/f 
now being a 2 x 2 matrix, so that C(T), eq. (4), becomes a 2m x (m + 2) matrix 
(T). A neuron that is constructed using (T) can be sc, in Fig. 3. It best responds to 
counterclockwise rotational patterns showing complete position invmiance over the visual 
field. The neuron is much less selective to expansions and unidirectional shifts, since 
440 Lappe and Rauschecker 
the responses never reach saturation. It therefore resembles a single component rotation 
selective cell. The position invariant behavior can again be explained by looking at the 
rotational optic flow pattern. Using the same argument as above, one can show that the 
neuron's input is zero whenever flz vanishes, i.e. when the rotational axis lies in the 
(X, Y)-plane. Then the flow pattern becomes 
)rot --- V35c 2 t_ ye2 _{_ f2 (fl5c]1 t_ 
and is an element of the range of (T). The (X, Y)-plane thus splits the space of all 
rotational axes into two half spaces, one in which the neuron's input is always positive and 
one in which it is always negative. Clockwise rotations are characterized by Qz > 0 and 
hence all lie in the same half space, while counterclockwise rotations lie in the other. As a 
result the neuron is exclusively excited by one mode of rotation in all of the visual fieIrk 
4 Conclusion 
Our neural network model for the detection of ego-motion proposes a computational map 
of heading directions. A similar map could be contained in area MSTd of monkey visual 
cortex. Cells in MSTd exhibit a varying degree of selectivity for basic optic flow patterns, 
but often show a substantialindifference towards the spatialposition of a stimulus. By using 
a population encoding of the heading directions, individual neurons in the model exhibit 
similar position invariant responses within large parts of the visual fieIrk Different neuronal 
selectivities found in MSTd can be modelled by assuming specializations pertaining to 
different types of eye movements. Consistent with experimental findings the position 
invariance of the model neurons is largest in the single component cells and less developed 
in the double and triple component cells. 
References 
Allman, J. M. and Kaas, J. H. 1971. Brain Res. 31, 85-105. 
Andersen, R., Graziano, M., and Snowden, R. 1990. Soc. Neurosci. Abs 16, 7. 
Bremer, E. and Rauschecker, J.P. 1990. J. Physiol. 423, 641-660. 
Duffy, C. J. and Wurtz, R. H. 1991. J. Neurophysiol. 65(6), 1329-1359. 
Gibson, J. J. 1950. The Perception of the lsual World. Houghton Mifflin, Boston. 
Hcager, D. J. and Jepson, A. 1992. Int. J. Comp. ls. 7(2), 95-117. 
Lappe, M. and Rauschecker, J.P. 1993. Neural Computation (in press). 
Maunsell, J. H. R. and Van Essen, D.C. 1983. d. Neurophysiol. 49(5), 1127-1147. 
Orban, G. A., Lagae, L., Verri, A., Raiguel, S., Xiao, D., Maes, H., and Torre, V. 1992. 
Proc. Nat. Acad. Sci. 89, 2595-2599. 
Rauschecker, J.P., yon Griinau, M. W., and Poulin, C. 1987. d. Neurosci. 7(4), 943-958. 
Rieger, J. H. and Toet, L. 1985. Biol. Cyb. 52, 377-381. 
Stone, L. S. and Pertone, J. A. 1991. In Soc. Neurosci. Abstt:. 17, 857. 
Tanaka, IC and Saito, H.-A. 1989. d. Neurophysiol. 62(3), 626-641. 
Wang, H. T, Mathur, B. P. and Koch, C. 1989. Neural Computation 1, 92-103. 
Warren, W. H. Jr., and Harmon, D. J. 1988. Nature 336, 162-163. 
