Resolving motion ambiguities 
K. I. Diamantaras 
Siemens Corporate Research 
755 College Rd East 
Princeton, NJ 08540 
D. Geiger* 
Courant Institute, NYU 
Mercer Street 
New York, NY 10012 
Abstract 
We address the problem of optical flow reconstruction and in par- 
ticular the problem of resolving ambiguities near edges. They oc- 
cur due to (i) the aperture problem and (ii) the occlusion problem, 
where pixels on both sides of an intensity edge are assigned the same 
velocity estimates (and confidence). However, these measurements 
are correct for just one side of the edge (the non occluded one). 
Our approach is to introduce an uncertmnty field with respect to 
the estimates and confidence measures. We note that the confi- 
dence measures are large at intensity edges and larger at the con- 
vex sides of the edges, i.e. inside corners, than at the concave side. 
We resolve the ambiguities through local interactions via coupled 
Markov random fields (MRF) The result is the detection of motion 
for regions of images with large global convexity 
I Introduction 
In this paper we discuss the problem of figure ground separation, via optical flow, for 
homogeneous images (textured images just provide more information for the disam- 
biguation of figure-ground). We address the problem of optical flow reconstruction 
and in particular the problem of resolving ambiguities near intensity edges. We 
concentrate on a two frames problem, where all the motion ambiguities we discuss 
can be disambiguiated by the human visual system. 
*work done when the author was at the Isaac Newton Institute and at Siemens Corpo- 
rate Research 
977 
978 Diamantaras and Geiger 
Optical flow is a 2D (two dimensional) field defined as to capture the projection 
of the 3D (three dimensional) motion field into the view plane (retina). The Horn 
and Schunk[8] formulation of the problem is to impose (i) the brightness constraint 
a:(,y,t) _ O, where E is the intensity image, and (ii) the smoothness of the velocity 
dt -- 
field. The smoothness can be thought of coming from a rigidity or quasi-rigidity 
assumption (see Ullman [12]). 
We utilize two improvements which are important for the optical flow computation, 
(i) the introduction of the confidence measure (Nagel and Enkelman [10], Anandan 
[1]) and (ii) the application of smoothness while preserving discontinuities (Geman 
and Geman [6], Blake and Zisserman [2], Mumford and Shah [9]). It is clear that as 
an object moves with respect to a background not only optical flow discontinuities 
occur, but also occlusions occur (and revelations). In stereo, occlusions are related 
to discontinuities (e.g. Geiger et. al 1992 [5]), and for motion a similar relation 
must exist. We study ambiguities ocuring at motion discontinuities and occlusions 
in images. 
The paper is organized as follows: Section 2 describes the problem with examples 
and a brief discussion on possible approaches, section 3 presents our approach, with 
the formulation of the model and a method to solve it, section 4 gives the results. 
2 Motion ambiguities 
Figure 1 shows two synthetic problems involving a translation and a rotation of 
simple objects in front of stationary backgrounds. 
Consider the case of the square translation (see figure la.). Humans perceive the 
square translating, although block-matching (and any other matching technique) 
gives translation on both sides of the square edges. Moreover, there are other inter- 
pretations of the scene, such as the square belonging to the stationary background 
and the outside being a translating foreground with a square hole. The examples 
are synthetic, but emphasize the ambiguities. Real images may have more texture, 
thus many times helping resolve these ambiguities, but not everywhere. 
(a) (b) 
Figure 1' Two image sequences of 128 x 128. (a) Square translation of 3 pixels; (b) 
"Eight" rotation of 10o Note that the "eight" has concave and convex regions. 
Resolving Motion Ambiguities 979 
3 A Markov random field model 
We describe a model capable of solving these ambiguities. It is based on coupled 
Markov random fields and thus, based on local processes. Our main contribution is 
to introduce the idea of uncertainty on the estimates and confidence measures. We 
propose a Markov field that allows the estimates of each pixel to be chosen among 
a large neighborhood, thus each pixel estimate can be neglected. We show that 
convex regions of the image do bias the confidence measures such that the final 
motion solutions are expected to be the ones with global larger convexity Note 
that locally, one can have concave regions of a shape that give "wrong" bias (see 
figure i b). 
3.1 Block Matching 
Block matching is the process of correlating a block region of one image, say of size 
(2WM + 1) x (2WM + 1), with a block region of the other image. Block-matching yields 
a set of matching errors di ' , where (i, j) is a pixel in the image and v: [ca, n] is 
a displacement vector in a search window of size (2w$ + 1) x (2w$ + 1) around the 
pixel. We define the velocity measurements gij and the covariance matrix Cij as the 
mean and variance of the-vector v - [ca, n] averaged according to the distribution 
m,n e-kd"v m,n e-kd (V -- gij)(V -- gij) T 
gij -- _kd,, Vii  _kd,, 
Figure 2 shows the block matching data gij for the two problems discussed above 
and figure 3 shows the correspondent confidence measurse (inverse of the covariance 
matrix as defined below). 
3.2 The aperture problem and confidence 
The aperture problem [7] occurs where there is a low confidence on the measure- 
ments (data) in the direction along an edge; In particular we follow the approach 
by [1]. 
The eigenvalues A, A2, of Cij correspond to the variance of distribution of v along 
the directions of the corresponding eigenvectors Vl, v2o The confidence of the esti- 
mate should be inversely proportional to the variance of the distribution, i.e. the 
confidence along direction v (v2) is oc 1/A (cr 1/,X2). All this confidence informa- 
tion can be packaged inside the confidence matrix defined as follows: 
Rij: (cij + 
where e is a very small constant that guarantees invertibility. Thus the eigenvalues of 
lij are values between 0 and 1 corresponding to the confidence along the directions 
v and v2, whereas v and v2 are still eigenvectors of lij. 
The confidence measures at straight edges is high perpendincular to the edges and 
low (zero) along the edges. However, at corners, the confidence is high on both 
980 Diamantaras and Geiger 
directions thus through smoothness this result can be propagated through the other 
parts of the image, then resolving the aperture problem. 
3.3 The localization problem and a binary decision field 
The localization problem arises due to the local symmetry at intensity edges, where 
both sides of an edge give the same correspondences. These cases occur when 
occluded regions are homogeneous and so, block matching, pixel matching or any 
matching technique can not distinguish which side of the edge is being occluded or 
is occluding. Even if one considers edge based methods, the same problem arises in 
the reconstruction stage, where the edge velocities have to be propagated to the rest 
of the image. In this cases a localization uncertainty is introduced. More precisely, 
pixels whose matching block contains a strong feature (e.g. a corner) will obtain a 
high-confidence motion estimate along the direction in which this feature moved. 
Pixels on both sides of this feature, and at distances less than half the matching 
window size, M, will receive roughly the same motion estimates associated with 
high confidences. However, it could have been just one of the two sides that have 
moved in this direction. In that case this estimate should not be taken into account 
on the other side. We note however a bias towards inside of corner regions from the 
confidence measures. 
Note that in a corner, despite both sides getting roughly the same velocity estimate 
and high confidence measures, the inside pixel always get a larger confidence. This 
bias is due to having more pixels outside the edge of a closed contour than outside, 
and occurs at the convex regions (e.g. a corner). Thus, in general, the convex 
regions will have a stronger confidence measure than outside them. Note that at 
concavities in the "eight" rotation image, the confidence will be higher outside the 
"eight" and correct at convex regions. Thus, a global optimization will be required 
to decide which confidences to "pick up". 
Our approach to resolve this ambiguity is to allow for the motion estimate at pixel 
(i, j) to select data from a neighborhood Nij, and its goal is to maximize the total 
estimates (taking into account the confidence measures). More precisely, let fij be 
the vector motion field at pixel (i, j). We introduce a binary field cti  that indicates 
which data gi+m,j+n in a neighborhood Nij of (i, j) should correspond to a motion 
estimate fij. The size of Nij is given by M -{- 1 to overcome the localization 
uncertainty. For a given lattice point (i, j) the boolean parameters cti3 should be 
mutually exclusive, i.e. only one of them, ctj **, should be equal to 1 indicating 
that fij should correspond to gi+,*,j+,*, while the rest cti , rn - m*, n - n*, 
should be zero (or Y-rn*n*eN,3 ctir ** ---- 1). The conditional probability reflects 
both an uncertainty due to noise and an uncertainty due to spatial localization of 
the data 
1 
P(R, glf, a) = 22 exp{- '. 
(2) 
where Ilhl12 = hl + for h = [h,hy]. 
Resolving Motion Ambiguities 981 
3.4 The piecewise smooth prior 
The prior probability of the motion field fij is a piecewise smoothness condition, 
in [6]. 
as 
I 
where hij = 0 (vij = 0) if there is no motion discontinuity separating pixels 
(i- 1,j) ((i,j),(i,j- 1))9 otherwise hij - 1 (vii - 1). The parameter/ has to 
be estimated. We have considered that the cost to create motion discontinuities 
should be lowered at intensity edges (see Poggio et al. [11])9 i.e 7ij = 7(1 - 
where eij is the intensity edge and 0 _< 5 <_ 1 and 7 have to be estimated. 
3.5 The posterior distribution 
The posterior distribution is given by Bayes' law 
1 1 -v(j,,,;g) 
P(f9 a9 h, v[g, R) - p(g, R) P(g' Rlf, a)P(f9 a, h9 v) = e 
where 
V(f9a,h,v) 
(4) 
---- E( E Ozi? n[lli+m'j+n(fij -- 
ij mnN s 
+ (1 - hij)llfij -- fi-x,jl[ 2 + (1 - vij)llfij - fi,j-ll[ 2 d- 
ij(hij -- vii)} (5) 
is the energy of the system. Ideally, we would like to minimize V under all 
possible configurations of the fields f, h, v and c9 while obeying the constraint 
meN. "i? = 1. 
3.6 Mean field techniques 
Introducing the inverse temperature parameter fi(- l/T) we can obtain the trans- 
formed probability distribution 
P(f9alg9R ) - le-v(i') (6) 
zz 
where 
ZZ 
(7) 
982 Diamantaras and Geiger 
where i5 = (1 -Oii) and . -- (1 - 'd). 
We have to obey the constraint meN,, ci  = 1. For the sake of simplicity we 
have assumed that the neighborhood Nij around site (i,j) is Nij = {(i + rn, j + 
n)  -1 _< rn _< 1,-1 _< n < 1}. The second factor in (7) can be explicitly 
computed. Employing the mean field techniques proposed in [3] and extended in 
[4] we can average out the variables h, v and c (including the constraint) and yield 
(8) 
which yields the following effective energy 
since Z = Elf} e-V*1Y (') Using the saddle point approximation, i.e. considering 
Z  e-Vejy (f) with f minimizing Vjy(f;g ), the mean field equations become 
__ -m/, V 
o - + + 
v h - 
with laij - /x(1- Oij) and I. tij -- /x(1- ij), Anfiy = (fij- fi-,j), Afij -- 
(j -- fi,j-), and 
O ij : 
1 
, and ij = 1 + e'r,,-llAf,sll = (9) 
The normalization constant Z called the partition function, 
property that 
1 
lim- lnZ= min {V(f,a,h,v)} 
has the important 
(10) 
Then using an annealing method we let  - oo and the minimum of V: -  In Z 
approaches asymptotically the desired minimum. 
Resolving Motion Ambiguities 983 
4 Results 
We have applied an iterative method along with an annealing schedule to solve the 
above mean field equations for/ -- cw. The method was run on the two examples 
already described Figure 4 depicts the results of the experiments. The system 
chooses a natural interpretation (in agreement with human perception), namely 
it interprets the object (e.g. the square in the first example or the eight-shaped 
region in the second example) moving and the background being stationary. In the 
beginning of the annealing process the localization field c may produce "erroneous" 
results, however the neighbor information eventually forces the pixels outside the 
moving object to coincide with the rest of the background which has zero motion 
For the pixels inside the object, on the contrary, the neighbor information eventually 
reinforces the adoption of the motion of the edges. 
References 
[1] P. Anandan, "Measuring Visual Motion from Image Sequences", PhD thesis. 
COINS Dept., Univ. Massachusetts, Amherst, 1987. 
[2] A. Blake and A. Zisserman, "Visual Reconstruction", Cambridge, Mass, MIT 
press, 1987. 
[3] D. Geiger and F. Girosi, "Parallel and Deterministic Algorithms for MRFs: 
Surface Reconstruction and Integration", IEEE PAMI: 13(5), May 1991. 
[4] D. Geiger and A. Yuille, "A Common Framework for Image Segmentation", Int. 
J. Cornput. Vision, 6(3), ppo 227-243, 1991. 
[5] D. Geiger and B. Ladendorf and A. Yuille "Binocular stereo with occlusion", 
Computer Vision- ECCV92, ed. G Sandini, Springer-Verlag, 588, pp 423-433 
May 1992o 
[6] So Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions and the 
Bayesian Restoration of Images", IEEE PAMI 6, pp 721-741, 1984 
[7] E. C. Hildreth, "The measurement of visual motion", MIT press, 1983. 
[8] B.K.P. Horn and B.G. Schunk, "Determining optical flow", Artificial Intelli- 
gence, vol 17, pp 185-203, August 1981 
[9] D. Mumford and J. Shah, "Boundary detection by minimizing functionals, I", 
Proco IEEE Conf. on Computer Vision & Pattern Recognition San Francisco, 
CA, 1985. 
[10] H.-H. Nagel and W. Enkelmann "An Investigation of Smoothness Constraints 
for the Estimation of Displacement Vector Fields from Image Sequences", IEEE 
PAMI: 8, 1986. 
[11] T. Poggio and E. B. Gamble and J. J. Little "Parallel Integration of Vision 
Module", Science, vol 242, pp. 436-440, 1988. 
[12] S. Ullman, "The Interpretation of Visual Motion", Cambridge, Mass, MIT 
press, 1979. 
984 Diamantaras and Geiger 
(c) 
Figure 2: Block matching data gi3. Both sides of the edges have the same data 
(and same confidence). White represents motion to the right (x-direction) or up 
(y-direction). Black is the complement. (a) The x-component of the data for the 
square translation. (b) The x-component of the data for the rotation and (c) the 
y-component of the data. 
(a) (b) 
Figure 3: The confidence R extracted from the block matching data gij. The display 
is the sum of both eigenvalues, i.e. the trace of R. Both sides of the edges have the 
same confidence. White represents high confidence. (a) For the square translation. 
(b) For the rotation. 
(a) 
(b) 
(c) 
Figure 4: The final motion estimation, after 20000 iterations, resolved the ambigu- 
ities with a natural interpretation of the scene /t: 10, 5 = 1, 3' = 100. (a) square 
translation (b) x component of the motion rotation (c) y component of the motion 
rotation 
