660 Geiger and Girosi 
Coupled 
Markov Random Fields 
Mean Field Theory 
and 
Davi Geiger  
Artificial Intelligence 
Laboratory, MIT 
545 Tech. Sq.  792 
Cambridge, MA 02139 
and 
Federico Girosi 
Artificial Intelligence 
Laboratory, MIT 
545 Tech. Sq.  788 
Cambridge, MA 02139 
ABSTRACT 
In recent years many researchers have investigated the use of Markov 
Random Fields (MRFs) for computer vision. They can be applied 
for example to reconstruct surfaces from sparse and noisy depth 
data coming from the output of a visual process, or to integrate 
early vision processes to label physical discontinuities. In this pa- 
per we show that by applying mean field theory to those MRFs 
models a class of neural networks is obtained. Those networks can 
speed up the solution for the MRFs models. The method is not 
restricted to computer vision. 
1 Introduction 
In recent years many researchers (Geman and Geman, 1984) (Marroquin et. al. 
1987) (Gamble et. al. 1989) have investigated the use of Markov Random Fields 
(MRFs) for early vision. Coupled MRFs models can be used for the reconstruction 
of a function starting from a set of noisy sparse data, such as intensity, stereo, or 
motion data. They have also been used to integrate early vision processes to label 
physical discontinuities. Two fields are usually required in the MRFs formulation 
of a problem: one represents the function that has to be reconstructed, and the 
other is associated to its discontinuities. The reconstructed function, say f, has 
New address is Siemens Corporate Research, 755 College Road East, Princeton NJ 08540 
Coupled Markov Random Fields and Mean Field Theory 661 
and lint pross 
Figure 1: The square lattice with the line process I and the field f defined at some 
pixels. 
a continuous range and the discontinuity field, say l, is a binary field (1 if there 
is a discontinuity and 0 otherwise, see figure 1). The essence of the MRFs model 
is that the probability distribution of the configuration of the fields, for a given 
a set of data, has a Gibbs distribution for some cost functional dependent upon 
a small neighborhood. Since the fields have a discrete range, to find the solution 
becomes a combinatorial optimization problem, that can be solved by means of 
methods like the Monte Carlo one (simulated annealing (Kirkpatrick and all, 1983), 
for example). However it has a main drawback: the amount of computer time 
needed for the implementation. 
We propose to approximate the solution of the problem formulated in the MRFs 
frame with its "average solution." The mean field theory (MFT) allows us to find 
deterministic equations for MRFs whose solution approximates the solution of the 
statistical problem. A class of neural networks can naturally solve these equations 
(Hopfield, 1984) (Koch et. al., 1985) (Geiger and Yuille, 1989). An advantage of 
such an approach is that the solution of the networks is faster than the Monte Carlo 
techniques, commonly used to deal with MRFs. 
A main novelty in this work, and a quite general one, is to show that the binary 
field representing the discontinuities can be averaged out to yield an effective the- 
ory independent of the binary field. The possibility of writing a set of equations 
describing the network is also useful for a better understanding of the nature of the 
solution and of the parameters of the model. We show the network performance in 
an example of image reconstruction from sparse data. 
662 Geiger and Girosi 
2 MRFs and Bayes approach 
One of the main attractions of MRFs models in vision is that they can deal directly 
with discontinuities. We consider coupled MRFs depending upon two fields, f 
and 1. For the problem of image reconstruction the field f represents the field to 
be smoothed and I represents the discontinuities. In this case I is a binary field, 
assuming the values I if there is a discontinuity and 0 otherwise. The Markov 
property asserts that the probability of a certain value of the field at any given 
site in the lattice depends only upon neighboring sites. According to the Clifford- 
HammersIcy theorem, the prior probability of a state of the fields f and I has the 
Gibbs form: 
P(f, t) = e 
(2.1) 
where f and I are the fields, e.g. the surface-field and its discontinuities, Z is the 
normalization constant also known as the partition function, U(f, l) = Y-i Ui(f, l) 
is an energy function that can be computed as the sum of local contributions from 
each lattice site i, and/ is a parameter that is called the inverse of the natural 
temperature of the field. If a sparse observation g for any given surface-field f is 
given and a model of the noise is available then one knows the conditional probability 
P(g[f, t). Bayes theorem then allows us to write the posterior distribution: 
P(f, lla) = P(aIf'I)P(f'I) = 
P(g) - 2' ' 
For the case of a sparse image corrupted by white gaussian noise 
V(f, tlg) =  A,(f -g,)a + U(f,/) 
i 
(z3) 
where ,j = 1 or 0 depending on whether data are available or not. V(f, llg) is 
sometimes called the visual cost function. The solution for the problem is the given 
by some estimate of the fields. The maximum of the posterior distribution or other 
related estimates of the "true" data-field value can not be computed analytically, 
but sample distributions of the field with the probability distribution of (2.2) can 
be obtained using Monte Carlo techniques such as the Metropolis algorithm. These 
algorithms sample the space of possible values of the fields according to the proba- 
bility distribution P(f , llg). 
A drawback of coupled MRFs has been the amount of computer time used in the 
Metropolis algorithm or in simulated annealing (Kirkpatrick et. al., 1983). 
A justification for using the mean field (MF) as a measure of the fields, f for ex- 
ample, resides in the fact that it represents the minimum variance Bayes estimator. 
More precisely, the average variance of the field f is given by 
Coupled Markov Random Fields and Mean Field Theory 663 
Va'f - Z( f - ])2P(f, llg ) 
where f is a given estimate of the field, the }-q,! represents the sum over all the 
possible configurations of f and l, and Va'! is the variance. Minimizing Va'! with 
respect to all possible values of f we obtain 
OOfVa,.r= 0 f- Y.fP(f, llg) 
This equation for ] defines the deterministic MF equations. 
2.1 MFT and Neural Networks 
To connect MRFs to neural networks, we use Mean field theory (MFT) to obtain 
deterministic equations from MRFs that represent a class of neural networks. 
The mean field for the values f and 1 at site i are given by 
 = ZfiP(f,l[g) and i = -.liP(f,l[g) (2.4) 
The sum over the binary process, l = 0, 1 gives for (2.3), using the mean field 
approximation, 
(e-U'(l,' ,t'=O) + 
where the partition function Z where factorized as Ill zi. In this case 
(2.5) 
Another way to write the equation for f is 
(2.6) 
where 
664 Geiger and Girosi 
Vi'll'ti"(f) = Ai(fi - g)a - ln(e -r'('rj'"t'=) + e -rr'('rj'"t'=)) 
(2.7) 
The important result obtained here is that the effective potential does not depen- 
dend on the binary field Ii. The line process field has been eliminated to yield a 
temperature dependent effective potential (also called visual cost function). The 
interaction of the field f with itself has changed after the line process has been 
averaged out. We interpret this result as the effect of the interaction of the line 
processes with the field f to yield a new temperature dependent potential. 
The computation of the sum over all the configurations of the field f is hard and 
we use the saddle point approximation. In this case is equivalent to minimize 
V'IIecti'(f). A dynamical equation to find the minimum of V eIlti is given by 
introducing a damping force t that Brings the system to equilibrium. Therefore the 
mean field equation under the mean field and saddle point approximation Becomes 
Equation (2.8) represents a class of .nsupervised neural networks coupled to (2.5). 
The mean field solution is given by the fixed point of (2.8) and (2.5) it is attained 
after running (2.8) and (2.5) as t  oo. This network is Better understood with an 
example of image reconstruction. 
3 Example: Image reconstruction 
To reconstruct images from sparse data and to detect discontinuities we use the 
weak membrane model where U'i(f, l) in two dimensions is given by 
Ui,j(f ,h, v) =  E [(fi,j - fi,j_)2(1-hi,j)+(fi,j- fi_x,j)2(1-vi,j)]+7(hi,j +vi,j) 
i,j 
and a and  ae positive parameters. 
The first term, contns the intertion between the field d the line processes: if 
the horizont or vetic gradient is very high at site (i, j) the corresponding ne 
process wl be very likely to be tive (i, = 1 or i, = 1), to make the visu cost 
function decrease and sign a discontinuity. The second term takes into account 
the pice we pay each time we create a discontinuity and is necessary to prevent 
the creation of discontinuities everywhere. The effective cost function (2.7) then 
becomes 
Coupled Markov Random Fields and Mean Field Theory 665 
c 
gl-I gt +1 
Figure 2: The network is represented for the one dimensional case. The lines are 
the connections 
Vf 'l ! = Z 
I _ _,x.." (1..l_e_pt 7_ i,j ))]] 
.., )) -' ,, 
(3.2) 
where A. a  = fij - fi-lj, A,j = fij -- fi,j-I and (2.5) is then given by 
1 
1 
and i,j = 1 + efl(-"(L,-L,-)') (3.3). 
we point out here that while the line process field is a binary field, its mean value 
is a continuous (analog) function in the range between 0 and 1. 
Discretizing (2.8) in time and applying for (3.2), we obtain 
(3.4) 
where /,j and 91j are given by the network (3.3) and n is the time step on the 
algorithm. We notice that (3.4) is coupled with (3.3) such that the field f is updated 
by (3.4) at step n and then (3.3) updates the field h and v before (3.4) updates field 
f again at step n + 1. 
This is a simple unsupervised neural network where the imput are the fields f and 
the output is the line process field h or v. This network is coupled to the network 
(2.8) to solve for the field f and then constitute the global network for this problem 
(see figure 2). It has been shown by many authors and (Geiger and Yuille, 1989) 
that these class of networks is equivalent to Hopfield networks (Hopfield, 1984) 
(Koch et. al., 1985). 
666 Geiger and Girosi 
Figure 3: a. The still life image 18 x 18 pizels. The image smoothed with 
7 -- 1400 and  - 4 for  iterations. The line process field (needs thinning). b. A 
face image of 18 x 18 pizels. Randomly chosen 50  of the original image (for 
display the other 50 are filled with white dots). c. The network described above is 
applied to smooth and fill in using the same parameters and for 10 iterations. For 
comparison we show the results of simply bluring the sparse data (no line process 
field). 
An important connection we make is to show (Geiger and Girosi, 1989) (Geiger, 
1989) that the work of Blake and Zisserman (Blake and Zisserman, 1987) can be 
seen as an approximation of these results. 
Coupled Markov Random Fields and Mean Field Theory 667 
In the zero temperature limit ( -, go) (3.3) becomes the Heaviside function (1 
or O) and the interpretation is simple: when the horizontal or vertical gradient are 
larger than a threshold (') a vertical or horizontal discontinuity is created. 
4 Results 
We applied the network to a real still life image and the result was an enhancement 
of specular edges, shadow edges and some other contours while smoothing out the 
noise (see Figure 3a). This result is consistent with all the images we have used. 
From one face image we produced sparse data by randomly suppressing 50% of 
the data. (see Figure 3b). We then applied the neural network to reconstruct the 
image. 
.4, cknowledgement s 
We are grateful to Tomaso Poggio for his guidance and support. 
References 
A. Blake and A. Zisserman. (1987) Visual Reconstruction. Cambridge, Mass: 
MIT Press. 
E. Gamble and D. Geiger and T. Poggio and D. Weinshall. (1989) Integration of 
vision modules and labeling of surface discontinuities. Invited paper to IEEE Trans. 
Sustems, Man & Cybernetics, December. 
D. Geiger and F. Girosi. (1989) Parallel and deterministic algorithms for MR_Is: 
surface reconstruction and integration. A.I. Memo No.1114. Artificial Intelligence 
Laboratory of MIT. 
D. Geiger. (1989) Visual models with statistical field theory. Ph.D. thesis. MIT, 
Physics department and Artificial Intelligence Laboratory. 
D. Geiger and A. Yuille. (1989) A common framework for image segmentation 
and surface reconstruction. Harvard Robotics Laboratory Technical Report, 89-7, 
Harvard, August. 
S. Geman and D. Geman. (1984) Stochastic ReIazation, Gibbs Distributions, and 
the Bayesian Restoration of Images. Pattern Analysis and Machine Intelligence, 
PAMI-6:721-741. 
J.J. Hopfield. (1984) Neurons with graded response have collective computational 
properties like those of two.state neurons. Proc. Natl. Acad. Sci., 81:3088-3092, 
S. Kirkpatrick and C. D. Gelart and M.P. Vecchi. (1983) Optimization by 
Simulated Annealina. Science. 220:219-227. 
C. Koch and J. Marroquin and A. Yuille. (1985) Analog 'Neuronat' Networks in 
Early Vision. Proc. Natl. Acad. Sci., 83:4263-4267. 
J. L. Marroquin and S. Mitter and T. Poggio. (1987) Probabilistic Solution of 
Ill-Posed Problems in Computational Vision. J. Amer. Star. Assoc., 82:76-89. 
