An application of Reversible-Jump 
MCMC to multivariate spherical Gaussian 
mixtures 
Alan D. Marrs 
Signal & Information Processing Dept. 
Defence Evaluation & Research Agency 
Gr. Malvern, UK WR14 3PS 
marrs signal. dra. hrng. g b 
Abstract 
Applications of Gaussian mixture models occur frequently in the 
fields of statistics and artificial neural networks. One of the key 
issues arising from any mixture model apphcation is how to es- 
timate the optimum number of mixture components. This paper 
extends the Reversible-Jump Markov Chain Monte Carlo (MCMC) 
algorithm to the case of multivariate spherical Gaussian mixtures 
using a hierarchical prior model. Using this method the number 
of mixture components is no longer fixed but becomes a parm- 
eter of the model which we shall estimate. The Reversible-Jump 
MCMC algorithm is capable of moving between parameter sub- 
spaces which correspond to models with different numbers of mix- 
ture components. As a result a sample from the full joint distribu- 
tion of all unknown model parameters is generated. The technique 
is then demonstrated on a simulated example and a well known 
vowel dataset. 
1 Introduction 
Applications of Gaussian mixture models regularly appear in the neural networks 
literature. One of their most common roles in the field of neural networks, is in 
the placement of centres in a radial basis function network. In this case the basis 
functions are used to model the distribution of input data (Xi = [a:x,a:2, ...,a:d] r, 
(i = 1, n)), and the problem is one of mixture density estimation. 
578 A. D. Marrs 
k 
j----1 
where k is the number of mixture components, r the weight or mixing propor- 
tion for component j and O 5 the component parameters (mean & variance in this 
case). The mixture components represent the basis functions of the neurs.1 network 
and their parameters (centres & widths) my be estimated using the expectation- 
maximisafion (EM) algorithm. 
One of the key issues arising in the use of mixture models is how to estimate the 
number of components. This is a model selection problem: the problem of choosing 
the 'correct' number of components for a mixture model. This my be thought of 
as one of comparing two (or more) mixture models with different components, and 
choosing the model that is 'best' based upon some criterion. For example, we might 
compare a two component model to one with a single component. 
p(XlO); + - 
(2) 
This may appear to be a case of testing of nested hypotheses. However, it has 
been noted [5] that the standard frequentist hypothesis testing theory (generalised 
likelihood ratio test) does not apply to this problem because the desired regularity 
conditions do not hold. In addition, if the models being tested have 2 and 3 compo- 
nents respectively, they are not strictly nested. For example, we could equate any 
pair of components in the three component model to the components in the two 
component model, yet how do we choose which component to 'leave out'? 
2 Bayesian approach to Gaussian mixture models 
A full Bayesian analysis treats the number of mixture components as one of the 
parameters of the model for which we wish to find the conditional distribution. In 
this case we would represent the joint distribution as a hierarchical model where we 
may introduce prior distributions for the model parameters, ie. 
p(k, r, z, (9, X) = p(k)p(rJk)p(zpr, k)p((9Jz, r, k)p(XJ(9, z, r, k), 
(3) 
k 
where r = (rS)=, (9 ((95 k 
= )5=x and z = (zi)?=x are allocation variables introduced 
by treating mixture estimation as a hidden data problem with zi allocating the ith 
observation to a particular component. A simplified version of this model can be 
derived by imposing further conditional independencies, leading to the following 
expression for the joint distribution 
p(k, r, z, (9, X) = p( k)p(rlk)p (z]r, k)p((glk)p(Xl(9 , z). 
(4) 
In addition, we add an extra layer to the hierarchy representing priors on the model 
parameters giving the final form for the joint distribution 
p(A, 5, 7, k, z, (9, X) = k)x 
p((glk, r)p(Xl(9, z). 
(5) 
Until recently a full Bayesian analysis has been mathematically intractable. Model 
comparison was carried out by conducting an extensive search over all possible 
Reversible-Jump MCMC for Multivariate Spherical Gaussian Mixtures 579 
model orders comparing Bayes factors for all possible pairs of models. What we 
really desire is a method which will estimate the model order along with the other 
model parameters. Two such methods based upon Markov Chain Monte Carlo 
(MCMC) techniques are reversible-jump MCMC [2] and jump-diffusion [3]. 
In the following sections, we extend the reversible-jump MCMC technique to multi- 
vatlate spherical Gaussian mixture models. Results axe then shown for a simulated 
example and an example using the Peterson-Baxney vowel data. 
3 Reversible-jump MCMC algorithm 
Following [4] we define the priors for our hierarchical model and derive a set of 
5 move types for the reversible jump MCMC sampling scheme. To simplify some 
of the MCMC steps we choose a prior model where the prior on the weights is 
Dirichlet and the prior model for /j = [/j, ...,/ja]T and af 2 is that they are 
drawn independently with normal and gamma priors, 
r  D(6,...,6) , Ij ~ N(rI, A -) , aj ~ F(a,/3), (6) 
where for the purposes of this study we follow[4] and define the hyper-parameters 
thus: 6 - 1.0; r/ is set to be the mean of the data; A is the diagonal precision 
matrix for the prior on/j with components a1 which axe taken to be 1]r' where 
r) is the data range in dimension j; a = 2.0 and  is some small multiple of 1/r. 
The moves then consist of: I: updating the weights; II: updating the paxameters 
(/, a); III: updating the allocation; IV: updating the hyper-paxameters; V: split- 
ting one component into two, or combining two into one. 
The first 4 moves are relatively simple to define, since the conjugate nature of the 
priors leads to relatively simple forms for the full conditional distribution of the 
desired parameter. Thus the first 4 moves are Gibbs sampling moves and the full 
conditional distributions for the weights rj, means/j, variances aj and allocation 
variables zi are given by: 
p(rj]...) ., D(6 + n, ...,6 + nk ), 
(7) 
where n, is the number of observations allocated to component k; 
(s) 
where we recognise that/zj is an d dimensional vector with components j.. (m -- 
1, d), Tm are the components of the/j prior mean and am represent the diagonal 
components of A. 
p(er-2[...) = I'(, + nj - 1,  Z (Xi -/j):r(xi -/) +/); (9) 
i=l:zi=l 
and 
p(zi = Jl'") c -- exp(- 
Z (xi (10) 
m= 1 0 
580 A. D. Marrs 
The final move involves splitting/combining model components. The main criteria 
which need to be met when designing these moves are that they are irreducible, 
aperiodic, form a reversible pair and satisfy detailed balance/1]. The MCMC step 
for this move takes the form of a Metropolis-Hastings step where a move from state 
y to state y' is proposed, with r(y) the target probability distribution and qm(Y, 
the proposal distribution for the move m. The resulting move is then accepted with 
probability am 
{ r(Y')qm(Y"Y) 1 (11) 
am = rain 1, r(y)qm(y,y') ' 
In the case of a move from state y to a state y which lies in a higher dimensional 
space, the move may be implemented by drawing a vector of continuous random 
variables u, independent of y. The new state y is then set using an invertible de- 
terministic function of z and u. It can be shown [2] that the acceptance probability 
is then given by 
am=rain 1, (Y ) re(Y) Oy' 
r(y)rm(y)q(u) I O(y, u) I ' (12) 
where r,(y) is the probability of choosing move type m when in state y, and q(u) 
is the density function of u. 
The initial application of the reversible jump MCMC technique to normal mixtures 
[4] was limited to the univariate case. This yielded relatively simple expressions for 
the split/combine moves, and, most importantly, the determinant of the Jacobian of 
the traformation from a model with k components to one with k q- 1 components 
was simple to derive. In the more general case of multivariate normal models care 
must be taken in prescribing move transformations. A complicated transformation 
will leaA to problems when the IJacobianl for a d-dimensional model is required. 
For multivariate spherical Gaussian models, we randomly choose a model compo- 
nent from the current k component model. The decision is then maAe to split or 
combine with one of its neighbours with probability Pk and pck respectively (where 
pc = 1 -p ). If the choice is to combine the component, we label the chosen com- 
ponent zl, and choose z2 to be a neighbouring component j with probability c 1/rj 
where rj is the distance from the component zl. The new component resulting from 
the combination of z and z2 is labelled zc and its parameters are calculated from: 
(13) 
If the decision is to split, the chosen component is labelled zc and it is used to define 
two new model components zx and zo. with weights and parameters conforming to 
(13). In making this transformation there are 2 q- d degrees of freedom, so we need 
to generate 2 q- d random numbers to enable the specification of the new component 
parameters. The random numbers are denoted u, u = [u,...,u] :r and u3. 
All are drawn from Beta(2, 2) distributions while the components of u2 each have 
probability 0.5 of being negative. The split transformation is then defined by: 
V rz / 
Reversible-Jump MCMC for Multivariate Spherical Gaussian Mixtures 581 
= (14) 
tYzl U3tYzc , z 
z za 
Once the new components have bn defined it is necessary to evute the proba- 
bility of choosing to combine component zx with component z in this new model. 
Hving propod the spSt/combine move 1 that remns is to cculate the 
Metropos-Htings cceptance probability a, where a = rain(l, R) for the spt 
move d a = rain(l, l/R) for the combine move. Where in the ce of split move 
kom  model with k components to one with k + 1 components, or  combine move 
kom k + 1 to k, R is ven by: 
t--l+n I 8--1+n 
P* Pttoc 
'' (15) 
(2((1--ul)ul)(+l)/ (1--u)u)  
where g2,2() denotes  Beta(2, 2) density function. The first line on the R.H.S is 
due to the rtio of 5kelihoods for tho objections signed to the components in 
question, the subsequent thr lines re due to the prior ratios, the fifth line is due to 
the the propos rtio and the lt ne due to the [Jobian] of the trsformtion. 
The term padloc reprents  combination of the probbity of obtning the current 
ocation of dta to the components in question d the probabiSty of choosing to 
combine components zx and 
4 Results 
To assess this approach to the estimation of multivariate spherical Gaussian mix- 
ture models, we firstly consider a toy problem where 1000 bivariate samples were 
generated from a known 20 component mixture model. This is followed by an anal- 
ysis of the Peterson-Barney vowel data set comprising 780 samples of the measured 
amplitude of four formant frequencies for 10 utterances. For this mixture estima- 
tion example, we ignore the class labels and consider the straight forward density 
estimation problem. 
4.1 Simulated data 
The resulting reversible-jump MCMC chain of model order can be seen in figure 
1, along with the resulting histogram (after rejecting the first 2000 MCMC sam- 
ples). The histogram shows that the maximum a posterJori value for model order 
is 17. The MAP estimate of model parameters was obtained by averaging all the 
17 component model samples, the estimated model is shown in figure 2 alongside 
the original generating model. The results are rather encouraging given the large 
number of model components and the relatively small number of samples. 
582 A. D. Marrs 
4000 
3000 
2000 
lOOO 
o 
2000 4000 6000 BOO0 10000 16 17 lB 19 20 
Iteration {k} 
Figure 1' Reversible-jump MCMC chain and histogram of model order for simulated 
data. 
Figure 2: Example of model estimation for simulated data. 
4.2 Vowel data 
The reversible-jump MCMC chain of model order for the Peterson-Barney vowel 
data example is shown in figure 3, alongside the resulting MAP model estimate. 
For ease of visualisation, the estimated model and data samples have been pro- 
jected onto the first two principal components of the data. Again, the results are 
encouraging. 
5 Conclusion 
One of the key problems when using Gaussian mixture models is estimation of the 
optimum number of components to include in the model. !u this paper we extend 
the reversible-jump MCMC technique for estimating the parameters of Gaussian 
mixtures with an unknown number of components to the multivariate spherical 
Gaussian case. The technique is then demonstrated on a simulated data example 
and an example using a well known dataset. 
The attraction of this approach is that the number of mixture components is not 
fixed at the outset but becomes a parameter of the model. The reversible-jump 
MCMC approach is then capable of moving between parameter subspaces which 
Reversible-Jump MCMC for Multivariate Spherical Gaussian Mixtures 583 
14 
Figure 3: Reversible-jump MCMC chain of model order and MAP estimate of model 
(projected onto first two principal components) for vowel data. 
correspond to models with different numbers of mixture components. As a result 
a sample of the full joint distribution is generated from which the posterior distri- 
bution for the number of model components can be derived. This information may 
then either be used to construct a Bayesian classifier or to define the centres in a 
radial basis function network. 
References 
[1] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter Eds. Markov Chain Monte 
Carlo in Practice. Chapman and Hall, 1995. 
[2] P.J. Green. Reversible jump MCMC computation and Bayesian model determi- 
nation. Boirnetrika, 82:711-732, 1995. 
D.B. Phillips and A.F.M. Smith. Bayesian model comparison via jump diffu- 
sions. In W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, editors, Markov 
Chain Monte Carlo in Practice. Chapman and Hall, 1995. 
[4] S. Richardson and P. J. Green. On Bayesian analysis of mixtures with an un- 
known number of components. J. Rotai Star. Soc. Series B, 59(4), 1997. 
[5] D.M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical Analysis of Finite 
Mixture Distributions. Wiley, 1985. 
British Crown Copyright 1998/DERA. 
Published with the permission of the controller of Her Britannic Majesty's 
Stationary Office. 
