Bayesian modelling of fMRI time series 
Pedro A.d. E R. H0jen-S0rensen, Lars K. Hansen and Carl Edward Rasmussen 
Department of Mathematical Modelling, Building 321 
Technical University of Denmark 
DK-2800 Lyngby, Denmark 
phs, lkhansen, carlimm. dtu. dk 
Abstract 
We present a Hidden Markov Model (HMM) for inferring the hidden 
psychological state (or neural activity) during single trial fMRI activa- 
tion experiments with blocked task paradigms. Inference is based on 
Bayesian methodology, using a combination of analytical and a variety 
of Markov Chain Monte Carlo (MCMC) sampling techniques. The ad- 
vantage of this method is that detection of short time learning effects be- 
tween repeated trials is possible since inference is based only on single 
trial experiments. 
1 Introduction 
Functional magnetic resonance imaging (fMRI) is a non-invasive technique that enables 
indirect measures of neuronal activity in the working human brain. The most common 
fMRI technique is based on an image contrast induced by temporal shifts in the relative 
concentration of oxyhemoglobin and deoxyhemoglobin (BOLD contrast). Since neuronal 
activation leads to an increased blood flow, the so-called hemodynamic response, the mea- 
sured fMRI signal reflects neuronal activity. Hence, when analyzing the BOLD signal 
there are two unknown factors to consider; the task dependent neuronal activation and the 
hemodynamic response. Bandettini et al. [1993] analyzed the correlation between a bi- 
nary reference function (representing the stimulus/task sequence) and the BOLD signal. 
In the following we will also make reference to the binary representation of the task as 
the paradigm. Lange and Zeger [1997] discuss a parameterized hemodynamic response 
adapted by a least squares procedure. Multivariate strategies have been pursued in [Wors- 
ley et al. 1997, Hansen et al. 1999]. Several explorative strategies have been proposed 
for finding spatio-temporal activation patterns without explicit reference to the activation 
paradigm. McKeown et al. [1998] used independent component analysis and found sev- 
eral types of activations including components with "transient task related" response, i.e., 
responses that could not simply be accounted for by the paradigm. The model presented 
in this paper draws on the experimental observation that the basic coupling between the 
net neural activity and hemodynamic response is roughly linear while the relation between 
neuronal response and stimulus/task parameters is often nonlinear [Dale 1997]. We will 
represent the neuronal activity (integrated over the voxel and sampling time interval) by a 
binary signal while we will represent the hemodynamic response as a linear filter of un- 
known form and temporal extent. 
Bayesian Modelling of fMRI Time Series 755 
2 A Bayesian model of fMRI time series 
Let s = {st  t = 0,... , T - 1} be a hidden sequence of binary state variables st 
{0, 1}, representing the state of a single voxel over time; the time variable, t, indexes the 
sequence of fMRI scans. Hence, st is a binary representation of the neural state. The 
hidden sequence is governed by a symmetric first order Hidden Markov Model (HMM) 
with transition probability oz = P(St+l = jlSt = j). We expect the activation to mimic 
the blocked structure of the experimental paradigm so for this reason we restrict oz to be 
larger than one half. The predicted signal (noiseless signal) is given by tt = h. s + 00 + 01 t, 
where  denotes the linear convolution and h is the impulse response of a linear system of 
order Mr. The dc off-set and linear trend which are typically seen in fMRI time series 
are given by 00 and 01, respectively. Finally, it is assumed that the observable is given by 
2 The generative model 
zt = lt + t, where t is iid. Gaussian noise with variance a n. 
considered is therefore given by: 
p(stlst-i,oz) - oZCs,,s,_x + (1 - c0(1 - d,,,_i), 
(,) 
p(zls, an,O, Ms) AC(y, an2I), where y: {Yt}: HO, and z: {zt}. 
Furthermore, d,,_x is the usual Kronecker delta and H = [1, , 3'0s, 3'is,   , 3'Ms-is], 
where 1 = (1,...1)', =(1,...,T)'/T and % is a /-step shift operator, that is 3'is = 
(0,..., 0, so, Sl,..., ST-l-i)'. The linear parameters are collected in 0 = (00,0i, 
s 
S 2 
X 1 X 2 
S: ST_ 
-() 
X 3 XT_ 1 
The graphical model. The hidden states 
Xt = (St-l,St-2,...,St_(Ms_l)) have 
been introduced to make the model first or- 
der. 
3 Analytic integration and Monte Carlo sampling 
In this section we introduce priors over the model parameters and show how inference may 
be performed. The filter coefficients and noise parameters may be handled analytically, 
whereas the remaining parameters are treated using sampling procedures (a combination 
of Gibbs and Metropolis sampling). Like in the previous section explicit reference to the 
filter order M s may be omitted to ease the notation. 
The dc off-set 00 and the linear trend 01 are given (improper) uniform priors. The filter 
coefficients are given priors that are uniform on an interval of length  independently for 
each coefficient: 
p(h[Mf ) = {  
for Ihi[<-, O_<i_<Mf-1 
otherwise 
Assuming that all the values of 0 for which the associated likelihood has non-vanishing 
contributions lie inside the box where the prior for 0 has support, we may integrate out the 
filter coefficients via a Gaussian integral: 
f 
P(Zlan'S'Mf): P(ZlO'an'S'Mf)P(OlMI)dO -- M 
exp ( z'z- ' 
2On )' 
756 PA. d. E R. Hojen-Sorensen, L. K. Hansen and C. E. Rasmussen 
We have here defined the mean filter, ts = (HHs)-iHz and mean predicted signal, 
) = Hs, for given state and filter length. We set the interval-length,/ to be 4 times the 
standard deviation of the observed signal z. This is done, since the response from the filter 
should be able to model the signal, for which it is thought to need an interval of plus/minus 
two standard deviations. 
We now proceed to integrate over the noise parameter; using the (improper) non- 
informative Jeffreys prior, p(Crn) oc cr -1 , we get a Gamma integral: 
f T-2% 
p(zls, Mf): p(z[crn,S, MI)p(crn)dcrn = 2 
(z,z- 
The remaining variables cannot be handled analytically, and will be treated using various 
forms of sampling as described in the following sections. 
3.1 Gibbs and Metropolis updates of the state sequence 
We use a flat prior on the states, p(st = O) = p(st = 1), together with the first order 
Markov property for the hidden states and Bayes' rule to get the conditional posterior for 
the individual states: 
p(st = jls\st,c,M) oc p(st = jlst_l,O)p(st+llSt = j,o)p(zls, Mf). 
These probabilities may (in normalized form) be used to implement Gibbs updates for the 
hidden state variables, updating one variable at a time and sweeping through all variables. 
However, it turns out that there are significant correlations between states which makes it 
difficult for the Markov Chain to move around in the hidden state-space using only Gibbs 
sampling (where a single state is updated at a time). To improve the situation we also 
perform global state updates, consisting of proposing to move the entire state sequence 
one step forward or backward (the direction being chosen at random) and accepting the 
proposed state using the Metropolis acceptance procedure. The proposed movements are 
made using periodic boundary conditions. The Gibbs sweep is computationally involved, 
since it requires computation of several matrix expressions for every state-variable. 
3.2 Adaptive Rejection Sampling for the transition probability 
The likelihood for the transition probability o is derived from the Hidden Markov Model: 
p(slc) 
T-1 
1 aE( ) (1 - a) 
H P($tlst-l')--  
t=l 
T-1 
where E(s) = E/=i 6st ,st-x is the number of neighboring states in s with identical values. 
The prior on the transition probabilities is uniform, but restricted to be larger than one half, 
since we expect the activation to mimic the blocked structure of the experimental paradigm. 
It is readily seen that p(o[s) x p(slo ), o E [21-, 1] is log-concave. Hence, we may use 
the Adaptive Rejection Sampling algorithm [Gilks and Wild, 1992] to sample from the 
distribution for the transition probability. 
3.3 Metropolis updates for the filter length 
In practical applications using real fMRI data, we do typically not know the necessary 
length of the filter. The problem of finding the "right" model order is difficult and has re- 
ceived a lot of attention. Here, we let the Markov Chain sample over different filter lengths, 
effectively integrating out the filter-length rather than trying to optimize it. Although the 
Bayesian Modelling of fMRI Time Series 757 
value of Mf determines the dimensionality of the parameter space, we do not need to use 
specialized sampling methodology (such as Reversible Jump MCMC [Green, 1995]), since 
those parameters are handled analytically in our model. We put a flat (improper) prior on 
Mf and propose new filter lengths using a Gaussian proposal centered on the current value, 
with a standard deviation of 3 (non-positive proposed orders are rejected). This choice of 
the standard deviation only effects the mixing rate of the Markov chain and does not have 
any influence on the stationary distribution. The proposed values are accepted using the 
Metropolis algorithm, using p(Mfls, y) oc p(yl s, Mf ). 
3.4 The posterior mean and uncertainty of the predicted signal 
Since 0 has a flat prior the conditional probability for the filter coefficients is proportional 
to the likelihood p( zlO , .) and by (,) we get: 
D 2  , 
p(Olz, s, an,Mf) ,,,.A/'( sZ, anDD), Ds = (HH)-H' . 
The posterior mean of the predicted signal, 9, is then readily computed as: 
= = = = 
where F -- HD. Here, the average over 0 and an is done analytically, and the average 
over the state and filter length is done using Monte Carlo. The uncertainty in the posterior, 
can also be estimated partly by analytical averaging, and partly Monte Carlo: 
= - - 
_ 1 ((z'z ^'^'F F '\ (Fszz'F ' . 
- T - M s - 2 - VY)  /,s + ),s - ))' 
4 Example: synthetic data 
In order to test the model, we first present some results on a synthetic data set. A signal 
z of length 100 is generated using a Mf - 10 order filter, and a hidden state sequence s 
consisting of two activation bursts (indicated by dotted bars in figure 1 top left). In this 
example, the hidden sequence is actually not generated from the generarive model (,); 
however, it still exhibits the kind of block structure that we wish to be able to recover. 
The model is run for 10000 iterations, which is sufficient to generate 500 approximately 
independent samples from the posterior; figure 2 (right) shows the autoocovariance for 
Mf as a function of the iteration lag. It is thought that changes in Mf are indicative of 
correlation time of the overall system. 
The correlation plot for the hidden states (figure 2, left) shows that the state activation onset 
correlates strongly with the second onset and negatively with the end of the activation (and 
vice versa). This indicates that the Metropolis updates described in section 3.1 may indeed 
be effective. Notice also that the very strong correlation among state variables does not 
strongly carry over to the predicted signal (figure 1, bottom right). 
To verify that the model can reasonably recover the parameters used to generate the data, 
posterior samples from some of the model variables are shown in figure 3. For all these 
parameters the posterior density is large around the correct values. Notice, that there in the 
original model (,) is an indeterminacy in the simultaneous inference of the state sequence 
and the filter parameters (but no indeterminacy in the predicted signal); for example, the 
same signal is predicted by shifting the state sequence backward in time and introducing 
leading zero filter coefficients. However, the Bayesian methodology breaks this symmetry 
by penalizing complex models. 
758 P. A. d. E R. Hojen-Sorensen, L. K. Hansen and C. E. Rasmussen 
N 
 4 
._c 
- 0 
o 
z 
-2 ' 
0 20 40 60 80 1 O0 
Scan number, t 
6 
4 
2 
0 
*.- 20 
O9 80 
20 40 60 
Scan number, t 
-2 , 1 O0 
0 20 40 60 80 100 20 40 60 80 100 
Scan number, t Scan number, t 
lOO 
0.8 
__0.6 
0.4 
0.2 
0 
-0.2 
Figure 1: Experiments with synthetic data. Top left, the measured response from a voxel 
is plotted for 100 consecutive scans. In the bottom left, the underlying signal is seen in 
thin, together with the posterior mean, ) (thick), and two std. dev. error-bars in dotted. Top 
right, the posterior probabilities are shown as a grey level, for each scan. The true activated 
instances are indicated by the dotted bars and the pseudo MAP estimate of the activation 
sequence is given by the crossed bars. Bottom right, shows the posterior uncertainty 
The posterior mean and the two standard deviations are plotted in figure 1 bottom left. No- 
tice, however, that the distribution of t is not Gaussian, but rather a mixture of Gaussians, 
and is not necessarily well characterized by mean and variance alone. In figure 1 (top left), 
the distribution of tt is visualized using grey-scale to represent density. 
5 Simulations on real fMRI data and discussion 
In figure 4 the model has been applied to two measurements in the same voxel in visual 
cortex. The fMRI scans were acquired every 330 ms. The experimental paradigm consisted 
of 30 scans of rest followed by 30 scans of activation and 40 rest. Visual activation con- 
sisted of a flashing (8 Hz) annular checkerboard pattern. The model readily identifies the 
activation burst of somewhat longer duration than the visual stimulus and delayed around 
2 seconds. The delay is in part caused by the delay in the hemodynamic response. 
These results show that the integration procedure works in spite of the very limited data 
at hand. In figure 4 (top) the posterior model size suggests that (at least) two competing 
models can explain the signal from this trial. One of these models explains the measured 
signal as a simple square wave function which seems reasonable by considering the signal. 
Conversely, figure 4 (bottom), suggests that the signal from the second trial can not be 
explained by a simple model. This too, seems reasonable because of the long signal raise 
interval suggested in the signal. 
Bayesian Modelling of fMRI Time Series 759 
Hidden state variables, s 
o 
lag 
Figure 2: The covariance of the hidden states based on a long run of the model is shown to 
the left. Notice, that the states around the front (back) of the activity "bumps" are highly 
(anti-) correlated. Right: The auto-covariance for the filter length M/as a function of the 
lag time in iterations. The correlation length is about 20, computed as the sum of auto- 
covariance coefficients from lag -400 to 400. 
Since the posterior distribution of the filter length is very broad it is questionable whether 
an optimization based procedure such as maximum likelihood estimation would be able 
to make useful inference in this case were data is very limited. Also, it is not obvious 
how one may use cross-validation in this setting. One might expect such optimization 
based strategies to get trapped in suboptimal solutions. This, of course, remains to be 
investigated. 
6 Conclusion 
We have presented a model for voxel based explorative data analysis of single trial fMRI 
signals during blocked task activation studies. The model is founded on the experimental 
observation that the basic coupling between the net neural activity and hemodynamic re- 
sponse is roughly linear. The preliminary investigation reported here are encouraging in 
that the model reliably detects reasonable hidden states from the very noisy fMRI data. 
One drawback of this method is that the Gibbs sampling step is computational expensive. 
To improve on this step one could make use of the large class of variational/mean field 
methods known from the graphical models literature. Finally, current work is in progress 
for generalizing the model to multiple voxels, including spatial correlation due to e.g. spill- 
over effects. 
0.15 
0.1 
0.05 
0.15 
0.1 
0.05 
0 
1 1.2 
0.15 
0.1 
0.05 
0 
.5 2 2.5 
DC off-set 
-2 -1 0 
Trend 
Oo 
0 5 10 
Mf 
Figure 3' Posterior distributions of various model parameters. The parameters used to 
generate the data are: a = 1.0, DC off-set -- 2, trend = -1 and filter order Mf -- 10. 
760 P. A. d. E R. Hojen-Sorensen, L. K. Hansen and C. E. Rasmussen 
320 
300 
280 
280 
240 i 
22O 
2O0 
180 
0 
280 , 
260 
24O 
22O 
20O 
180 
180 
20 40 60 80 100 
Scan number, t 
0 20 40 80 80 
Scan number, t 
oo 
0.1 
0.05 
12 14 16 18 20 
0.1 
0.05 
0 5 10 15 20 
Mf 
0.15 
0.1 
0.05 
o 
0.15 
10 12 14 16 
0.1 
0.05 
0 
0 
5 10 15 20 
Mf 
Figure 4: Analysis of two experimental trials of the same voxel in visual cortex. The left 
hand plot shows the posterior inferred signal distribution superimposed by the measured 
signal. The dotted bar indicates the experimental paradigm and the crossed bar indicates 
the pseudo MAP estimate of the neural activity. To the right the posterior noise level and 
inferred filter length are displayed. 
Acknowledgments 
Thanks to Egill Rostrup for providing the fMRI data. This work is funded by the Danish 
Research Councils through the Computational Neural Network Center (CONNECT) and 
the THOR Center for Neuroinformatics. 
References 
Bandettini, P. A. (1993). Processing strategies for time-course data sets in functional MRI of the 
human brain Magnetic Resonance in Medicine 30, 161-173. 
Dale, A.M. and R. L. Bucknet (1997). Selective Averaging of Individual Trials Using fMRI. Neu- 
rolmage 5, Abstract S47. 
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model 
determination. Biometrika 82, 711-732. 
Gilks, W. R. and P. Wild (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statis- 
tics 41,337-348. 
Hansen, L. K. et al. (1999). Generalizable Patterns in Neuroimaging: How Many Principal Compo- 
nents? Neurolmage, to appear. 
Lange, N. and S. L. Zeger (1997). Non-linear Fourier time series analysis for human brain mapping 
by functional magnetic resonance imaging. Journal of the Royal Statistical Society - Series C Applied 
Statistics 46, 1-30. 
McKeown, M. J. et al. (1998). Spatially independent activity patterns in functional magnetic reso- 
nance imaging data during the stroop color-naming task. Proc. Natl. Acad. Sci. USA. 95, 803-810. 
Worsley, K. J. et al. (1997). Characterizing the Response of PET and fMRI Data Using Multivariate 
Linear Models (MLM). Neurolmage 6, 305-319. 
