An Analysis of Turbo Decoding 
with Gaussian Densities 
Paat Rusmevichientong and Benjamin Van Roy 
Stanford University 
Stanford, CA 94305 
( paatrus, bvr) stanf ord. edu 
Abstract 
We provide an analysis of the turbo decoding algorithm (TDA) 
in a setting involving Gaussian densities. In this context, we are 
able to show that the algorithm converges and that - somewhat 
surprisingly - though the density generated by the TDA may differ 
significantly from the desired posterior density, the means of these 
two densities coincide. 
I Introduction 
In many applications, the state of a system must be inferred from noisy observations. 
Examples include digital communications, speech recognition, and control with in- 
complete information. Unfortunately, problems of inference are often intractable, 
and one must resort to approximation methods. One approximate inference method 
that has recently generated spectacular success in certain coding applications is the 
turbo decoding algorithm [1, 2], which bears a close resemblance to message-passing 
algorithms developed in the coding community a few decades ago [4]. It has been 
shown that the TDA is also related to well-understood exact inference algorithms 
[5, 6], but its performance on the intractable problems to which it is applied has 
not been explained through this connection. 
Several other papers have further developed an understanding of the turbo decoding 
algorithm. The exact inference algorithms to which turbo decoding has been related 
are variants of belief propagation [7]. However, this algorithm is designed for in- 
ference problems for which graphical models describing conditional independencies 
form trees, whereas graphical models associated with turbo decoding possess many 
loops. To understand the behavior of belief propagation in the presence of loops, 
Weiss has analyzed the algorithm for cases where only a single loop is present Ill]. 
Other analyses that have shed significant light on the performance of the TDA in 
its original coding context include [8, 9, 10]. 
In this paper, we develop a new line of analysis for a restrictive setting in which un- 
derlying distributions are Gaussian. In this context, inference problems are tractable 
and the use of approximation algorithms such as the TDA are unnecessary. How- 
ever, studying the TDA in this context enables a streamlined analysis that generates 
new insights into its behavior. In particular, we will show that the algorithm con- 
verges and that the mean of the resulting distribution coincides with that of the 
576 P Rusmevichientong and B. V. Roy 
desired posterior distribution. 
While preparing this paper, we became aware of two related initiatives, both in- 
volving analysis of belief propagation when priors are Gaussian and graphs possess 
cycles. Weiss and Freeman [12] were studying the case of graphs possessing only 
cliques of size two. Here, they were able to show that, if belief propagation con- 
verges, the mean of the resulting approximation coincides with that of the true 
posterior distribution. At the same time, Frey [3] studied a case involving graphical 
structures that generalize those employed in turbo decoding. He also conducted an 
empirical study. 
The paper is organized as follows. In Section 2, we provide our working definition 
of the TDA. In Section 3, we analyze the case of Gaussian densities. Finally, a 
discussion of experimental results and open issues is presented in Section 4. 
2 A Definition of Turbo Decoding 
Consider a random variable x taking on values in n distributed according to a 
density P0. Let Yl and Y2 be two random variables that are conditionally indepen- 
dent given x. For example, yl and y2 might represent outcomes of two independent 
transmissions of the signal x over a noisy communication channel. If y and y2 are 
observed, then one might want to infer a posterior density f for x conditioned on 
y and y2. This can be obtained by first computing densities p and p, where the 
first is conditioned on y and the second is conditioned on y2. Then, 
where ct is a "normalizing operator" defined by 
g 
.g _-- f 
and multiplication/division are carried out pointwise. 
Unfortunately, the problem of computing f is generally intractable. The computa- 
tional burden associated with storing and manipulating high-dimensional densities 
appears to be the primary obstacle. This motivates the idea of limiting attention 
to densities that factor. In this context, it is convenient to define an operator r 
that generates a density that factors while possessing the same marginals as another 
density. In particular, this operator is defined by 
n 
= H f ^ 
i=1 ' [ i=ai ) 
for all densities g and all a  n, where d A di = dl." di-ldi+l" 'dn. 
One may then aim at computing f  a proxy for f. Unfortunately, even this 
problem is generally intractable. The TDA can be viewed  an iterative algorithm 
for approximating f. 
Let operators F and F2 be defined by 
k p0/ 
and 
An Analysis of Turbo Decoding with Gaussian Densities 577 
for any density g. The TDA is applicable in cases where computation of these two 
operations is tractable. The algorithm generates sequences q?) and q?) according 
to 
q?+l)= Flq?) and 
initialized with densities q0) and q? that factor. The hope is that a(q?)q?)/po) 
converges to an approximation of rf. 
3 The Gaussian Case 
We will consider a setting in which joint density of x, Yl, and y2, is Gaussian. In this 
context, application of the TDA is not warranted - there are tractable algorithms for 
computing conditional densities when priors are Gaussian. Our objective, however, 
is to provide a setting in which the TDA can be analyzed and new insights can be 
generated. 
Before proceeding, let us define some notation that will facilitate our exposition. 
We will write g - N(g, Eg) to denote a Gaussian density g whose mean vector e/nd 
covariance matrix are  and E, respectively. For any matrix A, 5(A) will denote 
a diagonal matrix whose entries are given by the diagonal elements of A. For any 
diagonal matrices X and Y, we write X _< Y if Xii _< Y/i for all i. For any pair of 
nonsingular covariance matrices Eu and E,such that y,l+ EI _ I is nonsingular, 
let a matrix A,r,v be defined by 
Ar,,r,v  (y,l + EI _ i)-1. 
To reduce notation, we will sometimes denote this matrix by Au, 
When the random variables x, Yl, and y: are jointly Gaussian, the densities p, 
f, and p0 are also Gaussian. We let 
and assume that both E1 and E: are symmetric positive definite matrices. We will 
also assume that P0 " N(0, I) where I is the identity matrix. It is easy to show 
that Ar,,r, is well-defined. 
The following lemma provides formulas for the means and covariances that arise 
from multiplying and rescaling Gaussian densities. The result follows from simple 
algebra, and we state it without proof. 
Lemma 1 Let u  N(, E,,) and v ,, N(, E), where E,, and E are positive 
definite. If E 1 + E 1 - I is positive definite then 
One immediate consequence of this lemma is an expression for the mean of f: 
-1 
 = Az,z (E{11 + E 2). 
Let ,5 denote the set of covariance matrices that are diagonal and positive definite. 
Let  denote the set of Gaussian densities with covariance matrices in $. We then 
have the following result, which we state without proof. 
Lemma 2 The set  is closed under F1 and F. 
If the TDA is initialized with q0), q(0)  G, this lemma allows us to represent all 
iterates using appropriate mean vectors and covariance matrices. 
578 P Rusmevichientong and B. V. Roy 
3.1 Convergence Analysis 
Under suitable technical conditions, it can be shown that the sequence of mean 
vectors and covariance matrices generates by the TDA converges. Due to space 
limitations, we will only present results pertinent to the convergence of covariance 
matrices. Furthermore, we will only present certain central components of the 
analyses. For more complete results and detailed analyses, we refer the reader to 
our upcoming full-length paper. 
Recall that the TDA generates sequences q?) and q?) according to 
q(+)= Fq? ) and q?+)- F2q? ) 
I m  
As discussed earlier, if the algorithm is initialized with elements of G, by Lemma 2, 
for appropriate sequences of mean vectors and covariance matrices. It turns out 
that there are mappings : 3 + 3 and : 3 -+ 3 such that 
 = T E(2 and = , 
for all k. Let T = T o T. To establish convergence of E? ) and E? ), it suffices to 
show that Tn(? )) converges. The following theorem establishes this and further 
points out that the limit does not depend on the initial iterates. 
Theorem 1 There exists a matrix V*  3 such that 
lim Tn(V) = V*, 
n-}o<) 
for all V  3. 
3.1.1 Preliminary Lemmas 
Our proof of Theorem 1 relies on a few lemmas that we will present in this section. 
We begin with a lemma that captures important abstract properties of the function 
7-. Due to space constraints, we omit the proof, even though it is nontrivial. 
Lemma 3 
(a) There exists a matrix D  3 such that for all D  3, D < T(D) < I. 
(b) For all X, Y  3, if X < Y then T(X) < T(Y). 
(c) The function T is continuous on 3. 
(d) For all   (0, 1) and D  3, (1 + a)T (D) < T (D) for some a > O. 
The following lemma establishes convergence when the sequence of covariance ma- 
trices is initialized with the identity matrix. 
Lemma 4 The sequence T n (I) converges in 3 to a fixed point of T. 
Proof: By Lemma 3(a), 7-(1) <_ I, and it follows from monotonicity of 7- (Lemma 
3(b)) that <_ 7-n(i) for all n. Since 7-"(I) is bounded below by a matrix 
D  3, the sequence converges in 3. The fact that the limit is a fixed point of 7- 
follows from the continuity of 7- (Lemma 3(c)).  
Let V* = lim__. T  (I). This matrix plays the following special role. 
Lemma 5 The matrix V* is the unique fixed point in 3 of T. 
An Analysis of Turbo Decoding with Gaussian Densities 579 
Proof: Because T n (I) converges to V* and T is monotonic, no matrix V  $ with 
V  V* and V* _< V _< I can be a fixed point. Furthermore, by Lemma 3(a), no 
matrix V  $ with V _> I and V  I can be a fixed point. For any V  $ with 
V < V*, let 
= _< v}. 
For any V  $with V  V* and V < V*, we havev < 1. For such aV, by 
Lemma 3(d), there is an a > 0 such that T(/vV*) > (v + a)V*, and therefore 
T(V) : V. The result follows.  
3.1.2 Proof of Theorem 1 
Proof: For V  $ with V* _< V _< I convergence to V* follows from Lemma 4 and 
monotonicity (Lemma 3(b)). For V  $ with V _> I, convergence follows from the 
fact that V* _< T(V) < I, which is a consequence of the two previously invoked 
lemmas together with Lemma 3(a). 
Let us now address the case ofV  $ with V < V*. Let /v be defined as in 
the proof of Lemma 5. Then, vV* < T(/vV*). By monotonicity, Tn(vV *) < 
Tn+(/vV*) < V* for all n. It follows that Tn(vV *) converges, and since T 
is continuous, the limit must be the unique fixed point V*. We have established 
convergence for elements V of $ satisfying V _< V* or V _> V*. For other elements 
of $, convergence follows from the monotonicity of T.  
3.2 Analysis of the Fixed Point 
As discussed in the previous section, under suitable conditions, F o F2 and F2 o F 
each possess a unique fixed point, and the TDA converges on these fixed points. 
Let q ,,, N(pq;, Eq;) and q ,,, N(pq, Eq; ) denote the fixed points of F o F2 and 
F2 o F, respectively. Based on Theorem I, Eq; and Eq are in $. 
The following lemma provides an equation relating means associated with the fixed 
points. It is not hard to show that Aq.. Ar,,r,q; and Ar, q;,r,2, which are used in 
l-t2 ' , 
the statement, are well-defined. 
Lemma 6 
Proof: It follows from the definitions of F and F2 that, if ql -- Fq and q2 2q, 
* * * * * * 
q q2 P q2 q P2 
P0 P0 P0 
The result then follows from Lemma i and the fact that r does not alter the mean 
of a distribution.  
We now prove a central result of this paper: the mean of the density generated by 
the TDA coincides with the mean p of the desired posterior density f. 
Theorem 2 a(qq/po) N(p, Aq;q) 
-1 
Proof: By Lemma 1, p = Ar,r (Ep + E p2), while the mean of (qq/Po) 
+ We wll sow expressions equl. 
580 P Rusmevichientong and B. V. Roy 
Figure 1' Evolution of errors. 
Multiplying the equations from Lemma 6 by appropriate matrices, we obtain 
Aqq; A,Eq Aq;q (-,;lq  ;lq )  Aq;q (-11 -[- ;lq ), 
and 
It follows that 
and therefore 
Note that A,zq + A - - 
Eq;,E2 -- Aq;lq - A,E2' It follows that 
-1 
--1 
= Aq*,,* (ET1/Zl + E 2 
1 '2 ' 
-1 
= E/ + E2 
4 Discussion and Experimental Results 
The limits of convergence q and q of the TDA provide an approximation 
a(qq/po) to rf. We have established that the mean of this approximation coin- 
cides with that of the desired density. One might further expect that the covariance 
matrix of a(q q/po) approximates that of r f, and even more so, that q and q bear 
some relation to p and p. Unfortunately, as will be illustrated by experimental 
results in this section, such expectations appear to be inaccurate. 
We performed experiments involving 20 and 50 dimensional Gaussian densities (i.e., 
x was either 20 or 50 dimensional in each instance). Problem instances were sampled 
randomly from a fixed distribution. Due to space limitations, we will not describe 
the tedious details of the sampling mechanism. 
Figure i illustrates the evolution of certain "errors" during representative runs of 
the TDA on 20-dimensional problems. The first graph plots relative errors in means 
of densities a(q?)()p0 generated by iterates of the TDA. As indicated by our 
(/2 / ) 
analysis, these errors converge to zero. The second chart plots a measure of relative 
error for the covariance of a(q )q?)/P0) versus that of rf for representative runs. 
Though these covariances converge, the ultimate errors are far from zero. The two 
An Analysis of Turbo Decoding with Gaussian Densities 581 
Figure 2: Errors after 50 iterations. 
final graphs plot errors between the means of q?*) and q?) and those of p and p, 
respectively. Again, though these means converge, the ultimate errors can be large. 
Figure 2 provides plots of the same sorts of errors measured on 1000 different in- 
stances of 50-dimensional problems after the 50th iteration of the TDA. The hori- 
zontal axes are labeled with indices of the problem instances. Note that the errors 
in the first graph are all close to zero (the units on the vertical axis must be multi- 
plied by l0 -5 and errors are measured in relative terms). On the other hand, errors 
in the other graphs vary dramatically. 
It is intriguing that - at least in the context of Gaussian densities - the TDA can ef- 
fectively compute conditional means without accurately approximating conditional 
densities. It is also interesting to note that, in the context of communications, the 
objective is to choose a code word  that is comes close to the transmitted code x. 
One natural way to do this involves assigning to  the code word that maximizes 
the conditional density f, i.e., the one that has the highest chance of being correct. 
In the Gaussian case that we have studied, this corresponds to the mean of f - a 
quantity that is computed correctly by the TDA! It will be interesting to explore 
generalizations of the line of analysis presented in this paper to other classes of 
densities. 
References 
[1] S. Benedetto and G. Montorsi, "Unveiling turbo codes: Some results on parallel concatenated coding 
schemes," in IEEE Trans. Inform. Theory, vol. 42, pp. 409-428, Mar. 1996. 
[2] G. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit error-correcting coding: urbo codes," 
in Pvoc. 1993 Int. Conf. Corafaun., Geneva, Switzerland, May 1993, pp. 1064-1070. 
[3] B. Frey, "urbo Factor Analysis." To appear in Advances in Neural Information Processing $ysteras 1. 
[4] R. G. Callaget, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963. 
[5] F. R. Kschischang and B. J. Frey, "Iterative Decoding of Compound Codes by Probability Propagation in 
Graphical Models," in IEEE Journal on Selected Areas in Corafaun., vol. 16, 2, pp. 219-230, Feb. 1998. 
[6] R. J. McEliece, D. J. C. MacKay, and J-F. Cheng, "Turbo Decoding as an Instance of Pearl's "Belief 
Propagation" Algorithm," in IEEE Journal on Selected Areas in Coramun., vol. 16, 2, pp. 140-152, Feb. 
1998. 
[7] J. Pearl, Probab*listic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: 
Morgan Kaufmann, 1988. 
[8] T. Richardson, "The Geometry of Turbo-Decoding Dynamics," Dec. 1998. To appear in IEEE Trans. Infowra. 
Theory. 
[9] T. Richardson and R. Urbanke, "The Capacity of Low-Density Parity Check Codes under Message-Passing 
Decoding", submitted to the IEEE Trans. on Information Theory. 
[10] T. Richardson, A. Shokrollahi, and R. Urbanke, "Design of Provably Good Low-Density Parity Check 
Codes," submitted to the IEEE Trans. on Information Theory. 
[11] . Weiss, "Belief Propagation and Revision in Networks with Loops," November 1997. Available by ftp to 
publications.ai.mit.edu. 
[12] . Weiss and W. T. Freeman, "Correctness of belief propagation in Gaussian graphical models of arbitrary 
topology." To appear in Advances *n Neural Information Processing Systems 1. 
