Blind Separation of Filtered Sources 
Using State-Space Approach 
Liqing Zhang* and Andrzej Cichocki t 
Laboratory for Open Information Systems, 
Brain Science Institute, RIKEN 
Saitama 351-0198, Wako shi, JAPAN 
Email: zha, cia)@open.brain.riken.go.jp 
Abstract 
In this paper we present a novel approach to multichannel blind 
separation/generalized deconvolution, assuming that both mixing 
and demixing models are described by stable linear state-space sys- 
tems. We decompose the blind separation problem into two pro- 
cess: separation and state estimation. Based on the minimization 
of Kullback-Leibler Divergence, we develop a novel learning algo- 
rithm to train the matrices in the output equation. To estimate the 
state of the demixing model, we introduce a new concept, called 
hidden innovation, to numerically implement the Kalman filter. 
Computer simulations are given to show the validity and high ef- 
fectiveness of the state-space approach. 
I Introduction 
The field of blind separation and deconvolution has grown dramatically during re- 
cent years due to its similarity to the separation feature in human brain, as well as its 
rapidly growing applications in various fields, such as telecommunication systems, 
image enhancement and biomedical signal processing. The blind source separation 
problem is to recover independent sources from sensor outputs without assuming 
any priori knowledge of the original signals besides certain statistic features. Refer 
to review papers [1] and [5] for the current state of theory and methods in the field. 
Although there exist a number of models and methods, such as the infomax, nat- 
ural gradient approach and equivariant adaptive algorithms, for separating blindly 
independent sources, there still are several challenges in generalizing mixture to dy- 
* On leave from South China University of Technology, China 
t On leave from Warsaw University of Technology, Poland 
Blind Separation of Filtered Sources 649 
namic and nonlinear systems, as well as in developing more rigorous and effective 
algorithms with general convergence.[1-9], [11-13] 
The state-space description of systems is a new model for blind separation and 
deconvolution[9,12]. There are several reasons why we use linear state-space systems 
as blind deconvolution models. Although transfer function models are equivalent 
to the state-space ones, it is difficult to exploit any common features that may 
be present in the real dynamic systems. The main advantage of the state space 
description for blind deconvolution is that it not only gives the internal description 
of a system, but there are various equivalent types of state-space realizations for a 
system, such as balanced realization and observable canonical forms. In particular 
it is known how to parameterize some specific classes of models which are of interest 
in applications. Also it is much easy to tackle the stability problem of state-space 
systems using the Kalman Filter. Moreover, the state-space model enables much 
more general description than standard finite impulse response (FIR) convolutive 
filtering. All known filtering (dynamic) models, like AR, MA, ARMA, ARMAX and 
Gamma filterings, could also be considered as special cases of flexible state-space 
models. 
2 Formulation of Problem 
Assume that the source signals are a stationary zero-mean i.i.d processes and mutu- 
ally statistically independent. Let s(t) = (s (t),..., sn(t)) be an unknown vector of 
independent i.i.d. sources. Suppose that the mixing model is described by a stable 
linear state discrete-time system 
(k + 1) = A(k) + Bs(k) + Lp(k), (1) 
u(k) = c(k) + Ds() + (2) 
where   R r is the state vector of system, s(k)  R n is the vector of source signals 
and u(k)  R ' is the vector of sensor signals. A, B, C and D are the mixing 
matrices of the state space model with consistent dimensions. p(k) is the process 
noise and O(k) is sensor noise of the mixing system. If we ignore the noise terms 
in the mixing model, its transfer function matrix is described by a m x n matrix of 
the form 
H(z) = U(zI - + (3) 
where z - is a delay operator. 
We formulate the blind separation problem as a task to recover original signals 
from observations u(t) without prior knowledge on the source signals and the state 
space matrices [A, B, C, D] besides certain statistic features of source signals. We 
propose that the demixing model here is another linear state-space system, which 
is described as follows, (see Fig. 1) 
x(k + 1) = Ax(k) + Bu(k) + LR(k), (4) 
= cx() + (5) 
where the input u(k) of the demixing model is just the output (sensor signals) 
of the mixing model and the R(k) is the reference model noise. A, B, C and 
D are the demixing matrices of consistent dimensions. In general, the matrices 
W = [A, B, C, D, L] are parameters to be determined in learning process. 
For simplicity, we do not consider, at this moment, the noise terms both in 
the mixing and demixing models. The transfer function of the demixing model 
is W(z) = C(zI - A)-B + D. The output y(k) is designed to recover the source 
signals in the following sense 
(6) 
650 L. Zhang and A. Cichocld 
 
Refe,'nces (k) 
+ y(k) 
Figure 1: General state-space model for blind deconvolution 
where P is any permutation matrix and A(z) is a diagonal matrix with iz -Ti 
in diagonal entry (i,i), here hi is a nonzero constant and ri is any nonnegative 
integer. It is easy to see that the linear state space model mixture is an extension 
of instantaneous mixture. When both the matrices A, B, C in the mixing model 
and A, B, C in the demixing model are null matrices, the problem is simplified to 
standard ICA problem [1-8]. 
The question here is whether exist matrices [A, B, C, D] in the demixing model (4) 
and (5), such that its transfer function W(z) satisfies (6). It is proven [12] that if 
the matrix D in the mixing model is of full rank, rank(D) = n, then there exist 
matrices [A, B, C, D], such that the output signal y of state-space system (4) and 
(5) recovers the independent source signal s in the sense of (6). 
3 Learning Algorithm 
Assume that p(y, W), Pi (Yi, W) are the joint probability density function of y and 
marginal pdf of yi, (i -- 1,. ., n) respectively. We employ the mutual information 
of the output signals, which measures the mutual independence of the output signals 
yi(k), as a risk function [1,2] 
l(W) = -H(y, W) q- y H(yi, W), (7) 
where 
H(y, W) = - f p(u, W) logp(y, W)dy, H(yi, W) = -/pi(yi, W)logPi(yi, W)dyi. 
In this paper we do not directly develop learning algorithms to update all param- 
eters W = [A, B, C, D] in demixing model. We separate the blind deconvolution 
problem into two procedures: separation and state-estimation. In the separation 
procedure we develop a novel learning algorithm, using a new search direction, to 
update the matrices C and D in output equation (5). Then we define a hidden 
innovation of the output and use Kalman filter to estimate the state vector x(k). 
For simplicity we suppose that the matrix D in the demixing model (5) is nonsin- 
gular n x n matrix. From the risk function (7), we can obtain a cost function for 
on line learning 
1 n 
l(y, W) = -3 log det(D T D)) - y 1ogpi(Yi, W), (8) 
i=1 
Blind Separation of Filtered Sources 651 
where det(DTD) is the determinant of symmetric positive definite matrix DTD. 
For the gradient of I with respect to W, we calculate the total differential dl of 
l(y, W) when we takes a differential dW on W 
at(u, w) = t(u, w + dw) - t(u, w). (9) 
Following Amari's derivation for natural gradient methods [1-3], we have 
dl(y, W) = -tr(dDD -i) + qT(y)dy, (10) 
where tr is the trace of a matrix and q(y) is a vector of nonlinear activation 
functions 
qi(Yi) = dlogpi(Yi)_ P'i(Yi) 
-- dyi - Pi(Yi)' (11) 
Taking the derivative on equation (5), we have following approximation 
dy = dCx(k) + dDu(k). (12) 
On the other hand, from (5), we have 
u(k) - D-l(y(k)-Cx(k)) (13) 
Substituting (13) into (12), we obtain 
dy = (dC - dDD-1C)x + dDD-ly. (14) 
In order to improve the computing efficiency of learning algorithms, we introduce a 
new search direction 
dX = dC-dDD-1C, (15) 
dX2 = dDD -1. (16) 
Then the total differential dl can be expressed by 
dl = -tr(dX2) + qT(y)(dX13 + dX2y). (17) 
It is easy to obtain the derivatives of the cost function I with respect to matrices 
X1 and X2 as 
O1 
= q(y(k))xT(k), (18) 
Ol 
= qo(u(k))uT(k)- I. (19) 
OX2 
From (15) and (16), we derive a novel learning algorithm to update matrices (7 and 
D. 
AC(k) -- r/(-q(y(k))xr(k) + (I- q(y(k))y'(k))(7(k)), (20) 
AD(k) =  (I- T(y(k))yr(k)) D(k). (21) 
The equilibrium points of the learning algorithm satisfy the following equations 
E[(y(k))xr(k)] = O, (22) 
E - = 0. (23) 
This means that separated signals y could achieve as mutually independent as 
possible if the nonlinear activation function q(y) are,suitably chosen and the state 
vector x(k) is well estimated. From (20) and (21), we see that the natural gradient 
learning algorithm [2] is covered as a special case of the learning algorithm when 
the mixture is simplified to instantaneous case. 
The above derived learning algorithm enable to solve the blind separation problem 
under assumption that state matrices A and B are known or designed appropriately. 
In the next section instead of adjusting state matrices A and B directly, we propose 
new approaches how to estimate state vector x. 
652 L. Zhang and A. Cichocki 
4 State Estimator 
From output equation (5), it is observed that if we can accurately estimate the state 
vector x(k) of the system, then we can separate mixed signals using the learning 
algorithm (20) and (21). 
4.1 Kalman Filter 
The Kalman filter is a useful technique to estimate the state vector in state-space 
models. The function of the Kalman Filter is to generate on line the state estimate 
of the state x(k). The Kalman filter dynamics are given as follows 
x(k + 1) - Ax(k) q- Bu(k) + Kr(k) + R(k), (24) 
where K is the Kalman filter gain matrix, and r(k) is the innovation or residual 
vector which measures the error between the measured(or expected) output y(k) 
and the predicted output Cx(k) + Du(k). There are varieties of algorithms to 
update the Kalman filter gain matrix K as well as the state x(k), refer to [10] for 
more details. 
However in the blind deconvolution problem there exists no explicit residual r(k) 
to estimate the state vector x(k) because the expected output y(t) means the 
unavailable source signals. In order to solve this problem, we present a new concept 
called hidden innovation to implement the Kalman filter in blind deconvolution case. 
Since updating matrices C and D will produces an innovation in each learning step, 
we introduce a hidden innovation as follows 
r(k) = u(k) = + (25) 
where AC = C(k + 1) - C(k) and AD = D(k + 1) - D(k). The hidden innovation 
presents the adjusting direction of the output of the demixing system and is used 
to generate an a posteriori state estimate. Once we define the hidden innovation, 
we can employ the commonly used Kalman filter to estimate the state vector x(k), 
as well as to update Kalman gain matrix K. The updating rule in this paper is 
described as follows: 
(1) Compute the Kalman gain matrix 
K(k) '- P(k)C(k)T(c(k)P(k)cT(k) q- t(k)) -1 
(2) Update state vector with hidden innovation 
(k) = x(k) + K(k)r(k) 
(3) Update the error covariance matrix 
P(k) = (I- K(k)C(k))P(k) 
(4) evaluate the state vector ahead 
k+l -- A(]g)(]g) q- $(])u(]) 
(5) evaluate the error covariance matrix ahead 
= r + 
with the initial condition P(0) = I, where Q(k), R(k) are the covariance matrices 
of the noise vector R and output measurement noise nk. 
The theoretic problems, such as convergence and stability, remain to be elaborated. 
Simulation experiments show that the proposed algorithm, based on the Kalman 
filter, can separate the convolved signals well. 
Blind Separation of Filtered Sources 653 
4.2 Information Back-propagation 
Another solution to estimating the state of a system is to propagate backward the 
mutual information. If we consider the cost function is also a function of the vector 
x, than we have the partial derivative of l(y, W) with respect to x 
OI(u, W) = Cr,(U) ' (26) 
Ox 
Then we adjust the state vector x(k) according to the following rule 
5:(k) = x(k) - ,1C(k)rq(y(k)). (27) 
Then the estimated state vector is used as a new state of the system. 
5 Numerical Implementation 
Several numerical simulations have been done to demonstrate the validity and ef- 
fectiveness of the proposed algorithm. Here we give a typical example 
Example 1. Consider the following MIMO mixing model 
10 10 
.(k) + A.(k - i) = + - i) + 
i=1 i=1 
where u, s, v E R a, and 
-0.48 -0.16 -0.64 ) -0.50 -0.10 -0.40 ) 
A2 -- -0.16 -0.48 -0.24 , As = -0.10 -0.50 -0.20 , 
-0.16 -0.16 -0.08 -0.10 -0.10 -0.10 
0.32 0.19 0.38 ) 0.42 0.21 0.14 ) 
Ao = 0.16 0.29 0.20 , B2 = 0.10 0.56 0.14 , 
0.08 0.08 0.10 0.21 0.21 0.35 
-0.40 -0.08 -0.08 ) -0.19 -0.15 -0.10 ) 
Bs = -0.08 -0.40 -0.16 , B0 - -0.11 -0.27 -0.12 , 
-0.08 -0.08 -0.56 -0.16 -0.18 -0.22 
and other matrices are set to the null matrix. The sources s are chosen to be 
i.i.d signals uniformly distributed in the range (-1,1), and v are the Gaussian noises 
with zero mean and a covariance matrix 0.11. We employ the state space approach 
to separate mixing signals. The nonlinear activation function is chosen q(y) = ya. 
The initial value for matrices A and B in the state equation are chosen as in 
canonical controller form. The initial values for matrix C is set to null matrix or 
given randomly in the range (-1,1), and D -- 13. A large number of simulations 
show that the state space method can easily recover source signals in the sense of 
W(z)H(z) - PA. Figure 2 illustrates the coefficients of global transfer function 
G(z) - W(z)H(z) after 3000 iterations, where the (i,j)th sub-figure plots the 
coefficients of the transfer function Gij (z) = -k__o gijkz - up to order of 50. 
References 
[1] S. Amari and A. Cichocki, "Adaptive blind signal processing- neural network 
approaches", Proceedings of the IEEE, 86(10):2026-2048, 1998. 
[2] S. Amari, A. Cichocki, and H.H. Yang, "A new learning algorithm for blind 
signal separation", Advances in Neural Information Processing Systems 1995 
(Boston, MA: MIT Press, 1996), pp. 752-763. 
654 L. Zhang and A. Cchocki 
0 2o 40 
0 20 40 
z) 3  
:I 
o 
Cz)   
0 2o 40 
O(z)   
0 20 4o 
z)3 
O(z)  3 
0 2o 4o 
G(z)  3 
0 2O 4o 
G(z) 3 3 
0 2o 40 0 2o 4o 0 2o 40 
Figure 2: The coefficients of global transfer function after 3000 iterations 
[3] S. Amari "Natural gradient works efficiently in learning", Neural Computation, 
Vol.10, pp251-276, 1998. 
[4] A. J. Bell and T. J. Sejnowski, "An information-maximization approach to 
blind separation and blind deconvolution", Neural Computation, Vol.7, pp 
1129-1159, 1995. 
[5] J.-F Cardoso, "Blind signal separation: statistical principles", Proceedings 
of the IEEE, 86(10):2009-2025, 1998. 
[6] J.-F. Cardoso and B. Laheld, "Equivariant adaptive source separation," IEEE 
Trans. Signal Processing, vol. SP-43, pp. 3017-3029, Dec. 1996. 
[7] A.Cichocki and R. Unbehauen, "Robust neural networks with on-line learning 
for blind identification and blind separation of sources" IEEE Trans Circuits 
and Systems I: Fundamentals Theory and Applications, vol 43, No.11, pp. 
894-906, Nov. 1996. 
[8] P. Comon, "Independent component analysis: a new concept?", Signal Pro- 
cessing, vol.36, pp.287-314, 1994. 
[9] A. Gharbi and F. Salam, "Algorithm for blind signal separation and recov- 
ery in static and dynamics environments", IEEE Symposium on Circuits and 
Systems, Hong Kong, June, 713-716, 1997. 
[10] O. L. R. Jacobs, "Introduction to Control Theory", Second Edition, Oxford 
University Press, 1993. 
[11] T. W. Lee, A.J. Bell, and R. Lambert, "Blind separation of delayed and 
convolved sources", NIPS 9, 1997, MIT Press, Cambridge MA, pp758-764. 
[12] L. -Q. Zhang and A. Cichocki, "Blind deconvolution/equalization using state- 
space models", Proc. '98 IEEE Signal Processing Society Workshop on NNSP, 
pp123-131, Cambridge, 1998. 
[13] S. Choi, A. Cichocki and S. Amari, "Blind equalization of simo channels via 
spatio-temporal anti-Hebbian learning rule", Proc. '98 IEEE Signal Processing 
Society Workshop on NNSP, pp93-102, Cambridge, 1998. 
PART V 
IMPLEMENTATION 
