A Convergence Proof for the Softassign 
Quadratic Assignment Algorithm 
Anand Rangarajan 
Department of Diagnostic Radiology 
Yale University School of Medicine 
New Haven, CT 06520-8042 
e-maih anand{noodle. reed. yale. edu 
Alan Yuille 
Smith-Kettlewell Eye Institute 
2232 Webster Street 
San Francisco, CA 94115 
e-maih yuilleskivs. ski. org 
Steven Gold 
CuraGen Corporation 
322 East Main Street 
Branford, CT 06405 
e-maih gold-stevencs.yale. edu 
Eric Mjolsness 
Dept. of Comp. Sci. and Engg. 
Univ. of California San Diego (UCSD) 
La Jolla, CA 92093-0114 
e-mail: emj cs. ucsd. edu 
Abstract 
The softassign quadratic assignment algorithm has recently 
emerged as an effective strategy for a variety of optimization prob- 
lems in pattern recognition and combinatorial optimization. While 
the effectiveness of the algorithm was demonstrated in thousands 
of simulations, there was no known proof of convergence. Here, 
we provide a proof of convergence for the most general form of the 
algorithm. 
I Introduction 
Recently, a new neural optimization algorithm has emerged for solving quadratic 
assignment like problems [4, 2]. Quadratic assignment problems (QAP) are char- 
acterized by quadratic objectives with the variables obeying permutation matrix 
constraints. Problems that roughly fall into this class are TSP, graph partitioning 
(GP) and graph matching. The new algorithm is based on the softassign procedure 
which guarantees the satisfaction of the doubly stochastic matrix constraints (result- 
ing from a "neural" style relaxation of the permutation matrix constraints). While 
the effectiveness of the softassign procedure has been demonstrated via thousands 
A Convergence Prooffor the Softassign Quadratic Assignment Algorithm 621 
of simulations, no proof of convergence was ever shown. 
Here, we show a proof of convergence for the softassign quadratic assignment algo- 
rithm. The proof is based on algebraic transformations of the original objective and 
on the non-negativity of the Kullback-Leibler measure. A central requirement of 
the proof is that the softassign procedure always returns a doubly stochastic matrix. 
After providing a general criterion for convergence, we separately analyze the cases 
of TSP and graph matching. 
2 Convergence proof 
The deterministic annealing quadratic assignment objective function is written as 
[4, 5]: 
&ap(4, u,.) = -- 
1 
2 Z Cai;bjMaiMbj q- Z ga(y Mai- 1)+ Z ui(y] Mai- 1) 
aibj a i i a 
1 
-'Y- y,Ma2i q-  y,MailogMai 
2 ai ai 
Here M is the desired N x N permutation matrix. This form of the energy function 
has a self-amplification term with a parameter 7, two Lagrange parameters/ and 
 for constraint satisfaction, an x log x barrier function which ensures positivity of 
Mai and a deterministic annealing control parameter/. The QAP benefit matrix 
Cai;b j is preset based on the chosen problem, for example, graph matching or TSP. 
In the following deterministic annealing pseudocode/0 and/f are the initial and 
final values of/,/r is the rate at which/ is increased, Is is an iteration cap and 
 is an N x N matrix of small positive-valued random numbers. 
(1) 
1 
Initialize/ to/0, Mai to N + ai 
Begin A: Deterministic annealing. Do A until/ >_/f 
Begin B: Relaxation. Do B until all Mai converge or number of 
iterations > Is 
Qai 4- -]bj Cai;bjMbj q- ?Mai 
Begin Softassign: 
Mai 4- exp (]Qai) 
Begin C: Sinkhorn. Do C until all Mai converge 
Update Mai by normalizing the rows: 
M. 
Jrai 4- 'i-a i 
Update Mai by normalizing the columns: 
Mai 
Mai 4- ,, 
End C 
End Softassign 
End B 
End A 
622 A. Rangarajan, A. Yuille, S. Gold and E. Mjolsness 
The softassign is used for constraint satisfaction. The softassign is based on 
Sinkhorn's theorem [4] but can be independently derived as coordinate ascent on the 
Lagrange parameters/ and v. Sinkhorn's theorem ensures that we obtain a doubly 
stochastic matrix by the simple process of alternating row and column normaliza- 
tions. The QAP algorithm above was developed using the graduated assignment 
heuristic [1] with no proof of convergence until now. 
We simplify the objective function in (1) by collecting together all terms quadratic 
in Mai. This is achieved by defining 
() 
Cai;a i --' Cai;a i q- . (2) 
Then we use an algebraic transformation [3] to transform the quadratic form into 
a more manageable linear form: 
--- - rnn -Xa + a 2 (3) 
2 
Application of the algebraic transformation (in a vectorized form) to the quadratic 
term in (1) yields: 
(4) 
Extremizing (4) w.r.t. a, we get 
r() Mbj E() a ' Mai 
"ai;bj ---- (''i;bj b$ : O'ai ' 
(5) 
bj 
is a minimum, provided certain conditions hold which we specify below. 
In the first part of the proof, we show that setting rrai = Mai is guaranteed to 
decrease the energy function. Restated, we require that 
O'ai Mai argmin (E() M 'a -' 1 
a Lai;b j az 0 3 q-  E 'ff) O' 
-- ---- -- ('ai;bjtTai bj (6) 
aibj aibj 
If r() is positive definite in the subspace spanned by M, then ffai -- Mai is a 
- ai;bj 
() M .a. I C(7) ff 
minimum of the energy function - Eaibj Cai;bj az b$ q-  Eaibj ai;bj aitTbJ  
At this juncture, we make a crucial assumption that considerably simplifies the 
proof. Since this assumption is central, we formally state it here: "M is always 
constrained to be a doubly stochastic matrix." In other words, for our proof of con- 
vergence, we require the softassign algorithm to return a doubly stochastic matrix 
(as Sinkhorn's theorem guarantees that it will) instead of a matrix which is merely 
close to being doubly stochastic (based on some reasonable metric). We also require 
the variable rr to be a doubly stochastic matrix. 
Since M is always constrained to be a doubly stochastic matrix, r() 
-ai;bj is required 
to be positive definite in the linear subspace of rows and columns of M summing to 
A Convergence Proof for the Softassign Quadratic Assignment Algorithm 623 
r() does not have any 
one. The value of ff should be set high enough such that -ai;bj 
negative eigenvalues in the subspace spanned by the row and column constraints. 
This is the same requirement imposed in [5] to ensure that we obtain a permutation 
matrix at zero temperature. 
To derive a more explicit criterion for if, we first define a matrix r in the following 
manner: 
def  
r = = IN -- ee' (7) 
where Iv is the N x N identity matrix, e is the vector of all ones and the "prime" 
indicates a transpose operation. The matrix r has the property that any vector 
rs with s arbitrary will sum to zero. We would like to extend such a property to 
cover matrices whose row and column sums stay fixed. To achieve this, take the 
Kronecker product of r with itself: 
def 
R = rr (8) 
R has the property that it will annihilate all row and column sums. Form a vector 
m by concatenating all the columns of the matrix M together into a single column 
[m = vec(M)]. Then the vector Rm has the equivalent property of the "rows" and 
"columns" summing to zero. Hence the matrix RC()R (where C  is the matrix 
r()  
equivalent of -ai;bj! satisfies the criterion of annihilated row and column sums in 
any quadratic form; m'RC()Rm = (Rm)C  (Rm). 
The parameter ff is chosen such that all eigenvalues of RC()R are positive: 
ff = - min A(RCR) + e (9) 
where e > 0 is a small quantity. Note that C is the original QAP benefit matrix 
whereas C  is the augmented matrix of (2). We cannot always efficiently compute 
the largest negative eigenvalue of the matrix RCR. Since the original Cai;bj is 
four dimensional, the dimensions of RCR are N 2 x N 2 where N is the number of 
elements in one set. Fortunately, as we show later, for specific problems it's possible 
to break up RCR into its constituents thereby making the calculation of the largest 
negative eigenvalue of RCR more efficient. We return to this point in Section 3. 
The second part of the proof involves demonstrating that the softassign operation 
also decreases the objective in (4). (Note that the two Lagrange parameters/ and 
l/are specified by the softassign algorithm [4]). 
M Softassign(Q,/) where Qai y () a  
--- --' ('ai;bj b$ 
(10) 
Recall that the step immediately preceding the softassign operation sets O'ai ---- 
Mai. We are therefore justified in referring to ai as the "old" value of Mai. For 
convergence, we have to show that Eqap(r,r) _> Eqap(M, r) in (4). 
Minimizing (4) w.r.t. Mai, we get 
1 log Mai  "() r 
-- ---- C;ai;b j bj -- (a q- l/i)-- -- 
bj 
I 
(11) 
624 A. Rangarajan, A. Yuille, S. Gold and E. Mjolsness 
From (11), we see that 
1 
-- Cai;b j a, b$  la  Mai- YYi  Mai 
  Mai log Mai --   M .a .... 
ai aibj a i i a 
and 
I 
(12) 
I 1 
-- C ai;bjaaab$ --  a  aai  
  (Tai log Mai '-   '   (Tai--  1] i --  (Tai (13) 
ai aibj a i i a ai 
From (12) and (13), we get (after some algebraic manipulations) 
Eqap,) - Eqap(M, = 
I (Tai 
   (Tai log a/ -- 0 
ai 
by the non-negativity of the Kullback-Leibler measure. We have shown that the 
change in energy after a has been initialized with the "old" value of M is non- 
negative. We require that a and M are always doubly stochastic via the action of the 
softassign operation. Consequently, the terms involving the Lagrange parameters 
/ and y can be eliminated from the energy function (4). Setting a = M followed 
by the softassign operation decreases the objective in (4) after excluding the terms 
involving the Lagrange parameters. 
We summarize the essence of the proof to bring out the salient points. At each 
temperature, the quadratic assignment algorithm executes the following steps until 
convergence is established. 
Step 1: (Tai -- Mai. 
Step 2: 
('r) a-' 
Step 2a: Qai -- bj i;bj o$. 
Step 2b: M - Softassign(Q,/). 
Return to Step I until convergence. 
Our proof is based on demonstrating that an appropriately designed energy function 
decreases in both Step I and Step 2 (at fixed temperature). This energy function 
is Equation (4) after excluding the Lagrange parameter terms. 
r() in the 
Step 1: Energy decreases due to the positive definiteness of -ai;bj 
linear subspace spanned by the row and column constraints. ? has to be 
set high enough for this statement to be true. 
Step 2: Energy decreases due to the non-negativity of the Kullback-Leibler 
measure and due to the restriction that M (and a) are doubly stochastic. 
A Convergence Proof for the Softassign Quadratic AssignmentAlgorithm 625 
3 Applications 
3.1 Quadratic Assignment 
The QAP benefit matrix is chosen such that the softassign algorithm will not con- 
verge without adding the  term in (1). To achieve this, we randomly picked a unit 
vector v of dimension N 9'. The benefit matrix C is set to -vv . Since C has only one 
negative eigenvalue, the softassign algorithm cannot possibly converge. We ran the 
softassign algorithm with fi0 - 1, fir - 0.9 and  - 0. The energy difference plot 
on the left in Figure I shows the energy never decreasing with increasing iteration 
number. Next, we followed the recipe for setting  exactly as in Section 2. After 
projecting C into the subspace of the row and column constraints, we calculated 
the largest negative eigenvalue of the matrix RCR which turned out to be -0.8152. 
We set  to 0.8162 (e - 0.001) and reran the softassign algorithm. The energy 
difference plot shows (Figure 1) that the energy never increases. We have shown 
that a proper choice of  leads to a convergent algorithm. 
(9 -0.5 
(9 -1 
r-. 
(9 -2 
X 10 -7 
0.8 
0.6 
' 0.4 
gO.2 
-2.5 , , 0 ' ' ' 
0 10 20 30 0 20 40 60 80 
iterations iterations 
Figure 1: Energy difference plot. Left:  = 0 and Right:  - 0.8162. While 
the change in energy is always negative when  = 0, it is always non-negative when 
 = 0.8162. The negative energy difference (on the left) implies that the energy 
function increases whereas the non-negative energy difference (on the right) implies 
that the energy function never increases. 
3.2 TSP 
The TSP objective function is written as follows: Given N cities, 
Etsp(M) = E dijMaiM(a)j = trace(DMTM) (14) 
aij 
where the symbol  is used to indicate that the summation in (14) is taken modulo 
N, dij (D) is the inter-city distance matrix and M is the desired permutation 
matrix. T is a matrix whose (i,j)th entry is 6(ix)j (6ij is the Kronecker delta 
function). Equation (14) is transformed into the mCm form: 
Etsp(m) = trace(m'(D  T)m) (15) 
where m - vec(M). We identify our general matrix C with -2D  T. 
626 A. Rangarajan, A. Yuille, S. GoM and E. Mjolsness 
For convergence, we require the largest eigenvalue of 
-RCR = 2(r  r)(D  T)(r  r) = 2(rDr)  (rTr) = 2(rDr)  (rT) (16) 
The eigenvalues of rT are bounded by unity. The eigenvalues of rDr will depend 
on the form of D. Even in Euclidean TSP the values will depend on whether the 
Euclidean distance or the distance squared between the cities is used. 
3.3 Graph Matching 
The graph matching objective function is written as follows: Given N and N2 node 
graphs with adjacency matrices G and g respectively, 
1 
Zgm(M) - - y Cai;bjMaiMbj (17) 
aibj 
where Cai;bj = I -3[Gab -gijl is the compatibility matrix [1]. The matching 
constraints are somewhat different from TSP due to the presence of slack variables 
[1]. This makes no difference however to our projection operators. We add an extra 
row and column of zeros to g and G in order to handle the slack variable case. Now 
G is (N1 + 1) x (N + 1) and g is (Ng. + 1) x (Ng. + 1). Equation (17) can be readily 
transformed into the rWCm form. Our projection apparatus remains unchanged. 
For convergence, we require the largest negative eigenvalue of RCR. 
4 Conclusion 
We have derived a convergence proof for the softassign quadratic assignment al- 
gorithm and specialized to the cases of TSP and graph matching. An extension 
to graph partitioning follows along the same lines as graph matching. Central to 
our proof is the requirement that the QAP matrix M is always doubly stochas- 
tic. As a by-product, the convergence proof yields a criterion by which the free 
self-amplification parameter ff is set. We believe that the combination of good the- 
oretical properties and experimental success of the softassign algorithm make it the 
technique of choice for quadratic assignment neural optimization. 
References 
[1] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph 
matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 
18(4) :377-388, 1996. 
[2] S. Gold and A. Rangarajan. Softassign versus softmax: Benchmarks in combi- 
natorial optimization. In Advances in Neural Information Processing Systems 
8, pages 626-632. MIT Press, 1996. 
[3] E. Mjolsness and C. Garrett. Algebraic transformations of objective functions. 
Neural Networks, 3:651-669, 1990. 
[4] A. Rangarajan, S. Gold, and E. Mjolsness. A novel optimizing network archi- 
tecture with applications. Neural Computation, 8(5):1041-1060, 1996. 
[5] A. L. Yuille and J. J. Kosowsky. Statistical physics algorithms that converge. 
Neural Computation, 6(3):341-356, May 1994. 
