A Method for the Efficient Design 
of Boltzmann Machines for Classification 
Problems 
Ajay Gupta and Wolfgang Maass* 
Department of Mathematics, Statistics, and Computer Science 
University of Illinois at Chicago 
Chicago IL, 60680 
Abstract 
We introduce a method for the efficient design of a Boltzmann machine (or 
a Hopfield net) that computes an arbitrary given Boolean function f. This 
method is based on an efficient simulation of acyclic circuits with threshold 
gates by Boltzmann machines. As a consequence we can show that various 
concrete Boolean functions f that are relevant for classification problems 
can be computed by scalable Boltzmann machines that are guaranteed 
to converge to their global maximum configuration with high probability 
after constantly many steps. 
I INTRODUCTION 
A Boltzmann machine ([AHS], [HS], [AK]) is a neural network model in which the 
units update their states according to a stochastic decision rule. It consists of a 
set H of units, a set C of unordered pairs of elements of H, and an assignment 
of connection strengths S : C --* R. A configuration of a Boltzmann machine 
is a map k : H --* {0, 1}. The consensus C(k) of a configuration k is given by 
C(k) = Y{u,v}Ec S({u, v}). k(u). k(v). If the Boltzmann machine is currently in 
configuration k and unit u is considered for a state change, then the acceptance 
*This paper was written during a visit of the second author at the Department of 
Computer Science of the University of Chicago. 
825 
826 Gupta and Maass 
probability for this state change is given by  Here AC is the change in 
l+e-,c/c  
the value of the consensus function C that would result from this state change of 
u, and c > 0 is a fixed parameter (the "temperature"). 
Assume that n units of a Boltzmann machine B have been declared as input units 
and m other units as output units. One says that B computes a function f  
{0, 1} n --. {0, 1} m if for any clamping of the input units of B according to some _a  
{0, 1} n the only global maxima of the consensus function of the clamped Boltzmann 
machine are those configurations where the output units are in the states given by 
f(a). 
Note that even if one leaves the determination of the connection strengths for a 
Boltzmann machine up to a learning procedure ([AHS], [HS], [AK]), one has to 
know in advance the required number of hidden units, and how they should be 
connected (see section 10.4.3 of [AK] for a discussion of this open problem). 
Ad hoc constructions of efficient Boltzmann machines tend to be rather difficult 
(and hard to verify) because of the cyclic nature of their "computations". 
We introduce in this paper a new method for the construction of efficient Boltzmann 
machines for the computation of a given Boolean function f (the same method can 
also be used for the construction of Hopfield nets). We propose to construct first an 
acyclic Boolean circuit T with threshold gates that computes f (this turns out to 
be substantially easier). We show in section 2 that any Boolean threshold circuit T 
can be simulated by a Boltzmann machine B(T) of the same size as T. Furthermore 
we show in section 3 that a minor variation of B(T) is likely to converge very fast. 
In Section 4 we discuss applications of our method for various concrete Boolean 
functions. 
2 
SIMULATION OF THRESHOLD CIRCUITS BY 
BOLTZMANN MACHINES 
A threshold circuit T (see [M], [PS], [R], [HMPST]) is a labeled acyclic directed 
graph. We refer to the number of edges that are directed into (out of) a node of T 
as the indegree (outdegree) of that node. Its nodes of indegree 0 are labeled by input 
variables xi(i  {1,... ,n}). Each node g of indegree ! > 0 in T is labeled by some 
arbitrary Boolean threshold function Fg: {0, 1}l --* {0, 1}, where Fg(yl,...,yt)= 1 
. 
if and only lfyd=l oqyi _ t (for some arbitrary parameters c,..., ct, t G R; w.l.o.g. 
o1,..., oq,t  Z [M]). One views such node g as a threshold gate that computes 
Fg. If m nodes of a threshold circuit T are in addition labeled as output nodes, 
one defines in the usual manner the Boolean function f: {0, 1}" -* {0, 1} " that is 
computed by T. 
We simulate T by the following Boltzmann machine B(T) = < L/, C, S > (note that 
T has directed edges, while B(T) has undirected edges). We reserve for each node g 
of T a separate unit b(g) of B(T). We set 
{b(g)lg is a node of T} and 
{{b(g'),b(g)}[g',g are nodes of T so that either g' = g or 
g, g are connected by an edge in T}. 
Efficient Design of Boltzmann Machines 827 
Consider an arbitrary unit b(9) of B(T). We define the connection strengths 
S({b(9))) and S({b(9'),b(9))) (for edges ( 9',9 ) ofT) by induction on the length 
of the longest path in T from 9 to a node of T with outdegree 0. 
If g is a gate of T with outdegree 0 then we define S({b(g)}) := -2t + 1, where t is 
the threshold of 9, and we set S({b(9'),b(9)}) := 2a(< 9',9 >) (where a(< 9',9 >) 
is the weight of the directed edge < 9', 9 > in T). 
Assume that g is a threshold gate of T with outdegree > 0. Let g,..., g be the 
k 
immediate successors of g in T. Set w :: ]]i= IS({b(g),b(gi)})l (we assume that 
the connection strengths S({b(g),b(gi)}) have already been defined). We define 
S({b(g)}) := -(2w+ 2).t + w+ 1, where t is the threshold of gate g. Furthermore 
for every edge < g,g > in T we set S({b(g),b(g)}) := (2w + 2) a (< g',g >). 
Remark: It is obvious that for problems in TC  (see section 4) the size of connec- 
tion strengths in B(T) can be bounded by a polynomial in n. 
Theorem 2.1 For any threshold circuit T the Boltzmann machine B(T) computes 
the same Boolean function as T. 
Proof of Theorem 2.1: 
Let a  {0, 1} n be an arbitrary input for circuit T. We write g(a) e {0, 1} for the 
output of gate g of T for circuit input a. 
Consider the Boltzmann machine B(T)a_ with the n units b(g) for input nodes g of 
T clamped according to a. We show that the configuration Ka of B(T)a where b(g) 
is on if and only if g(a) = 1 is the only global maximum (f fact: the only local 
maximum) of the consensus function C for B(T)a. 
Assume for a contradiction that configuration K of B(T)a_ is a global maximum of 
the consensus function C and K 7 Ka. Fix a node g of T of minimal depth in T 
so that K(b(g)) 7 Ka(b(g)) - g(a). B-y definition of B(T)a this node g is not an 
input node of T. Let K' result form K by changing the state of b(g). We will show 
that C(K ) > C(K), which is a contradiction to the choice of K. 
We have (by the definition of C) 
C(K') - C(K) : (1 - 2K(b(g))) . (S. + $2 + S({b(g)})), where 
S1 '= ]]{K(b(9'))' S({b(9'),b(9)})l < 9',9 > is an edge in T} 
S2 :: ]]{K(b(9'))- S({b(9),b(f)})] < 9,9' > is an edge in T}. 
Let w be the parameter that occurs in the definition of $({b(g)}) (set w := 0 if g 
has outdegree 0). Then IS2[ < w. Let pl,...,p, be the immediate predecessors 
of g in T, and let t be the threshold of gate g. Assume first that g(_a) = 1. Then 
$1 = (2w +2). ]]il a(< Pi,g >) .pi(a) > (2w+ 2).t. This implies that $1 + $2 > 
(2w + 2).t- w- 1, and therefore S1 + S + S({b(g) )) > 0, hence C(K')-C(K) > O. 
If g(_a) = 0 then we have l a(< pi,g >)' p(a__) < t - 1, thus $ = (2w + 2). 
 a(< p,g >). p(a) < (2w + 2)  t - 2w - 2. This implies that $ + $2 < 
(2w + 2). t - w - 1, and therefore $ + S2 + S({b(g)}) < 0. We have in this case 
K(b(g)) = 1, hence C(K')-C(K) = (-1)-(S1 + S + $({b(g)})) > O. 1 
828 Gupa and Maass 
3 
THE CONVERGENCE SPEED OF THE 
CONSTRUCTED BOLTZMANN MACHINES 
We show that the constructed Boltzmann machines will converge relatively fast to 
a global maximum configuration. This positive result holds both if we view B (T) as 
a sequential Boltzmann machine (in which units are considered for a state change 
one at a time), and if we view B(T) as a parallel Boltzmann machine (where several 
units are simultaneously considered for a state change). In fact, it even holds for 
unlimited parallelism, where every unit is considered for a state change at every 
step. Although unlimited parallelism appears to be of particular interest in the 
context of brain models and for the design of massively parallel machines, there are 
hardly any positive results known for this case (see section 8.3 in [AK]). 
If g is a gate in T with outdegree  I then the current state of unit b(g) of B(T) 
becomes relevant at several different time points (whenever one of the immediate 
successors of g is considered for a state change). This effect increases the probability 
that unit b(g) may cause an "error." Therefore the error probability of an output 
unit of B(T) does not just depend on the number of nodes in T, but on the number 
N(T) of nodes in a tree T  that results if we replace in the usual fashion the directed 
graph of T by a tree T  of the same depth (one calls a directed graph a tree if all of 
its nodes have outdegree _ 1). 
To be precise, we define by induction on the depth of g for each gate g of T a 
tree Tree(g) that replaces the subcircuit of T below g. If g,...,g are the im- 
mediate predecessors of g in T then Tree(g) is the tree which has g as root and 
Tree(g),... ,Tree(g) as immediate subtrees (it is understood that if some gi has 
another immediate successor g  g then different copies of Tree(g/) are employed 
in the definition of Tree(g) and Tree(g)). 
We write ITree(g)] for the number of nodes in Tree(g) , and N(T) for 
(ITree(g)l Ig is an output node of T). It is easy to see that if T is synchronous 
(i.e. depth (g") -- depth(g') + 1 for all edges  g,g"  in T) then ]Wree(g)l s 
for any node g in T of depth d which has s nodes in the subcircuit of T below g. 
Therefore N(T) is polynomial in n if T is of constant depth and polynomial size 
(this can be achieved for all problems in TC , see Section 4). 
We write B(T) for the variation of the Boltzmann machine B(T) of section 2 where 
each connection strength in B(T) is multiplied by 5 (5  0). Equivalently one could 
view B(T) as a machine with the same connection strengths as B(T) but a lower 
"temperature" (replace c by 
Theorem 3.1 Assume that T is a threshold circuit of depth d that computes a 
Boolean function f ' {0,1} n -- {0, 1} m. Let B(T)a_ be the Boltzmann machine that 
results from clamping the input units of Ba(T) according to a (a  {0, 1}). 
Assume that 0 -- qo  q  ...  qa are arbitrary numbers such that for every 
i  {1,...,d) and every gate g of depth i in T the corresponding unit b(g) is 
considered for a state change at some step during interval (qi-l,qi]. There is no 
restriction on how many other units are considered for a state change at any step. 
Let t be an arbitrary time step with t _ qa. Then the output units of B(T) are at 
Efficient Design of Boltzmann Machines 829 
the end of step t with probability > 1 - N(T)  1 
_ + in the states given by f(_a). 
Remarks: 
1. For 6 := n this probability converges to 1 for n - c<> if T is of constant depth 
and polynomial size. 
The condition on the timing of state changes in Theorem 3.1 has been for- 
mulated in a very general fashion in order to make it applicable to all of the 
common types of Boltzmann machines.For a sequential Boltzmann machine 
(see [AK], section 8.2) one can choose qi- qi-i sufficiently large (for exam- 
ple polynomially in the size of T) so that with high probability every unit of 
B(T) is considered for a state change during the interval (qi-, qi]. On the 
other hand, for a synchronous Boltzmann machine with limited parallelism 
([AK], section 8.3) one may apply the result to the case where every unit b(g) 
with g of depth i in T is considered for a state change at step i (set qi :-- i). 
Theorem 3.1 also remains valid for unlimited parallelism ([AK], section 8.3), 
where every unit is considered for a state change at every step (set qi :-- i). In 
fact, not even synchronicity is required for Theorem 3.1, and it also applies to 
asynchronous parallel Boltzmann machines ([AK], section 8.3.2). 
For sequential Boltzmann machines in general the available upper bounds for 
their convergence speed are very unsatisfactory. In particular no upper bounds 
are known which are polynomial in the number of units (see section 3.5 of 
[AK]). For Boltzmann machines with unlimited parallelism one can in general 
not even prove that they converge to a global maximum of their consensus 
function (section 8.3 of [AK]). 
Proof of Theorem 3.1: We prove by induction on i that for every gate g of depth 
i in T and every step t >_ qi the unit b(g) is at the end of step t with probability 
> 1- [Tree(g)[. 1 
_ +,---;r/7 in state g(a). 
Assume that g, ..., g are the immediate predecessors of gate g in T. By definition 
we have ITree(g)l- 1 + y.= ITree(gi)l. Let t  _< t be the last step before t at which 
b(g) has been considered for a state change. Since T >_ qi we have t t > qi-1- Thus 
for each j = 1,..., k we can apply the induction hypothesis to unit b(gj) and step 
t  - 1 > qdepth(gj). Hence with probability > 1 - (ITree(g)l- 1).  the state of 
_ _ + 
the units b(g), ...,b(g) at the end of step t - 1 are g(), ... ,g(). Assume now 
that the unit b(g i) is at the end of step t  - 1 in state gj (), for j = 1,..., k. If g is 
at the beginning of step t  not in state g(), then a state change of unit b(g) would 
increase the consensus function by C  5 (independently of the current status 
of units b(O) for immediate successors  of g in T). Thus b(g) accepts in this case 
1 
the change to state g() with probability 1  1 = 1 -- l+' On the 
l+e-O/- l+e-/c 
other hand, if b(g) is already at the beginning of step t' in state g(R), then a change 
of its state would decrease the consensus by at let 5. Thus b(g) remains with 
probability > 1 i in state g(g). The preceding considerations imply that 
-- l+e 
unit b(g) is at the end of step t  (and hence at the end of step i) with probability 
> 1 in g(a) 
- +  
830 Gupta and Maass 
4 APPLICATIONS 
The complexity class TC  is defined as the class of all Boolean functions f : 
(0,1)* --+ (0,1)* for which there exists a family (Tn)nEN of threshold circuits 
of some constant depth so that for each n the circuit Tn computes f for inputs of 
length n, and so that the number of gates in Tn and the absolute value of he weights 
of threshold gates in Tn (all weights are assumed to be integers) are bounded by a 
polynomial in n ([HMPST], [PSI). 
Corollary 4.1 (to Theorems 2.1 3.1): Every Boolean function f that belongs 
lo the complexity class TC  can be computed by scalable (i.e. polynomial size) 
Boltzmann machines whose connection strengths are integers of polynomial size and 
which converge for slate changes with unlimited parallelism with high probability in 
constantly many steps to a global maximum of their consensus function. 
The following Boolean functions are known to belong to the complexity class TC: 
AND, OR, PARITY; SORTING, ADDITION, SUBTRACTION, MULTIPLICA- 
TION and DIVISION of binary numbers; DISCRETE FOURIER TRANSFORM, 
and approximations to arbitrary analytic functions with a convergent rational power 
series ([CVS], [R], [HMPST]). 
Remarks: 
One can also use the method from this paper for the efficient construction 
of a Boltzmann machine Bpl,...,p k that can decide very fast to which of k 
stored "patterns" Pl,"',Pk 6 {0,1} n the current input X- 6 {0,1} n to the 
Boltzmann machine has the closest "similarity." 
For arbitrary fixed "patterns" pl,-.',p 6 {0,1} n let fp,...,pk ' {0,1} n - 
{0, 1} e be the paltern classification ,function whose ith output bit is 1 if and 
only if the Hamming distance between the input x_ 6 {0, 1} n and pi is less or 
equal to the Hamming distance between x_ and PJ, for all j  i. 
We write HD(_x,y) for the Hamming distance Ein__l Izi- Yi[ of strings 
_x,y,( {0,1} n. One has HD(X-,y) = Eyi_OXi q- Ey,=l(1 -- Xi), and there- 
fore HD(x, pj) - HD(X-,pt) = Ein__l OiXi q- C for suitable coefficients ai  
(-2,-1, 0, 1,2) and c  Z (that depend on the fixed patterns Pj,Pt  (0, 1)n). 
Thus there is a threshold circuit that consists of a single threshold gate which 
outputs 1 if HD(X-,pj) _< HD(x_,pt), and 0 otherwise. 
The function fp,...,pk can be computed by a threshold circuit T of depth 2 
whose jth output gate is the AND of k - I gates as above which check for 
!  (1,...,k)- (j) whether HD(x, pj) _< HD(x_,pt) (note that the under- 
lying graph of T is the same for any choice of the patterns pl,..., pe). The 
desired Boltzmann machine Bp,...,p is the Boltzmann machine B(T) for this 
threshold circuit T. 
Our results are also of interest in the context of learning algorilhms for Boltz- 
mann machines. For example, the previous remark provides a single graph 
< L/, > of a Boltzmann machine with n input units, k output units, and 
k  - k hidden units, that is able to compute with a suitable assignment of 
Efficient Design of Boltzmann Machines 831 
connection strengths (that may arise from a learning algorithm for Boltzmann 
machines) any function fp,...,p, (for any choice of pl,...,p  (0, 1)"). 
Similarly we get from Theorem 2.1 together with a result from [M] the graph 
< t/, > of a Boltzmann machine with n input units, n hidden units, and 
one output unit, that can compute with a suitable assignment of connection 
strengths any symmetric function f: (0,1)" - (0, 1) (f is called symmetric 
if f(zi,... ,z,) depends only on -]i"-- zi; examples of symmetric functions are 
AND, OR, PARITY). 
Acknowledgment: We would like to thank Georg Schnitger for his suggestion to 
investigate the convergence speed of the constructed Boltzmann machines. 
References 
[AK] 
[AHS] 
[as] 
[cvs] 
[HMPST] 
[M] 
[PSI 
[R] 
E. Aarts, J. Korst, Simulated Annealing and Boltzmann Machines, John 
Wiley & Sons (New York, 1989). 
D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for 
Boltzmann machines, Cognitive Science, 9, 1985, pp. 147-169. 
G.E. Hinton, T.J. Sejinowski, Learning and relearning in Boltzmann ma- 
chines, in: D.E. Rumelhart, J.L McCelland, & the PDP Research Group 
(Eds.), Parallel Distributed Processing: Explorations in the Microstruc- 
ture of Cognition, MIT Press (Cambridge, 1986), pp. 282-317. 
A.K. Chandra, L.J. Stockmeyer, U. Vishkin, Constant depth reducibility, 
SIAM, J. Comp., 13, 1984, pp. 423-439. 
A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, G. Turan, Threshold cir- 
cuits of bounded depth, to appear in J. of Comp. and Syst. Sci. (for an 
extended abstract see Proc. of the 28th IEEE Conf. on Foundations of 
Computer Science, 1987, pp.99-110). 
S. Muroga, Threshold Logic and its Applications, John Wiley & Sons 
(New York, 1971). 
I. Parberry, G. Schnitger, Relating Boltzmann machines to conventional 
models of computation, Neural Networks, 2, 1989, pp. 59-67. 
J. Reif, On threshold circuits and polynomial computation, Proc. of 
the 2nd Annual Conference on Structure in Complexity Theory, IEEE 
Computer Society Press, Washington, 1987, pp. 118-123. 
