I I II 
The Storage Capacity 
of a Fully-Connected Committee Machine 
Yuansheng Xiong 
Department of Physics, Pohang Institute of Science and Technology, 
Hyoja San 31, Pohang, Kyongbuk, Korea 
xionggalaxy. postech. ac. kr 
Chulan Kwon 
Department of Physics, Myong Ji University, 
Yongin, Kyonggi, Korea 
ckwonwh. myongj . ac. kr 
Jong-Hoon Oh 
Lucent Technologies, Bell Laboratories, 
600 Mountain Ave., Murray Hill, NJ07974, U.S. A. 
j hohphys ics. bell-labs. com 
Abstract 
We study the storage capacity of a fully-connected committee ma- 
chine with a large number K of hidden nodes. The storage capac- 
ity is obtained by analyzing the geometrical structure of the weight 
space related to the internal representation. By examining the as- 
ymptotic behavior of order parameters in the limit of large K, the 
storage capacity ac is found to be proportional to K l up to the 
leading order. This result satisfies the mathematical bound given 
by Mitchison and Durbin, whereas the replica-symmetric solution 
in a conventional Gardner's approach violates this bound. 
I INTRODUCTION 
Since Gardner's pioneering work on the storage capacity of a single layer 
perceptron[1], there have been numerous efforts to use the statistical mechanics 
formulation to study feed-forward neural networks. The storage capacity of multi- 
layer neural networks has been of particular interest, together with the general- 
ization problem. Barkai, Hansel and Kanter[2] studied a parity machine with a 
The Storage Capacity of a Fully-Connected Committee Machine 379 
non-overlapping receptive field of continuous weights within a one-step replica sym- 
metry breaking (RSB) scheme, and their result agrees with a mathematical bound 
previously found by Mitchison and Durbin (MD)[3]. Subsequently Barkai, Hansel 
and Sompolinsky[4] and Engel et al.[5] have studied the committee machine, which 
is closer to the multi-layer perceptron architecture and is most frequently used in 
real-world applications. Though they have derived many interesting results, partic- 
ularly for the case of a finite number of hidden units, it was found that their the 
replica-symmetric (RS) result violates the MD bound in the limit where the number 
of hidden units K is large. 
Recently, Monasson and O'Kane[6] proposed a new statistical mechanics formalism 
which can analyze the weight-space structure related to the internal representations 
of hidden units. It was applied to single layer perceptrons[7, 8, 9] as well as multi- 
layer networks[10, 11, 12]. Monasson and Zecchina[10] have successfully applied this 
formalism to the case of both committee and parity machines with non-overlapping 
receptive fields (NRF)[10]. They suggested that analysis of the RS solution under 
this new statistical mechanics formalism can yield results just as good as the one- 
step RSB solution in the conventional Gardner's method. 
In this letter, we apply this formalism for a derivation of the storage capacity of a 
fully-connected committee machine, which is also called a committee machine with 
overlapping receptive field (ORF) and is believed to be a more relevant architecture. 
In particular, we obtain the value of the critical storage capacity in the limit of 
large K, which satisfies the MD bound. It also agrees with a recent one-step RSB 
calculation, using the conventional Gardner method, to within a small difference of 
a numerical prefactor[13]. Finally we will briefly discuss the fully-connected parity 
machine. 
2 
WEIGHT SPACE STRUCTURE OF THE 
COMMITTEE MACHINE 
We consider a fully-connected committee machine with N input units, K hidden 
units and one output unit, where weights between the hidden units and the output 
unit are set to 1. The network maps input vectors {x}, where p - 1,...,P, to 
output y as: 
y - sgn 
-- sgn 
(1) 
where Wji is the weight between the ith input node and the jth hidden unit. 
h. -- sgn(iN=l Wjix) is the jth component of the internal representation for 
input pattern {x}. We consider continuous weights with spherical constraint, 
E, = 
Given P = aN patterns, the learning process in a layered neural network can be 
interpreted as the selection of cells in the weight space corresponding to a set of 
suitable internal representations h -- {h}, each of which has a non-zero elementary 
380 Y. Xiong, C. Kwon and J-H. Oh 
volume defined by' 
where O(x) is the Heaviside step function. The Gardner's volume Vc, that is, the 
volume of the weight space which satisfies the given input-output relations, can be 
written as the sum of the cells over all internal representations: 
Vc - E Vh. (3) 
h 
The method developed by Monasson and his collaborators [6, 10] is based on analysis 
of the detailed internal structure, that is, how the Gardner's volume V6 is decom- 
posed into elementary volumes Vh associated with a possible internal representation. 
The distribution of the elementary volumes can be derived from the free energy, 
where ((...)) denotes the average over patterns. The entropy A/'[w(r)] of the volumes 
whose average sizes are equal to w(r) -- - 1IN In ((Vh)), can be given by the Legendre 
relations 
= og(,.) 
= o,- (5) 
respectively. 
The entropies A/'D = .Af[w(r = 1)] and A/'a = .Af[w(r = 0)] are of most impor- 
tance, and will be discussed below. In the thermodynamic limit, 1/N((ln(VG))) = 
-g(r - 1) is dominated by elementary volumes of size w(r = 1), of which there 
are exp(N.AfD). Furthermore, the most numerous elementary volumes have the size 
w(r = 0) and number exp(NA/'R). The vanishing condition for the entropies is re- 
lated to the zero volume condition for VG and thus gives the storage capacity. We 
focus on the entropy A/'D of elementary volumes dominating the weight space VG. 
3 ORDER PARAMETERS AND PHASE TRANSITION 
For a fully-connected machine, the overlaps between different hidden units should 
be taken into account, which makes this problem much more difficult than the tree- 
like (NRF) architecture studied in Ref. [10]. The replicated partition function for 
the fully-connected committee machine reads: 
with a = 1,-.. 
,r and c - 1,.-.,n. 
(6) 
Unlike Gardner's conventional approach, we 
need two sets of replica indices for the weights. We introduce the order parameters, 
Qab 1 
i 
where the indices a, b originate from the integer power r of elementary volumes, 
and c,  are the standard replica indices. The replica symmetry ansatz leads to five 
The Storage Capacity of a Fully-Connected Committee Machine 381 
order parameters as: 
q, 
i.) c j a b q 
d* 
d 
(j = k,c =/3, a - b), 
(j = k,c  ), 
(j  k,c =/,a = b), 
(j - k,c =/,a - b), 
(j # k,  # ), 
(8) 
where q* and q are, respectively, the overlaps between the weight vectors connected 
to the same hidden unit of the same (c = ) and different (c  ) replicas corre- 
sponding to the two different internal representations. The order parameters c, d* 
and d describe the overlaps between weights that are connected to different hid- 
den units, of which c and d* are the overlaps within the same replica whereas d 
correlates different replicas. 
Using a standard replica trick, we obtain the explicit form of g(r). One may notice 
that the free energy evaluated at r: i is reduced to the RS results obtained by the 
conventional method on the committee machine[4, 5], which is independent of q* 
and d*. This means that the internal structure of the weight space is overlooked by 
conventional calculation of the Gardner's volume. When we take the limit r - 1, 
the free energy can be expanded as: 
g(r,q* q,c,d*,d)=g(1,q,c,d)+ (r- 1) Og(r'q*'q'c'd*'d) r----1 
() 
As noticed, g(r, q*, q, c, d*, d) is the same as the RS free energy in the Gardner's 
method. From the relation: 
#(r) I -g(r---) I (10) 
-- , 
']rD ---- O(i/r) r=l 01' r--1 
we obtain the explicit form of Aft). 
In the case of the NRF committee machine, where each of the hidden units is 
connected to different input units, we do not have a phase transition. Instead, a 
single solution is applicable for the whole range of c. In contrast, the phase-space 
structure of the fully-connected committee machine is more complicated than that 
of the NRF committee machine. When a small number of input patterns are given, 
the system is in the permutation-symmetry (PS) phase[4, 5, 14], where the role of 
each hidden unit is not specialized. In the PS phase, the Gardner's volume is a single 
connected region. The order parameters associated with different hidden units are 
equal to the corresponding ones associated with the same hidden unit. When a 
critical number of patterns is given, the Gardner's volume is divided into many 
islands, each one of which can be transformed into other ones by permutation of 
hidden units. This phenomenon is called permutation symmetry breaking (PSB), 
and is usually accompanied by a first-order phase transition. In the PSB phase, 
the role of each hidden unit is specialized to store a larger number of patterns 
effectively. A similar breaking of symmetry has been observed in the study of 
generalization[14, 15], where the first-order phase transition induces discontinuity of 
the learning curve. It was pointed out that the critical storage capacity is attained in 
the PSB phase[4, 5], and our recent one-step replica symmetry breaking calculation 
confirmed this picture[13]. Therefore, we will focus on the analysis of the PSB 
solution near the storage capacity, in which q*, q - 1, and c, d*, d are of order 
1/K. 
382 Y. Xiong, C. Kwon and J-H. Oh 
4 STORAGE CAPACITY 
When we analyze the results for free energy, the case with q(r = 1), c{r = 1) and 
d(r = 1) is reduced to the usual saddle-point solutions of the replica symmetric 
expression of the Gardner's volume #(r: 1)[4, 5]. When K is large, the trace 
over all allowed internal representations can be evaluated similarly to Ref.[4]. The 
saddle-point equations for q* and d* are derived from the derivative of the free 
energy in the limit r % 1, as in Eq. (9). The details of the self-consistent equations 
are not shown for space consideration. In the following, we only summarize the 
asymptotic behavior of the order parameters for large a: 
128 K  
1-q+d-c.- (a'-2) a a a' (11) 
32 K 
1-q+(K- 1)(c-d) (12) 
--22' 
--2 
q + (K - 1)d  --, (13) 
aF a 
1-q*+d*-c 2a a , (14) 
aF  
1-q*+(K-1)(c-d*) 2a a , (15) 
where F: -[f du Z(u)lnZ(u)] -1  0.62. 
It is found that all the overlaps between weights connecting different hidden units 
have scaling of-l/K, where the typical overlaps between weights connecting the 
same hidden unit approach one. The order parameters c, d and d* are negative, 
showing antiferromagnetic correlations between different hidden units, which im- 
plies that each hidden unit attempts to store patterns different from those of the 
others[4, 5]. 
Finally, the ymptotic behavior of the entropy D in the large K limit can be 
derived using the scaling given above. Near the storage capacity, D can be written, 
up to the leading order, : 
D  KInK- (g- 2)aaa (16) 
- 256K 
Being the entropy of a discrete system, D cannot be negative. Therefore, 
D = 0 gives an indication of the upper bound of storage capacity, that is, 
ac  =laKlnK. The storage capacity per synapse, =aln K, satisfies the 
rigorous bound  ln K derived by Mitchison and Durbin (MD)[3], where the 
conventional RS rult[4, 5], which scales  , violates the MD bound. 
5 DISCUSSIONS 
Recently, we have studied this problem using a conventional Gardner approach in 
the one-step RSB scheme[13]. The result yields the same scaling with respect to K, 
but a coefficient smaller by a factor v. In the present paper, we are dealing with 
the fine structure of version space related to internal representations. On the other 
hand, the RSB calculation seems to handle this fine structure in association with 
symmetry breaking between replicas. Although the physics of the two approaches 
seems to be somehow related, it is not clear which of the two can yield a better 
The Storage Capacity of a Fully-Connected Committee Machine 383 
estimate of the storage capacity. It is possible that the present RS calculation does 
not properly handle the RSB picture of the system. Monasson and his co-workers 
reported that the Almeida-Thouless instability of the RS solutions decreases with 
increasing K, in the NRF case[10, 11]. A similar analysis for the fully-connected case 
certainly deserves further research. On the other hand, the one-step RSB scheme 
also introduces approximation, and possibly it cannot fully explain the weight-space 
structure associated with internal representations. 
It is interesting to compare our result with that of the NRF committee machine 
along the same lines[10]. Based on the conventional RS calculation, Angel et al. 
suggested that the same storage capacity per synapse for both fully-connected and 
NRF committee machines will be similar, as the overlap between the hidden nodes 
approaches zero.[5]. While the asymptotic scaling with respect to K is the same, 
the storage capacity in the fully-connected committee machine is larger than in the 
NRF one. It is also consistent with our result from one-step RSB calculation[13]. 
This implies that the small, but nonzero negative correlation between the weights 
associated with different hidden units, enhances the storage capacity. This may 
be good news for those people using a fully connected multi-layer perceptton in 
applications. 
From the fact that the storage capacity of the NRF parity machine is In K/In 2[2, 
10], which saturates the MD bound, one may guess that the storage capacity of a 
fully-connected parity machine is also proportional to K In K. It will be interesting 
to check whether the storage capacity per synapse of the fully-connected parity 
machine is also enhanced compared to the NRF machine[16]. 
Acknowledgements 
This work was partially supported by the Basic Science Special Program of 
POSTECH and the Korea Ministry of Education through the POSTECH Basic 
Science Research Institute(Grant No. BSRI-96-2438). It was also supported by 
non-directed fund from Korea Research Foundation, 1995, and by KOSEF grant 
971-0202-010-2. 
References 
[1] E. Gardner, Europhys, Lett. 4(4), 481 (1987); E. Gardner, J. Phys. A21, 257 
(1988); E. Gardner and B. Derrida, J. Phys. A21,271 (1988). 
[2] E. Barkai, D. Hansel and I. Kanter, Phys. Rev. Lett. V 65, N18, 2312 (1990). 
[3] G. J. Mitchison and R. M. Durbin, Boil. Cybern. 60,345 (1989). 
[4] E. Barkai, D. Hansel and H. Sompolinsky, Phys. Rev. E45, 4146 (1992). 
[5] A. Engel, H. M. KShler, F. Tschepke, H. Vollmayr, and A. Zippeelius, Phys. 
Rev. E45, 7590 (1992). 
[6] R. Monasson and D. O'Kane, Europhys. Lett. 27, 85(1994). 
[7] B. Derrida, R. B. Grifliths and A Prugel-Bennett, J. Phys. A 24, 4907 (1991). 
[8] M. Biehl and M. Opper, Neural Networks: The Statistical Mechanics Perspec- 
tive, Jong-Hoon Oh, Chulan Kwon, and Sungzoon Cho (eds.) (World Scientific, 
Singapore, 1995). 
[9] A. Engel and M. Weigt, Phys. Rev. E53, R2064 (1996). 
[10] R. Monasson and R. Zecchina, Phys. Rev. Lett. 75, 2432 (1995); 76, 2205 
(1996). 
384 E Xiong, C. Kwon and J-H. Oh 
[11] R. Monasson and R. Zecchina, Mod. Phys. B, Vol.9, 1887-1897 (1996). 
[12] S. Cocco, R. Monasson and R. Zecchina, Phys. Rev. E54, 717 (1996). 
[13] C. Kwon and J. H. Oh, J. Phys. A, in press. 
[14] K. Kang, J. H. Oh, C. Kwon and Y. Park, Phys. Rev. E48, 4805 (1993). 
[15] H. $chwarze and J. Hertz, Europhys. Lett. 21,785 (1993). 
[16] Y. Xiong, C. Kwon and J.-H. Oh, to be published (1997). 
