Single-iteration Threshold Hamming 
Networks 
Isaac Meilijson 
Eytan Ruppin 
Moshe Sipper 
School of Mathematical Sciences 
Raymond and Beverly Sackler Faculty of Exact Sciences 
Tel Aviv University, 69978 Tel Aviv, Israel 
Abstract 
We analyze in detail the performance of a Hamming network clas- 
sifying inputs that are distorted versions of one of its m stored 
memory patterns. The activation function of the memory neurons 
in the original Hamming network is replaced by a simple threshold 
function. The resulting Threshold Hamming Network (THN) cor- 
rectly classifies the input pattern, with probability approaching 1, 
using only O(m In m) connections, in a single iteration. The THN 
drastically reduces the time and space complexity of Hamming Net- 
work classifiers. 
1 Introduction 
Originally presented in (Steinbuch 1961, Taylor 1964) the Hamming network (HN) 
has received renewed attention in recent years (Lippmann et. al. 1987, Baum et. 
al. 1988). The HN calculates the Hamming distance between the input pattern 
and each memory pattern, and selects the memory with the smallest distance. It 
is composed of two subnets: The similarity subnet, consisting of an n-neuron input 
layer connected with an m-neuron memory layer, calculates the number of equal bits 
between the input and each memory pattern. The winner-take-all (WTA) subnet, 
consisting of a fully connected m-neuron topology, selects the memory neuron that 
best matches the input pattern. 
564 
Single-iteration Threshold Hamming Networks 565 
The similarity subnet uses ran connections and performs a single iteration. The 
WTA subnet has ra' connections. With randomly generated input and memory 
patterns, it converges in O(raln(ran)) iterations (Floreen 1991). Since ra is ex- 
ponential in n, the space and time complexity of the network is primarily due to 
the WTA subnet (Domany & Orland 1987). We analyze the performance of the 
HN in the practical scenario where the input pattern is a distorted version of some 
stored memory vector. We show that it is possible to replace the original activa- 
tion function of the neurons in the memory layer by a simple threshold function, 
and completely discard the WTA subnet. If the threshold is properly tuned, only 
the neuron standing for the 'correct' memory is likely to be activated. The result- 
ing Threshold Hamming Network (THN) will perform correctly (with probability 
approaching 1) in a single iteration, using only O(ra In ra) connections instead of 
the O(ra ') connections in the original HN. We identify the optimal threshold, and 
measure its performance relative to the original HN. 
2 The Threshold Hamming Network 
We examine a HN storing ra+ 1 memory patterns ", 1 _ g _ ra+ 1, each 
being an n-dimensional vector of +1. The input pattern x is generated by selecting 
some memory pattern " (w.l.g., "*+), and letting each bit xi be either  or 
-' with probabilities a and (1 - a) respectively, where a > 0.5. To analyze this 
HN, we use some tight approximations to the binomial distribution. Due to space 
considerations, their proofs are omitted. 
Lemma 1. 
Let X  Bin(n, p). 
If x are integers such that lira.oo = fl G (p, 1), then 
1-p 
P(X_>x,) exp{-n[filn ff + (1 - fi) In 1 - fi]} (1) 
(1 - )V/27rnfi(1 - ) p 1 - p 
in the sense that the ratio between LHS and RHS converges to 1 as n -- 
1 let G() = In 2 +  In  + (1 - ) In(1 - ), then 
the special case p = 2, 
For 
exp{-nG()) 
> (2) 
(2 - )v/2rn(1 - ) 
Lemma 2. 
1 
Let. X  Bin(n, ) be independent, 7 G (0, 1), and let x,, be as in Lemma 1. If 
_ (1) 
ra:(2- )V/2rnfl(1-) ln e nG(), (3) 
then 
P(raax(X1,X2,.. ',Xm) < Xn)  '7 
(4) 
Lemma 3. 
 let (Xi) and be in Lemma 2, and let r]  (0, 1). 
Let Y ,,- Bin(n, a') with c > 5, 7 as 
Let x, be the integer closest to nil, where 
= - - (5) 
n 2n 
566 Meilijson, Ruppin, and Sipper 
and z,is the 71 - quantile of the standard normal distribution, i.e., 
1 f_'" e-XU'dx (6) 
Then, if Y and (Xi) are independent 
P(?rct(-'1,X2,'",Xm) < Y) >_ P(l'12ct(Xl,X2,'",Xm) < gn _ Y)  771 (7) 
as n -- oc, for m as in (3). 
Based on the above binomial probability approximations, we can now propose and 
analyze a n-neuron Threshold Hamming Network (THN) that classifies the input 
patterns with probability of error not exceeding , when the input vector is generated 
with an initial bit-similarity a: Let Xj be the similarity between the input vector 
and the fth memory pattern (1 < j < m), and let Y be the similarity with 
the 'correct' memory pattern ,,+F.. Ch-ose 7 and 71 so that 771 _> 1 - , e.g., 
7 = 71 = V r- ; determine fi by (5) and m by (3). Discard the WTA subnet, and 
simply replace the neurons of the memory layer by m neurons having a threshold 
xn , the integer closest to nfi. If any memory neuron with similarity at least xn 
is declared 'the winner', then, by Lemma 3, the probability of error is at most , 
where 'error' may be due to the existence of no winner, wrong winner, or multiple 
winners. 
3 
The Hamming Network and an Optimal Threshold 
Hamming Network 
We now calculate the choice of the threshold x, that maximizes the storage ca- 
pacity m = re(n, e, o O. Let  () denote the standard normal density (cumulative 
distribution function), and let r = 4/(1 - q)) denote the corresponding failure rate 
function. Then, 
Lemma 4. 
The optimal proportion between the two error probabilities is 
__ .(z.) (8) 
x/nc(1 - )ln i_PZ ' 
1-'/ 
1-71 
which we will denote by 5. 
Prooff 
Let M = max(X1,X2,...,Xm), and let Y denote the similarity with the 
'correct' memory pattern, as before. We have seen that P(M < x)  
exp{-m xp{-,6()} }. Since G(/) = In (_--) then by Taylor expansion 
V/2n(1 - )(2-  ) ' 
exp{-n[G( + x--)]) ) , 
P(M < x) = P(M < xo + x - xo)  exp{-m V/2rn( 1 _ )(2 - ) 
exp{-nG(fi) - (x - x0)In (1--) } } '7 ( __ )o- 
exp{-m = 
V/2wn(1 - )(2 - ) 
(9) 
Single-iteration Threshold Hamming Networks 567 
(in accordance with Gnedenko extreme-value distribution of type 1 (Leadbetter et. 
al. 1983)). Similarly, 
P(Y < x) = exp{lnP(Y < x0 + x- x0)}  
P(Y < x0)exp{ OS(z) x- x0 - 
 *(z) V/nc(1 - c) ) = (1 - ,)exp{r(z) x x0 
n(l_ ) ) (10) 
where b is the standard normal density function, (I) is the standard normal cumu- 
lative distribution function, q* = 1 - q and r = -q- is the corresponding failure 
rate function. The probability of correct recognition using a threshold x can now 
be expressed as 
(1 - (1 - ) exp{r(z) 
X -- X0 
}) (11) 
We differentiate expression (11) with respect to x0 - x, and equate the derivative 
at x0 - x to zero, to obtain the relation between 7 and  that yields the optimal 
threshold, i.e., that which maximizes the probability of correct recognition. This 
yields 
r(z) 1-,) (12) 
7 -- exp{- v/n(1 _ )ln /  
1- 
We now approximate 
1- 7  -ln7  (1- r/) 
v/ha(1 - c) In p 
1- 
and thus the optimal proportion between the two error probabilities is 
1-,v r(z) 
-.-----',, =. 
I - r/ V/nc(1 - c) In  
1- 
(13) 
(14) 
Based on Lemma 4, if the desired probability of error is e, we choose 
7-1 1+5' r/-1 (1+5) (15) 
We start with 3' = r/= v/- e, obtain 
and 3' from (15). The limiting values of f/ and 3' in this iterative process give the 
maximal capacity m and threshold 
We now compute the error probability e(rn, n, c) of the original HN (with the WTA 
subnet) for arbitrary m, n and c, and compare it with e. 
Lemma 5. 
For arbitrary n, c and e, let m,/,-y, r/ and 5 be as calculated above. Then, the 
probability of error (m, n, o) of the HN satisfies 
I - e 
e(m, n., r(1-6) 1+6 (16) 
5 in  (1 + 5) 1+5 
1- 
568 Meilijson, Ruppin, and Sipper 
where 
is the Gamma function. 
P roof: 
r(t) = xt-ie-Xdx 
(17) 
(18) 
We now approximate this sum by the integral of the summand: let b =  and 
1-; 
c = 5In  We have seen that the probability of incorrect performance of the 
WTA subnet is equal to 
 e(r _< xo)-*xo--)[(?( < o)) *( 
P(Y _< M)  
--" (?(:u ))b{0--] 
-- < Xo  
(1 - r/) - 7b)e-CYdy 
(19) 
1 
Now we transform variables t = b y In  to get the integral in the form 
fo  t dt fo  
e-C(1-r/) (e-t-e-st)(l---,k) = K1 (e-t--e-*t)t-(+K)dt (20) 
This is the convergent difference between two divergent Gamma function integrals. 
We perform integration by parts to obtain a representation as an integral with t 
instead of t-(+:) in the integrand. For 0 <_ K. < 1, the corresponding integral 
converges. The final result is then 
c i) (21) 
(1- r/)1- e-F(1 lnb)(ln 7_ 
Hence, we }lave 
P(Y < M)  (1 - r) 
1 - --6ln /s, r(1 - 5)(ln 1 ), 
51n /s  
1-/ 
i __ -ln __, (5)6 
r(1- 5) 
6 In p (1 + 6) t+* 
1- 
(22) 
Single-iteration Threshold Hamming Networks 569 
% error -, predicted predicted experimental experimental 
threshold , ra THN HN THN HN 
133, 145 2.46 0.144 2.552 0.103 
(1 - = 1.o3 (1 - = 1.o 
1-, = 1.46) l-r/- 1.552) 
134 , 346 3.4 0.272 3.468 0.253 
(1 -- 7 = 1.37 (1 - 7 = 1.373 
1 -- r/= 2.11) 1 -- r/= 2.168) 
135 , 825 4.714 0.494 4.152 0.485 
(1 -- 7 = 1.776 (1 -- 7 = 1.606 
1 -- rt = 2.991) 1 -- rt = 2.576) 
136, 1970 6.346 0.857 6.447 0.863 
(1 - 7 = 2.274 (1 - 7 = 2.335 
1 - /= 4.167) 1 - /= 4.162) 
Table 1: The performance of a HN and optimal THN: A comparison between cal- 
culated and experimental results (a = 0.7,n = 210). 
as claimed. Expression (22)is presented as K(,5,/?), where K(e, 5,/?)is the factor 
(<_ 1) by which the probability of error e of the THN should be multiplied in order 
to get the probability of error of the original HN with the WTA subnet. For small 
5, If is close to 1, however, as will be seen in the next section, K is typically larger. 
4 Numerical results 
The experimental results presented in table 1 testify to the accuracy of the HN and 
THN calculations. Figure I presents the calculated error probabilities for various 
values of input similarity a and memory capacity rn, as a function of the input size 
n. As is evident, the performance of the THN is worse than that of the HN, but due 
to the exponential growth of rn, it requires only a minor increment in n to obtain 
a THN that performs as well as the original HN. 
To examine the sensitivity of the THN network to threshold variation, we have fixed 
a = 0.7, n = 210, m: 825, and let the threshold vary between 132 and 138. As we 
can see in figure 2, the threshold 135 is indeed optimal, but the performance with 
threshold values of 134 and 136 is practically identical. The magnitude of the two 
error types varies considerably with the threshold value, but this variation has no 
effect on the overall performance near the optimum. These two error probabilities 
might as well be taken equal to each other. 
Conclusion In this paper we analyzed in detail the performance of a Hamming 
Network and a Threshold Hamming Network. Given a desired storage capacity and 
performance, we described how to compute the corresponding minimal network size 
required. The THN drastically reduces the time and connectivity requirements of 
Hamming Network classifiers. 
570 Meiliison, Ruppin, and Sipper 
epsilon 
error 
probability) 
0.0001, 
0.0003- 
0.0009- 
0.0025- 
0.0074 
0.018- 
0.05- 
0.14 -< 
0.37- 
800 
alpha=O.6,m- 10 3 
 HN -' ' 
I I I I I I 
1000 1200 1400 1600 1800 2000 
n (network size) 
2200 
epsilon 
error 
probabili[y) 
0.0001 
0.0003- 
0.0009- 
0.0025- 
0.007- 
0.018: 
0.05- 
0.14- 
0.37- 
30 
alpha=0.7,m=106 
I I I I I I I I I I I t I I 
320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 
n (network size) 
epsilon 
error 
probability) 
0.0001 
0.0003- 
0.0009- 
0.0025- 
0.007- 
0.018- 
0.05: 
0.142 
0.37- 
alpha=0.8,m=109 
 HN  
I I I I I I I 
160 180 200 220 240 260 280 300 320 
n (network size) 
Figure 1: Probability of error as a function of network size: three networks are 
depicted, displaying the performance at various values of a and m. For graphical 
l 
convenience, we have plotted log 7 versus n. 
Single-iteration Threshold Hamming Networks 571 
error 
lO 
95 
8-- 
7- 
6- 
5- 
THN performance 
3- 
2- 
1- 
0 
 epsilon <> 
1 - gamma I 
132 133 134 135 136 137 138 
threshold 
Figure 2: Threshold sensitivity of the THN (a = 0.7, n = 210, m = 825). 
References 
[1] K. Steinbuch. Dei lernmatrix. Kybernetic, 1:36-45, 1961. 
[2] W.K. Taylor. Cortico-thalamic organization and memory. Proc. of the Royal 
Society of London B, 159:466-478, 1964. 
[3] R.P. Lippmann, B. Gold, and M.L. Malpass. A comparison of Hamming and 
Hopfield neural nets for pattern classification. Technical Report TR-769, MIT 
Lincoln Laboratory, 1987. 
[4] E.E. Baum, J. Moody, and F. Wilczek. Internal representations for associative 
memory. Biological Uybernetics, 59:217-228, 1987. 
[5] P. Floreen. The convergence of hamming memory networks. IEEE Trans. on 
Neural Networks, 2(4):449-457, 1991. 
[6] E. Domany and H. Orland. A maximum overlap neural network for pattern 
recognition. Physics Letters A, 125:32-34, 1987. 
[7] M.R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and related prop- 
erties of random sequences and processes. Springer-Verlag, Berlin-Heidelberg- 
NewYork, 1983. 
