95 
OPTIMAL NEURAL SPIKE CLASSIFICATION 
Amir F. Atiya(*) and Ja,nes M. Bower(**) 
(*) Dept. of Electrical Engineering 
(**) Division of Biology 
California Institute of Technology 
Ca 91125 
Abstract 
Being able to record the electrical activities of a number of neurons simultaneously is likely 
to be important in the study of the functional organization of networks of real neurons. Using 
one extracellular microelectrode to record from several neurons is one approach to studying 
the response properties of sets of adjacent and therefore likely related neurons. However, to 
do this, it is necessary to correctly classify the signals generated by these different neurons. 
This paper considers this problem of classifying the signals in such an extracellular recording, 
based upon their shapes, and specifically considers the classification of signals in the case when 
spikes overlap temporally. 
Introduction 
How single neurons in a network of neurons interact when processing information is likely 
to be a fundamental question central to understanding how real neural networks compute,. 
In the mammalian nervous system we know that spatially adjacent neurons are, in general, 
more hkely to interact, as well as receive common inputs. Thus neurobiologists are interested 
in devising techniques that allow adjacent groups of neurons to be sampled simultaneously. 
Unfortunately, the small scale of real neural networks makes inserting one recording electrode 
per cell impractical. Therefore, one is forced to use single electrodes designed to sample neu- 
ral signals evoked by several cells at once. While this approach provides the multi-neuron 
recordings being sought, it also presents a rather serious waveform classification problem be- 
cause the actual temporal sequence of action potentials in each individual neuron must be 
deciphered. This paper describes a method for classifying the activities of several individual 
neurons recorded simultaneously using a single electrode. 
Description of the Problem 
Over the last two decades considerable attention 1'8 has been devoted to the problem of 
classification of action potentials in multi-neuron recordings. These action potentials (also 
referred to as "spikes") are the extracellularly recorded signal produced by a single neuron 
when it is passing information to other neurons (Fig. 1). Fortunately, spikes recorded from the 
same cell are more or less similar in shape, while spikes coming from different neurons usually 
have somewhat different shapes, depending on the neuron type, electrode characteristics, the 
distance between the electrode and the neuron, and the intervening medium. Fig. I illustrates 
some representative variations in spike shapes. It is our objective to detect and classify different 
spikes based on their shapes. However, relying entirely on the shape of the spikes presents 
difficulties. For example spikes from different neurons can overlap temporally producing novel 
waveforms (see Fig. 2 for an example of an overlap). To deal with these overlaps, one has first 
to detect the occurrence of an overlap, and then estimate the constituent spikes. Unfortunately, 
only a few of the available spike separation algorithms consider these events, even though they 
are potentially very important in understanding neural networks. Those few tend to rely 
American Institute of Physics 1988 
96 
on heuristic rules and subtractive methods to resolve overlap cases. No currently published 
method we are aware of attempts to use knowledge of the likelihood of overlap events for 
detecting them, which is at the basis of the method we will describe. 
Fig. 1 
An example of a multi-neuron recording 
overlapping spikes 
Fig. 2 
An example of a temporal overlap of action potentials 
General Approach 
The first step in classifying neural waveforms is obviously to identify the typical spike 
shapes occurring in a particular recording. To do this we have applied a learning algorithm 
on the beginning portion of the recording, which in an unsupervised fashion (i.e. without the 
intervention of a human operator) estimates the shapes. After the learning stage we have 
the classification stage, which is applied on the remaining portion of the recording. A new 
classification method is proposed, which gives minimum probability of error, even in case of the 
occurrence of overlapping spikes. Both the learning and the classification algorithms require 
a preprocessing step to detect the position of the spike candidate in the data record. 
Detectior: For the first task of detection most researchers use a simple level detecting 
algorithm, that signals a spike when recorded voltage levels cross a certain voltage threshold. 
However, variations in recording position due to natural brain movements during recording 
(e.g. respiration) can cause changes in relative height of the positive to the negative peak. 
Thus, a level detector (using either a positive or a negative threshold) can miss some spikes. 
Alternatively, we have chosen to detect an event by sliding a window of fixed length until a 
time when the peak to peak value within the window exceeds a certain threshold. 
Learning: Learning is performed on the beginning portion of the sampled data using 
the Isodata clustering algorithm 9. The task is to estimate the number of neurons n whose 
spikes are represented in the waveform and learn the different shapes of the spikes of the 
various neurons. For that purpose we apply the clustering algorithm choosing only one feature 
97 
from the spike, the peak to peak value which we have found to be quite an effective feature. 
Note that using the peak to peak value in the learning stage does not necessitate using it for 
classification (one might need additional or different features, especially for tackling the case 
of spike overlap). 
The Optimal Classification Rule: Once we have identified the number of different events 
present, the classification stage is concerned with estimating the identities of the spikes in the 
recording, based on the typical spike shapes obtained in the learning stage. In our classification 
scheme we make use of the information given by the shape of the detected spike as well 
as the firing rates of the different neurons. Although the shape plays in general the most 
important role in the classification, the rates become a more significant factor when dealing 
with overlapping events. This is because in general overlap is considerably less frequent than 
single spikes. The shape information is given by d set of features extracted from the waveform. 
Let x be the feature vector of the detected spike (e.g. the samples of the spike waveform). Let 
N, ..., N,, represent the different neurons. The detection algorithm tells us only that at least 
one spike occurred in the narrow interval (t - T, t + T) (= say I) where t is the instant of 
the peak of the detected spike, T and Tg are constants chosen subjectively according to the 
smallest possible time separation between two consecutive spikes, identifiable as two separate 
(nonoverlapping) spikes. By definition, if more than one spike occurs in the interval I, then 
we have an overlap. As a matter of convention, the instant of the occurrence of a spike  
taken to be that of the spike peak. For simplicity, we will consider the case of two possibly 
overlapping spikes, though the method can be extended easily to more. The classification rule 
which results in minimum probability of error is the one which chooses the neuron (or pair of 
neurons in case of overlap) which has the maximum likelihood. We have therefore to compare 
the Pi's and the Py's, defined as 
= p(N, fed in Ix, A), 
i ---- 1, ..., n 
as = P(N and NS fired in/Ix, A), 
l,j=l,...,n, 
where A represents the event that one or two spikes occurred in the interval I. In other words 
Pi the probability that what has been detected is a single spike from neuron i, whereas 
is the probability that we have two overlapping spikes from neurons I and j (note that spikes 
from the same neuron never overlap). Henceforth we will use f to denote probability density. 
For the purpose of abbreviation let Bi(t) mean "neuron Ni fired at t". The classification 
problem can be reduced to comparing the following likelihood functions: 
ft+ T2 
Li = f(Bi(t))a,_r ' f(xlBi(t,))dZ,, i= 1,...,r (la) 
t + T: ft + T: 
= f(xlBt(tx),Bs(t=))dt,dt= , l,j= 1,...,n, j < I (lb) 
(for a derDation refer to AppendN). Let fi be the density of the inter-spe interval d ri be 
the most recent firg instant of neuron Ni. ff we e given the ft that neuron Ni h been 
idle for at let a period of duration t - ri, we get 
= 
(2} 
A disadvantage of using (2) is that the available fi's and ri's are only estimates, which depend 
on the previous classification results, Further, for reliable estimation of the densities fi, one 
needs a large number of spikes and therefore a long learning period since we are estimating a 
98 
whole function. Therefore, we have not used this form, but instead have used the following two 
schemes. In the first one, we ignore the knowledge about the previous firing pattern except 
for the estimated firing rates ,l, ...,An of the different neurons N1, ...,Nn respectively. Then 
the probability of a spike coming from neuron Ni in an interval of duration dt is simply Aidt. 
Hence 
i(B(0) = (3) 
In the second scheme we do not use any previous knowledge except for the total firing rate (of 
all neurons), say a. Then 
f(Bi(t)) = --. (4) 
n 
Although the second scheme does not use as much of the information about the firing 
pattern as the first scheme does, it has the advantage of obtaining and using a more reliable 
estimate of the firing rate, because in general the overall firing rate changes less with time than 
the individual rates and because the estimate of a does not depend on previous classification 
results. However, it is useful mostly when the firing rates of the different neurons do not vary 
much, otherwise the firt scheme is preferred. 
In real recording situations, sometimes one encounters voltage signals which are much 
different than any of the previously learned typical spike shapes or their pairwise overlaps. 
This can happen for example due to a falsely detected noise event, a spike from a class not 
encountered in the learning stage, or to the overlap of three or more spikes. To cope with 
these cases we use the reject option. This means that we refuse to classify the detected spike 
because of the unlikeliness of the assumed event A. The reject option is therefore employed 
whenever P(Atx ) is smaller than a certain threshold. We know that 
P(AIx ) = f(A,x)/[f(A,x)+ f(A,x)] 
where A c is the complement of the event A. The density f(AC,x) can be approximated as 
uniform (over the possible values of x) because a large variety of cases are covered by the event 
A c. It follows that one can just compare f(A,x) to a threshold. Hence the decision strategy 
becomes finally: Reject if the sum of the likelihood functions is less than a threshold. Otherwise 
choose the neuron (or pair of neurons) corresponding to the largest likelihood functions. Note 
that the sum of the likelihood functions equals f(A,x) (refer to Appendix). 
Now, let us evaluate the integrals in (1). Overlapping spikes are assumed to add linearly. 
Since we intend to handle the overlap case, we have to use a set of features xm which obeys 
the following. Given the features of two of the waveforms, then one can compute those of their 
overlap. A good such candidate is the set of the samples of the spike (or possibly also just 
part of the samples). The added noise, partly thermal noise from the electrode and partly 
due to firings from distant neurons, can usually be approximated as white Gaussian. Let the 
variance be cr 2. The integrals in the likelihood functions can be approximated as summations 
(note in fact that we have samples available, not a continuous waveform). Let i represent the 
typical feature vector (template) associated with neuron Ni, with the rn th component being 
i 
y,,. Then 
1 exp [- 1 i 2 
m=l 
M 
i exp[-- i t 
-- Ym-kx 
i(xl,(kl), - 
rl,---- ! 
- y' 
rr--k ] J 
99 
where x, is the rn t component of x, and M is the dimension ofx. This leads to the following 
likelihood functions 
 1 i 2 
kl=-M m=l 
-- -- Ym-kx -- m-kl J 
kl=-Mlk:-M1 
where k  the spike stant, and the interval from -M to M2 corresponds to the inteal I 
defined at the beginning of the Section. 
Implementation 
The techniques we have just described were tested in the following way. For the first 
experiment we identified two spike classes in a recording from the rat cerebellum. A signal 
is created, composed of a number of spikes from the two classes at random instants, plus 
noise. To make the situation as realistic as possible, the added noise is taken from idle periods 
(i.e. non-spiking) of a real recording. The reason for using such an artificially generated 
signal is to be able to know the class identities of the spikes, in order to test our approach 
quantitatively. We implement the detection and classification techniques on the obtained 
signal, with various values of noise amplitude. In our case the ratio of the peak to peak values 
of the telnplates turns out to be 1.375. Also, the spike rate of one of the clases is twice that of 
the other class. Fig.3a shows the results with applying the first scheme (i.e. using Eq. 3). The 
overall percentage correct classification for all spikes (solid curve) and the percentage correct 
classification for overlapping spikes (dashed curve) are plotted versus the standard deviation 
of the noise cr normalized with respect to the peak h of the large template. Notice that the 
overall classification accuracy is near 100% for cr/h less than 0.15, which is actually the range 
of noise amplitudes we mostly encountered in our work with real recordings. Observe also 
the good results for classifying overlapping events. We have applied also the second scheme 
(i.e. using Eq. 4) and obtained similar results. We wish to mention that the thresholds for 
detection and for the reject option are set up so as to obtain no more than 3% falsely detected 
spikes. 
A similar experiment is performed with three waveforms (three classes), where two of the 
waveforms are the same as those used in the first experiment. The third is the average of 
the first two. All the three neurons have the same spike rate (i.e. ,l : -2 = ,3). Hence 
both classification schemes are equivalent in this case. Fig. 3b shows the overall as well as 
the sub-category of overlap classification results. One observes that the results are worse than 
those for the two-class case. This is because the spacings between the templates are in general 
smaller. Notice also that the accuracy in resolving overlapping events is now tangibly less 
than the overall accuracy. However, one can say that the results are acceptable in the range 
of cr/h less than 0.1. The following experiment is also performed using the same data. We 
would like to investigate the importance of the information given by the (overall) firing rate on 
the problem of classifying overlapping events. In our method the summation in the likelihood 
functions for single spikes is multiplied by o/n, while that for overlapping spikes is multiplied 
by (c/n) 2. Usually ot/n is considerably less than one. Hence we have a factor which gives less 
weight for overlapping events. Now, consider the case of ignoring completely the information 
given by the firing rate and relying solely on shape information. We assume that overlapping 
spikes from any two given classes represent "new" class of waveforms and that each of these 
overlap classes has the same rate as that of a single-spike class. In that case we can obtain 
expressions for the likelihood functions as consisting just the summations, i.e. free of the rate 
100 
l. 1.148 I.l% 1.24t I.l 
a 
If. Ill 
III.III 
II.III 
I, 1.848 1.1% 1.14 1.1 1.711 
otlel/ 
Fig. 3 
a) Overall (solid curve) and overlap (dashed curve) 
classification accuracy for a two class case 
b) Overall (solid curve) and overlap (dashed curve) 
classification accuracy for a three class case 
Percent of incorrect classification of single spikes as overlap 
solid curve: scheme utilzing the spike rate 
dashed curve: scheme not utilizing the spike rate 
factor / (refer to Appendix). An experiment is performed using that scheme (on the same 
three class data). One observes that the method classifies a number of single spikes wrongly 
as overlaps, much more than our original scheme does (see Fig. 3c), especially for the large 
noise case. On the other hand, the number of overlaps which are classified wrongly as single 
spikes is near zero for both schemes. 
Finally, in the last experiment the techniques re implemented on real recordings from the 
rat cerebellum. The recorded signal is band-pass-filtered in the frequency range 300 Hz - 10 
KH% then sampled with a rate of 20KH.. For classification, we take 20 samples per spike as 
features. Fig. 4 shows the results of the proposed method, using the first scheme (Eq. 3). The 
number of neurons whose spikes are represented in the waveform is estimated to be four. The 
lol 
detection threshold is set up so that spikes which are too small are disregarded, because they 
come from several neurons far away from the electrode and are hard to distinguish. Notice 
the overlap of classes 1 and 2, which was detected. We used the second scheme also on the 
same portion and it gave similar results as those of the first scheme (only one of the spikes is 
classified differently). Overall, the discrepancies between classifications done by the proposed 
method and an experienced human observer were found to be small. 
3 
1 
3 3 
2 
I I 
3 
14 
1,2 3 
I I 
1 
3 2 
1 
1 
Fig. 4 
Classification results for a recording from the rat cerebellum 
Conclusion 
Many researchers have considered the problem of spike classification in multi-neuron 
recordings, but only few have tackled the case of spike overlap, which could occur frequently, 
particularly if the group of neurons under study is stimulated. In this work we propose a 
method for spike classification, which can also aid in detecting and classifying overlapping 
spikes. By taking into account the statistical properties of the discharges of the neurons sam- 
pied, this method minimizes the probability of classification error. The application of the 
method to artificial as well as real recordings confirm its effectiveness. 
Appendix 
Consider first Py. We can write 
102 
We can also obtain 
f+Tf+T= 
V, s = f(x' AIB'(t)' Bs(t2)) 
Now, consider the two events Bt(t) d Bi(t2 ). In the absense of any formation about thek 
dependence, we sume that they e independent. We get 
f(Bt(t),Bi(t2)) =/(Bt(t))/(By(t2)). 
Within the interval I, both /(Bt(t)) and /(Bs(t2)) hardly vy because the duration of 
I 2 very small comped to a typical inter-spe teal. Therefore we get the foowing 
approx ation: 
/(SS(*))  /(SS(*))' 
The expression for S becomes 
Notice that the term A was omitted from the gument of the density inside the integral, 
because the occuence of two spikes at t d t2eI implies the occurrence of A. A 
derivation for  results  
The erm f(x, A)  common o all he PU's and [he Pi's. Hence one can sply compare [he 
following lelood functions: 
= 
Acknowledgement 
Our thanks to Dr. Yaser Abu-Mostafa for his assistance with this work. This project was 
supported by the Caltech Program of Advanced Technology (sponsored by Aerojet,GM,GTE, 
and TRW), and the Joseph Drown Foundation. 
References 
[1] M. Abeles and M. Goldstein, Proc. IEEE, 65, pp.762-773, 1977. 
[2] G. Dinning and A. Sanderson, IEEE Trans. Bio- Med. Eng., BME-28, pp. 804-812, 
1981. 
[3] E. D'Hollander and G. Orban, IEEE Trans. Bio-Med. Eng., BME-26, pp. 279-284, 1979. 
[4] D. Mishelevich, IEEE Trans. Bio-Med. Eng., BME-17, pp. 147-150, 1970. 
[5] V. Prochazka and H. Kornhuber, Electroenceph. clin. Neurophysiol., 32, pp. 91-93, 1973. 
[6] W. Roberts, Biol. Cybernet., 35, pp. 73-80, 1979. 
[7] W. Roberts and D. Hartline, Brain Res., 94, pp. 141-149, 1975. 
[8] E. Schmidt, J. Neurosci. Methods, 12, pp. 95-111, 1984. 
[9] R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley, 1973. 
