A Connectionist Learning Approach to Analyzing 
Linguistic Stress 
Prahlad Gupta 
Department of Psychology 
Carnegie Mellon University 
Pittsburgh, PA 15213 
David S. Touretzky 
School of Computer Science 
Carnegie Mellon University 
Pittsburgh, PA 15213 
Abstract 
We use connectionist modeling to develop an analysis of stress systems in terms 
of ease of learnability. In traditional linguistic analyses, learnability arguments 
determine default parameter settings based on the feasibilty of logically deducing 
correct settings from an initial state. Our approach provides an empirical alter- 
native to such arguments. Based on perceptron learning experiments using data 
from nineteen human languages, we develop a novel characterization of stress 
patterns in terms of six parameters. These provide both a partial description of the 
stress pattern itself and a prediction of its learnability, without invoking abstract 
theoretical constructs such as metrical feet. This work demonstrates that ma- 
chine learning methods can provide a fresh approach to understanding linguistic 
phenomena. 
1 LINGUISTIC STRESS 
The domain of stress systems in language is considered to have a relatively good linguistic 
theory, called metrical phonology . In this theory, the stress patterns of many languages 
can be described concisely, and characterized in terms of a set of linguistic "parameters," 
such as bounded vs. unbounded metrical feet, left vs. right dominant feet, etc. 2 In many 
languages, stress tends to be placed on certain kinds of syllables rather than on others; the 
former are termed heavy syllables, and the latter light syllables. Languages that distinguish 
1For an overview of the theory, see [Goldsmith 90, chapter 4]. 
2See [Dresher 90] for one such parameter scheme. 
225 
226 Gupta and Touretzky 
OUTPUT UNIT 
(PERCEPTRON) 
Input 
INPUT LAYER (2 x 13 units) 
Figure 1: Perceptton model used in simulations. 
between heavy and light syllables are termed quantity-sensitive (QS), while languages that 
do not make this distinction are termed quantity-insensitive (QI). In some QS languages, 
what counts as a heavy syllable is a closed syllable (a syllable that ends in a consonant), 
while in others it is a syllable with a long vowel. We examined the stress patterns of 
nineteen QI and QS systems, summarized and exemplified in Table 1. The data were drawn 
primarily from descriptions in [Hayes 80]. 
2 PERCEPTRON SIMULATIONS 
In separate experiments, we trained a perceptron to produce the stress pattern of each of 
these languages. Two input representations were used. In the syllabic representation, used 
for QI patterns only, a syllable was represented as a [1 1] vector, and [0 0] represented no 
syllable. In the weight-string representation, which was necessary for QS languages, the 
input patterns used were [1 0] for a heavy syllable, [0 1] for a light syllable, and [0 0] for 
no syllable. For stress systems with up to two levels of stress, the output targets used in 
training were 1.0 for primary stress, 0.5 for secondary stress, and 0 for no stress. For stress 
systems with three levels of stress, the output targets were 0.6 for secondary stress, 0.35 for 
tertiary stress, and 1.0 and 0 respectively for primary stress and no stress. The input data set 
for all stress systems consisted of all word-forms of up to seven syllables. With the syllabic 
input representation there are 7 of these, and with the weight-string representation, there are 
255. The perceptron's input array was a buffer of 13 syllables; each word was processed 
one syllable at a time by sliding it through the buffer (see Figure 1). The desired output at 
each step was the stress level of the middle syllable of the buffer. Connection weights were 
adjusted at each step using the back-propagation learning algorithm [Rumelhart 86]. One 
epoch consisted of one presentation of the entire training set. The network was trained for 
as many epochs as necessary to ensure that the stress value produced by the perceptron was 
within 0.1 of the target value, for each syllable of the word, for all words in the training set. 
A learning rate of 0.05 and momentum of 0.90 was used in all simulations. Initial weights 
were uniformly distributed random values in the range -t-0.5. Each simulation was run at 
least three times, and the learning times averaged. 
Connectionist Learning and Linguistic Stress 227 
REF I LANGUAGE I DESCRIPTION OF STRESS PATTERN I Ex^IES 
Quantity-Insensitive Languages: 
L1 Latvian Fixed word-initial slxess. S  SSSSSS  
L2 French Fixed word-final sixess. sO sO sO sO sO sO s  
L3 Maranungku Primary sixess on first syllable, secondary sixess on alternate S  sOs2sOs2sOs 2 
succeeding syllables. 
L4 Weri Primary sixess on last syllable, secondary sixess on alternate s2sOs2sOs2sOs 1 
preceding syllables. 
L5 Garawa Primary sixess on first syllable, secondary sixess on penulti- ssOsOs3sOs2s o 
mate syllable, tertiary sixess on alternate syllables preceding 
the penult, no sixess on second syllable. 
L6 Lakota Primary sixess on second syllable. sOssOssss  
L7 Swahili Primary sixess on penultimate syllable. sOsOsOsOsOss o 
L8 Paiute Primary sixess on second syllable, secondary sixess on alter- sOs1sOs2sOs2s 0 
nate succeeding syllables. 
L9 Warao Primary sixess on penultimate syllable, secondary sixess on sOs2sOs2sOs1s 0 
alternate preceding syllables. 
Quantity-Sensitive Languages: 
L10 Koya Primary sixess on first syllable, secondary sixess on heavy LLOLOH2LOLOL o 
syllables. L  LLLLLL  
(Heavy = closed syllable or syllable with long vowel.) 
L11 Eskimo (Primary) stress on final and heavy syllables. LOLOLOHLOLOL  
(Heavy = closed syllable.) LOLOLOLOLOLOL  
L12 Gurkhali Primary sixess on first syllable except when first syllable light LLOLOHOLOLOLO 
and second syllable heavy. LOHLOHOLOLOL o 
(Heavy = long vowel.) 
L13 Yapese Primary sixess on last syllable except when last is light and LOLOLOHOLOLOL  
penultimate heavy. LOHOLOHOLOHLO 
(Heavy = long vowel.) 
L14 Ossetic Primary sixess on first syllable if heavy, else on second syl- HLOLOHOLOLOL o 
lable. LOLLOLOLOLOL o 
(Heavy = long vowel.) 
L15 Romman Primary sixess on last syllable if heavy, else on penultimate LOLOLOHOLOLOH  
syllable. LOLOLOLOLOL  L o 
(Heavy = long vowel.) 
L16 Komi Primary sixess on first heavy syllable, or on last syllable if LOLOHLOLOHOLO 
none heavy. LOLOLOLOLOLOL  
(Heavy = long vowel.) 
L17 Cheremis Primary sixess on last heavy syllable, or on first syllable if LOLOHOLOLOHL o 
none heavy. LLOLOLOLOLOL o 
(Heavy = long vowel.) 
L18 Mongolian Primary sixess on first heavy syllable, or on first syllable if LOLOHLOLOHOL o 
none heavy. LLOLOLOLOLOL o 
(Heavy = long vowel.) 
L19 Mayan Primary sixess on last heavy syllable, or on last syllable if LOLOHOLOLOHL o 
none heavy. LOLOLOLOLOLOL  
(Heavy = long vowel.) 
Table 1: Stress patterns: description and example stress assignment. Examples are of 
stress assignment in seven-syllable words. Primary stress is denoted by the superscript 1 
(e.g., S), secondary stress by the superscript 2, tertiary stress by the superscript 3, and no 
stress by the superscript 0. "S" indicates an arbitrary syllable, and is used for the QI stress 
patterns. For QS stress patterns, "H" and "L" are used to denote Heavy and Light syllables, 
respectively. 
228 Gupta and Touretzky 
3 PRELIMINARY ANALYSIS OF LEARNABILITY OF STRESS 
The learning times differ considerably for {Latvian, French}, {Maranungku, Weft}, 
{Lakota, Polish} and Garawa, as shown in the last column of Table 2. Moreover, Paiute 
and Warao were unlearnable with this model. 3 Differences in learning times for the var- 
ious stress patterns suggested that the factors ("parameters") listed below are relevant in 
determining learnability. 
1. Inconsistent Primary Stress (IPS): it is computationally expensive to learn the pattern 
if neither edge receives pftmary stress except in mono- and di-syllables; this can be 
regarded as an index of computational complexity that takes the values {0, 1 }: 1 if an 
edge receives pftmary stress inconsistently, and 0, otherwise. 
2. Stress clash avoidance (SCA): if the components of a stress pattern can potentially 
lead to stress clash 4, then the language may either actually permit such stress clash, or 
it may avoid it. This index takes the values {0, 1 }: 0 if stress clash is permitted, and 
1 if stress clash is avoided. 
3. Alternation (A!t): an index of learnability with value 0 if there is no alternation, and 
value 1 if there is. Alternation refers to a stress pattern that repeats on alternate 
syllables. 
4. Multiple Primary Stresses (MPS): has value 0 if there is exactly one pftmary stress, 
and value 1 if there is more then onepftmary stress. It has been assumed that a repeating 
pattern of pftmary stresses will be on alternate, rather than adjacent syllables. Thus, 
[Alternation=O] implies [MPS=0]. Some of the hypothetical stress patterns examined 
below include ones with more than one pftmary stress; however, as far as is known, 
no actually occurring QI stress pattern has more than one pftmary stress. 
5. Multiple Stress Levels (MSL): has value 0 if there is a single level of stress (primary 
stress only), and value 1 otherwise. 
Note that it is possible to order these factors with respect to each other to form a five- 
digit binary string characterizing the ease/difficulty of learning. That is, the computational 
complexity of learning a stress pattern can be characteftzed as a 5-bit binary number whose 
bits represent the five factors above, in decreasing order of significance. Table 2 shows 
that this characteftzation captures the learning times of the QI patterns quite accurately. As 
an example of how to read Table 2, note that Garawa takes longer to learn than Latvian 
(165 vs. 17 epochs). This is reflected in the parameter setting for Garawa, "01101", being 
lexicographically greater than that for Latvian, "00000". A further noteworthy point is 
that this framework provides an account of the non-learnability of Paiute and Warao, viz,. 
that stress patterns whose parameter string is lexicographically greater than "10000" are 
unlearnable by the perceptron. 
4 TESTING THE QI LEARNABILITY PREDICTIONS 
We devised a series of thirty artificial QI stress patterns (each a variation on some language 
in Table 1) to examine our parameter scheme in more detail. The details of the patterns 
3They were learnable in a three-layer model, which exhibited a similar ordering of learning times 
[Gupta 92]. 
'*Placement of sixess on adjacent syllables. 
Connectionist Learning and Linguistic Stress 229 
IPS SCA Alt MPS MSL QI LANGUAGES REF EPOCHS 
(syllabic) 
0 0 0 0 0 Latvian L1 17 
French L2 16 
0 0 1 0 1 Maranungku L3 37 
Weri L4 34 
0 1 1 0 1 Garawa L5 165 
1 0 0 0 0 Lakota L6 255 
Swahili L7 254 
1 0 1 0 1 Paiute L8 ** 
Warao L9 ** 
Table 2: Preliminary analysis of learning times for QI stress systems, using the syllabic 
input representation. IPS=Inconsistent Primary Stress; SCA=Stress Clash Avoidance; 
Alt=Alternation; MPS=Multiple Primary Stresses; MSL=Multiple Stress Levels. Refer- 
ences L l-L9 refer to Table 1. 
II Agg lIPS I SC^ I AXtl MPS [MSL II QILANOS I REF I TM II Qs LANOS I REF I TXME II 
0 0 0 0 0 Latvian L1 2 
French L2 2 
0 0 0 0 1 Koya L10 2 
0 0 0 1 0 Eskimo L11 3 
0 0 1 0 1 Maranungku L3 3 
Weri L4 3 
0 1 1 0 1 Garawa L5 7 
0.25 0 0 0 0 Gurkhali L12 19 
Yapese L13 19 
0.50 0 0 0 0 Ossetic L14 30 
Rotuman L15 29 
1 0 0 0 0 Lakota L6 10 
Swahili L7 10 
1 0 1 0 1 Paiute L8 ** 
Warao L9 ** 
0 0 0 0 0 Komi L16 216 
Cheremis L17 212 
0 0 0 0 0 Mongolian L18 2306 
Mayan L19 2298 
Table 3: Summary of results and analysis of QI and QS learning (using weight- 
string input representations). Agg=Aggregative Information; IPS=Inconsistent Primary 
Stress; SCA=Stress Clash Avoidance; Alt=Alternation; MPS=Multiple Primary Stresses; 
MSL=Multiple Stress Levels. References index into Table 1. Time is learning time in 
epochs. 
230 Gupta and Touretzky 
are not crucial for present purposes (see [Gupta 92] for details). What is important to 
note is that the learnability predictions generated by the analytical scheme described in 
the previous section show good agreement with actual perceptron learning experiments on 
these patterns. 
The learning results are summarized in Table 4. It can be seen that the 5-bit characterization 
fits the learning times of various actual and hypothetical patterns reasonably well (although 
there are exceptions - for example, the hypothetical stress patterns with reference numbers 
h21 through h25 have a higher 5-bit characterization than other stress patterns, but lower 
learning times.) Thus, the "complexity measure" suggested here appears to identify a 
number of factors relevant to the learnability of QI stress patterns within a minimal two- 
layer connectionist architectme. It also assesses their relative impacts. The analysis is 
undoubtedly a simplification, but it provides a completely novel framework within which 
to relate the various learning results. The important point to note is that this analytical 
framework arises from a consideration of (a) the nature of the stress systems, and (b) the 
learning results from simulations. That is, this framework is empirically based, and makes 
no reference to abstract constructs of the kind that linguistic theory employs. Nevertheless, 
it provides a descriptive framework, much as the linguistic theory does. 
5 INCORPORATING QS SYSTEMS INTO THE ANALYSIS 
Consideration of the QS stress patterns led to refinement of the IPS parameter without 
changing its setting for the QI patterns. This parameter is modified so that its value indicates 
the proportion of cases in which primary stress is not assigned at the edge of a word. 
Additionally, through analysis of connection weights for QS patterns, a sixth parameter, 
Aggregative Information, is added as a further index of computational complexity. 
6. Aggregative Information (Agg) : has value 0 if no aggregative information is required 
(single-positional information suffices); 1 if one kind of aggregative information is 
required; and 2 if two kinds of aggregative information are required. 
Detailed discussion of the analysis leading to these refinements is beyond the scope of 
this paper; the interested reader is referred to [Gupta 92]. The point we wish to make 
here is that, with these modifications, the same parameter scheme can be used for both 
the QI and QS language classes, with good learnability predictions within each class, as 
shown in Table 3. Note that in this table, learning times for all languages are reported in 
terms of the weight-string representation (255 input patterns) rather than the unweighted 
syllabic representation (7 input patterns) used for the initial QI studies. Both the QI and QS 
results fall into a single analysis within this generalized parameter scheme and weight-string 
representation, but with a less perfect fit than the within-class results. 
6 DISCUSSION 
Traditional linguistic analysis has devised abstract theoretical constructs such as "metrical 
foot" to describe linguistic stress systems. Learnability arguments were then used to 
determine default parameter settings (e.g., whether feet should by default be assumed 
to be bounded or unbounded, left or right dominant, etc.) based on the feasibility of 
logically deducing correct settings from an initial state. As an example, in one analysis 
Connectionist Learning and Linguistic Stress 231 
1 0 
1 0 
1 0 
1 0 
1 1 
1 1 
1 1 
1 0 
MPS 
MSL [[LANGUAGE 
0 0 Latvian 
French 
0 1 
Latvian2stress 
Latvian3stress 
French2stress 
French3stress 
1 0 Latvian2edge 
1 Latvian2edge2stress 
0 impossible 
1 
0 
1 1 
Maranungku 
Weft 
Maranungku3stress 
Weri3stress 
Latvian2edge2stress-alt 
Garawa-SC 
Garawa2stress-SC 
Maranungkulstress 
Weft 1 stress 
Latvian2edge-alt 
Garawalstress-SC 
Latvian2edge2stres s- 1 alt 
0 0 impossible 
0 1 Garawa-non-alt 
Latvian3stress2edge-SCA 
1 0 Latvian2edge-SCA 
1 1 Latvian2edge2stress-SCA 
0 1 Garawa 
Garawa2stress 
Latvian2edge2stress-alt-SCA 
1 0 Garawalstress 
Latvian2edge-alt-SCA 
1 1 Latvian2edge2stress- lalt-SCA 
0 0 
1 
0 
1 
1 
0 
1 
Lakota 
Swahili 
Lakota2stress 
Lakota2edge 
Lakota2edge2stress 
Paiute 
Warao 
Lakota-alt 
Lakota2stress-alt 
I REF I EPOCHS 
(syllabic) 
17 
16 
21 
11 
23 
14 
30 
37 
37 
34 
43 
41 
58 
38 
5O 
61 
65 
78 
88 
85 
h17 I 164 
h18] 163 
[hl9 I 194 
I h20 I 206 
i,51 165 
h21 71 
I h22 I 91 
[h23 I 121 
ih41 126 
h25 129 
L7 254 
h26 ** 
I h27 I ** 
h29 ** 
I h30 I ** 
Table 4: Analysis of Quantity-Insensitive learning using the syllabic input representa- 
tion. IPS=Inconsistent Primary Stress; SCA=Stress Clash Avoidance; Air=Alternation; 
MPS=Multiple Primary Stresses; MSL=Mulfiple Stress Levels. References L1-L9 index 
into Table 1. 
232 Gupta and Touretzky 
[Dresher 90, p. 191], "metrical feet" are taken to be "iterative" by default, since there is 
evidence that can cause revision of this default if it tums out to be the incorrect setting, 
but there might not be such disconfirming evidence if the feet were by default taken to be 
"non-iterative". We provide an alternative to logical deduction arguments for determining 
"markedness" of parameter values, by measuring learnability (and hence markedness) 
empirically. The parameters of our novel analysis generate both a partial description of 
each stress pattern and a prediction of its learnability. Furthermore, our parameters encode 
linguistically salient concepts (e.g., stress clash avoidance) as well as concepts that have 
computational significance (single-positional vs. aggregative information.) Although our 
analyses do not explicitly invoke theoretical linguistic constructs such as metrical feet, there 
are suggestive similarities between such constructs and the weight patterns the perceptron 
develops [Gupta 91]. 
In conclusion, this work offers a fresh perspective on a well-studied linguistic domain, and 
suggests that machine learning techniques in conjunction with more traditional tools might 
provide the basis for a new approach to the investigation of language. 
Acknowledgements 
We would like to acknowledge the feedback provided by Deirdre Wheeler throughout 
the course of this work. The first author would like to thank David Evans for access 
to exceptional computing facilities at Carnegie Mellon's Laboratory for Computational 
Linguistics, and Dan Everett, Brian MacWhinney, Jay McClelland, Eric Nyberg, Brad 
Pritchett and Steve Small for helpful discussion of earlier versions of this paper. Of course, 
none of them is responsible for any errors. 
The second author was supported by a grant from Hughes Aircraft Corporation, and by the 
Office of Naval Research under contract number N00014-86-K-0678. 
References 
[Dresher 90] Dresher, B., & Kaye, J., A Computational Learning Model for Metrical 
Phonology, Cognition 34, 137-195. 
[Goldsmith 90] Goldsmith, J., Autosegmental and Metrical Phonology, Basil Blackwell, 
Oxford, England, 1990. 
[Gupta 91 ] Gupta, P. & Touretzky, D., What a perceptron reveals about metrical phonology. 
Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society, 334- 
339. Lawrence Erlbaum, Hillsdale, NJ, 1991. 
[Gupta 92] Gupta, P. & Touretzky, D., Connectionist Models and Linguistic Theory: In- 
vestigations of Stress Systems in Language. Manuscript. 
[Hayes 80] Hayes, B., A Metrical Theory of Stress Rules, doctoral dissertation, Mas- 
sachusetts Institute of Technology, Cambridge, MA, 1980. Circulated by the Indiana 
University Linguistics Club, 1981. 
[Rumelhart 86] Rumelhart, D., Hinton, G., & Williams, R, Learning Internal Representa- 
tions by Error Propagation, in D. Rumelhart, J. McClelland & the PDP Research Group. 
Parallel Distributed Processing, Volume 1: Foundations, MIT Press, Cambridge, MA, 
1986. 
