Undebunking the phoneme

Un-debunking the phoneme: How the space character and the Roman alphabet led phonology astray.
Thomas M. Eccardt

Abstract:

In phonology, the phoneme used to be the essential unit that signaled and differentiated all the morphemes in a spoken human language. But then the space character essentially was renamed “juncture” and became a phoneme itself. The space occurs so frequently that it provides mutually exclusive environments where there really are none in connected speech. This led to two crises in phonemics. First, phonetically dissimilar phonemes such as /h/ and /ng/ seemed to be in complementary distribution. Second, Yuan-ren Chao's analysis of Mandarin Chinese seemed to show that more than two sounds could be in complementary distribution, and if they were phonetically dissimilar, combining them into one phoneme was an arbitrary choice. This paper will show that the space character does not behave like a letter in actual texts, turning these crises into false alarms.

The Roman alphabet perhaps was adequate for Latin, but it has no characters for aspirated phonemes. A careful analysis of English will show aspiration to be a phonemic rather than allophonic distinction. This paper reanalyzes aspirated /t/, unaspirated /t/, and /d/ into /d'/, /'d/ and /d/ respectively, where the apostrophe (') stands for a voice interruption. This transcription properly accounts for the phonemic distinction that can be found between “night rates” /nai'drei'dz/ vs. “nitrates” /naid'rei'dz/, without the use of a juncture phoneme. Furthermore, it vitiates the archiphoneme /S/ for the English plural, making it a regular /z/ phoneme “allophonically” devoiced by its proximity to the apostrophe phoneme. Finally, it accounts for such “oronyms” (homonymic strings) as “We backed Ann” and “We back Dan” with the simple concatenation of the individually phonemicized morphemes. No transformations necessary.

Un-debunking the phoneme: How the space character and the Roman alphabet led phonology astray.

Thomas M. Eccardt

Text:

In the 1960's, it became possible to measure the neutrino output of the sun. To the surprise of astronomers, there were far fewer neutrinos arriving from the sun than had been predicted by the theory that the sun produces its energy through a nuclear fusion reaction as in the hydrogen bomb. Astronomers were faced with a choice: either reject the thermonuclear reaction theory or modify it somehow. Nobody knew how else the sun could be producing so much energy – no ordinary chemical reaction could explain it – and nobody knew how to fix the thermonuclear neutrino output prediction, either. The dilemma became known as the solar neutrino problem. Finally in 2002, the problem was solved: the sun was producing the right amount of neutrinos, but many of them were being transformed into two other types of neutrino that were not previously detectable. Nowadays the quantity of each of the three types of neutrinos matches up nicely with predictions from atomic physics.

/h/ vs /ng

In the 1930's linguists discovered problems with the phonemic principle, which states that words (actually, morphemes) are made up of strings of sounds, known as phonemes, none of which sounds appear in a mutually exclusive environments. But it was noticed that the /h/ and /ng/ sounds never appear in the same environments. /h/ appears at the beginning and middle of morphemes as in “head” and “ahead”, and /ng/ appears only at the end of morphemes, as in “sing”. Nobody liked the idea of combining /h/ and /ng/ into one phoneme, because they sound so different. Those linguists who wanted to save the phoneme proposed a requirement that in addition to appearing in at least one shared environment, all variations (allophones) of a given phoneme must be phonetically similar. The problem with that solution was that there would now be some phonemes which did not serve to differentiate morphemes, because every /h/ could theoretically be replaced by an /ng/ or vice versa, without changing the resulting message.

Linguists took a different approach than astronomers. Most simply abandoned the phonemic principle, and went on a course of pure description of language. They turned their back on the vital function of the phoneme, namely, to differentiate a limitless vocabulary of morphemes through a limited number of sounds. If you accepted the revised definition of the phoneme, then some phonemes did not serve to differentiate words, but apparently fulfilled the same function as allophones, which is to add redundancy to speech. If there was no strict functional difference between phonemes and allophones, why bother to differentiate between them? On the other hand, most linguists tacitly still recognize that somehow a huge vocabulary is signaled by a small number of sounds or “features”. Astronomers did not abandon the principle of nuclear fusion, since they believed the physicists who told them that such a huge amount of energy coming from the sun must be a nuclear reaction. But linguists decided to ignore coding theory and human behavior, which argued against a functionless phoneme. Nowadays phonology seeks to describe every pattern detectable in morphology, whether or not it has a function, whether it is a contemporary productive process or merely a historical vestige, and no matter how complicated this makes any model of production and comprehension of speech. Functions for the many new posited units are not sought.

It turns out that in reality /h/ and /ng/ do not have totally exclusive distributions. There is one word, “gingam,” not divisible into morphemes, which contains an /ng/ internally. Still, it seems like a remarkable coincidence for them to come so close to mutual exclusivity, and this leads us to believe that some language might exist allowing mutually exclusive phonemes. Then again, how does it happen that human languages, which typically contain a score of different phonemes, choose to arrange those phonemes in such restrictive environments as to make some phonemes very nearly redundant? The answer can be discovered in the same way that astronomers solved the solar neutrino problem. Instead of cutting off relations with nuclear physicists, astronomers looked at the problem more closely, and discovered that there were more than one kind of neutrino coming from the sun. Linguists do not have to abandon the phonemic principle, they need only correct it.

If “gingham” is the only English word that contains an /ng/ in an environment where an /h/ could be found, it is not the only message that contains such an environment. The expressions “raw hair” and “wrong air” phonetically differ only in their use of /h/ versus /ng/. If you are fooled by the written transcription and you count the space character as a sound, then the environments do not appear identical. But if you realize that spaces are not pronounced, then you probably see that the environments are the same, except when a user intentionally pauses between the words, say, by inserting a glottal stop at the beginning of the word “air,” for example. But in normal speech, morpheme boundaries are not pronounced, and they ought not count as part of the environment surrounding a phoneme in question.

Oronyms

You may protest that even though languages rarely modify pronunciation at morpheme boundaries,those boundaries are predictable, so they can count as phonetic environments. Well, unlike phonemes, boundaries are not pronounced, but like phonemes, boundaries are not completely predictable, either phonetically or through semantic cues. There are many so-called oronyms, such as "knows ink" and "no zinc", which are phonetically the same, but differ by a morpheme boundary. These two expressions are completely homonymous, unless you unnaturally pause between them. In fact, you can make a sentence oronym out of them: “rob no zinc from foreign lands” vs. “Rob knows ink from foreign lands.”

The whole point is this: the morpheme boundary per se should not count as part of the environment that a phoneme finds itself in. In phonemic analysis, there are actually two distinct morpheme boundaries to be considered: initial boundaries and final boundaries. Initial boundaries represent all the phonemes that any previous morpheme may end in, final boundaries represent all the phonemes that any following morpheme may start with. Each type also includes its corresponding phrase boundary, because not every phoneme may be eligible to start an isolated phrase or to end one.

In terms of scientific investigation, this revised use of the morpheme boundary corresponds to the discovery that there are three types of neutrinos coming from the sun's nuclear fusion process. We should not abandon the phoneme any more than astronomers should have abandoned nuclear physics. When an apparent exception is found to a successful scientific theory, in most cases, the theory should not be abandoned, but the exception should be explained, possibly by a modification of the theory. Of course, this goes for successful theories only. In cases of theories that are unsuccessful at explaining anything useful, such as astrology or morphophonemics, sometimes the theory should be overturned. Unfortunately, the publicity rewards for overturning a theory are great, while the rewards for explaining an exception are usually small.

It often helps to examine how an apparent exception arose. In the solar neutrino problem,

astronomers had no examples of the transformation of neutrinos, so they did not consider this a possibility. In the case of the phonemic principle, linguists did not not consider oronyms. Furthermore, the morpheme boundary is a theoretic entity that occurs so often that restricting a particular phoneme to one side of it will not drastically reduce the frequency of this phoneme to near zero. Compare the letter Q in English, which has a very low frequency, partly because it's restricted to occurring before U. Since the average morpheme token is about four characters long, the morpheme boundary occupies 20% of any text, making it more frequent than any true phoneme. So, /ng/ is not an impossibly rare sound in English, even though it is essentially restricted to the ends of morphemes. When the word boundary is treated like a phoneme, it is not surprising that it can provide mutually exclusive environments. But instead of being made redundant by silent morpheme boundaries, /ng/ almost always signals a morpheme boundary.

So far this introductory discussion has been mostly about just one of the orientations that this study will follow, namely, that not every apparent exception to a scientific “law” overturns a useful theory. Another orientation is the bringing of theory closer to observable facts. As mentioned above, we recognized only observable phonemes as phonetic environments. We used speech in context, not isolated morphemes. Later we will account for aspiration directly as a consequence of phoneme sequences, not as a transformation. And we will solicit the opinions of other people as to what strings are truly homonymous, rather than relying entirely on our own judgments.

We also have a functional orientation. We are interested in phonemes, which serve the purpose of signalling and distinguishing morphemes. Other entities like syllables and their boundaries, morphophonemes, archiphonemes, and secondary stress will not be posited, since they have no function, although we might try to explain their apparitions. We study only phonemes, which combine with each other serially to form morphemes. Other observable phenomena such as intonation and loudness do not combine to form morphemes, so they are not studied here. The latter two may be linguistic signs, worthy of study, but they are not phonological. Non-functional historical processes like English vowel length will also be ignored.

As with most studies, we attempt to find the simplest explanation. In this way, we will be able to give encoding and decoding equal importance by avoiding many transformations. Under the traditional phonemicization of English, some strings that ought to be oronyms turn out not to be homonymous. This has been blamed on allophonic variation, but we will attempt to explain it in the normal way, directly through phonemic differences. As already implied, we avoid positing new entities whenever we can. Unlike most other studies, however, we strive to reduce the number of entities by explaining away entities posited by other theories. What have been called archiphonemes will be shown to be the direct result of certain serial arrangements of phonemes.

Another orientation regards the use of mathematics and logic. Normal science uses mathematics to measure and explain observed phenomena. Rarely does a scientist invent a new kind of mathematics in order to fit a theory, because that tends to explain little. Data are normally fitted to mathematical models in order for other people who already understand the math to see the analogy to other phenomena fitting the same mathematical model. Many linguists now recognize that Chomsky normal form plus transformations did little to explain grammar, because it can model almost any serial phenomenon. But it also had difficulty explaining grammar because it hadn't been applied to anything before, and there was nothing one could compare transformational grammar to. We will attempt to use a bit of already-existing coding or information theory to explain aspects of phonology, rather than inventing more math.

There are two other unconventional orientations that will guide us. Firstly, we consider the phoneme to be an articulatory gesture, a goal-oriented movement of the vocal apparatus, detectable by the sound produced, rather than consisting of the sound itself. If we use the word phoneme, it is only as a term of art – we really mean articulatory gesture. Secondly, we will strive to avoid the influence of the Roman alphabet on our phonemicizations. Not only will we reject the space character as a phonological unit, but we will not be wedded to the combinations of articulatory gestures that some of our alphabetic letters represent. Instead of combining a /k/ gesture and a voicing gesture into a Roman letter G, we might separate the two, as does the Japanese Kana alphabet.

The Phonemic Membership Dilemma

Now that we have finished listing our orientations, let us return to the supposed defects in the phonemic principle. Yuan-ren Chao says that there can be more than one valid phonemicization of a given language, and whether or not this is true, I don't think this undermines the phoneme. It still has the same vital function. However, at least one of Chao's conundrums can be resolved by banning the use of morpheme boundaries as phonetic environments.

In his famous paper, Chao presents the Mandarin syllable initial consonants plus the syllable medial semi-vowels that can follow them. Three rows of consonants cannot appear before a palatal medial, and one row must appear before a palatal medial. This seems like a mutually exclusive environment which would normally cause the phonemicist to combine what might have been two different consonants into one phoneme. One problem is that none of the rows of consonants are phonetically similar. A more serious dilemma, though, is deciding which of the three rows that can't be found before the palatal should be combined with the one that always does. In other words, even if we are willing to combine phonetically dissimilar mutually exclusive phonemes, sometimes it seems to be impossible to decide which ones to combine. We'll call this the phonemic membership dilemma, because we cannot decide which sound is a member of which phoneme.

Initials	Palatal Medials			Non-palatal medials
apicals	*zia	*cia	*sia	za	ca	sa
velars	*gia	*kia	*hia	ga	ka	ga
retroflexes	*zhia	*chia	*shia	zha	cha	sha
palatals	jia	qia	xia	*ja	*qa	*xa

THE PHONEMIC MEMBERSHIP DILEMMA (*denotes impossible combination)

In this particular case, we have instead another example of excessive mutually exclusive environments created by the morpheme boundary. Actually it's considered a syllable boundary, but every syllable in Mandarin is a morpheme, even if it's a bit archaic like the “con-” in “con-sent,” “con-tain,” “con-vince” etc. I personally think that the palatal medial is not a sequential phoneme at all, but should be combined as a simultaneous feature of the initial consonant, as it is in Russian. I have other reasons to believe this, and incidentally this would solve the phonemic membership dilemma. However, there are some already-existing phonemicizations, like the Bopomofo alphabet system, that also resolve the dilemma, if only the morpheme boundary is not counted as a phoneme. In Bopomofo, some syllables are transcribed as an initial consonant alone, even though a vowel-like sound follows. This vowel-like sound is just a homorganic extension of the initial consonant, and it is exactly like the medial that would follow in other transcriptions. Two of the three rows of initial consonants that cannot appear before the palatal medial need no medial, and can constitute an entire morpheme. That means they appear finally, or just before a morpheme boundary. Take away this boundary, and these two rows of consonants do appear before palatals. In the diagram below, I give an example of a syllable with a palatal initial, namely, “yao,” which could appear directly after an apical or retroflex initial, if the morpheme boundary is not counted. That is, they can now appear directly before other morphemes that begin with palatals. Then we can assign the remaining row, the velar consonants, as the mutually exclusive partner of the palatal consonants, because it does not appear finally. We are obliged to combine the palatals with the velar initials as a single phoneme, because there is no other mutually exclusive environment.

bopomofo syllable	pinyin syllable (requires some vowel)	bopomofo syllable + a palatal initial syllable	pinyin syllable + a palatal initial syllable
ㄗ	z(i)	ㄗ+ㄧㄠ	z(i)+yao
ㄓ	zh(i)	ㄓ+ㄧㄠ	zh(i)+yao
ㄍㄜ	ge	ㄍㄜ+ㄧㄠ	ge+yao
ㄐㄧ	ji	ㄐㄧ+ㄧㄠ	ji+yao

BOPOMOFO MINUS MORPHEME-BOUNDARY SOLUTION TO DILEMMA

(shaded area indicates initials that are to be combined into one phoneme)

I personally do not advocate this particular phonemicization of Mandarin, because for one thing, palatals and velars are not phonetically similar enough to be counted as the same phoneme. More on this later. But the phoneme membership dilemma is just another example of the spurious problems in the phonemic principle created by counting the morpheme boundary as a phoneme. Once again, we can blame it on the unusually high frequency of the morpheme boundary, which artificially creates many mutually exclusive environments, which do not exist in connected speech.

Nitrates, night rates, hydrates, etc.

From personal experience, I find that connected speech is truly connected, and the languages I know do not pronounce any boundaries. In Dutch, for example, all stop or spirant consonant clusters are pronounced either completely voiced or completely unvoiced, despite the spelling, whether or not a word or morpheme boundary intervenes. But what about the various kinds of so-called “juncture” that have been posited for English? One of the most familiar examples is found in “nitrates” versus “night rates.” Most phonemic transcriptions would write these as oronyms, were it not for the “juncture phoneme” between “night” and “rates.” At least in my idiolect, no matter how fast I pronounce “night rates,” it always sounds different from “nitrates.” Now, if a so-called “open juncture” is audible, then perhaps it should be counted as a phoneme, and this is exactly what Charles Hockett does. Then we are back at square one, and /h/ and /ng/ appear in very nearly mutually exclusive environments, and it's impossible to tell which other mutually exclusive phoneme the Mandarin palatal consonant should be merged with.

But long ago I noticed that the difference in pronunciation is really due to the aspiration in the second syllable of “nitrates.” According to current theory, this is caused by the secondary stress in the “traits” syllable of “nitrates.” At least here, open juncture is not making the difference between a potential pair of oronyms. Current phonological theory says that the phoneme /t/ sometimes is aspirated, sometimes is not. Aspiration happens word-initially or in primary or secondarily stressed syllables. There is a long tradition of calling aspirated /t/ and non-aspirated /t/ “allophones” of the same phoneme. In fact, aspiration is a favorite illustrative example of the allophone, of how a phoneme automatically changes its pronunciation in different contexts. But I know of no true automatic alternation between aspirated and non-aspirated phonemes in English.

The /k/ sound in “construction” is aspirated, supposedly because it is word-initial. But is this /k/ de-aspirated in the word “reconstruction?” I don't think so. Here, the “con” syllable it is no longer initial. Nor is it secondarily stressed. Yet it is still aspirated. According to my dictionary, it is the “re-” which is secondarily stressed. And this is the key: “re-” is recognized as a productive prefix to the stem “construction,” so for the users of English, “construction” needs to be recognizable as having the same sound and meaning whether it is prefixed with “re-” or not. Same with “conclusive” and “inconclusive.” So the aspiration must not change. Compare this with the “re-” in “reconcile” where “concile” is not a word. The “con” in not aspirated either in “reconcile” or “reconciliation,” where it might be considered secondarily stressed. Of course there are examples like “Plato” and “Platonic” where some people may not aspirate the /t/ in the name, but do aspirate the /t/ in the adjective. But this is no automatic change. The /n/ in the word “Platonic” is in no way derivable from English “Plato” as it might have been in Latin. So this is not an example of an automatic allophonic variation of the phoneme /t/, but an example of two words that came into the English language at different times, and perhaps even from different sources.

For a better phonemic solution, we need to look closer at the /t/ in both “night rates” and “nitrates.” At the end of the word “night,” the vocal cords stop vibrating just before the tongue reaches the roof of the mouth for the /t/, at least in American English. It may be easier to see this if you pronounce the word in isolation, and avoid releasing (exploding) the /t/. In contrast, in a word like “hide,” with a final /d/, the vocal cords stop only after the closure, building up a little air pressure. In a context like “hide it,” there is no voice interruption at all, so we can attribute the final halting of the voice in the isolated word “hide” to the end of the phrase, not to the word itself. In sum, the essential difference between a final /t/ and a final /d/ is the pause in the voice that happens before the /t/, but not before the /d/. And so they are both called apical stops.

Now let us look closer at the word “nitrates”. There is an interruption of the voice not before, but after the /t/ this time. This gives the impression of “aspiration,” which is similar to – but not the same as – the sound of English /h/. It seems there are three ways to pronounce an apical stop, not just two. One can have no voice interruption at all, another can place a voice interruption before the stop, and a third places the voice interruption after the stop. Abandoning the letter /t/ of the Roman alphabet, and using the apostrophe to indicate voice interruption, we can transcribe (the first part) of “hydrates” as /haidrei.../, “night rates” as /nai'drei.../ and “nitrates” as /naid'rei.../. Then we need no open juncture to prevent “night rates” and “nitrates” from being oronyms. Transcribing “nitrates” with an apostrophe between the D and the R (/naid'rei.../) also gives a graphic explanation of why the /r/ in this word is slightly unvoiced, or “whispered,” yet is fully voiced in “night rates.” Notice that “hydrates” is an oronym with “hide rates,” and the reason is that there is no voice interruption anywhere to distinguish them. There is no audible “juncture” phoneme. Be careful not to pause between the morphemes and create an unnatural distinction that would not occur in connected speech!

Is this a fudge concocted to discredit English juncture? Well, first of all, the new transcription follows the actual phonetics more closely than the old one. Secondly, it eliminates a supposed automatic allophonic variation, and thirdly it eliminates the need for secondary stress as an imperfect explanation of aspiration. What we are saying essentially is that the difference between what the Roman alphabet transcribes as two different letters, namely T and D, is due to an unnoticed phoneme, namely, voice interruption, transcribed as an apostrophe, which optionally occurs between the consonant and the vowel. At the beginning of words, the apostrophe optionally occurs after the initial consonant: “tray” /d'rei/. At the end of words, the apostrophe optionally occurs before the final consonant: “rate” /rei'd/. The apostrophe never occurs initially or finally, because it's too hard to hear when the word is pronounced after or before a pause.

If there are three possible realizations of a stop consonant next to a morpheme boundary, why do there seem to be only two realizations in the middle of a morpheme? The space character substitutes for the supposed juncture, pronounced at times as “aspiration”, but how is it that the Roman alphabet suits English so well in the middle of morphemes? One reason is that most polysyllabic morphemes are borrowings from Latin, where there presumably was no aspiration, and in most cases, English speakers conventionally aspirated the longer syllables which resembled single-syllable Anglo-Saxon morphemes. These are the syllables with so-called secondary accent. Of course there could be no contrasting aspirated and non-aspirated pairs coming from Latin. Polysyllabic Anglo-Saxon morphemes tended to have at least one schwa syllable, whose initial consonant was never aspirated: “bacon, buckle, tinker, sample, Scranton,” etc. Compound Anglo-Saxon nouns simply retain the aspiration status of both morphemes: “tank-top, mail-drop, drop-box”, etc. So because there were no polysyllabic morphemes with medial aspiration in Anglo-Saxon, the Roman alphabet was adequate even before the wave of Latin words arrived. In fact, since aspiration was probably restricted to the beginnings of morphemes, it might have come close to being a juncture phoneme, if only for morphemes that begin with stops.

Phonemic aspiration

But there are exceptions to the rule in modern English that says “do not aspirate a completely unaccented syllable.” A single unstressed O is not considered long enough to be counted even as a secondarily stressed syllable, yet there are a few unanalyzable morphemes that have a mysterious aspiration there. The medial consonants in “Hypo”, “gecko,” and “bunco” are not aspirated, but Madison Avenue has taught us to aspirate “Maypo” and “Remco”. But if it weren't for recorded sound, the intended pronunciation of these brand names would never have caught on, since the Roman alphabet cannot transcribe aspiration. Secondary stress has always been controversial in English language linguistics because it's so hard to hear, and it doesn't seem a good way to predict aspiration. We choose to transcribe the aspiration directly. I believe that people more often detect a secondary stress through aspiration rather than produce an aspiration as a result of an inherently marked secondary stress.

Medial consonant (represented by d)	/d/	/b/	/g/
/d/ (voiced)	Hydrate /haidrei'd/	Mabel /meibəl/	Rego /rigou/
/d'/ (aspiration)	Nitrate /naid'rei'd/	Maypo /meib'ou/	Remco /remg'ou/
/'d/ (voiceless)	Night rate /nai'drei'd/	Hypo /hai'bou/	Gecko /ge'gou/

But do we have the right to recognize devoicing as a phoneme, just because it can appear on either side of some consonants? Of course we do, even if it appears nowhere else. All phonemes have some restrictions on their distributions, these are called phonotactics. The essence of the phoneme is to differentiate an unlimited number of morphemes simply by its serial order. If an articulatory gesture can be recognized as appearing before or after another one, it is a phoneme, no matter how much overlap there is between them in the speech chain. As long as sound gesture can disambiguate any morphemes, it is a phoneme, otherwise it's an “allophone.” Incidentally, a devoicing phoneme is not much different from the well-recognized /h/ phoneme: neither has an upper vocal tract place of articulation, and both relate to the vocal cords only.

A new transcription

Speaking of overlap, many linguists have wondered at a different kind of overlap between voiced and unvoiced consonants in English, due to “positional variation.” For example, a supposedly voiced consonant like /g/ becomes completely devoiced after an non-voiced consonant. The /g/ in “that guy” sounds just like the /k/ in “baking,” since both are unvoiced and unaspirated. Compare “bad guy,” where the /g/ is fully voiced. In our new transcription, they both are simply a /g/ preceded by an apostrophe, a pause in voicing, except that in “that guy” the apostrophe occurs before the entire consonant cluster /dg/: “that guy” /đæ'dgai/ “baking” /bei'giŋ/. The apostrophe devoices the entire consonant cluster here and elsewhere.

So the unvoiced stop consonants P, T, CH, K are to be transcribed as B, D, J, G with a leading or following apostrophe, representing a pause in the voice, and possibly “aspiration”. This reduces the phoneme inventory of English by a count of four unvoiced Roman letters, but increases it by introducing one voicing pause apostrophe. In order to explain more oronyms, we need to reduce the inventory further, by eliminating unvoiced spirants as well, and here we can re-use the already-posited apostrophe. Hockett gives the example of a potential oronym that he disambiguates by a juncture: “it swings” vs. “its wings.” We will examine the slightly simpler example of “race wings” vs. “Ray swings”. We can transcribe “race wings” as /rei'zwIŋz/ and “Ray swings” as /reiz'wIŋz/. Although it is less audible than the ones in stop consonants, I believe that you can detect an aspiration in “swings”, if necessary, by putting your hand over your mouth and feeling the extra air flow. The aspiration in “swings” is not very audible, probably because much of its air flow is exhausted by the excessive air flow in the /s/. But another thing that contributes to distinguishing these potential oronyms occurs when the voicing pause is moved backwards from the word “swings” into the word “race”. The vowel in “race” then becomes a so-called “clipped vowel,” shorter than the vowel in “Ray,” due to its following apostrophe. So now F, TH, S, and SH are to be transcribed as V, DH, Z, and ZH, respectively, with a leading or following apostrophe, representing a pause in the voice, and possibly “aspiration”.

Simplification of archiphonemes

We demonstrated above how we can simply append our new transcription of the word “guy” to the transcription of the word “that,” and account for the devoicing of the g in the phrase “that guy” The apostrophe devoices both the final D in “that” and the initial G in “guy”: “that guy” /đæ'dgai/. By splitting the stops and spirants off from the devoicing phoneme, we also solve the conundrum of the English archiphoneme. The plural (and an homonymic verb ending) in English is always spelled with a Roman letter -S, even though it is noncommittal with reference to voicing. Same for the past tense ending, spelled -ED: it, too, follows the voicing or non-voicing of the phoneme just before it. Traditionally, these are called archiphonemes, and they seem to be an unusual example of a morpheme that consists of less than one phoneme. In our transcription, there are no archiphonemes. The plural and past tense markers are simply the sounds /z/ and /d/, respectively, and they become devoiced just as they always do when preceded by the apostrophe, the devoicing phoneme: “rat” /ræ'd/ + -S /s/ = “rats” /ræ'ds/. But in this case, the apostrophe comes from the word to which they are attached. No neutralization, no transformation is necessary.

Word in Eng. spelling	New transcription	Eng. Spelling of word with -S suffix	New transcription + -z any apostrophe devoices entire cons. cluster	Old transcription plus -s or -z phoneme as “realization of archiphoneme”	Eng. Spelling of word with -ed suffix	New transcription + -d any apostrophe devoices entire cons. cluster	Old transcription plus -t or -d phoneme as “realization of archiphoneme”
rack	ræ'g	racks	ræ'gz	ræks	racked	ræ'gd	rækt
bag	bæg	bags	bægz	bægz	bagged	bægd	bægd
ref	re'v	refs	re'vz	refs	reffed	re'vd	reft
rev	rev	revs	revz	revz	revved	revd	revd

How the voice interruption phoneme simplifies archiphonemes. Note the great variety of symbols at the end of the morphemes under the old transcription.

Some examples...

There is a small difference between the behaviors of the voiced stops compared with the spirants. We transcribe the voiced stops as usual, namely, B, D, J and G. When preceded by a word with a devoiced sound, these initials can loose their voiced sound, yet the voicing is clear when they are preceded by voiced sounds. For example, “that guy” /đæ'dgai/ “my guy” /maigai/. In our transcription, the apostrophe devoices both the /d/ and the /g/ in “that guy”. Now, when a voiced stop begins a whole phrase, voicing doesn't usually start until the immediately following vowel, so they too sound like voiceless stops. When it is said in isolation, the G in “guy!” sounds much like the K in “raking.” We could almost be tempted to transcribe an apostrophe at the beginning of every phrase that begins with a stop /'gai/ (compare: /rei'giŋ/). English apparently prefers not to build up pressure at the start of a phrase. Be that as it may, voiced phrase-initial spirants do not behave that way – their voicing starts immediately, perhaps because no pressure would build up. That is, the Z in the word “zoo” never sounds like an S when “zoo” starts a sentence. Another reason may be because a spirant has a longer closure than a stop, and even if the voice does not start at the beginning of it, voicing will be heard before the end of the closure and it will be perceived as voiced.

We showed above how “night rate” and “nitrate” are saved from being oronyms by what was called an allophonic variation, but is in reality a phonemic difference. We now give a converse example: what the old phonemic transcription ought to disambiguate becomes ambiguous due to so-called allophonic variation. Our new transcription accurately writes them as phonemically homonymous. Now, if a pause in voicing, transcribed as an apostrophe, can affect a following morpheme ending, such as plural and past tense, then it ought to affect any old following morpheme, and it does. The expressions “we backed Ann” and “we back Dan” are oronyms (homonymous), when spoken at a normal speed without pauses. Under current theory, two separate processes conspire to create this homonymy. In “backed Ann,” the archiphoneme -ED ending is realized as unvoiced, due to the non-voiced /k/ sound in “back.” In “back Dan” the following rule from the Wikipedia is applicable “In English, a voiced obstruent is partially devoiced next to a pause or next to a voiceless sound, inside a word or across its boundary.” In my idiolect, and I suspect in most people's, the D is completely devoiced, unless I make a special effort to distinguish these oronyms. So I transcribe both expressions as /bæ'gdæn/, because they are the concatenations of /bæ'gd+æn/ or /bæ'g+dæn/. If vocally I want to disambiguate it to “backed Ann,” I insert a glottal stop before the word “Ann”. If I want to disambiguate it to “back Dan,” I make an extra effort to voice the D, unnaturally building up air pressure. This latter strategy is probably based on the analogy of the sound of D in “Dan” between voiced sounds. Although the -ED ending is also voiced between voiced sounds, it is never voiced here probably because in this kind of oronym it would directly oppose the strategy of interposing a glottal stop when carefully pronouncing “Ann.” In any case, the phonemic transcription must reflect ordinary speech.

There is a more surprising example of this kind of homonymy, supposedly caused by allophonic variation, but actually caused by the concatenation of two morphemes plus the effects of a phonemic voicing pause in one of them. On October 28, 2014, I pronounced the expression “I know that's ink” to a classroom of high school students, being careful not to pronounce a glottal stop before the word “ink.” Half of the students heard “that's ink,” but the other half heard “that zinc.” The latter group assumed I said “zinc” but did not voice the /z/ because of the voice pause that carried over from the word “that,” in connected speech. Both expressions would be transcribed /đæ'dziŋ'g/, under our new phonemicization. The silent morpheme-boundary belongs in different places, but is not transcribed. Here again, if I wanted to artificially disambiguate, I would either insert a glottal stop or fully voice the Z in “zinc.”

Nasalization

I'm afraid we also need to posit a nasalization phoneme in order to explain why certain phrases with nasal consonants are not oronyms. Unfortunately, I do not have any examples from Madison Avenue to provide contrastive environments within morphemes as I did with aspiration. Hockett gives the example “see Mabel” versus “seem able.” I differentiate these vocally without using the glottal stop. How? By nasalizing the /i/ vowel sound in “seem,” and not nasalizing the /i/ in “see.” This is my normal practice, even when I pronounce these words in isolation. When I say “peanuts,” I nasalize the “pea”; when I say “pea nuts” I don't. If you don’t think you nasalize a vowel before a nasal consonant, try pronouncing “limp” without exploding the P, and compare its vowel to the one in “lip.” Because the nasality happens before the voice stops, we transcribe “limp” as /li~'b/ rather than /li'~b/, where the tilde represents nasality. In the other case, when a nasal consonant begins a morpheme, we write the tilde just before the vowel, as in “nuts” /d~Λ'dz/. The nasalization phoneme works much like the voicing pause phoneme. So for the cocktail snack “peanuts” we write /b'i~dΛ'dz/ and for the imaginary nuts with some connection to peas “pea nuts,” we write /b'id~Λ'dz/. For “see Mabel” we write /z'ib~eibl/, and “seem able” is /z'i~beibl/.

It is interesting to see how nasality and voice interruption interact in a name like “Tomkins.” Originally, this name meant exactly what it sounded like, the “kin” or relatives of Tom. I would transcribe this as “tom” + “kins”, that is /d'a~b/+/g'i~dz/ = /d'a~bg'i~dz/ with an aspirated /k/, written as /g'/. The old transcription would be /tamkinz/. What happened historically is apparently that the apostrophe did not disappear, but moved backwards while the /i/ of “kins” became a schwa, since schwa syllables are never aspirated. The result is /d'a~b'gə~dz/ or, really, /d'a~'bgə~dz/, since we don't get the apostrophe between stops anywhere else in English. According to the old transcription, a P appears out of nowhere /tampknz/. This historical process led to a new spelling as well: “Tompkins”. Most interesting of all, however, is the fact that it seems very difficult to pronounce the version of the word with a schwa or “syllabic N” in the “kins” part, without pronouncing a P! Not that there really exists a /p/ phoneme – it's really just a voice-interrupted manifestation of the /b/ phoneme – but a /p/ is what we hear because we're so accustomed to the Roman alphabet. Perhaps we should write /d'a~'bg~z/, without a schwa or even a second D. That would explain why the aspiration goes away in schwa syllables. I'm still working on how the apostrophe and the tilde interact in the new transcription, especially on so-called “syllabic nasals.” Perhaps the /d/ is not always necessary when transcribing what was formally N, and the nasal could have a default apical pronunciation when it doesn't appear before a stop, much as it seems to function in the so-called syllable final N in Spanish.

Another advantage of transcribing a separate nasal phoneme rather than /m/, /n/, and /ŋ/ (ng), is that it doesn't commit itself on a palatal nasal. In the new phonemicization, there are no nasal consonants, per se, in any case – there are only combinations of consonants plus the nasal phoneme. In the case of a word like “onion” one either transcribes /Λ~yә~d/ for people who have a palatal nasal or /Λ~dyә~d/ for those who don't. Anyway, by creating a nasal phoneme, we have now lowered the phoneme inventory by 2 or 3.

Consonantal vs Vocalic R and L

Now, at this point, all we have to do is add two phonemes, namely “vocalic L” and “vocalic R”, and we have a phonemicization that correctly accounts for all the oronyms of English, without resorting to “juncture.” That is, it transcribes identically all true oronyms, and transcribes distinctly all expressions that are not oronyms, correcting much of the old transcription. Now, consonantal L and R are to vocalic L and R as Y and W are to I and U. Consonantal L contrasts with vocalic L in the words “Clyde” versus “collide”. Consonantal R contrasts with vocalic R in the words “train” versus “terrain.” I believe that vocalic R is less rounded than consonantal R. Also, vocalic L has the velarized sound of the so-called “dull” L. Both of the so-called vocalic sounds are found directly after vowels and color the vowel sounds. Of course, vocalic L and R are not considered separate phonemes in most phonemicizations of English, but are considered allophones that are predictable through syllable boundaries. Adding the vocalic versions of L and R liberates us from dealing with the syllable. Now the supposed syllable boundaries are predicted by the phonemes, rather than the syllable boundaries predicting the allophonic realizations. And the vocalic versions disambiguate “soar over” from “saw rover” and they disambiguate “I'll earn” from “I learn.” Please let me know if you think of an example where vocalic R or L alternates automatically with consonantal R or L.

Summarizing so far, we have introduced a total of four new phonemes, and decommissioned a total of at least eleven phonemes. Besides a net reduction of at least seven phonemes, what has this accomplished? Firstly, it has brought the phonemicization closer to the actual phonetics by explicitly transcribing aspiration, clipped vowels, and vowel nasalization, through the proximity of the apostrophe and the tilde to vowels in the transcription. Another way in which we get closer to the actual phonetics is by eliminating all or parts of previously-posited “levels,” like archiphonemes and certain allophones. It seems to me that aspiration in general is a bad candidate as an allophonic variation, since it is a rather discrete entity: either a consonant has it, or it doesn't. True allophonic variation happens when a phoneme's realization is slightly altered because of neighboring phonemes, usually due to overlap. The second accomplishment is the explicit representation of oronyms: when two morphemes are concatenated and their unmodified concatenated transcription is the same as another string, we get a homonymous pronunciation called an oronym. Otherwise we don't. Thirdly, we have shown that by eliminating the unnecessary juncture phoneme, we can not only explain oronyms more directly, but also eliminate the confusion on the distribution of /h/ and /ng/, explaining the apparent exception to the rule that phonemes must not have mutually exclusive distributions. It also solves the phoneme membership dilemma. And because of the high frequency of the morpheme boundary, we can be optimistic that by disqualifying the boundary from being a phoneme, these dilemmas will also be eliminated in the analyses other languages.

With the return of the phoneme, we can also recover the implicit theory behind the phonemic principle, namely, that articulatory gestures alone definitively signal and differentiate the signs of spoken language – other cues are sporadic and irregular and may be speaker dependent. Let us return to the oronym “rob no zinc” versus “Rob knows ink.” Inserting the word “only,” might make the name “Rob” more probable than the verb, “to rob”: “Rob only knows ink”. However the other interpretation is still possible, at least in the minds of some people: “rob only no zinc.” But inserting the phoneme /p/ not only changes the morpheme from “ink” to “pink,” but “Rob knows pink” is no longer ambiguous. All speakers understand the mechanism that differentiates “pink” from “ink,” even though the mechanism is arbitrary, that is, there is no semantic connection between the two. But nobody knows for sure how or why “only” makes “rob” more likely a name, if indeed it does. Traditional homonyms work the same way.

Whereas it's sometimes difficult to determine whether a morpheme signals two homonyms, two distinct meanings or one meaning with various senses, it's always crystal clear whether two utterances are composed of the same phonemes. This is because phonology is somehow digital, even if there is much overlap in the continuous flow of speech. Although there may also be analog channels of information flowing from speaker to listener, such as intonation and gesticulation, we know that there is at least one digital channel, because for many centuries and in many languages it has been recorded digitally in writing. Writing in turn has been converted into Morse code and secret codes. And from the transmission of codes of all kinds arose information and coding theory.

Two of the units of coding theory are the alphabet symbol and the codeword. Sound familiar? The correspondence to the phoneme and the morpheme is no coincidence, since they correspond in turn to the letter and the printed word. And it turns out that the codeword boundary is an important consideration in coding theory. From a coding theory perspective, spoken language is a variable-length codeword system. As is well known, it uses an efficient strategy of assigning the shorter code symbol strings to the more frequent codewords, economizing on the use of symbols. So frequent morphemes like “of” or plural are short, while the morpheme “redundant” is long.

As opposed to fixed-length block codes, however, variable-length codes pose the decoding problem of finding the beginnings and ends of codewords. It is precisely the oronyms problem, because without knowing where a codeword boundary is, you could interpret a long string in more than one way. This problem can be solved through various coding strategies. The simplest one is the comma code, the comma being a special symbol dedicated exclusively to indicating the boundary between codewords in context. This corresponds to the assumption that the morpheme boundary is a phoneme, and that it has a physical manifestation, such as open juncture. This solution allows any and all combinations of all the other symbols, as in fixed-length codes. But this requires the gaps between text occurrences of the explicit codeword boundary to reflect the frequency distribution of the codewords, and I believe that this solution is slightly restricted in its efficiency when the frequency distribution of the codewords is not exponential. It also may require the source to produce the comma very frequently, perhaps more frequently than physically possible.

I have done a statistical text analysis on the first two books of the Old Testament in English and Dutch. I didn't bother to convert them into phonemes, but I analyzed the letters and spaces only, eliminating punctuation characters. In both languages, the space is the most frequent character. In English, it is twice as frequent as the next most frequent character, E, making it look like an outlier in the frequency distribution of letters.

The other solution to the codeword oronyms problem is to make the codeword boundary predictable from the arrangement of characters in the words themselves. A certain theorem states that there always exists a code that will most efficiently encode a set of codewords of various frequencies without the use of a comma. It requires however that not every combination of letters be used in any codeword. This extra cost restriction is in compensation for the savings gained by eliminating the comma in a comma code. The simplest way to do this is the prefix-free code that disallows any codeword to start with a string of letters that is already a complete codeword. So the word “testament” would be disallowed, because it begins with the “test,” already a word in English.

English obviously does not follow quite the prefix-free strategy, and it does have oronyms, even if they are rare. But English employs a number of restrictions on phoneme distribution within a morpheme that help to signal where the morpheme boundaries are and are not. For example, aspiration often occurs initially. Also, the short vowels in “pit, pat, pet, pot, put” never occur finally, but are always followed by some consonant within the same morpheme. Perhaps the reason that we have so few words beginning with Z is that this restriction tells us that a morpheme boundary likely occurs after the Z, for example, making the “no zinc” interpretation less likely. Some languages show vowel harmony, where all the vowels of a given morpheme must belong to one of two classes. Other languages restrict stressed syllables to the beginning or end of words, providing clues to morpheme boundaries. In fact, any so-called phonotactic rules that apply only within morphemes tell the hearer that morpheme boundaries occur wherever the rules do not seem to apply.

In a sense, every language gets a free ride on morpheme boundary signaling through natural phonotactics. Many languages have restrictions on which phonemes can occur at the beginning or end of phrases. These are the natural result of the physical restrictions on articulators and also because of the difficulties in perceiving certain sounds when speech starts or stops. Since a morpheme cannot be interrupted by pause, at least not intentionally, those phrase-initial and phrase-final restrictions automatically apply to many morpheme boundaries as well.

Whatever the physical or non-physical causes of the above-mentioned phoneme restrictions are, they help to make an explicit morpheme boundary unnecessary, because it is somewhat predictable, if not totally redundant. In information theory, that would reduce information contribution of the morpheme boundary, compared to the information it would provide if it were distributed at random. That is to say, the information that occurrence of a symbol supplies is not only inversely proportional to its frequency, but also inversely proportional to its predictability. Measuring letter frequency is relatively easy, but measuring predictability in context is much harder. I chose an offbeat methodology which I feel is unbiased, even if it only provides a relative, not an absolute measure of information.

Information Contribution of Letters

I downloaded a file compression program from the Internet which uses frequent letter string repetitions as a basis for compression. Then I wrote a program to repeatedly rewrite the Old Testament text files, each time omitting a different letter of the alphabet, including the space character. I ran the compression program on each of these files and compared the lengths of the compressed files with the compressed file of the unaltered text without any letter omissions. The length reductions of the single letter-omitting compressions compared to the full text compression ought to give us an idea of the relative information contribution of each letter free of its predictability by context.

I have listed these reductions as the fifth column in the table for the most frequent letters. The line with the @-sign represents all letters combined. Now, the size reductions in the various letter-omitting compressions do not add up to the length of the full text compression – the actual total is about half the length of the total compressed file. I believe this is due to the fact that the compressions are not optimal. However, the information reductions of each letter can still be compared to the sum of the reductions of the individual letters, to give the relative information contribution of each letter in context. This I listed in the next column of the table. Then I computed the relative amount of information that each letter contributed because of its frequency without regard to the context, this is listed in the column marked “fraction of total information contributed by letter in isolation.” Then I divided it by the in-context information, and listed it under “Letter's relative redundancy.” This last column purports to be the ratio of the letter's information content in isolation to its information content in context. Most of these are about equal to one, meaning that the letter makes the same relative information contribution in context or out of context. But notice the value for the space character – twice as redundant in context as it would be if it were randomly distributed. This means that the morpheme boundary is far more predictable than it would be if it were a phoneme of the same frequency. This contrasts with an efficient comma code, where the explicit codeword boundary isn't predictable at all. Note that the high predictability detected here is due entirely to the arrangements of other characters, with no contribution from meanings or world knowledge. That seems to be a vindication of the use of coding and information theory in phonology, if not in grammar or semantics.

Other than a greater predictability than the phoneme, the morpheme boundary certainly has an unusual distribution within morphemes. It must occur at the end and beginning of every morpheme, but never in the middle. It must occur exactly twice per morpheme. Pretty unusual for the distribution of a phoneme. In texts, it separates morphemes, so perhaps it occurs only once per morpheme. Those linguists that count the morpheme boundary as a context in their search for mutually exclusive environments seem to be assuming that the morpheme boundary is entirely predictable, perhaps through semantic cues in the text, since it is not always identifiable vocally. If not entirely identifiable, then it could not be part of the mutually exclusive environment that makes trouble for /h/ and /ng/. But then again if it's entirely predictable, the morpheme boundary cannot be a phoneme, but only an allophone. Is this a contradiction or am I missing something?

The Wikipedia gives a kind of flowchart in the form of a tree diagram for determining the phonemic status of two sounds. The first decision is whether the sounds are in complementary distribution. Only then do we ask whether they are phonetically similar. And this is how we solve the conundrum of /h/ and /ng/ being in complementary distribution. Since the answer to the second question is no, they are not phonetically similar, they are separate phonemes, even though they supposedly do not serve to distinguish morphemes. But just where did the mutually exclusive environments come from? It seems we must have already classified all other sounds successfully if we can already recognize mutually exclusive environments. The obvious question is how did we ever get started? If we must ask the mutual exclusivity question first, how do we answer it before we have posited any phonemes? The obvious answer is to start with phonetic similarity.

My method of determining a phoneme inventory is this: you make a list of all the sounds you think you can distinguish, hoping your ear and your mouth are skilled enough not to miss any distinctions that native speakers make. Then you group any phonetically similar sounds together into one phoneme if they have mutually exclusive distributions. It will not harm your analysis if some of those mutually exclusive environments are made up of allophones of other phonemes that you have not yet properly conglomerated. When you consider an environment that includes a morpheme boundary, you must ignore that boundary and consider any sounds that appear on the other side of it. Eventually you will have eliminated all allophonic variation and classified all phonemes properly. I predict that no language will leave you with any phonetically dissimilar sounds that appear in mutually exclusive environments, simply because human beings would not waste so much effort and articulatory options on something that only serves to produce redundancy, but does not distinguish messages.

I do not necessarily advocate making devoicing a phoneme in every language, at least not a traditionally-defined phoneme. If it is not possible to determine on which side of a stop a devoicing phoneme occurs, then devoicing would not be an ordinary phoneme. I find that in Spanish, for example, a voicing phoneme co-occurring with stops would be more useful in explaining allophonic variation, but it would have to be called a simultaneous phoneme, because it is impossible to tell whether it precedes or follows a stop. On the other hand, a regular nasalization phoneme really is necessary in Spanish. After a stop it signals the letters M or N. After a vowel, it nasalizes the previous vowel, and although it is always written as an N, it represents a following nasal consonant that is homorganic with the following consonant, even if that consonant is an F.

What would a devoicing phoneme in English mean for Phonology as Human Behavior? Very little, I think. Since the devoicing phoneme always accompanies a written devoiced stop no matter which side it appears on, statistical counts of non-voiced consonants should not be affected. Any statement about air pressure build-up in voiced stops would still be valid. The same can be said for the nasal phoneme – it always accompanies what has been considered a nasal consonant no matter which side it appears on.

Monday, November 10, 2014

Nasalization