thinking through some linguistics stuff
Jan. 26th, 2003 06:15 pmI've been set to the task of figuring out how the written Chinese language works. Not to actually learn Chinese, for that would be a vast task and would take me many years, but to understand at a theoretical level how Hanzi are used to represent the language. Just to, in the abstract, get my head around the linguistic issues.
It's mostly a self-imposed task. Bob only really needed me to investigate so far as would be needed to gather a pile of Hanzi for use as arbitrary, nearly-random stuff in our memorization study, and yet be able to discuss the results without saying anything stupid or culturally insensitive. But I've been all-out geeking.
The fascinating puzzle: It's claimed that each character stands for a morpheme. (Definition of morpheme is not exact but roughly, it's the smallest unit bearing meaning. This might be a word: "dog" is a word and one morpheme; or part of a word: "dogs" is 2 morphemes, dog + s; "weekend" is 2 morphemes, week + end.) OK, fair enough. But it's also claimed that each character stands for a syllable, at least in Mandarin. One character = one syllable = one morpheme. This would mandate that, for the most part, morphemes consist of one syllable. Not that it would be impossible to deal with an exception to that rule in their writing system; but it seems, from my reading, that multiple-syllable morphemes are uncommon enough for the people establishing writing conventions to blithely make the syllables == morphemes assumption. Didn't even notice that they were making an assumption.
Now, my questions: (1) how does a language get to that state? It seems to me that there are historical linguistic trends that would tend to cause languages to acquire, and retain, morphemes that do not follow this rule. (2) And how does it stay that way for thousands of years? The Chinese writing system, as it is known today, was developed about 4 thousand years ago during the Shang dynasty. It fit the language then, and it hasn't popped any buttons since. (That I know of.)
English has many multi-syllable morphemes. Here's a sample from a book that seems to be on not too difficult a reading level:
If the other novice wizards on the row hadn't broken into Raeshaldis's rooms the previous day, pissed on her bed and written WHORE and THIEF on the walls, she probably would have been killed on the night of the full moon.
Leaving aside the proper name (since personal names are created anew every generation, and so can be made to conform whatever cultural convention); I count 49 morphemes, 5 of which consist of more than one syllable. Hmm, 10% is not insignificant. How did English come to have so many multiple-syllable morphemes? Three historical-linguistic factors that I have come up with: 1) Units borrowed from other languages are no longer analyzable into their constituents. We can hear "week" and "end" in the word "weekend", because "weekend" was coined in English. To a speaker of Latin, "praevius" would be understandable as "prae" + "vius"; but when this was borrowed into English as "previous", we wound up with a monolithic blob with two syllables. 2) Old grammatical rules stop being productive but leave behind fossils: units which used to be word + ending but are now unanalyzable blobs. For example, it looks like "wizard" comes from Middle English "wysard", "wys" for "wise" plus some morphological thingie. I don't know this -ard suffix, it must have dropped out of the language, so hey presto, no separate morpheme. 3) Multi-syllabic words can be very conservative about staying that way. "Other" is related to Old High German "andar" and Sanskrit "antara". Sanskrit! If the similarities in these words is based on their being cognates, then the proto-Indo-European word must have been something like nasal vowel-dental-vowel-/r/. While some of our multi-syllable words are neologisms (such as radar), this one sure isn't.
Perhaps none of these factors apply to Chinese. 1) Chinese is not big on borrowing words, I think. Perhaps this has been true for thousands of years. It can't have been on a word-borrowing binge just prior to the Shang dynasty (~1766 BC) because the writing system, as it was being invented, would have had to deal with lots of non-decomposable foreign phrases. But perhaps historical factors that work against word borrowing (i.e., being the leading cultural innovators in the area, not having much immigration, ...) were in place in China for a long, long time. 2) Chinese is currently a isolating language. They don't inflect or add endings to anything. Perhaps it has been that way for many, many, many thousands of years. Which seems weird; in the past 4 thousand years, English has changed how much it inflects and agglutinates, from lots to not much. So languages are not necessarily very conservative in that respect. This would be a fascinating question for the field of historical linguistics: Do isolating languages tend to be conservative and stay that way? If so, where do agglutinating or inflecting languages come from? 3) I haven't shown that multi-syllabic words are necessarily conservative. Just because the P.I.E. word for "other" held onto all its syllables to the present day, in English, doesn't say anything about the rest of the P.I.E. vocabulary or what happens in other languages. Maybe two-syllable words do overall tend to erode to single syllables. Erosion... look at French! (Disclaimer: I don't know a bit of French.) Comparing how French words are pronounced and spelled (which reflects how they were pronounced in the 11th century) shows that the ends of French words have eroded like a freshly deforested hillside. Across the board. Perhaps, in the time leading up to the Shang dynasty, Chinese suffered the same sort of malaise and lost its trailing syllables.
Whew. I didn't even get to speculating about the influence of the writing system on the spoken language, and my butt is asleep. Time to raid the Super Bowl party food.
It's mostly a self-imposed task. Bob only really needed me to investigate so far as would be needed to gather a pile of Hanzi for use as arbitrary, nearly-random stuff in our memorization study, and yet be able to discuss the results without saying anything stupid or culturally insensitive. But I've been all-out geeking.
The fascinating puzzle: It's claimed that each character stands for a morpheme. (Definition of morpheme is not exact but roughly, it's the smallest unit bearing meaning. This might be a word: "dog" is a word and one morpheme; or part of a word: "dogs" is 2 morphemes, dog + s; "weekend" is 2 morphemes, week + end.) OK, fair enough. But it's also claimed that each character stands for a syllable, at least in Mandarin. One character = one syllable = one morpheme. This would mandate that, for the most part, morphemes consist of one syllable. Not that it would be impossible to deal with an exception to that rule in their writing system; but it seems, from my reading, that multiple-syllable morphemes are uncommon enough for the people establishing writing conventions to blithely make the syllables == morphemes assumption. Didn't even notice that they were making an assumption.
Now, my questions: (1) how does a language get to that state? It seems to me that there are historical linguistic trends that would tend to cause languages to acquire, and retain, morphemes that do not follow this rule. (2) And how does it stay that way for thousands of years? The Chinese writing system, as it is known today, was developed about 4 thousand years ago during the Shang dynasty. It fit the language then, and it hasn't popped any buttons since. (That I know of.)
English has many multi-syllable morphemes. Here's a sample from a book that seems to be on not too difficult a reading level:
If the other novice wizards on the row hadn't broken into Raeshaldis's rooms the previous day, pissed on her bed and written WHORE and THIEF on the walls, she probably would have been killed on the night of the full moon.
Leaving aside the proper name (since personal names are created anew every generation, and so can be made to conform whatever cultural convention); I count 49 morphemes, 5 of which consist of more than one syllable. Hmm, 10% is not insignificant. How did English come to have so many multiple-syllable morphemes? Three historical-linguistic factors that I have come up with: 1) Units borrowed from other languages are no longer analyzable into their constituents. We can hear "week" and "end" in the word "weekend", because "weekend" was coined in English. To a speaker of Latin, "praevius" would be understandable as "prae" + "vius"; but when this was borrowed into English as "previous", we wound up with a monolithic blob with two syllables. 2) Old grammatical rules stop being productive but leave behind fossils: units which used to be word + ending but are now unanalyzable blobs. For example, it looks like "wizard" comes from Middle English "wysard", "wys" for "wise" plus some morphological thingie. I don't know this -ard suffix, it must have dropped out of the language, so hey presto, no separate morpheme. 3) Multi-syllabic words can be very conservative about staying that way. "Other" is related to Old High German "andar" and Sanskrit "antara". Sanskrit! If the similarities in these words is based on their being cognates, then the proto-Indo-European word must have been something like nasal vowel-dental-vowel-/r/. While some of our multi-syllable words are neologisms (such as radar), this one sure isn't.
Perhaps none of these factors apply to Chinese. 1) Chinese is not big on borrowing words, I think. Perhaps this has been true for thousands of years. It can't have been on a word-borrowing binge just prior to the Shang dynasty (~1766 BC) because the writing system, as it was being invented, would have had to deal with lots of non-decomposable foreign phrases. But perhaps historical factors that work against word borrowing (i.e., being the leading cultural innovators in the area, not having much immigration, ...) were in place in China for a long, long time. 2) Chinese is currently a isolating language. They don't inflect or add endings to anything. Perhaps it has been that way for many, many, many thousands of years. Which seems weird; in the past 4 thousand years, English has changed how much it inflects and agglutinates, from lots to not much. So languages are not necessarily very conservative in that respect. This would be a fascinating question for the field of historical linguistics: Do isolating languages tend to be conservative and stay that way? If so, where do agglutinating or inflecting languages come from? 3) I haven't shown that multi-syllabic words are necessarily conservative. Just because the P.I.E. word for "other" held onto all its syllables to the present day, in English, doesn't say anything about the rest of the P.I.E. vocabulary or what happens in other languages. Maybe two-syllable words do overall tend to erode to single syllables. Erosion... look at French! (Disclaimer: I don't know a bit of French.) Comparing how French words are pronounced and spelled (which reflects how they were pronounced in the 11th century) shows that the ends of French words have eroded like a freshly deforested hillside. Across the board. Perhaps, in the time leading up to the Shang dynasty, Chinese suffered the same sort of malaise and lost its trailing syllables.
Whew. I didn't even get to speculating about the influence of the writing system on the spoken language, and my butt is asleep. Time to raid the Super Bowl party food.