Monday, May 19, 2008

Transitional Pinyin, a mixed semantic orthography

I propose that some Web pages of interest to learners of Chinese language be made available in a new intermediate orthography that mixes Han Chinese characters and Latin letters. The new orthography can have several stages of difficulty, for learners at different levels. Certain words or elements that are targets for learning will have an special underline (or other visual indicator) so that a mouse over the target will cause additional information to pop up.

The orthography is nor phonologically regular like Pinyin, it recognizes the semantic value of characters and emphasizes that.

General objectives
  1. Allow learners of spoken chinese to read as much as possible using their phonetic knowledge of spoken forms, but supplemented with semantic knowledge of the most common characters or components. The key obstacle for learners of Chinese, which is not found in languages using alphabets, is that they cannot acquire vocabulary from reading, especially if they reside outside a Chinese speaking locality. With Transitional Pinyin, they can acquire vocabulary much more quickly, especially when reading on the Web.
  2. Use Web display technologies to allow user control of options for display.

Common policies (all stages)
Secondary readings are always transcribed as Pinyin (or some other visual indicator consistent with chosen display mode).


Stage 1: Familiarity with up to 500 characters, and a corresponding vocabulary (identify what part of HSK vocabulary should be covered)

Objective:
Develop familiarity with about 170 Radical Characters that are frequently used. Develop familiarity with the most important Hanzi families, by frequent exposure to the family "head": base component characters, or the most frequently-used radical+base form. Hanzi family is defined as a group of characters which share a phonetic component, as well as a pronunciation with the same initial and rhyme, but may have a different tone. All tones for characters are marked.
Stage 1 will use Hanzi to distinguish homophonic variants of a syllable or word (considering two syllables homophones if they have the same initial and rhyme, but possibly differing tones), considering only the most productive families and open class words.

Policies
  1. All functional morphemes are spelled in Pinyin.
  2. Foreign proper names and loan words are spelled as in international English (with Pinyin available as a mouse over).
  3. Only 500 Han characters are used, and tones are marked for them. All other words and morphemes are written in Pinyin with tone marks.
  4. Other characters in the same family as the 500 are optionally displayed, some simpler indicator is the default with Han character displayed upon mouse-over.
  5. All the 500 Han characters are selected from the 1000 most frequently used based on corpus studies. Families with more or more productive members are preferred.
  6. Words are separated by spaces, following a standard to be defined. As an intermediate policy, it will follow the spacing of words in the ABC Dictionary of John DeFrancis and WenLin software.
  7. The Han characters are used as syllables in open class words (notional word categories, like noun, verb, adverb).
Stage 2: 1000 character forms (head forms of families, or selected singletons) of open class words.

Objective
Use characters to distinguish common syllables that are homophonic (ignoring tone, which is explicitly marked). Near-homophones (that have a different but related initial or rhyme or both) are prioritized for Pinyin representation.

Policies.
  1. 1000 forms are displayed.
  2. All family-related forms (differing in radical and tone) will also be displayed by default. Some options: the radical is displayed in a different color, with a brief definition of the character and word upon mouse-over.
  3. In addition to the 1000 character forms, all functional morphemes can be displayed as characters as a non-default option.
Stage 3. 2000+ forms

Policies
By default, all text is Hanzi but readings of a character that is not homophonic to the base form have an underline and Pinyin is displayed upon mouseover. Other options are 1) selective interlinear Pinyin (ruby text) 2) Pinyin is default for any reading not in homophonic to the regular Hanzi families, with the Hanzi form available as mouse-over, ruby or after the syllable/word.

Semantically-Annotated Pinyin Variants

All text is in Pinyin, but homophonic families (of open class words) are identified by various display options. Semantically related near-homophones are also available with a different visual indicator. The display options include: two kinds of underlining, with mouse overs; display base character and and radical separately, with a mnemonic available for exploration. Differen variants can be aligned with the various stages.

Related Concepts

Productivity
The productivity of a family base form is a function of the number of members, the closeness of pronunciation (including identical tones), the frequency of the characters in a corpus, and the frequency of words using the character in the HSK and similar learner vocabularies.

Next Steps

Prepare a text in this style, perhaps Quotations of Chairman Mao, or a culturally annotated reader based on it.

Consider how web display technology can allow the various options.

No comments: