Monday, August 3, 2009

old NYT article on origin of language

Early Voices: The Leap to Language
By NICHOLAS WADE
New York Times
Published: Tuesday, July 15, 2003

Wednesday, July 29, 2009

references: language documentation

Gippert, Jost, Nikolaus P. Himmelmann and Ulrike Mosel. 2006. Essentials of language documentation. Berlin: Walter de Gruyter.

This volume consists of essays from many linguists in the field of language documentation covering a range of subjects including community fieldwork, ethnography in linguistic fieldwork, annotation and archiving methods. [Himmelmann, in Germany, has worked on Tagalog, and collaborated with Australians]

Tuesday, July 28, 2009

notes: models for using jyutping

Models for standard Cantonese in Education and Social Life

For initial literacy (perhaps focused on families of non-Han migrants), use a system of Jyutping+Hanzi similar to Japanese writing (kana-kanji). Only introduce characters for reference if they are not the most frequent full homograph.

Use Jyutping translations of English science texts (retaining English technical terms with Cantonese glosses), then teach from the English texts later.

1. Occitan - language shift to national norm, historical vernacular disappears as a living language

2. Swiss German - Bilingual, vibrant spoken language (and sung in Cantopop) but no interest in written form, defer to a larger standard form

3. Frisian - No university teaches it, but social space enlarging.

4. Letzeburgisch - Generally accept a larger external
language as standard for literacy and writing, a belated interest in promoting vernacular into a standard

5. Catalan - Bilinguals but vigorously defend social position of written standard from vernacular. Resist language shift, social policy to promote children of non-locals to become fluent in local vernacular.

6. Dutch - Full standard language, related foreign standard
is seen much like a foreign language.

Negative model:
. Bokmal and Nynorsk - disputed standard, especially
for newspapers and fangyan characters


From
John DeFrancis, “The Prospects for Chinese Writing Reform”
Sino-Platonic Papers, 171 (June, 2006)

The Zhuyin Shizi, Tiqian Duxie ‘Phonetically Annotated Recognition Promotes Earlier Reading and Writing’ experiment came into being in 1982 in the northeast province of Heilongjiang

Reference:

John S. Rohsenow, “The ‘Z.T.’ Experiment in the PRC,” Journal of the Chinese Language Teachers Association. 31, 3 (1996): 33-44.

links on learning Chinese Characters

From Wikipedia HSK article:
From MDBG:
练习 Practice

ABC toc at Pleco

  1. Encounters

Thursday, June 18, 2009

Moschovakis uses Typed Lambda Calculus for the Semantics of English

A LOGICAL CALCULUS OF MEANING AND SYNONYMY
YIANNIS N. MOSCHOVAKIS
date: December 13, 2004
Linguistics and Philosophy
source: A logical calculus of meaning and synonymy , Linguistics and Philosophy, v. 29 (2006), pp. 27 -- 89.


Montague: meaning (= Frege: sense) of term A is its Carnap Intension or denotation(A)(a) for each state a (= possible world, time, context of use). Too coarse for synonyms.

Other extreme: "structural" approaches to the modeling of meaning (like Russell's propositions,[n3] Church [1946], Church [1974] and Cresswell [1985]) basically tell us no more than that "the sense of a complex term A can be determined from the syntactic structure of A and the senses or denotations of the basic constituent parts of A", without explaining how this "determination" is to take place.

Judy Pelham and Alasdair Urquhart [1994], Russellian propositions, Logic, Methodology and Philosophy of Science IX (D. Prawitz et al., editors), Elsevier Science.
Alonzo Church [1946], A formulation of the logic of sense and denotation, abstract, The Journal of
Symbolic Logic, vol. 11, p. 31.
Alonzo Church [1974], Outline of a revised formulation of the logic of sense and denotation, part II,
Nous, vol. 8, pp. 135-156.
M. J. Cresswell [1985], Structured meanings: The semantics of propositional attitudes, The MIT
Press, Cambridge, Mass.

- This is bottom-up compositional, does not allow the construction to contribute to meaning. The syntactic structures are empty combinatorial rules, without any meaning beyond what is determined by its constituents (deductively determined, not abductively-probabilistically inferred).
- but if the method constructs an algorithm (semantic contribution) before getting to the denotation (the explicit core of pragmatic interpretation), could that algorithm be construction-dependent rather that language-wide and uniform? Would it be enough to have every sense license a particular construction that triggers that sense for the wordform?

Davidson's eloquent criticism in Davidson [1967],
Theaetetus and the property of flying do not (by themselves) amount to the meaning of "Theaetetus flies"

Donald Davidson [1967], Truth and meaning, Synthese, vol. 17, pp. 304-333, reprinted in Martinich
[1990] and in Davidson [1984].

- the meaning of the verb in use is not the (property of) the action but a situation type involving the participants dependent on the predicative verb; or better, it is both an antecedent situation-type involving the particiapants and a related consequent situation-type where the action is realized with participant-role relationships between the instantiated action-type and each mentioned or implicit participant). This verb-meaning situation type exists in the discourse situation cognitively shared by speaker and hearer, their shared information state.

In Moschovakis [1994] I argued that the meaning of a term A can be faithfully modeled by its referential intension int(A), an (abstract, idealized, not necessarily implementable) algorithm which computes the denotation of A. The basic technical tool in that paper was the Formal language of recursion FLR, [for rendering NL as formal]

Yiannis N. Moschovakis [1994], Sense and denotation as algorithm and value, Logic colloquium '90

[note4] full rendering operation is of the form

natural language expression + context -- render--> formal expression + state

where the (informally understood) context determines not only the state (as we will make it precise in Subsection x2.2), but also which precise reading of the expression is appropriate and what formal transformations should be made (e.g., co-indexing), depending on information about "what the speaker meant", intonation, if the expression was spoken, punctuation and capitalization, if it was written, etc.

- I take this to mean:

NL expr + partly-construed situation --render-->
formal semantic contribution
+ anchoring situation (concrete or discourse) as construed in a verbal scheme (= parameter-resolved psoa + location + context of use (Wittg aspect))

I will not specify with any precision the all-important rendering (or translation) operation... I think that the theory of what-happens-next proposed here may be of some value, primarily because of two reasons.
> First, the modeling of meanings by referential intensions goes far beyond the imagery and analogy with computation often used to explain the relation between Frege's sense and denotation, especially by Dummett.5

M. A. Dummett [1978], Frege's distinction between sense and reference, Truth and other enigmas, Harvard Univ.ersity Press, Cambridge, pp. 116-144.
G. Evans [1982], The varieties of reference, Clarendon Press, Oxford, Edited by J. N. McDowell.

> Second, the formal processing of L^{lambda}_ar-terms (the "calculus" of the title) sets conditions and limitations on the rendering operation, it provides new ways to implement some syntactic transformations which affect meaning (like co-indexing and co-ordination), and for some English phrases, it suggests some plausible, novel renderings directly in L^{lambda}_ar which are not referentially synonymous with any terms of the typed {lambda}-calculus.

x1. The typed {lambda}-calculus with acyclic recursion, L^{lambda}_ar. The language L^{lambda}_ar is a typed calculus of terms, an extension of the two-sorted type theory Ty_2 of Gallin [1975][sec.8]
into which the language of intensional logic LIL of Montague [1973] can be interpreted by Gallin's Theorem 8.2.[note6]

Daniel Gallin [1975], Intensional and higher-order modal logic, North-Holland Mathematical Studies, no. 19, North-Holland, Elsevier, Amsterdam, Oxford, New York.

- Moschovakis builds on the theory of typed functions, typed substitution-evaluable relations. Russell originally made a (ramified) theory of typed sets. What are typed situations? The are set like collections of individuals_s, properties_s on individuals and relations_s on pairs and sequences of individuals, where properties_s, relations_s and situations can also be individuated (reification in a cognitive schema?).

- can we make a typed calculus of terms for rendering natural language sentence in the scheme of a constructicon? A Natural Semantic Rendering Formalism, that is not truth-conditional but considers conditions of information flow and conditions of satisfaction up to a shared semantic contribution of (only) the expression. Instead of just types e~ and t~, we can have entities (individuals) e~, properties and relations r~_i for i=1..n (where n is the number of participants in the largest frame, let us call it less than 7, Miller's number for working memory), situations s, and j~ for information flow values (whether a concrete or discourse situation supports a situation-type or basic infon). Going beyond Russellian propositions, we consider that verbs have many senses, and that constructions also contribute sense-like meanings. We want our representation to support not simply deductive inference but abductive sense extension, so we can infer not only information that is already there in the expression, but can learn about the world and acquire extended schemas for classifying it.

- We would like to build on DRS, but again at the sense level rather than the proposition level. Perhaps we can refer to each signalled sense as a microsign, and we are interested in its semantic contribution to pragmatic construal.

- in SBCG, every verb is taken to have one _rel. If we model at the granularity of senses, one _rel per sense. However, in Sowa's conceptual graphs, a verb is a relational concept that has (labelled) relations with each participant mentioned in the expression. Using FrameNet, but at the level of senses + constructions, we can have a small set of participant-relations that identify how the mentioned individuals relate to the event-situation of the verb, call them _prel. We may want to classify each _prel in a way local to the frame, or using a language-wide or universal collection of generic _prels (agent, accessory, goal, location, instrument, beneficiary). We can have grammatically-compulsory _prels and optional (adjunct) _prels.

the set of types is the smallest set which includes the distinct "symbols" e; t; s
and is closed under the pairing operation ({sigma}-->{sigma}). A type is pure (or state-free) if the state type s does not occur in it.

- Moschovakis only has a single pairing operation to generate his types, perhaps we need several.
- M uses the Curry-Howard isomorphism to handle multiple arguments to a function. We may instead shift to a finer grain level of _prels relation a verb's event-situation and its participants.
- in DRS we resolve referential indexes by equating i=j. But for events, we may need to place them in a partial order of a consistent discourse situation (one channel) and be prepared to shift channels. This is done pragmatically so we can abductively infer the best shared information state for information to flow in a successful communication context. Within this broad contingent pragmatic field, the semantic contribution is more predictable across similar and distinguished situations.

Constants, variables, terms

We assume given a (finite) set K of typed constants, the "vocabulary", and we write c : {sigma} to indicate that c has type {sigma}
For each type {sigma}, L^{lambda}_ar has two infnite sequences of variables,
-  the pure variables v^{sigma}_0, v^{sigma}_1, ... and
-  the recursion variables or locations p^{sigma}_0, p^{sigma}_1, ...
Syntactically, pure variables are quantified, while locations are assigned-to.

Terms are defined recursively, starting with the variables and the constants and using application, {lambda}-abstraction and (mutual) acyclic recursion. The definition also assigns a type to every term and specifies the free and bound occurrences of variables in it.

- locations are used for referential indexes

"referentially synonymous"

Formally, congruence is the smallest equivalence relation =_c between terms which respects alphabetic replacement of bound
variables (of both kinds), application, {lambda}-abstraction and acyclic recursion, and permuting the indexes of locations (so the system of assignments is a set, not a sequence)

Both the denotational and intensional semantics of L^{lambda}_ar(K) will respect congruence, and so we will sometimes tacitly identify congruent terms.

- we can model the acquisition of new vocabulary in a known wordsense group as associating a new constant with some idiosyncratic meaning to a wordsense group existing in the constructicon. If the new vocabulary item (a new form, or a new sense of an existing form) fits into the pattern and makes sense in terms of semantic analysis and pragmatic construal, it supports abduction to a particular semantic and pragmatic meaning, and the constructicon is (defeasibly) incremented with the new vocabulary item.

{beta}-conversion almost never preserves meaning, just as logical deduction does not--otherwise all theorems would be synonymous, which is absurd

States. To be specific,we will assume in this paper that a state is a quadruple
a = <>
which speciÞes a possible world i , a moment of time j, a point in space k, a speaker (or "agent") A, and a function ä which assigns values to all possible occurrences of proper names and demonstratives, indexed by the order in which they appear in
terms

More interesting for the natural language examples are the state-depended versions of these [logical] operations, [where t evaluates to 1, 0, or er] summarized in Table 3.

We assume the language has a constant [] for the basic ne-
cessity operator, Montague's "full necessity", or "necessarily always", as Thomason calls it.
Kaplan [1978b] argues convincingly that this interpretation is inappropriate for terms which contain demonstratives, but in our determination to avoid philosophical commitments, it is best to allow his interpretation as a de re reading of the modality, without forbidding the de dicto reading.

David Kaplan [1978b], On the logic of demonstratives, Journal of Philosophical Logic, pp. 81-98, reprinted in Salmon and Soames [1988].


The natural definition of the description operator returns an
error if the existence and uniqueness conditions are not fulfilled

Local, modal

An object is local[n16] if each value p(x; a) depends only on x(a) and not on any other values x(b).

[16] See Montague [1973][Section 4]. Montague and Gallin use extensional and intensional for our local and modal, but this adds one more use to the already overloaded extension-intension distinction and suggests a connection between modality and meaning which is not in the spirit of this article.

What seems (at first) surprising is that some common nouns and verbs are also modal, in this abstract sense, and that the
distinction is worth noting.

e.g. the temperature is rising

then rises cannot be reasonably interpreted by a local object: because we cannot tell whether the temperature is rising in state a from the mere knowledge of its value in state a.

For another example, consider the sentence
the color of the sky ranged from light pink to deep, brooding red;
the verb "ranges" is modal in this usage since to determine whether ranges(color; a) we must evaluate color(b) for various states b which differ in "observed location" from the current state a--assuming, for the example, that "observed location" is
part of the state.

Roughly speaking, co-indexing occurs when the references of one or more indexical expressions in a term are identified with that of a subterm by the introduction of a bound variable which refers to all of them.

Co-indexing is part of the rendering operation, since whether and how it should be done is determined by the informal context

One of the most original innovations in Montague [1973] is the interpretation of "John", "I" and "the blond" by quantifiers, of type ~q = (~e -> ~t) -> ~t (in the present system). I will not adopt it,
however, because the Montague renderings produce the wrong logical form for the syntactical expressions that they purport to formalize, and thus lose the intended meaning.

The evening star is the morning star (should be synonymous to) The morning star is the evening star

It is not hard to formulate rules for rendering which avoid unnecessary type-raising and give plausible results for (at least) simple expressions which involve singular terms or quantifiers (or both). The basic technique is known as type-driven rendering (or translation), cf. Klein and Sag [1985] or the more recent textbook Heim and Kratzer [1998][Chapter 3], where it is applied using phrase structure trees to represent meanings.

Ewan Klein and Ivan A. Sag [1985], Type-driven translation, Linguistics and Philosophy, vol. 8, pp. 163-201.
Irene Heim and Angelika Kratzer [1998], Semantics in generative grammar, Blackwell.

For our purposes here, the main lesson is that meaning (intuitively understood) must be seriously considered in the rendering process--simply "getting the right denotation" is not enough; and that the subsequent, formal computation of referential intensions and synonymies provides some clues as to whether the informal meaning
was captured by the proposed rendering.

We call cf(A) the canonical form of A and we write
A =>_cf B () <=> cf(A) ==_c B:
The terms A0,A1,...,An are the parts of A, and A0 is its head. It will be convenient to employ the notational convention
A where { } == A
introduced in (4), which allows us to assume that all canonical forms look like
recursive termsÑperhaps with an empty body:

10 reduction rules

We claim that it preserves meaning, so it had better preserve at least denotations: Thm 3.11

Proof is simple, by induction on the definition of the reduction relation.


Main Conjecture. If the set of constants K is finite, then the relation of referential synonymy between closed terms of L^{lambda}_ar(K) is decidable.

Still open.

For a satisfactory development of a theory of belief in which the belief carriers are utterances, we would also need to establish the decidability of synonymy between the parts of utterances in which the parameter a occurs. see:

Eleni Kalyvianaki and Yiannis N. Moschovakis [], Two aspects of local meaning, in preparation.

This is because, intuitively: if you mention an individual concept,
then that (full) concept is part of the meaning of your utterance.34 In the two puzzles above, Los Angeles, LA, He and Scott are all parts of the relevant terms, but Los Angeles = LA, which dooms poor Petros, while He 6= Scott, which saves the King.
We have already discussed in x4.2 the technical fact behind this claim: the state parameter a occurs only in the head of the canonical form of an utterance A(a) and not in its body.

5. English as a programming language.

(50) program P |--> algorithm(P) |--> den(P):
It is not hard to work out the mathematical theory of a suitably abstract notion of algorithm which makes this work; and once this is done, then it is hard to miss the similarity of (50) with the basic Fregean scheme for the interpretation of a natural language,
(51) term A |--> meaning(A) |--> den(A):

Aside from the relation between algorithms and meanings, programming languages resemble natural languages more than they resemble the classical, formal languages of logic, both in their complexity and also because they exhibit some natural language phenomena which are absent from formal languages.

I will also not try to explain my take on basic philosophical questions like what it means to "define", "represent faithfully" or "explicate" meaning (or any other notion) in set-theoretic terms; I tried my best to be as clear on these issues as I can in Moschovakis [1998]

Yiannis N. Moschovakis [1998], On founding the theory of algorithms, Truth in mathematics (H. G. Dales and G. Oliveri, editors), Clarendon Press, Oxford, pp. 71-104.

Denotational semantics for programming languages.

Scott's theory is peculiarly incomplete in that it makes no room for the notion of algorithm which (one would think) is at the heart of the matter. Consider, for example, the problem of "sorting" (putting in alphabetical order) a long list of words u. There are many algorithms which will do this--the bubble sort, the merge sort, the quick sort etc.-- and they differ greatly in many ways, for example their efficiency. They can all be "programmed" (expressed) in every sufficently rich programming language L, but the denotational semantics of L cannot distinguish between them, as they all have the same denotation, the function which assigns to each u its alphabetized rearrangement.

And so it seemed to me that Scott semantics should be refined by the introduction of algorithms as the primary semantics values of programs, which then determine their denotations, i.e., by adopting the basic interpretation scheme (50).

5.2. What is an algorithm?

recursive equations, e.g. for Euclidean Algo for gcd

gcd(x; y) = p(x; y) where fp := {lambda}(x){lambda}(y)C(q1(x; y); y; r(x; y));
q1 := {lambda}(x){lambda}(y)rem(x; y);
r := {lambda}(x){lambda}(y)p(y; q2(x; y));
q2 := {lambda}(x){lambda}(y)rem(x; y)g

where the conditional construct
C(u; s; t) = if (u = 0) then s else t

Notice that these algorithms are always relative to the givens, and so they do not determine "absolutely computable" functions unless the givens are absolutely computable.

intended interpretation of L^{lambda}_ar(K) in x1.4 includes higher-type givens,
and the simpler, acyclic recursors sufficed.


---------------

Alonzo Church [1951a], A formulation of the logic of sense and denotation, Structure, method and meaning (P. Henle, H. M. Kallen, and S. K. Langer, editors), Liberal Arts Press, New York, pp. 3-24.
Alonzo Church [1951b], The need for abstract entities, American Academy of Arts and Sciences Proceedings, vol. 80, pp. 100-113, reprinted in Martinich [1990] under the title Intensional Semantics.
Alonzo Church [1962], A remark conerning QuineÕs paradox about modality, Spanish version in Analisis Filos«oÞco, pp. 25-32, reprinted in English in Salmon and Soames [1988].
Alonzo Church [1973], Outline of a revised formulation of the logic of sense and denotation, part I, Nous, vol. 7, pp. 24-33.
Donald Davidson [1984], Truth and interpretation, Clarendon Press, Oxford.
G. Frege [1952], Translations from the philosophical writings of Gottlob Frege, Blackwell, Oxford,
edited by P. Geach and M. Black.
Gottlob Frege [1892], On sense and denotation, Zeitschrift f ¬ur Philosophie und Philosophische Kri-
tik, vol. 100, Translated by Max Black Frege [1952] and also by Herbert Feigl Martinich [1990]. I have
used ÒdenotationÓ to render Frege's ÒBedeutung,Ó instead of BlackÕs ÒmeaningÓ or FeiglÕs ÒnominatumÓ.
David Kaplan [1978a], Dthat, Syntax and semantics (Peter Cole, editor), vol. 9, Academic Press, New York, reprinted in Martinich [1990].
Saul A. Kripke [1979], A puzzle about belief, Meaning and use (A. Margalit, editor), Reidel, pp. 239-283, reprinted in Salmon and Soames [1988].
Leonard Linsky (editor) [1971], Reference and modality, Oxford University Press.
A. P. Martinich (editor) [1990], The philosophy of language, second ed., Oxford University Press, New York, Oxford.
R. Montague [1970a], English as a formal language, Linguaggi nella Societ `a e nella Tecnica (Milan) (Bruno Visentini et al., editors), Edizioni di Comunit`a, pp. 189-284, reprinted in Montague [1974].
R. Montague [1970b], Pragmatics and intensional logic, Synth`ese, vol. 22, pp. 68-94, reprinted in Montague [1974].
R. Montague [1970c], Universal grammar, Theoria, vol. 36, pp. 373-398, reprinted in Montague [1974].
R. Montague [1973], The Proper Treatment of Quantification in Ordinary English, Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics (J. Hintikka et al., editors), D. Reidel Publishing Co, Dordrecht, pp. 221-224, reprinted in Montague [1974].
R. Montague [1974], Formal philosophy, Yale University Press, New Haven and London, Selected papers of Richard Montague, edited by Richmond H. Thomason.
Yiannis N. Moschovakis [1994], Sense and denotation as algorithm and value, Logic colloquium '90 (J. V¬a¬an¬anen and J. Oikkonen, editors), vol. 2, Association for Symbolic Logic, Lecture Notes in Logic, pp. 210-249.
Jamal Ouhalla [1994], Introducing transformational grammar, Arnold and Oxford University Press.
G. Plotkin [1977], LCF considered as a programming language, Theoretical Computer Science, vol. 5, pp. 223-255.
Nathan Salmon and Scott Soames [1988], Propositions and attitudes, Oxford University Press.
D. S. Scott and C. Strachey [1971], Towards a mathematical semantics for computer languages, Proceedings of the symposium on computers and automata (New York) (J. Fox, editor), Polytechnic Institute of Brooklyn Press, pp. 19-46.
J. van Heijenoort [1985], Frege on sense identity, Selected essays, Bibliopolis, Napoli, pp. 65-70.

Thursday, May 28, 2009

reviews: creoles

Contact Linguistics: Bilingual encounters and grammatical outcomes

By Carol Myers-Scotton

Oxford: Oxford University Press, 2002. Pp. 356. paper $45.00. ISBN 0198299532.

Reviewed by Alison

Nicolle (BTL, East Africa) and Steve Nicolle (BTL, East Africa)

A creole can have several natural languages contributing to its Matrix language system, and the Embedded language is the superstratal lexifier.

-------

Defining Creole

By John H. McWhorter

Oxford: Oxford University Press, 2005. Pp. 444. paper $49.95. ISBN

0195166698.

Reviewed by Gerry Beimers

SIL International and University of New England (Australia)

ch 1 “official” statement of his Creole Prototype hypothesis. Here he explicates the three traits of the creole prototype, namely,
  1. “few or no inflectional affixes” (p. 12),
  2. “little or no use of tone to distinguish monosyllabic lexical items or to encode morphosyntactic distinctions” (p. 13), and
  3. a lack of noncompositional derivation.

ch 2: four diagnostics of grammatical complexity, namely,
  1. phonemic inventory,
  2. more syntactic rules to be processed,
  3. grammaticalized expressions of fine-grained semantic and pragmatic distinctions, and
  4. inflectional morphology.

ch 3: the developmental relationship between pidgins and creoles. In it he argues against the notion that the path from source language to creole is merely via “syntax-internal” (p. 74) transformation. The argument takes the shape of an examination of six features (which he designates as ornamental—metaphorically speaking) not found in creoles, namely:
  1. ergativity,
  2. inalienable possessive marking,
  3. overt marking of inherent reflexivity,
  4. evidential markers,
  5. grammaticalized referential marking, and
  6. consonant mutation.

ch 5: argues that the superstratist creole genesis model (advanced mainly by Chaudenson and Mufwene) is not supported by the data.

ch 11: English is “significantly less overspecified semantically and less complexified syntactically” (p. 268) compared to its Germanic sisters. His essential thesis, that this is due to a contact-based explanation, accounts for the facts. He outlines his view of overspecification and complexification and then goes on to examine ten features, namely,
  1. reflexivity marking,
  2. external possessor constructions,
  3. grammatical gender marking on the article,
  4. derivational morphology,
  5. directional adverbs,
  6. be with past participles,
  7. passive marking with become,
  8. verb-second word order,
  9. disappearance of thou, and
  10. disappearance of the indefinite pronoun man.

review: reduplication

SIL Review
of
Reduplication: Doubling in morphology

By Sharon Inkelas and Cheryl Zoll

Cambridge Studies in Linguistics 106. Cambridge: Cambridge University Press, 2005. Pp. 276. hardback $90.00. ISBN 0521806496.

Reviewed by Mike Cahill

Ch 4: "go beyond the daughter phonologies to argue there is a layer of phonology (a cophonology) associated with the mother node, that is, the construction as a whole. This is compatible with Kiparsky’s Stratal OT approach (Kiparsky 2000), though not identical."

Has examples from Tagalog in Ch 6. Argues that reduplication is mostly at level of morphology, rather than phonology.

Kiparsky, Paul. 2000. Opacity and cyclicity. The Linguistic Review 17:351-367.

Wednesday, January 21, 2009

For annotations to address NLU, need basic research on language-world mapping

Mark-up Barking Up the Wrong Tree
Annie Zaenen

In her Last Words column in CoLi, Zaenen describes the limitations of annotation applied to NLU, and calls for fundamental research "to understand the mapping between language and the world itself better"

"The interest in machine-learning methods to solve natural-language-understanding problems has led to the use of textual annotation as an important auxiliary technique. Grammar induction based on annotation has been very successful for the Penn Treebank" "The problems with the ‘coreference’ annotation tasks of MUC and the like are well documented and not solved. Kibble and van Deemter (2000), for instance, discuss the difficulties created by the assumption that coreference is an equivalence relation, and hence transitive"

2 difficulties: "The first is inherent in the kind of annotations that are currently needed. The field is moving from information retrieval to language understanding tasks. To understand a linguistic utterance is to map from it to a state of the world, a non-linguistic reality. Language understanding always has a non-linguistic component. In computational settings, unfortunately, most of the time we do not have independent access to this non-linguistic component. This means that language understanding systems have to be more than just language understanding systems: One expects them also to take care of some minimal understanding of the world the language is supposed to describe. But relations between linguistic entities and the world have not been studied in linguistics nor anywhere else in the systematic way that is required to develop reliable annotation schemas: Traditional formal semantics says how meanings are put together and remains silent about semantic primitives. Lexical semantics is very fragmented: One part of it tends to limit its scope to lexical items that exhibit syntactic alternations, whereas another part concentrates on improving traditional lexicography." "Annotation tasks typically involve these ill-understood phenomena. Current practice seems to assume that theoretical understanding can be circumvented and that the pristine intuitions of nearly na¨ıve native speakers can replace analysis of what is going on in these complicated cases. The results that I have looked at suggest that this is wishful thinking."

"The second problem that annotation tasks face is not inherent. It is created by the current fundingmodel. In the name of accountability, current NLP practice is wedded to quantifiable results, short time horizons, and strict financial control. In this setup there is no time for fundamental research. So, when research is necessary, it has to be called by another name. Part of it will be shuffled under the heading ‘annotation’."

"We should recognize that annotations are no substitute for the understanding of a phenomenon. They are an encoding of that understanding. The encoding is different from a rule-based encodingin that it does not require a generative formalization and it allows a more piecemeal approach." 

Patrick Blackburn reviews Lambalgen and Hamm, The Proper Treatment of Events

The Proper Treatment of Events by Michiel van Lambalgen and Fritz Hamm, 2005
review

This is an intriguing review.

"It provides a formalization of the notion of event (using a modification of Shanahan’s [1997] version of the Event Calculus, a many-sorted first-order theory), defines a dynamic-style semantics for the system, and discusses how constraint logic programming can be used to cash out its computational content. ... a detailed exploration of the ramifications of a single idea:... to properly understand how temporal expressions in natural language work, we must understand how human beings construct time, and that the cognitive construction of time is best explicated in terms of planning and causality. Planning is the glue that lets human minds integrate past, present, and future, and episodic memory (which Lambalgen and Hamm view as a “generalised capacity for imagining or constructing possible worlds”) is the key to this capacity." I am interested in causality and causal constraints, partly to harness in a situation theoretic semantics of verbs. The role of episodic memory may help clarify the role of consciousness.

"they broadly agree with Moschovakis’s (1993) interpretation of the Fregean notion of sense: The sense of an expression is the algorithm that computes its reference....Causality, the key relation between events, is presented in two variants: instantaneous change and continuous change. Moreover, in addition to this general background theory, they also allow for the constructions of “scenarios,” microtheories stating the specific causal relationships holding in a given situation (this machinery underlies their account of lexical meanings)....a theory that is carefully axiomatized. The authors consider various models for their theory, paying particular attention to minimal models, for they make a closed-world assumption in which anything that is not forced to happen does not happen....the authors distance themselves both from DRT (Kampand Reyle 1993) (because of its reliance on Davidson-style events with predicates corresponding to thematic roles) and from Amsterdam-style dynamic semantics (Groenendijk and Stokhof 1991) (which they view as treating computation implicitly rather than explicitly)." I definitely want to see why they reject DRT, and their ideas on events and thematic roles.

"Part III of the book (which, at around 160 pages, is by far the longest section) puts this apparatus to work to construct a theory of tense and aspect. Every VP is associated with a default scenario (that is, a microtheory) that determines the Aktionsart of the verb. The word “default” is important: temporal and aspectual operators, and many other linguistic items, may coerce the verb to assume a different Aktionsart...this book treats temporal and aspectual phenomena from a perspective very different from that of current corpusbased work. But it does so systematically and with great precision.... Interested in temporal semantics? Then this is essential reading." Again, this should help come up with a verb semantics, verb morphology construction semantics and also a clausal construction semantics.

Sowa reviews Halliday

 Computational Linguistics the journal is now open access. I came across this review of Construing Experience through Meaning: A Language-based Approach to Cognition by M. A. K. Halliday and Christian M. I. M. Matthiessen where John Sowa describes the ontology they use. They acknowledge a debt to the dyadic semiotics of Saussure, Hjelmsev and Firth, but Sowa suggests they are rediscovering (or not attributing) some of the insights of Peirce. Elements, Figures, Sequences correspond to referential indices, minimal clauses and discourse. “Elements are classified as participant, circumstance, or process. Figures are classified by another triad of relational (being or having), material (doing or happening), and mental (sensing or saying).” Sowa notes the latter “corresponds to Peirce’s fundamental triad of Quality, Reaction, and Representation.”

Sowa cites early Winograd and the USC ISI group (including Bateman), now in Germany as researchers computationally applying Systemic Functional Linguistics. This reminds me that my original interest in NLP was sparked by Winograd's Language as a Cognitive Process, including his appendix on a natural language specification technique. That must have been 1988 or so, twenty years! I need access to a good university lib, where I can reread it.

Wednesday, January 7, 2009

statistical NLP with R

One of the areas I still need to study is statistical NLP, I lack even the mathematical background although I took a probability course for Math majors at Cornell a long time ago. 

I came across this article about the R programming language and open source software package (community?).  It seems there are some people applying it to speech processing, with the EMU Speech Database System

There is even work related to language corpora (book draft) and multi-language annotation (not free :(  tho), which I haven't reviewed yet.

Saturday, January 3, 2009

Languages of Papua New Guinea

Taking data from Ethnologue 15th Edition, there are 820 living languages in PNG, of which I count 88 to have 10 thousand or more speakers. The Austronesian ones are divided into three subfamilies of Western Oceanic, listed below. So languages with 10 thousand speakers or more in PNG are 8/66 for Meso Melanesian, 4/102 for North New Guinea and 9/62 for Papuan Tip subfamily.

The biggest family in PNG is Trans-New Guinea 57/564 have over 10k speakers, I have listed only the seven with over 50k speakers. There is also Tok Pisin, English (with 50,000 speakers), and three smaller families with over 10k speakers: East Papuan 2/36, Sepik-Ramu 3/100, and Torricelli 3/53.

Meso Melanesian (66) 

Kuanua [ksd] 61,000 (1991 SIL). East New Britain Province, Rabaul District, Gazelle Peninsula. Alternate names: Tolai, Gunantuna, Tinata Tuna, Tuna, Blanche Bay, New Britain Language.  Dialects:Vunadidir, Rapitok, Raluana, Vanumami, Livuan, Matupit, Kokopo, Kabakada, Nodup, Kininanggunan, Rakunei, Rebar, Watom, Masawa.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai   More information.
Halia [hla] 20,000 (1994 SIL). Bougainville Province, North Bougainville District, northeastern Buka Island. Alternate names: Tasi.  Dialects: Hanahan, Hangan, Touloun (Tulon, Tulun), Selau. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Nehan-North Bougainville, Buka, Halia   More information.
Bola [bnp] 13,746 (2000 census). Population includes 2,253 Harua. West New Britain Province, northeast coast, most of Willaumez Peninsula. Harua is on the east side of Kimbe. Alternate names: Bakovi, Bola-Bakovi.  Dialects: Harua (Karua, Xarua, Garua, Mai), Bola. Harua is a dialect that has developed as a result of a group of people being resettled on an oil palm plantation. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez   More information.
Lihir [lih] 12,571 (2000 census). New Ireland Province, Lihir Island, and 3 smaller islands. Alternate names: Lir.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Tabar   More information.
Nakanai [nak] 13,000 (1981 Wurm and Hattori). West New Britain Province, Hoskins District, northwest coast. 42 villages. Alternate names: Nakonai.  Dialects: Losa (Loso, Auka), Bileki (Lakalai, Muku, Mamuga), Vere (Vele, Tarobi), Ubae (Babata), Maututu.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez   More information.
Tungag [lcm] 12,000 (1990 SIL). New Ireland Province, Lamet District, New Hanover Island, Tingwon and Umbukul Islands. Alternate names: Tungak, Lavongai, Lavangai, Dang.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Lavongai-Nalik   More information.
Ramoaaina [rai] 10,266 (2000 census). East New Britain Province, Kokopo District, Duke of York Islands.Alternate names: Duke of York, Ramuaina.  Dialects: Makada, Molot (Main Island), Aalawa (Aalawaa, Alawa, Mioko, Ulu, South Islands). Makada dialect is very different. Possibly not intelligible to speakers of other dialects.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai   More information.
Uneapa [bbn] 10,000 (1998 SIL). West New Britain Province, Talasea District, Unea (Bali) Island off the northwest coast. Alternate names: Bali.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Bali-Vitu   More information.
Adzera [azr] 20,675 (1988 Holzknecht). Population includes 367 Ngariawan (1978 McElhanon), 497 Sarasira (1988 Holzknecht), 990 Sukurum (1990). Morobe Province, Markham Valley, Kaiapit District, Leron River. Alternate names: Azera, Atzera, Acira.  Dialects: Yarus, Amari, Azera, Ngarowapum, Tsumanggorun, Guruf-Ngariawang (Ngariawan), Sarasira (Sirasira), Sukurum. The dialects form a cluster.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, Markham, Upper, Adzera   More information.
Takia [tbc] 19,619 (2003 SIL). Southern half of Karkar Island, Bagabag Island, and coastal villages Megiar and Serang, Madang Province, Madang District. Dialects: Megiar, Serang.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Ngero-Vitiaz, Vitiaz, Bel, Nuclear Bel, Northern   More information.
Buang, Mapos [bzh] 10,484 (2000). 30% monolingual. Morobe Province, middle Snake River area, Mumeng District. 10 villages. Alternate names: Mapos, Central Buang.  Dialects: Wagau, Mambump, Buweyeu, Wins, Chimbuluk, Papakene, Mapos. Lexical similarity 61% between Mambump and Mangga.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, South, Hote-Buang, Buang   More information.
Bugawac [buk] 9,694 (1978 McElhanon). 40% monolingual. Morobe Province, coast of Huon Gulf. Alternate names: Bukawa, Bukaua, Bukawac, Kawa, Kawac, Yom Gawac.  Dialects: Close to Yabem. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, North   More information.

Papuan Tip (62)

Kilivila [kij] 20,000 (2000 Tryon). 60% monolingual. Milne Bay Province, Trobriand Islands. Alternate names: Kiriwina.  Dialects: Kitava, Vakuta, Sinaketa. Various dialects. Lexical similarity 68% with Muyuw. Kitava Island has 80% lexical similarity.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Kilivila   More information.
Tawala [tbo] 20,000 (2000 census). Milne Bay Province, Alotau District, from Awaiama to East Cape, north and south shores of Milne Bay, Sideia and Basilaki islands. Alternate names: Tawara, Tavara. Dialects: Awayama (Awaiama, Awalama), Huhuna, Kehelala (Keherara, East Cape), Lelehudi, Diwinai (Divinai), Labe (Rabe), Yaleba (Wagawaga, Gwawili, Gwavili, Ealeba), Bohilai (Bohira'i, Basilaki), Sideya (Sideia).  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Are-Taupota, Taupota   More information.
Keapara [khz] 19,400 (2000 D. Tryon). Central Province, coast from east of Hood Peninsula to Lalaura west of Cape Rodney. 3 villages. Alternate names: Keopara, Kerepunu.  Dialects: Babaga, Kalo, Keapara (Keopara), Aroma (Arona, Aloma, Galoma), Maopa, Wanigela, Kapari, Lalaura. Dialect continuum to Hula.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Mekeo [mek] 19,000 (2003 SIL). Central Province, Kaiyuku District, inland, bounded on the west by the Waima, on the east by the Kuni and Kunimaipa. Extends into Gulf Province. Alternate names:Mekeo-Kovio.  Dialects: East Mekeo, West Mekeo, North Mekeo, Northwest Mekeo (Kovio). Kovio is a peripheral dialect. The four dialects are mutually unintelligible to each other's speakers, except for North and West Mekeo, but most Mekeo are reported to have familiarity with neighboring dialects. Kovio, however, is not contiguous to the other dialects. Kovio has 81% lexical similarity with West Mekeo and North Mekeo, and 79% with East Mekeo. West and East Mekeo have 87% lexical similarity. North Mekeo has 99% lexical similarity with West Mekeo and 87% with East Mekeo. Mekeo has 41% with Waima.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear   More information.
Misima-Paneati [mpx] 18,000 (2002 SIL). 4,000 monolinguals. Milne Bay Province, Misima District, Misima Island, Panaieti, and all the islands of the Calvados Chain to (not including) Panawina, Alcester, Ole, and Tewatewan Islands, and Bowagis on Woodlark Island. 32 villages. Alternate names: Panaieti, Panaeati, Paneyate, Paneate, Panayeti.  Dialects: Nasikwabw (Tokunu), Tewatewa. Lexical similarity 33% with Nimowa and Dobu (closest).  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Misima   More information.
Sinaugoro [snc] 15,000 (1991 SIL). Central Province, Rigo District, south of Kwikila. Alternate names:Sinagoro.  Dialects: Ikolu, Balawaia, Saroa, Babagarupu, Kwaibida, Taboro, Kwaibo, Alepa, Omene, Tubulamo, Ikega, Boku, Buaga, Wiga, Vora, Kubuli, Oruone. Boku dialect may be most central. Lexical similarity 70% to 75% with Kalo (closest), 65% to 70% with Hula.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Waima [rro] 15,000 (2000 census). Central Province, Bereina District, near Kairuku, shores of Hall Sound, between Yule Island and mainland, 65 miles northwest of Port Moresby. Alternate names: Roro. Dialects: Waima, Paitana, Roro. Roro and Paitana populations are smaller and scattered. Lexical similarity 45% with Kuni (closest), 99% among all three dialects.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear   More information.
Motu [meu] 14,000 (1981 Wurm and Hattori). Central Province, in and around Port Moresby, villages along the coast from Manumanu, Galley Reach, to GabaGaba (Kapakapa). Alternate names: True Motu, Pure Motu.  Dialects: Western Motu, Eastern Motu.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Dobu [dob] 10,000 (1998 SIL). 60% monolingual. Milne Bay Province, Esa'ala District, Sanaroa, Dobu, and parts of Fergusson and Normanby islands. 500 villages. Dialects: Galubwa, Sanaroa, Ubuia, Central Dobu, Loboda (Roboda, Dawada-Siausi). Lexical similarity 56% with Morima (closest). Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Dobu-Duau   More information.

Trans-New Guinea (564)

Enga [enq] 164,750 (1981 Wurm and Hattori). Population includes 12,000 in Sau (1990 UBS). Enga Province. The Maramuni are nomadic, and are in the lower reaches of the central range. Alternate names: Caga, Tsaga, Tchaga.  Dialects: Kandepe, Layapo, Tayato, Mae (Mai, Wabag), Maramuni (Malamuni), Kaina, Kapona, Sau (Sau Enga, Wapi), Yandapo, Lapalama 1, Lapalama 2, Laiagam, Sari. Mae is the standard dialect; all understand it. Layapo is between Mae and Kyaka. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Enga   More information.
Melpa [med] 130,000 (1991 SIL). Western Highlands Province, Hagen District. Alternate names: Medlpa, Hagen.  Dialects: Tembagla. Only slight dialect differences.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Hagen   More information.
Kuman [kue] 80,000 (1994 SIL). 10,000 monolinguals. Simbu Province, northern third, overlapping into Minj Subprovince of Western Highlands Province. Alternate names: Chimbu, Simbu.  Dialects: Kuman, Nagane (Genagane, Genogane), Yongomugi.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.
Huli [hui] 70,000 (1991 UBS). Southern Highlands Province around Tari, and southern fringe of Enga Province. Alternate names: Huli-Hulidana, Huri.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Huli   More information.
Kamano [kbq] 63,170 (2000 census). Eastern Highlands Province, Kainantu and Henganofi districts.Alternate names: Kamano-Kafe.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, East-Central, Kamano-Yagaria   More information.
Golin [gvf] 51,105 (1981 Wurm and Hattori). Simbu Province, Gumine District. Alternate names: Gollum, Gumine.  Dialects: Yuri, Kia (Kiari), Golin, Keri, Marigl. Close to Dom.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.
Sinasina [sst] 50,079 (1981 Wurm and Hattori). Simbu Province. Dialects: Tabare, Guna. Close to Dom and Golin.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.

Other Families

Tok Pisin [tpi] 121,000 (2003 SIL). 50,000 monolinguals. Mainly in the northern half of the country, and now well established in Port Moresby, and into other regions. Alternate names: Pisin, Pidgin, Neomelanesian, New Guinea Pidgin English, Melanesian English.  Dialects: There are dialect differences between lowlands, highlands, and the islands. The highlands lexicon has more English influence (J. Holm).  Classification: Creole, English based, Pacific   More information.
Terei [buo] 26,500 (2003 SIL). Southern Bougainville Province, Buin District. Alternate names: Buin, Telei, Rugara.  Dialects: Closest to Uisai.  Classification: East Papuan, Bougainville, East, Buin   More information.
Naasioi [nas] 10,000 (1990 SIL). Bougainville Province, Kieta District, central mountains and southeast coast. Alternate names: Nasioi, Kieta, Kieta Talk, Aunge.  Dialects: Naasioi, Kongara, Orami (Guava), Pakia-Sideronsi.  Classification: East Papuan, Bougainville, East, Nasioi   More information.
Ambulas [abt] 44,000 (1991 SIL). Population includes 27,000 in Wosera (1991 SIL), 9,000 in Maprik (1991 SIL), 8,000 in Wingei (1991 SIL). East Sepik Province, Maprik District. Alternate names: Abulas, Abelam.  Dialects: Maprik, Wingei, Wosera-Kamu, Wosera-Mamu.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu   More information.
Boikin [bzf] 31,328 (2003 SIL). East Sepik Province, Yangoru District. Alternate names: Boiken, Nucum, Yangoru, Yengoru.  Dialects: West Boikin, Central Boikin, East Boikin, Munji, Haripmor, Kwusaun, Kunai, Island Boikin.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu   More information.
Kwanga [kwj] 10,000 (2001 SIL). East Sepik Province, extending beyond the western boundary of Maprik District; Makru-Klaplei Division, Nuku District; Saundaun Province, east of Mehek. 40 villages.Alternate names: Kawanga, Gawanga.  Dialects: Apos, Bongos (Bongomamsi, Bongomaise, Nambi), Tau (Kubiwat), Wasambu, Yubanakor (Daina). A dialect cluster of 5 subdialects, 2 main dialects.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Nukuma   More information.
Bukiyip [ape] 16,233 (2003 SIL). East Sepik Province, west Yangoru District, Torricelli Mountains.Alternate names: Bukiyúp, Mountain Arapesh.  Dialects: Coastal Arapesh, Bukiyip (Mountain Arapesh). Lexical similarity 60% with Mufian.  Classification: Torricelli, Kombio-Arapesh, Arapesh   More information.
Olo [ong] 13,667 (2003 SIL). Sandaun Province, Lumi District. 55 villages. Alternate names: Orlei. Dialects: Payi (Pay, North Olo), Wapi (Wape, South Olo). Related to Yis, Yau, Ningil, Valman. Classification: Torricelli, Wapei-Palei, Wapei   More information.
Mufian [aoj] 11,000 (1998 SIL). Population includes 6,000 Filifita (1999 SIL). East Sepik Province, Maprik District, Torricelli Mountains, west of Maprik. 36 villages. Alternate names: Southern Arapesh, Muhiang, Muhian.  Dialects: Supari, Balif, Filifita (Ilahita), Iwam-Nagalemb, Nagipaem. Classification: Torricelli, Kombio-Arapesh, Arapesh   More information.

ethnographic film

There is such a thing as visual anthropology (Wikipedia) (see the list of influential films), and an ethnographic film unit at ANU (there is a web page on a film about Roti). It occurs to me that linguistic documentation should increasingly use video, and not just for sign language. A good way to embed this in the local culture is to produce short films for elementary school in local languages. This can range from traditional stories, to Sesame Street (or Batibot) like educational programs. This would be particularly helpful in language maintenance programs, where the teacher is not as fluent as resource speakers like elders, who could be captured on film.