Wednesday, January 21, 2009

For annotations to address NLU, need basic research on language-world mapping

Mark-up Barking Up the Wrong Tree
Annie Zaenen

In her Last Words column in CoLi, Zaenen describes the limitations of annotation applied to NLU, and calls for fundamental research "to understand the mapping between language and the world itself better"

"The interest in machine-learning methods to solve natural-language-understanding problems has led to the use of textual annotation as an important auxiliary technique. Grammar induction based on annotation has been very successful for the Penn Treebank" "The problems with the ‘coreference’ annotation tasks of MUC and the like are well documented and not solved. Kibble and van Deemter (2000), for instance, discuss the difficulties created by the assumption that coreference is an equivalence relation, and hence transitive"

2 difficulties: "The first is inherent in the kind of annotations that are currently needed. The field is moving from information retrieval to language understanding tasks. To understand a linguistic utterance is to map from it to a state of the world, a non-linguistic reality. Language understanding always has a non-linguistic component. In computational settings, unfortunately, most of the time we do not have independent access to this non-linguistic component. This means that language understanding systems have to be more than just language understanding systems: One expects them also to take care of some minimal understanding of the world the language is supposed to describe. But relations between linguistic entities and the world have not been studied in linguistics nor anywhere else in the systematic way that is required to develop reliable annotation schemas: Traditional formal semantics says how meanings are put together and remains silent about semantic primitives. Lexical semantics is very fragmented: One part of it tends to limit its scope to lexical items that exhibit syntactic alternations, whereas another part concentrates on improving traditional lexicography." "Annotation tasks typically involve these ill-understood phenomena. Current practice seems to assume that theoretical understanding can be circumvented and that the pristine intuitions of nearly na¨ıve native speakers can replace analysis of what is going on in these complicated cases. The results that I have looked at suggest that this is wishful thinking."

"The second problem that annotation tasks face is not inherent. It is created by the current fundingmodel. In the name of accountability, current NLP practice is wedded to quantifiable results, short time horizons, and strict financial control. In this setup there is no time for fundamental research. So, when research is necessary, it has to be called by another name. Part of it will be shuffled under the heading ‘annotation’."

"We should recognize that annotations are no substitute for the understanding of a phenomenon. They are an encoding of that understanding. The encoding is different from a rule-based encodingin that it does not require a generative formalization and it allows a more piecemeal approach." 

Patrick Blackburn reviews Lambalgen and Hamm, The Proper Treatment of Events

The Proper Treatment of Events by Michiel van Lambalgen and Fritz Hamm, 2005
review

This is an intriguing review.

"It provides a formalization of the notion of event (using a modification of Shanahan’s [1997] version of the Event Calculus, a many-sorted first-order theory), defines a dynamic-style semantics for the system, and discusses how constraint logic programming can be used to cash out its computational content. ... a detailed exploration of the ramifications of a single idea:... to properly understand how temporal expressions in natural language work, we must understand how human beings construct time, and that the cognitive construction of time is best explicated in terms of planning and causality. Planning is the glue that lets human minds integrate past, present, and future, and episodic memory (which Lambalgen and Hamm view as a “generalised capacity for imagining or constructing possible worlds”) is the key to this capacity." I am interested in causality and causal constraints, partly to harness in a situation theoretic semantics of verbs. The role of episodic memory may help clarify the role of consciousness.

"they broadly agree with Moschovakis’s (1993) interpretation of the Fregean notion of sense: The sense of an expression is the algorithm that computes its reference....Causality, the key relation between events, is presented in two variants: instantaneous change and continuous change. Moreover, in addition to this general background theory, they also allow for the constructions of “scenarios,” microtheories stating the specific causal relationships holding in a given situation (this machinery underlies their account of lexical meanings)....a theory that is carefully axiomatized. The authors consider various models for their theory, paying particular attention to minimal models, for they make a closed-world assumption in which anything that is not forced to happen does not happen....the authors distance themselves both from DRT (Kampand Reyle 1993) (because of its reliance on Davidson-style events with predicates corresponding to thematic roles) and from Amsterdam-style dynamic semantics (Groenendijk and Stokhof 1991) (which they view as treating computation implicitly rather than explicitly)." I definitely want to see why they reject DRT, and their ideas on events and thematic roles.

"Part III of the book (which, at around 160 pages, is by far the longest section) puts this apparatus to work to construct a theory of tense and aspect. Every VP is associated with a default scenario (that is, a microtheory) that determines the Aktionsart of the verb. The word “default” is important: temporal and aspectual operators, and many other linguistic items, may coerce the verb to assume a different Aktionsart...this book treats temporal and aspectual phenomena from a perspective very different from that of current corpusbased work. But it does so systematically and with great precision.... Interested in temporal semantics? Then this is essential reading." Again, this should help come up with a verb semantics, verb morphology construction semantics and also a clausal construction semantics.

Sowa reviews Halliday

 Computational Linguistics the journal is now open access. I came across this review of Construing Experience through Meaning: A Language-based Approach to Cognition by M. A. K. Halliday and Christian M. I. M. Matthiessen where John Sowa describes the ontology they use. They acknowledge a debt to the dyadic semiotics of Saussure, Hjelmsev and Firth, but Sowa suggests they are rediscovering (or not attributing) some of the insights of Peirce. Elements, Figures, Sequences correspond to referential indices, minimal clauses and discourse. “Elements are classified as participant, circumstance, or process. Figures are classified by another triad of relational (being or having), material (doing or happening), and mental (sensing or saying).” Sowa notes the latter “corresponds to Peirce’s fundamental triad of Quality, Reaction, and Representation.”

Sowa cites early Winograd and the USC ISI group (including Bateman), now in Germany as researchers computationally applying Systemic Functional Linguistics. This reminds me that my original interest in NLP was sparked by Winograd's Language as a Cognitive Process, including his appendix on a natural language specification technique. That must have been 1988 or so, twenty years! I need access to a good university lib, where I can reread it.

Wednesday, January 7, 2009

statistical NLP with R

One of the areas I still need to study is statistical NLP, I lack even the mathematical background although I took a probability course for Math majors at Cornell a long time ago. 

I came across this article about the R programming language and open source software package (community?).  It seems there are some people applying it to speech processing, with the EMU Speech Database System

There is even work related to language corpora (book draft) and multi-language annotation (not free :(  tho), which I haven't reviewed yet.

Saturday, January 3, 2009

Languages of Papua New Guinea

Taking data from Ethnologue 15th Edition, there are 820 living languages in PNG, of which I count 88 to have 10 thousand or more speakers. The Austronesian ones are divided into three subfamilies of Western Oceanic, listed below. So languages with 10 thousand speakers or more in PNG are 8/66 for Meso Melanesian, 4/102 for North New Guinea and 9/62 for Papuan Tip subfamily.

The biggest family in PNG is Trans-New Guinea 57/564 have over 10k speakers, I have listed only the seven with over 50k speakers. There is also Tok Pisin, English (with 50,000 speakers), and three smaller families with over 10k speakers: East Papuan 2/36, Sepik-Ramu 3/100, and Torricelli 3/53.

Meso Melanesian (66) 

Kuanua [ksd] 61,000 (1991 SIL). East New Britain Province, Rabaul District, Gazelle Peninsula. Alternate names: Tolai, Gunantuna, Tinata Tuna, Tuna, Blanche Bay, New Britain Language.  Dialects:Vunadidir, Rapitok, Raluana, Vanumami, Livuan, Matupit, Kokopo, Kabakada, Nodup, Kininanggunan, Rakunei, Rebar, Watom, Masawa.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai   More information.
Halia [hla] 20,000 (1994 SIL). Bougainville Province, North Bougainville District, northeastern Buka Island. Alternate names: Tasi.  Dialects: Hanahan, Hangan, Touloun (Tulon, Tulun), Selau. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Nehan-North Bougainville, Buka, Halia   More information.
Bola [bnp] 13,746 (2000 census). Population includes 2,253 Harua. West New Britain Province, northeast coast, most of Willaumez Peninsula. Harua is on the east side of Kimbe. Alternate names: Bakovi, Bola-Bakovi.  Dialects: Harua (Karua, Xarua, Garua, Mai), Bola. Harua is a dialect that has developed as a result of a group of people being resettled on an oil palm plantation. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez   More information.
Lihir [lih] 12,571 (2000 census). New Ireland Province, Lihir Island, and 3 smaller islands. Alternate names: Lir.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Tabar   More information.
Nakanai [nak] 13,000 (1981 Wurm and Hattori). West New Britain Province, Hoskins District, northwest coast. 42 villages. Alternate names: Nakonai.  Dialects: Losa (Loso, Auka), Bileki (Lakalai, Muku, Mamuga), Vere (Vele, Tarobi), Ubae (Babata), Maututu.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez   More information.
Tungag [lcm] 12,000 (1990 SIL). New Ireland Province, Lamet District, New Hanover Island, Tingwon and Umbukul Islands. Alternate names: Tungak, Lavongai, Lavangai, Dang.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Lavongai-Nalik   More information.
Ramoaaina [rai] 10,266 (2000 census). East New Britain Province, Kokopo District, Duke of York Islands.Alternate names: Duke of York, Ramuaina.  Dialects: Makada, Molot (Main Island), Aalawa (Aalawaa, Alawa, Mioko, Ulu, South Islands). Makada dialect is very different. Possibly not intelligible to speakers of other dialects.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai   More information.
Uneapa [bbn] 10,000 (1998 SIL). West New Britain Province, Talasea District, Unea (Bali) Island off the northwest coast. Alternate names: Bali.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Bali-Vitu   More information.
Adzera [azr] 20,675 (1988 Holzknecht). Population includes 367 Ngariawan (1978 McElhanon), 497 Sarasira (1988 Holzknecht), 990 Sukurum (1990). Morobe Province, Markham Valley, Kaiapit District, Leron River. Alternate names: Azera, Atzera, Acira.  Dialects: Yarus, Amari, Azera, Ngarowapum, Tsumanggorun, Guruf-Ngariawang (Ngariawan), Sarasira (Sirasira), Sukurum. The dialects form a cluster.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, Markham, Upper, Adzera   More information.
Takia [tbc] 19,619 (2003 SIL). Southern half of Karkar Island, Bagabag Island, and coastal villages Megiar and Serang, Madang Province, Madang District. Dialects: Megiar, Serang.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Ngero-Vitiaz, Vitiaz, Bel, Nuclear Bel, Northern   More information.
Buang, Mapos [bzh] 10,484 (2000). 30% monolingual. Morobe Province, middle Snake River area, Mumeng District. 10 villages. Alternate names: Mapos, Central Buang.  Dialects: Wagau, Mambump, Buweyeu, Wins, Chimbuluk, Papakene, Mapos. Lexical similarity 61% between Mambump and Mangga.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, South, Hote-Buang, Buang   More information.
Bugawac [buk] 9,694 (1978 McElhanon). 40% monolingual. Morobe Province, coast of Huon Gulf. Alternate names: Bukawa, Bukaua, Bukawac, Kawa, Kawac, Yom Gawac.  Dialects: Close to Yabem. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, North   More information.

Papuan Tip (62)

Kilivila [kij] 20,000 (2000 Tryon). 60% monolingual. Milne Bay Province, Trobriand Islands. Alternate names: Kiriwina.  Dialects: Kitava, Vakuta, Sinaketa. Various dialects. Lexical similarity 68% with Muyuw. Kitava Island has 80% lexical similarity.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Kilivila   More information.
Tawala [tbo] 20,000 (2000 census). Milne Bay Province, Alotau District, from Awaiama to East Cape, north and south shores of Milne Bay, Sideia and Basilaki islands. Alternate names: Tawara, Tavara. Dialects: Awayama (Awaiama, Awalama), Huhuna, Kehelala (Keherara, East Cape), Lelehudi, Diwinai (Divinai), Labe (Rabe), Yaleba (Wagawaga, Gwawili, Gwavili, Ealeba), Bohilai (Bohira'i, Basilaki), Sideya (Sideia).  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Are-Taupota, Taupota   More information.
Keapara [khz] 19,400 (2000 D. Tryon). Central Province, coast from east of Hood Peninsula to Lalaura west of Cape Rodney. 3 villages. Alternate names: Keopara, Kerepunu.  Dialects: Babaga, Kalo, Keapara (Keopara), Aroma (Arona, Aloma, Galoma), Maopa, Wanigela, Kapari, Lalaura. Dialect continuum to Hula.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Mekeo [mek] 19,000 (2003 SIL). Central Province, Kaiyuku District, inland, bounded on the west by the Waima, on the east by the Kuni and Kunimaipa. Extends into Gulf Province. Alternate names:Mekeo-Kovio.  Dialects: East Mekeo, West Mekeo, North Mekeo, Northwest Mekeo (Kovio). Kovio is a peripheral dialect. The four dialects are mutually unintelligible to each other's speakers, except for North and West Mekeo, but most Mekeo are reported to have familiarity with neighboring dialects. Kovio, however, is not contiguous to the other dialects. Kovio has 81% lexical similarity with West Mekeo and North Mekeo, and 79% with East Mekeo. West and East Mekeo have 87% lexical similarity. North Mekeo has 99% lexical similarity with West Mekeo and 87% with East Mekeo. Mekeo has 41% with Waima.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear   More information.
Misima-Paneati [mpx] 18,000 (2002 SIL). 4,000 monolinguals. Milne Bay Province, Misima District, Misima Island, Panaieti, and all the islands of the Calvados Chain to (not including) Panawina, Alcester, Ole, and Tewatewan Islands, and Bowagis on Woodlark Island. 32 villages. Alternate names: Panaieti, Panaeati, Paneyate, Paneate, Panayeti.  Dialects: Nasikwabw (Tokunu), Tewatewa. Lexical similarity 33% with Nimowa and Dobu (closest).  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Misima   More information.
Sinaugoro [snc] 15,000 (1991 SIL). Central Province, Rigo District, south of Kwikila. Alternate names:Sinagoro.  Dialects: Ikolu, Balawaia, Saroa, Babagarupu, Kwaibida, Taboro, Kwaibo, Alepa, Omene, Tubulamo, Ikega, Boku, Buaga, Wiga, Vora, Kubuli, Oruone. Boku dialect may be most central. Lexical similarity 70% to 75% with Kalo (closest), 65% to 70% with Hula.  Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Waima [rro] 15,000 (2000 census). Central Province, Bereina District, near Kairuku, shores of Hall Sound, between Yule Island and mainland, 65 miles northwest of Port Moresby. Alternate names: Roro. Dialects: Waima, Paitana, Roro. Roro and Paitana populations are smaller and scattered. Lexical similarity 45% with Kuni (closest), 99% among all three dialects.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear   More information.
Motu [meu] 14,000 (1981 Wurm and Hattori). Central Province, in and around Port Moresby, villages along the coast from Manumanu, Galley Reach, to GabaGaba (Kapakapa). Alternate names: True Motu, Pure Motu.  Dialects: Western Motu, Eastern Motu.  Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara   More information.
Dobu [dob] 10,000 (1998 SIL). 60% monolingual. Milne Bay Province, Esa'ala District, Sanaroa, Dobu, and parts of Fergusson and Normanby islands. 500 villages. Dialects: Galubwa, Sanaroa, Ubuia, Central Dobu, Loboda (Roboda, Dawada-Siausi). Lexical similarity 56% with Morima (closest). Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Dobu-Duau   More information.

Trans-New Guinea (564)

Enga [enq] 164,750 (1981 Wurm and Hattori). Population includes 12,000 in Sau (1990 UBS). Enga Province. The Maramuni are nomadic, and are in the lower reaches of the central range. Alternate names: Caga, Tsaga, Tchaga.  Dialects: Kandepe, Layapo, Tayato, Mae (Mai, Wabag), Maramuni (Malamuni), Kaina, Kapona, Sau (Sau Enga, Wapi), Yandapo, Lapalama 1, Lapalama 2, Laiagam, Sari. Mae is the standard dialect; all understand it. Layapo is between Mae and Kyaka. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Enga   More information.
Melpa [med] 130,000 (1991 SIL). Western Highlands Province, Hagen District. Alternate names: Medlpa, Hagen.  Dialects: Tembagla. Only slight dialect differences.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Hagen   More information.
Kuman [kue] 80,000 (1994 SIL). 10,000 monolinguals. Simbu Province, northern third, overlapping into Minj Subprovince of Western Highlands Province. Alternate names: Chimbu, Simbu.  Dialects: Kuman, Nagane (Genagane, Genogane), Yongomugi.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.
Huli [hui] 70,000 (1991 UBS). Southern Highlands Province around Tari, and southern fringe of Enga Province. Alternate names: Huli-Hulidana, Huri.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Huli   More information.
Kamano [kbq] 63,170 (2000 census). Eastern Highlands Province, Kainantu and Henganofi districts.Alternate names: Kamano-Kafe.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, East-Central, Kamano-Yagaria   More information.
Golin [gvf] 51,105 (1981 Wurm and Hattori). Simbu Province, Gumine District. Alternate names: Gollum, Gumine.  Dialects: Yuri, Kia (Kiari), Golin, Keri, Marigl. Close to Dom.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.
Sinasina [sst] 50,079 (1981 Wurm and Hattori). Simbu Province. Dialects: Tabare, Guna. Close to Dom and Golin.  Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu   More information.

Other Families

Tok Pisin [tpi] 121,000 (2003 SIL). 50,000 monolinguals. Mainly in the northern half of the country, and now well established in Port Moresby, and into other regions. Alternate names: Pisin, Pidgin, Neomelanesian, New Guinea Pidgin English, Melanesian English.  Dialects: There are dialect differences between lowlands, highlands, and the islands. The highlands lexicon has more English influence (J. Holm).  Classification: Creole, English based, Pacific   More information.
Terei [buo] 26,500 (2003 SIL). Southern Bougainville Province, Buin District. Alternate names: Buin, Telei, Rugara.  Dialects: Closest to Uisai.  Classification: East Papuan, Bougainville, East, Buin   More information.
Naasioi [nas] 10,000 (1990 SIL). Bougainville Province, Kieta District, central mountains and southeast coast. Alternate names: Nasioi, Kieta, Kieta Talk, Aunge.  Dialects: Naasioi, Kongara, Orami (Guava), Pakia-Sideronsi.  Classification: East Papuan, Bougainville, East, Nasioi   More information.
Ambulas [abt] 44,000 (1991 SIL). Population includes 27,000 in Wosera (1991 SIL), 9,000 in Maprik (1991 SIL), 8,000 in Wingei (1991 SIL). East Sepik Province, Maprik District. Alternate names: Abulas, Abelam.  Dialects: Maprik, Wingei, Wosera-Kamu, Wosera-Mamu.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu   More information.
Boikin [bzf] 31,328 (2003 SIL). East Sepik Province, Yangoru District. Alternate names: Boiken, Nucum, Yangoru, Yengoru.  Dialects: West Boikin, Central Boikin, East Boikin, Munji, Haripmor, Kwusaun, Kunai, Island Boikin.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu   More information.
Kwanga [kwj] 10,000 (2001 SIL). East Sepik Province, extending beyond the western boundary of Maprik District; Makru-Klaplei Division, Nuku District; Saundaun Province, east of Mehek. 40 villages.Alternate names: Kawanga, Gawanga.  Dialects: Apos, Bongos (Bongomamsi, Bongomaise, Nambi), Tau (Kubiwat), Wasambu, Yubanakor (Daina). A dialect cluster of 5 subdialects, 2 main dialects.  Classification: Sepik-Ramu, Sepik, Middle Sepik, Nukuma   More information.
Bukiyip [ape] 16,233 (2003 SIL). East Sepik Province, west Yangoru District, Torricelli Mountains.Alternate names: Bukiyúp, Mountain Arapesh.  Dialects: Coastal Arapesh, Bukiyip (Mountain Arapesh). Lexical similarity 60% with Mufian.  Classification: Torricelli, Kombio-Arapesh, Arapesh   More information.
Olo [ong] 13,667 (2003 SIL). Sandaun Province, Lumi District. 55 villages. Alternate names: Orlei. Dialects: Payi (Pay, North Olo), Wapi (Wape, South Olo). Related to Yis, Yau, Ningil, Valman. Classification: Torricelli, Wapei-Palei, Wapei   More information.
Mufian [aoj] 11,000 (1998 SIL). Population includes 6,000 Filifita (1999 SIL). East Sepik Province, Maprik District, Torricelli Mountains, west of Maprik. 36 villages. Alternate names: Southern Arapesh, Muhiang, Muhian.  Dialects: Supari, Balif, Filifita (Ilahita), Iwam-Nagalemb, Nagipaem. Classification: Torricelli, Kombio-Arapesh, Arapesh   More information.

ethnographic film

There is such a thing as visual anthropology (Wikipedia) (see the list of influential films), and an ethnographic film unit at ANU (there is a web page on a film about Roti). It occurs to me that linguistic documentation should increasingly use video, and not just for sign language. A good way to embed this in the local culture is to produce short films for elementary school in local languages. This can range from traditional stories, to Sesame Street (or Batibot) like educational programs. This would be particularly helpful in language maintenance programs, where the teacher is not as fluent as resource speakers like elders, who could be captured on film.