Semantical: reviews on meaning & representation

An operational definition of "meaning" (in a formal language)

2010-09-21T00:39:00.000-07:00

Working notion of meaning

... we shall accept that the meaning of A is the set of sentences S true because of A. The set S may also be called the set of consequences of A. Calling sentences of S consequences of A underscores the fact that there is an underlying logic which allows one to deduce that a sentence is a consequence of A.

Wladyslaw M. Turski and Thomas S. E. Maibaum, The specification of computer programs (Wokingham: Addison-Wesley, 1987), p. 4.

cited in: C. M. Sperberg-McQueen, Meaning and interpretation of markup: a report on the Bechamel Project, slides from talk sponsored by the W3C German/Austrian Office at the Fraunhofer Gesellschaft Institut Medienkommunikation in Sankt Augustin, Germany, 1 October 2004.

How would one define "the epistemic sense of the utterance of a natural language expression"? What is the semantic contribution of individual lexical items that occur in that expression?

The slides are about the meaning of markup, which is a technological vocabulary for describing the representamen of a text (a character stream in Unicode, a layer of content and markup, a layer of syntax tree, a layer of Infoset graph). The representamen of a text or expression is described in terms of a type-vocabulary for its parts.

So ...

what inferences are licensed by each element type?
by each attribute?
for each location? (i.e. how do you associate the meaning with a particular instance?)

Part of the use-context of markup is it describes parts of the representamen that currently or at some time may get different treatment, in terms of stylistic rendering or some other (semantic?) application. Element names are typically in the everyday language of users of the text. Attribute names tend to be more technical, of interest more to technologists than end-users.

Some premises

Rules are vocabulary-specific.
The coding a is both visually and semantically parallel to Fa
Definition of F to be provided ...
In many cases, the relevant property has arity > 1: F(a,b), F(a,b,c), ...
As a consequence:

We need deixis.
Argument structure is crucial.

This may embody a rather extensional view of the representamen. The minimal character strings are individuals, an element type is predicated of one or more individuals. Deixis implies a need for reflexive content as well as referential.

Challenges:
technical / plumbing
- distributed / non-distributed properties
- overriding inheritance
- milestone elements
- unique identity of individuals

design / philosophical
- completeness
- fertile valley vs. desert landscape
- meta-markup

A ‘desert landscape’ view

The Wittgenstein transcripts postulate

the manuscript
the transcription
pages
text blocks (main block, left margin, ...)
characters

And possibly also

the von Wright catalog and its entries
words and sentences as described by Duden
people
dates
MECS-WIT version numbers

The name desert landscape is borrowed from W.V.O. Quine.

A ‘fertile valley’ view

We may also postulate

sections
revisions
acts of deletion
insertions
formulae
quotations
names, dates, things, ...

Markup is not used to classify utterances signaling a speech act display. Historically, it emerged to classify a text as a renderable document. This is a display of a persistent artifact, made of parts which are visually distinguishable in the rendering.

Further research on Lexical Semantics of Construal

2010-09-19T02:59:00.000-07:00

Some thoughts about the broader significance of LSCon

The LSCon Methodology could be applied to bilingual resources, fixing a signature that bridges two languages

I am interested in

German-English and English-German (perhaps using the Collins dictionary)

Chinese-English
English-Filipino, Cebuano-English
Indonesian-English and English-Indonesian
Tok Pisin-English

study AktionsArt, as empirically analyzed in PL-Onto and exemplars
develop applications in computer assisted language learning

explore games and interactive learning environments with a NLU interface
another area for collaboration with undergraduates or masters students

What tasks are involved in manually compiling an MRD suitable for conversion into MRS?

cluster KWIC data from a corpus into candidate senses
isolate a distinctive Recognition Situation for a cluster, and enumerate the typical Grammar Patterns
initially characterize the Distinguishing Situations of candidate senses, including implicit participants and logical alternatives (which may still be split off into separate senses)
fix a set of senses, choosing the Head Relations (e.g. the more specific verb to use) in the Distinguishing Sits

Head Relations are the symbols used to partition a word-level concept into sense-level concepts according to a fixed sense signature. For example, catch_capture_hrel and catch_seize_hrel are two senses out of 24. They are called Head Relations, because they characterize the head of a clause with a symbol for the head of an infon.

choose exemplars for each sense and Grammar Pattern
write the definition of each sense
verify the definition against exemplars and the KWIC data. Check there is adequate coverage (all corpus data fits well into a sense, or is close enough, or is an obscure usage)
compare the definition to what is available about related words (near-synonyms, antonyms, semantic sets)

How would these tasks change if a team were to develop a LSCon research directly from corpus data, with no pre-existing MRS?

an MRS could be automatically generated as one of the outputs from the LSCon Resource database
This approach could be used in developing resources for less-studied languages
It would be nice to provide integration with FieldWorks Language Explorer, so that SIL data could be massaged into a LSCon resource. Perhaps it is useful to develop a standard sense signature for bilingual dictionaries with English (and other languages of wider communication).

This could take the form of a defining vocabulary for senses and exemplars in the Source language, with tools for (graphically?) constructing logically precise definitions.
Since included among the target users of a FLEx dictionary are local translators and language workers, who have limited competence in English or other LWC, it is useful to have a restricted and precise defining vocabulary.
In fact, the production of a full bilingual DevelopmentLanguage-English dictionary with well written English definitions and glosses could be postponed if there are enough resources, it is sufficient to create sense definitions that are clear enough in LSCon, then move directly to other tasks like

Producing an English-DevLang dictionary, where local workers do have the linguistic competence to compose good definitions and glosses. Their reading knowledge of English may be sufficient for this work, more than trying to write good definitions and glosses in English. This also has important applications in schooling and literacy education.

This could start with matching the LSCon-EN encoded sense definitions of a DevLang headword from the DevLang-EN dictionary with the closest senses in English (senses from one or more English headword). This ensures that well-described words in the DevLang are maximally utilized in a precise and consistent way.
This approach seems to be more sensitive to the sense-level nuances of meaning, compared to the typical reversal index.
Perhaps I should experiment with doing this for an English-Filipino dictionary for the most common verbs.

Producing a monolingual dictionary in the Development Language. This could include the construction of an LSCon sense signature for that language (informed by the choices made in producing a English-DevLang dictionary, if there is one). It includes encoding the dictionary entries and exemplars using LSCon Notation into a lexical resource that could be readily integrated into existing software and interactive applications.

It might be useful to populate the SIL DDP topics with LSCon-EN senses....

One motivation for producing an LSCon resource during early stages of lexicography is to connect the language to existing NLP software and educational resources. Interactive learning resources and games might be particularly useful for literacy and multilingual education applications.

PL-Onto and common sense

2010-09-19T02:56:00.000-07:00

Part of my concept of developing Lexical Semantics of Construal LSCon is to develop a Populated Local Ontology PL-Onto

The detailed empirical results of PL-Onto in LSCon-EN should be of interest to researchers in AI because it gives a concrete window in several domains of common sense, providing a more precise characterization of what it is.
We assume that the vague notion of common sense is constructed with the help of general concepts about situations and how individuals and situations interact, the PL-Onto level of Situation. This level constitutes (at least in part) the upper ontology of PL-Onto.

The "upper" level of Situation abstracts from the concrete details of other levels. It embodies the common sense intuitions of how situations work across other levels

The empirical work of deriving PL-Onto from MRD data can be done gradually, by layered level. The more basic levels make less complex assumptions about the world. Presumably they use only part of the upper ontology, the more basic the level, the smaller or simpler the part.
The PL-Onto level of Physical builds on intuitions that the world is nothing but res extensa, physical bodies made of matter and occupying space and time. These bodies have physical qualities that change, and interact because of forces. The source of force is not analyzed in detail, though it may be the movement of some animate creature, an Agent. Clauses describing a Physical situation may not mention an underlying animate cause of interaction, and the SUBJ in the clause may be some inanimate participant which we can call an Actant (or Sowa's Initiator), a thematic role with fewer assumptions than Agent.
The PL-Onto level of Animate builds on intuitions that Actions in the world are caused by the movement of animate creatures, various species of animals or humans seen primarily as moving creatures with bodies (ignoring specifically human capacities, social dimensions, or mental life).

[does "function" begin here, teleological functions relating to assumed biological regularities that are long-range and homeostatically maintained by biological systems?]

The PL-Onto level of Human allows the expression of basic common sense intuitions about how humans interact with each other and the world.
The PL-Onto level of Institution allows the expression of common sense intuitions about social institutions, and how humans interact socially.

[Should Artifact be split off, to characterize objects that are intentionally constructed with a "functional" purpose? Need to investigate nouns first. Perhaps verbs like "operate, build, construct" should be moved to level Artifact]
[examples, what they tell us about common sense]

The PL-Onto level of Attitude allows the expression of common sense intuitions about Intentionality [Searle]

This level embodies "folk psychology," providing Reasons for Action from the unseen domain of internal mental causes for human action. [see Dretske]
The scheme of individuation at this level has an intrinsic 1&3P character, so intuitions about the phenomenality of mental states can be accounted for at this level.
Although mental states in others are unseen, they are readily inferrable from behavior (and language reports) as causes for human Action. Causality at this level is likely very different from physical causes when the world is seen as simply Physical.

The intuitions about these differences underly Descartes distinction between res extensa and res cogitans. Thinking things, the mental state of thinking creatures, appear not to have physical extent since they are unseen. Descartes, working under the influence of a neo-Platonist tradition about ousia (Substantive-entities), considered primarily the Type-level of mental entities, taken to be immutable Forms. As a more modern alternative conception, we can consider Type-level of mental entities to be location-parameter-absorbed situation-types where humans are in a mental state. With a nod to Perry's antecedent physicalism, there can be a purely physicalist account of the Token-situations that ground these situation-types, working within a physicalist 3P scheme of individuation. However, situation-types that are intrinsically mental are individuated through a 1&3P scheme that associates phenomenality with mental entities, providing some kind of insight into the "hard problem" of consciousness
Common sense intuitions about mental states, the stuff of folk psychology, give humans readily inferrable information about unseen causal states underlying the Actions of others. PL-Onto at this level gives a detailed characterization of (linguistic aspects of) common sense about Intentionality in human Action. It allows a characterization of what is cognitively accessible to humans as they behave motivated by mental reasons for action. This provides a window into the "easier" problem of cognitive access (what Nagel earlier called access consciousness).

This characterization does not depend on functionalism or teleology, it can derive functions and purpose from more basic entities of mental states, as expressed in the content of utterances expressing common sense intuitions about Intentionality in human Action. Functionalist and teleological explanations are not necessarily wrong, but they may be a bit backward in explaining cognitive processes from the outside in. It is admittedly difficult to come up with alternative explanations starting from "inside" the head with unseen mental states. But this task becomes easier if we make use of the window into mental states provided by the empirically-derived PL-Onto level of Attitude (and Language).
Mental state may not be epistemologically-objective because they cannot be directly seen (humans only physically "see" behavior, not mental causes). However, mental states can be readily inferred from behavior, and reliably confirmed by subsequent behavior and language. Humans can indirectly "see" mental causes of behavior through the lens of folk psychology.
Once we accept as valid both direct observations (epistemologically-objective) and indirect observations as valid empirical basis of science, we can develop a rigorous cognitive psychology account of the phenomena characterized by common sense or "folk" psychology. The indirect observations are not arbitrarily subjective, they are intersubjective is consistent and observationally-confirmable ways.
The subject matter of this characterization of mental states as entities for scientific study has been rejected out of hand the Behaviorism and related tradtions in experimental psychology. Mental states are taken to be ontologically-subjective, as well as epistemologically-subjective. This is a methodological error in psychological research, grounded in unhelpful philosophical assumptions about the lived world of psychological phenomena. The alternative is to treat mental states as ontologically-objective real entities seen through the lens of human schemes of individuation, specifically the 1&3P scheme of individuation mental states. In fact, modern technology discovered after the demise of Behaviorism allows researchers to physically "see" traces of mental states using fMRI, PET, advanced EEG, etc.
It will be difficult to characterize mental entities if only approached "bottom-up" from the physical tokens of mental phenomena. If we use the indirect inter-subjective evidence of verbal reports about mental entities, a "top-down" approach, we can come out with precise theoretical accounts that can be confirmed or refuted by physical evidence. Extracting the PL-Onto levels of Attitude and Language depends on available print resources that characterize lexical knowledge (of the English speech community) and a window into common sense about Intentional mental states.
This can be helpful to the scientific study of cognition, and also to weak AI technologists who need to characterize common sense precisely enough for some practical purpose.

Therefore, we can move beyond Cartesian dualism and start treating mental entities as ontologically-objective real entities within the purview of cognitive science, observed using direct physical and indirect inter-subjective methods. The inter-subjective methods are reliable and confirmable. Extracting PL-Onto from lexical resources is a way of moving forward with this method to characterize (the linguistically accessible part of) common sense about human mental states.

So the level of Attitude characterizes the vocabulary used in talking about the lived world of mental life, where Actions are caused by mental states inferred but not seen in other humans.
Although mental states in others are unseen, they are inferred tokens of "the same" types that a human uses to classify their own mental states. This helps account for the intuition that there is something "it is like" [Nagel] to be a human experiencing a mental state. Through the shared 1&3P scheme of individuation of types of mental states (a subset of which are the verbal concepts for mental states expressed in verbs of propositional attitude), humans can infer states that have properties of cognitive access and phenomenality [what Nagel earlier called access consciousness and phenomenal consciousness].

The PL-Onto level of Language allows the expression of common sense intuitions about utterances using the human language capacity.

Lexical Semantics of Construal

2010-09-19T02:51:00.000-07:00

Concept of LSCon

LSCon is a knowledge representation technology

I also use the acronym to refer to the underlying methodology and approach to characterizing the semantics of words in construction as a contribution to a broader process of construal. This is studied in a principled way within a broad cognitive linguistics framework, but with immediate applications in mind. In case of ambiguity, this underlying approach to designing lexical resources can also be called LSCon Methodology
LSCon Notation refers to the specific notation as it develops for modeling English with the Sense Signature fixed by
a single MRD source (COBUILD). The notation is influenced by CG and EKN: Conceptual Graphs of John Sowa and the ISO standard, and the Extended Kamp Notation of Situation Theory and Situation Semantics. See also Controlled English.
The trial lexical resource (and its specific variant of the notation given the fixed sense signature) that is built during research in LSCon is tentatively called "the LSCon Resource for English" or LSCon-EN, but this may be replaced by another name: SenseNet? Sense Graphbank?

The initial resource includes an encoding of definitions, including the Distinguishing Situations of the most basic senses of lexical entries the most frequent verbs in COCA (plus nouns mentioned in definitions).
It also includes an integrated encoding of exemplars, one or two for each Grammar Pattern

Perhaps some or all exemplars will be taken from a different corpus from COBUILD's BoE, like COCA
This will help validate the sense signature, but takes more work

The initial resource is a derived work from COBUILD, and may be subject to a restrictive copyright licensing

A future lexical resource may be developed with a liberal Creative Commons license

This may involve generating a new sense signature from a corpus, consulting and synthesizing multiple dictionaries (so that the result is fair use of various dictionaries, not similar enough to any one to be called a derived work. Do we need clean-room methods for this?)

LSCon has layered dependencies or interfaces with other domains of cognition and language processing

It is layered above phonology-syntax (initially HPSG and/or SBCG, using MRS or an extension at the syntax-semantic interface).

This includes pre-semantic uses of context, in Perry's terms (R&R 3.4)

It is layered above a populated ontology of encyclopedic knowledge. This is fixed by the sense signature, but it involves discovering an upper ontology that may be substantially universal or cross-linguistic.

Each lexical entry interfaces to encyclopedic knowledge, via a populated local ontology. This includes certain concepts underlying the substantive lexemes in the definition, specifically the information conditions expressed in Distinguishing Sits of each sense.

I have not yet decided whether to "chase down sense distinctions" so that the concepts used in the local ontology are sense-level rather than word-level

The union of populated local ontologies of all lexical entries, including the shared upper ontology, is called PL-Onto.
The upper ontology, empirically derived rather than theoretically motivated, is called Upper PL-Onto. It is theoretically influenced by situation semantics, perhaps also Peirce's triadic schema of signs and Sowa's upper ontology in Conceptual Structures.
PL-Onto is expected to be layered into levels, which are derived empirically per sense (not per lexical entry)

Initial levels include

Situation (mostly incorporated into the upper ontology)
Physical - assumes only bodies and substances interacting in space, sources of force are unanalyzed
Mixed - default level, senses that don't clearly fit elsewhere
Animate - the source of force or causation is an animal (including hominids, but not requiring human-specific capacities)
Human - a human agent is involved in a way that invokes human-specific capacities not classified in subsequent levels
Institution - involves dependencies on human social institutions such as economic institutions, marriage, schooling, etc. There may be a need to further distinguish technology and artifacts, but the initial analysis of verbs does not make this obvious
Attitude - this is the first level to distinguish mental states. It is called attitude because it includes verbs of propositional attitude (belief, desire, etc.)
Language - entities at this level involve human language capacity in essential ways

Part of the motivation of layering is to simplifying analysis by limiting the parts of the upper ontology (essentially) required for modeling.
These levels seem to capture the interface to "common sense" required by lexical entries. See the separate note on "PL-Onto and common sense"

LSCon has dependencies or interfaces with other domains that are expected to be less influential (at least on the development of LSCon), since LSCon is layered below them (more or less) in the anticipated architecture of cognition. LSCon or related outputs feed into

DRS processing, at the semantic level (resolving anaphora) and post semantic level (anchoring variables to a state or context, perhaps via some worked out dynamic semantics)
Designation processing. This includes the inference of referential content and its anchoring to individuals and worlds fixed by context. LSCon covers the ground of an intensional logic, Frege's Sinn. Designation processing is using that Sinn to get to a Bedeutung (reference, denotation).
Near-side pragmatics (which may includes parts of Designation processing)
Far-side pragmatics, including speech acts, communicative intentions, conversational implicatures
Post-semantic connections to encyclopedic knowledge. This includes specialized knowledge domains, in contrast to the very generic (common sense) knowledge domains of the layered levels of PL-Onto.
Interface of lexical information with other cognitive capacities, such as linking lexical information about physical object to "visual objects" in a scheme of concept types from Visual Object Recognition

A KR technology is focused on structured data representations, and is relatively separate from the algorithms that use the representations.
LSCon is envisioned to be used in algorithms for cognitive construal. It is hoped that it will be a "broad spectrum" resource that is of interest to a wide range of researchers and technologiest. Parts of that range can include people investigating:

logical processing, like DRS. Even if DRS, because it is closely tied to referential content, prefers to stick with word-level symbols, the sense-level distinctions of LSCon may be useful at intermediate stages then discarded
parallel distributed processing
modeling of functional schemas in the brain, like Arbib (and the Finnish guy in the US whose name escapes me)
Quillan's semantic networks (?)

Many of these uses, and their algorithms, go beyond semantics narrowly defined
The major algorithmic use of LSCon that I expect to develop as part of research into LSCon is the construction of the Semantic Contribution of a sentence (and its MRS)

Initially, I am most interested in clause constructions. Perhaps I should simplify NP's to just their heads plus a few other constituents relevant to the Distinguishing Sits of the candidate senses.

Sem Contrib is intended to be context-independent

It avoids crossing the interface to other neighboring domains of construal.

in practice, it may be necessary for any implementation to perform certain context-dependent processes in parallel (notably DRS processing and Designational processing of the referential content). I think it will be helpful to keep those conceptually separate at this stage, and see how much can be achieved by semantic representations independent of everything else.

The algorithms to construct Sem Contrib from an input sentence and a lexical resource are called Semantic Integration.

This is intended to be psychologically plausible, it is a fragment of the psychological (and neural) processes of cognitive construal of languistic information.
Any implementations of Sem Contrib and linguistic construal are expected to make large and small departures from psychological plausibility, for whatever technological or empirical reasons are at hand.

The integration of Sem Contrib is expected to be relevant to a variety of tasks, including certain shared tasks useful for evaluating the state of the art of language technology

The WSD task may include DRS and Designational processing, but Sem Contrib may be the critical factor in resolving senses within a signature
The SRL task using LSCon would involve mapping to the lexical-entry local participant roles
It would be interesting to see implemented algorithms that integrate information from LSCon and other resources, such as WordNet, FrameNet, and Pustejovsky-style resources in the Generative Lexicon tradition (I believe both Pustejovsky and Buitelaar have resources). The resources of Martha Palmer and her students may also be of interest.

Perhaps undergrad and masters students might want to explore some of this.

Other shared tasks or variants could be designed using a LSCon resource

Precursors to Language

2010-09-04T16:44:00.000-07:00

[added on Feb 2011]

Hominids experience reality and classify it. They classify scenes visually into visual objects that participate in visual scenes. Some of the visual scenes are action-scenes. Action-scenes involve other hominids visibly and audibly behaving, with their body parts like faces and eye-gaze and hand movements, and with extended vocalizations segmented into parts with different articulation. [The precursor to linguistic classification capacities is the segmental, and thus discrete, classification of display action-scenes. The breakthrough to linguistic classifications can work without vocalization, as demonstrated by the emergence of Deaf sign languages repeatedly in many communities in history. However, the typical case, which likely interacted with the biological evolution and cultural development of behavioral modernity, is that display action-scenes incorporate vocalizations classified phonemically. The breakthrough builds on prior cognitive capacities related to social behavior, conventionalized routines and intentionality. Behavior in the linguistic domain is unambiguously attained with the systematic intentionality of conventionalized phoneme sequences (or in the less common case, conventionalized sign lexical items). The intentionality of linguistic actions is marked by a sharp demarcation between the display situation of the source signal, and the described situation that becomes the cognitively accessible "content" of a linguistic display or utterance. The link is mediated by a shared lexical scheme of individuation, so that recovery of the source signal during the linguistic recognition performance of a listener depends critically on the conventionalized-arbitrary lexical association between phonemically-classified display and "cognitively established content". It is the shared lexical scheme that establishes what pragmatically available significance of vocalization displays is established. The established lexical senses extract a semantic level of meaning that is independent of the particular utterance situation, and indeed the broader linguistic context of usage. To be an established sense it is sufficient to associate with the phonemic form of a lexical item a discrete "semantic contribution" in term of other (more or less basic) lexical items in the scheme.]

[The level of "content" in the described situation is thus abstracted from particularities of the utterance display situations, with the important exception of indexicals. Content is in general practice transparently referential. (But see Perry's reflexive-referential theory of content, which analyzes the importance of additional levels of content to explicate indexicality and other reflexive phenomena, even proper names. ]

Hominids classify a visual scene into an action-scene if it involves a hominid (the Agent a) behaving in a known pattern of behavior [as mentioned by a verb], with specific success-conditions. For example, an a-grabs-f situation is one where an agent a moves parts of their body to make a significant change in the scene. Before the action, object f was located in the scene but not located in the hands of of a. If the scene satisfies the conditions to be an a-grabs-f situation, the scene changes so that at a later time f is in the hands of a.

Event:
Prior sit:
Resulting sit:
t1 part-of t, t2 part-of t, t1 before t2

We can say that an action-scene or event is the part of the world that can be classified according to such conditions. Let us say there is a hominid s, that classifies the visual scene before it as an a-grabs-f scene. Then s can visually track the various individuals involved in the event, and tracks the n-ary condition [which underlies the sense of the verb] picked out by that classification, which are relations that individuals stand in (or not) in the scene. If s registers a scene as falling under the classification a-grabs-f, they can remember it as such. If a few days later, the see a grabbing another object f2 of the same type F as f (a physical object, let us say food), they can classify the new scene as the same type of situation as earlier, call the situation-type a-grabs-F. The a-grabs-F situation-type is more general than the earlier a-grabs-f action-scene (a singular scene, involving fully identified individuals), because the participant F is parametrized to a type of object F. Similarly, s can classify the actions of another hominid b in a similar way, according to a still more general situation-type A-grabs-F. [A is the type of some particular hominid who takes the role of an Agent in the action-situation so classified.]

The world in which hominids live is full of regularities, and their ability to recognize and remember those regularities allows a troop of hominids to be successful in survival, reproduction and maintaining group cohesion. Some regularities are related to others. For example, the action of grabbing results in a situation where A-holds-F. For purposes of illustration, we do not treat hold as a significant change in a scene, but as a significant continuity or stative relation. An event that involves some A holding an F involves a stative relation rather that a proper change-based action (admittedly the boundary between actions and statives can be fuzzy or arbitrary, at some earlier time, the agent must have acted to come to hold the object).

Event:
Prior sit:
Resulting sit:
t1 part-of t, t2 part-of t, t1 before t2

Still another related situation-type to the grab action is a catch action, where the object of type F was moving in the prior situation, and when it is caught and held it no longer moves. We can distinguish two variations (at least) of the catch action. If the object is of an animate type G (it is a hominid, or an animal) it is able to move on its own and avoid the catching action of the agent. The agent will often have to chase the object it wants to catch, and may have to use some instrument like a rope or net to restrain its movement. We will call this variant a catch-capture situation. If the object is of an inanimate type H, it is typically moving because it is falling through the air due to gravity. We call this variant with inanimate objects a catch-seize situation. We can characterize the conditions for a scene to be classified as a catch situation as follows:

• Event:
o Sense1-event:
 Prior sit:
•
•
•
 Resulting sit:
• OR
•
• <>
• <>
•
 t1 part-of t, t2 part-of t, t1 before t2
o Sense2-event:
 Prior sit:
•
•
• <>
 Resulting sit:
•
• <>
• <>
 t1 part-of t, t2 part-of t, t1 before t2

Our hominid s can observe the scenes around her, and if they involve individuals of the appropriate type, s can classify situations that involve grabbing, holding and catching. We say that s has a long-term cognitive memory, with a scheme of individuation (for situations involving individuals and relations) that is attuned to precisely those types of situations, and many others. Our hominid can track scenes, classifying them as being in a certain type of situation, or as not being in that situation-type, using their working memory to judge if a situation sit1 is of a certain type (for example: a-grabs-f) or not. We say that hominid s believes that sit1 is of Situation-Type-A if her working memory tracks the relevant individuals as standing in the relations specified in the conditions for Situation-Type-A.

Hominid s lives in a troop with other hominids, and classifies their actions in ways that are relevant to the continuing social life of the troop. A hominid can call the attention of others in the troop to a certain situation of a type, perhaps by a vocalization display or by shifting their gaze in a way that can be observed by others, or by pointing. We call these attention-directing displays as referring actions. If the belief state of s is that the scene they ["they" = s and troop members who can observe her display] can see is significant because it is of a certain type, they can call attention to the scene or to the individuals involved by referring displays. Members of the troop are attuned to the situations that significant others find important, which is important to the cohesion of the group. Young hominids are socialized to be aware of the same types of situations as adult members of the troop, and develop the same sort of cognitive scheme of individuation as the others in the troop.

[In support of the idea of referring actions using monitored gaze-shift, we have the way that hominids apparently have some selectional advantage by having white scleras in their eyes. This make more prominent a referring display by shifting the gaze from one location to another.]

This framework allows us to propose a scenario for the evolution of language among hominids.

old NYT article on origin of language

2009-08-03T00:30:00.000-07:00

Early Voices: The Leap to Language
By NICHOLAS WADE
New York Times
Published: Tuesday, July 15, 2003

references: language documentation

2009-07-29T03:38:00.000-07:00

Gippert, Jost, Nikolaus P. Himmelmann and Ulrike Mosel. 2006. Essentials of language documentation. Berlin: Walter de Gruyter.

This volume consists of essays from many linguists in the field of language documentation covering a range of subjects including community fieldwork, ethnography in linguistic fieldwork, annotation and archiving methods. [Himmelmann, in Germany, has worked on Tagalog, and collaborated with Australians]

notes: models for using jyutping

2009-07-28T19:49:00.000-07:00

Models for standard Cantonese in Education and Social Life

For initial literacy (perhaps focused on families of non-Han migrants), use a system of Jyutping+Hanzi similar to Japanese writing (kana-kanji). Only introduce characters for reference if they are not the most frequent full homograph.

Use Jyutping translations of English science texts (retaining English technical terms with Cantonese glosses), then teach from the English texts later.

1. Occitan - language shift to national norm, historical vernacular disappears as a living language

2. Swiss German - Bilingual, vibrant spoken language (and sung in Cantopop) but no interest in written form, defer to a larger standard form

3. Frisian - No university teaches it, but social space enlarging.

4. Letzeburgisch - Generally accept a larger external
language as standard for literacy and writing, a belated interest in promoting vernacular into a standard

5. Catalan - Bilinguals but vigorously defend social position of written standard from vernacular. Resist language shift, social policy to promote children of non-locals to become fluent in local vernacular.

6. Dutch - Full standard language, related foreign standard
is seen much like a foreign language.

Negative model:
. Bokmal and Nynorsk - disputed standard, especially
for newspapers and fangyan characters

From
John DeFrancis, “The Prospects for Chinese Writing Reform”
Sino-Platonic Papers, 171 (June, 2006)

The Zhuyin Shizi, Tiqian Duxie ‘Phonetically Annotated Recognition Promotes Earlier Reading and Writing’ experiment came into being in 1982 in the northeast province of Heilongjiang

Reference:

John S. Rohsenow, “The ‘Z.T.’ Experiment in the PRC,” Journal of the Chinese Language Teachers Association. 31, 3 (1996): 33-44.

links on learning Chinese Characters

2009-07-28T19:28:00.001-07:00

From Wikipedia HSK article:

From MDBG:
练习 Practice

ABC toc at Pleco

Encounters

Moschovakis uses Typed Lambda Calculus for the Semantics of English

2009-06-18T07:00:00.000-07:00

A LOGICAL CALCULUS OF MEANING AND SYNONYMY
YIANNIS N. MOSCHOVAKIS
date: December 13, 2004
Linguistics and Philosophy
source: A logical calculus of meaning and synonymy , Linguistics and Philosophy, v. 29 (2006), pp. 27 -- 89.

Montague: meaning (= Frege: sense) of term A is its Carnap Intension or denotation(A)(a) for each state a (= possible world, time, context of use). Too coarse for synonyms.

Other extreme: "structural" approaches to the modeling of meaning (like Russell's propositions,[n3] Church [1946], Church [1974] and Cresswell [1985]) basically tell us no more than that "the sense of a complex term A can be determined from the syntactic structure of A and the senses or denotations of the basic constituent parts of A", without explaining how this "determination" is to take place.

Judy Pelham and Alasdair Urquhart [1994], Russellian propositions, Logic, Methodology and Philosophy of Science IX (D. Prawitz et al., editors), Elsevier Science.
Alonzo Church [1946], A formulation of the logic of sense and denotation, abstract, The Journal of
Symbolic Logic, vol. 11, p. 31.
Alonzo Church [1974], Outline of a revised formulation of the logic of sense and denotation, part II,
Nous, vol. 8, pp. 135-156.
M. J. Cresswell [1985], Structured meanings: The semantics of propositional attitudes, The MIT
Press, Cambridge, Mass.

- This is bottom-up compositional, does not allow the construction to contribute to meaning. The syntactic structures are empty combinatorial rules, without any meaning beyond what is determined by its constituents (deductively determined, not abductively-probabilistically inferred).
- but if the method constructs an algorithm (semantic contribution) before getting to the denotation (the explicit core of pragmatic interpretation), could that algorithm be construction-dependent rather that language-wide and uniform? Would it be enough to have every sense license a particular construction that triggers that sense for the wordform?

Davidson's eloquent criticism in Davidson [1967],
Theaetetus and the property of flying do not (by themselves) amount to the meaning of "Theaetetus flies"

Donald Davidson [1967], Truth and meaning, Synthese, vol. 17, pp. 304-333, reprinted in Martinich
[1990] and in Davidson [1984].

- the meaning of the verb in use is not the (property of) the action but a situation type involving the participants dependent on the predicative verb; or better, it is both an antecedent situation-type involving the particiapants and a related consequent situation-type where the action is realized with participant-role relationships between the instantiated action-type and each mentioned or implicit participant). This verb-meaning situation type exists in the discourse situation cognitively shared by speaker and hearer, their shared information state.

In Moschovakis [1994] I argued that the meaning of a term A can be faithfully modeled by its referential intension int(A), an (abstract, idealized, not necessarily implementable) algorithm which computes the denotation of A. The basic technical tool in that paper was the Formal language of recursion FLR, [for rendering NL as formal]

Yiannis N. Moschovakis [1994], Sense and denotation as algorithm and value, Logic colloquium '90

[note4] full rendering operation is of the form

natural language expression + context -- render--> formal expression + state

where the (informally understood) context determines not only the state (as we will make it precise in Subsection x2.2), but also which precise reading of the expression is appropriate and what formal transformations should be made (e.g., co-indexing), depending on information about "what the speaker meant", intonation, if the expression was spoken, punctuation and capitalization, if it was written, etc.

- I take this to mean:

NL expr + partly-construed situation --render-->
formal semantic contribution
+ anchoring situation (concrete or discourse) as construed in a verbal scheme (= parameter-resolved psoa + location + context of use (Wittg aspect))

I will not specify with any precision the all-important rendering (or translation) operation... I think that the theory of what-happens-next proposed here may be of some value, primarily because of two reasons.
> First, the modeling of meanings by referential intensions goes far beyond the imagery and analogy with computation often used to explain the relation between Frege's sense and denotation, especially by Dummett.5

M. A. Dummett [1978], Frege's distinction between sense and reference, Truth and other enigmas, Harvard Univ.ersity Press, Cambridge, pp. 116-144.
G. Evans [1982], The varieties of reference, Clarendon Press, Oxford, Edited by J. N. McDowell.

> Second, the formal processing of L^{lambda}_ar-terms (the "calculus" of the title) sets conditions and limitations on the rendering operation, it provides new ways to implement some syntactic transformations which affect meaning (like co-indexing and co-ordination), and for some English phrases, it suggests some plausible, novel renderings directly in L^{lambda}_ar which are not referentially synonymous with any terms of the typed {lambda}-calculus.

x1. The typed {lambda}-calculus with acyclic recursion, L^{lambda}_ar. The language L^{lambda}_ar is a typed calculus of terms, an extension of the two-sorted type theory Ty_2 of Gallin [1975][sec.8]
into which the language of intensional logic LIL of Montague [1973] can be interpreted by Gallin's Theorem 8.2.[note6]

Daniel Gallin [1975], Intensional and higher-order modal logic, North-Holland Mathematical Studies, no. 19, North-Holland, Elsevier, Amsterdam, Oxford, New York.

- Moschovakis builds on the theory of typed functions, typed substitution-evaluable relations. Russell originally made a (ramified) theory of typed sets. What are typed situations? The are set like collections of individuals_s, properties_s on individuals and relations_s on pairs and sequences of individuals, where properties_s, relations_s and situations can also be individuated (reification in a cognitive schema?).

- can we make a typed calculus of terms for rendering natural language sentence in the scheme of a constructicon? A Natural Semantic Rendering Formalism, that is not truth-conditional but considers conditions of information flow and conditions of satisfaction up to a shared semantic contribution of (only) the expression. Instead of just types e~ and t~, we can have entities (individuals) e~, properties and relations r~_i for i=1..n (where n is the number of participants in the largest frame, let us call it less than 7, Miller's number for working memory), situations s, and j~ for information flow values (whether a concrete or discourse situation supports a situation-type or basic infon). Going beyond Russellian propositions, we consider that verbs have many senses, and that constructions also contribute sense-like meanings. We want our representation to support not simply deductive inference but abductive sense extension, so we can infer not only information that is already there in the expression, but can learn about the world and acquire extended schemas for classifying it.

- We would like to build on DRS, but again at the sense level rather than the proposition level. Perhaps we can refer to each signalled sense as a microsign, and we are interested in its semantic contribution to pragmatic construal.

- in SBCG, every verb is taken to have one _rel. If we model at the granularity of senses, one _rel per sense. However, in Sowa's conceptual graphs, a verb is a relational concept that has (labelled) relations with each participant mentioned in the expression. Using FrameNet, but at the level of senses + constructions, we can have a small set of participant-relations that identify how the mentioned individuals relate to the event-situation of the verb, call them _prel. We may want to classify each _prel in a way local to the frame, or using a language-wide or universal collection of generic _prels (agent, accessory, goal, location, instrument, beneficiary). We can have grammatically-compulsory _prels and optional (adjunct) _prels.

the set of types is the smallest set which includes the distinct "symbols" e; t; s
and is closed under the pairing operation ({sigma}-->{sigma}). A type is pure (or state-free) if the state type s does not occur in it.

- Moschovakis only has a single pairing operation to generate his types, perhaps we need several.
- M uses the Curry-Howard isomorphism to handle multiple arguments to a function. We may instead shift to a finer grain level of _prels relation a verb's event-situation and its participants.
- in DRS we resolve referential indexes by equating i=j. But for events, we may need to place them in a partial order of a consistent discourse situation (one channel) and be prepared to shift channels. This is done pragmatically so we can abductively infer the best shared information state for information to flow in a successful communication context. Within this broad contingent pragmatic field, the semantic contribution is more predictable across similar and distinguished situations.

Constants, variables, terms

We assume given a (finite) set K of typed constants, the "vocabulary", and we write c : {sigma} to indicate that c has type {sigma}
For each type {sigma}, L^{lambda}_ar has two infnite sequences of variables,
- the pure variables v^{sigma}_0, v^{sigma}_1, ... and
- the recursion variables or locations p^{sigma}_0, p^{sigma}_1, ...
Syntactically, pure variables are quantified, while locations are assigned-to.

Terms are defined recursively, starting with the variables and the constants and using application, {lambda}-abstraction and (mutual) acyclic recursion. The definition also assigns a type to every term and specifies the free and bound occurrences of variables in it.

- locations are used for referential indexes

"referentially synonymous"

Formally, congruence is the smallest equivalence relation =_c between terms which respects alphabetic replacement of bound
variables (of both kinds), application, {lambda}-abstraction and acyclic recursion, and permuting the indexes of locations (so the system of assignments is a set, not a sequence)

Both the denotational and intensional semantics of L^{lambda}_ar(K) will respect congruence, and so we will sometimes tacitly identify congruent terms.

- we can model the acquisition of new vocabulary in a known wordsense group as associating a new constant with some idiosyncratic meaning to a wordsense group existing in the constructicon. If the new vocabulary item (a new form, or a new sense of an existing form) fits into the pattern and makes sense in terms of semantic analysis and pragmatic construal, it supports abduction to a particular semantic and pragmatic meaning, and the constructicon is (defeasibly) incremented with the new vocabulary item.

{beta}-conversion almost never preserves meaning, just as logical deduction does not--otherwise all theorems would be synonymous, which is absurd

States. To be specific,we will assume in this paper that a state is a quadruple
a = <>
which speciÞes a possible world i , a moment of time j, a point in space k, a speaker (or "agent") A, and a function ä which assigns values to all possible occurrences of proper names and demonstratives, indexed by the order in which they appear in
terms

More interesting for the natural language examples are the state-depended versions of these [logical] operations, [where t evaluates to 1, 0, or er] summarized in Table 3.

We assume the language has a constant [] for the basic ne-
cessity operator, Montague's "full necessity", or "necessarily always", as Thomason calls it.
Kaplan [1978b] argues convincingly that this interpretation is inappropriate for terms which contain demonstratives, but in our determination to avoid philosophical commitments, it is best to allow his interpretation as a de re reading of the modality, without forbidding the de dicto reading.

David Kaplan [1978b], On the logic of demonstratives, Journal of Philosophical Logic, pp. 81-98, reprinted in Salmon and Soames [1988].

The natural definition of the description operator returns an
error if the existence and uniqueness conditions are not fulfilled

Local, modal

An object is local[n16] if each value p(x; a) depends only on x(a) and not on any other values x(b).

[16] See Montague [1973][Section 4]. Montague and Gallin use extensional and intensional for our local and modal, but this adds one more use to the already overloaded extension-intension distinction and suggests a connection between modality and meaning which is not in the spirit of this article.

What seems (at first) surprising is that some common nouns and verbs are also modal, in this abstract sense, and that the
distinction is worth noting.

e.g. the temperature is rising

then rises cannot be reasonably interpreted by a local object: because we cannot tell whether the temperature is rising in state a from the mere knowledge of its value in state a.

For another example, consider the sentence
the color of the sky ranged from light pink to deep, brooding red;
the verb "ranges" is modal in this usage since to determine whether ranges(color; a) we must evaluate color(b) for various states b which differ in "observed location" from the current state a--assuming, for the example, that "observed location" is
part of the state.

Roughly speaking, co-indexing occurs when the references of one or more indexical expressions in a term are identified with that of a subterm by the introduction of a bound variable which refers to all of them.

Co-indexing is part of the rendering operation, since whether and how it should be done is determined by the informal context

One of the most original innovations in Montague [1973] is the interpretation of "John", "I" and "the blond" by quantifiers, of type ~q = (~e -> ~t) -> ~t (in the present system). I will not adopt it,
however, because the Montague renderings produce the wrong logical form for the syntactical expressions that they purport to formalize, and thus lose the intended meaning.

The evening star is the morning star (should be synonymous to) The morning star is the evening star

It is not hard to formulate rules for rendering which avoid unnecessary type-raising and give plausible results for (at least) simple expressions which involve singular terms or quantifiers (or both). The basic technique is known as type-driven rendering (or translation), cf. Klein and Sag [1985] or the more recent textbook Heim and Kratzer [1998][Chapter 3], where it is applied using phrase structure trees to represent meanings.

Ewan Klein and Ivan A. Sag [1985], Type-driven translation, Linguistics and Philosophy, vol. 8, pp. 163-201.
Irene Heim and Angelika Kratzer [1998], Semantics in generative grammar, Blackwell.

For our purposes here, the main lesson is that meaning (intuitively understood) must be seriously considered in the rendering process--simply "getting the right denotation" is not enough; and that the subsequent, formal computation of referential intensions and synonymies provides some clues as to whether the informal meaning
was captured by the proposed rendering.

We call cf(A) the canonical form of A and we write
A =>_cf B () <=> cf(A) ==_c B:
The terms A0,A1,...,An are the parts of A, and A0 is its head. It will be convenient to employ the notational convention
A where { } == A
introduced in (4), which allows us to assume that all canonical forms look like
recursive termsÑperhaps with an empty body:

10 reduction rules

We claim that it preserves meaning, so it had better preserve at least denotations: Thm 3.11

Proof is simple, by induction on the definition of the reduction relation.

Main Conjecture. If the set of constants K is finite, then the relation of referential synonymy between closed terms of L^{lambda}_ar(K) is decidable.

Still open.

For a satisfactory development of a theory of belief in which the belief carriers are utterances, we would also need to establish the decidability of synonymy between the parts of utterances in which the parameter a occurs. see:

Eleni Kalyvianaki and Yiannis N. Moschovakis [], Two aspects of local meaning, in preparation.

This is because, intuitively: if you mention an individual concept,
then that (full) concept is part of the meaning of your utterance.34 In the two puzzles above, Los Angeles, LA, He and Scott are all parts of the relevant terms, but Los Angeles = LA, which dooms poor Petros, while He 6= Scott, which saves the King.
We have already discussed in x4.2 the technical fact behind this claim: the state parameter a occurs only in the head of the canonical form of an utterance A(a) and not in its body.

5. English as a programming language.

(50) program P |--> algorithm(P) |--> den(P):
It is not hard to work out the mathematical theory of a suitably abstract notion of algorithm which makes this work; and once this is done, then it is hard to miss the similarity of (50) with the basic Fregean scheme for the interpretation of a natural language,
(51) term A |--> meaning(A) |--> den(A):

Aside from the relation between algorithms and meanings, programming languages resemble natural languages more than they resemble the classical, formal languages of logic, both in their complexity and also because they exhibit some natural language phenomena which are absent from formal languages.

I will also not try to explain my take on basic philosophical questions like what it means to "define", "represent faithfully" or "explicate" meaning (or any other notion) in set-theoretic terms; I tried my best to be as clear on these issues as I can in Moschovakis [1998]

Yiannis N. Moschovakis [1998], On founding the theory of algorithms, Truth in mathematics (H. G. Dales and G. Oliveri, editors), Clarendon Press, Oxford, pp. 71-104.

Denotational semantics for programming languages.

Scott's theory is peculiarly incomplete in that it makes no room for the notion of algorithm which (one would think) is at the heart of the matter. Consider, for example, the problem of "sorting" (putting in alphabetical order) a long list of words u. There are many algorithms which will do this--the bubble sort, the merge sort, the quick sort etc.-- and they differ greatly in many ways, for example their efficiency. They can all be "programmed" (expressed) in every sufficently rich programming language L, but the denotational semantics of L cannot distinguish between them, as they all have the same denotation, the function which assigns to each u its alphabetized rearrangement.

And so it seemed to me that Scott semantics should be refined by the introduction of algorithms as the primary semantics values of programs, which then determine their denotations, i.e., by adopting the basic interpretation scheme (50).

5.2. What is an algorithm?

recursive equations, e.g. for Euclidean Algo for gcd

gcd(x; y) = p(x; y) where fp := {lambda}(x){lambda}(y)C(q1(x; y); y; r(x; y));
q1 := {lambda}(x){lambda}(y)rem(x; y);
r := {lambda}(x){lambda}(y)p(y; q2(x; y));
q2 := {lambda}(x){lambda}(y)rem(x; y)g

where the conditional construct
C(u; s; t) = if (u = 0) then s else t

Notice that these algorithms are always relative to the givens, and so they do not determine "absolutely computable" functions unless the givens are absolutely computable.

intended interpretation of L^{lambda}_ar(K) in x1.4 includes higher-type givens,
and the simpler, acyclic recursors sufficed.

---------------

Alonzo Church [1951a], A formulation of the logic of sense and denotation, Structure, method and meaning (P. Henle, H. M. Kallen, and S. K. Langer, editors), Liberal Arts Press, New York, pp. 3-24.
Alonzo Church [1951b], The need for abstract entities, American Academy of Arts and Sciences Proceedings, vol. 80, pp. 100-113, reprinted in Martinich [1990] under the title Intensional Semantics.
Alonzo Church [1962], A remark conerning QuineÃ•s paradox about modality, Spanish version in Analisis FilosÂ«oÃžco, pp. 25-32, reprinted in English in Salmon and Soames [1988].
Alonzo Church [1973], Outline of a revised formulation of the logic of sense and denotation, part I, Nous, vol. 7, pp. 24-33.
Donald Davidson [1984], Truth and interpretation, Clarendon Press, Oxford.
G. Frege [1952], Translations from the philosophical writings of Gottlob Frege, Blackwell, Oxford,
edited by P. Geach and M. Black.
Gottlob Frege [1892], On sense and denotation, Zeitschrift f Â¬ur Philosophie und Philosophische Kri-
tik, vol. 100, Translated by Max Black Frege [1952] and also by Herbert Feigl Martinich [1990]. I have
used Ã’denotationÃ“ to render Frege's Ã’Bedeutung,Ã“ instead of BlackÃ•s Ã’meaningÃ“ or FeiglÃ•s Ã’nominatumÃ“.
David Kaplan [1978a], Dthat, Syntax and semantics (Peter Cole, editor), vol. 9, Academic Press, New York, reprinted in Martinich [1990].
Saul A. Kripke [1979], A puzzle about belief, Meaning and use (A. Margalit, editor), Reidel, pp. 239-283, reprinted in Salmon and Soames [1988].
Leonard Linsky (editor) [1971], Reference and modality, Oxford University Press.
A. P. Martinich (editor) [1990], The philosophy of language, second ed., Oxford University Press, New York, Oxford.
R. Montague [1970a], English as a formal language, Linguaggi nella Societ `a e nella Tecnica (Milan) (Bruno Visentini et al., editors), Edizioni di Comunit`a, pp. 189-284, reprinted in Montague [1974].
R. Montague [1970b], Pragmatics and intensional logic, Synth`ese, vol. 22, pp. 68-94, reprinted in Montague [1974].
R. Montague [1970c], Universal grammar, Theoria, vol. 36, pp. 373-398, reprinted in Montague [1974].
R. Montague [1973], The Proper Treatment of Quantification in Ordinary English, Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics (J. Hintikka et al., editors), D. Reidel Publishing Co, Dordrecht, pp. 221-224, reprinted in Montague [1974].
R. Montague [1974], Formal philosophy, Yale University Press, New Haven and London, Selected papers of Richard Montague, edited by Richmond H. Thomason.
Yiannis N. Moschovakis [1994], Sense and denotation as algorithm and value, Logic colloquium '90 (J. VÂ¬aÂ¬anÂ¬anen and J. Oikkonen, editors), vol. 2, Association for Symbolic Logic, Lecture Notes in Logic, pp. 210-249.
Jamal Ouhalla [1994], Introducing transformational grammar, Arnold and Oxford University Press.
G. Plotkin [1977], LCF considered as a programming language, Theoretical Computer Science, vol. 5, pp. 223-255.
Nathan Salmon and Scott Soames [1988], Propositions and attitudes, Oxford University Press.
D. S. Scott and C. Strachey [1971], Towards a mathematical semantics for computer languages, Proceedings of the symposium on computers and automata (New York) (J. Fox, editor), Polytechnic Institute of Brooklyn Press, pp. 19-46.
J. van Heijenoort [1985], Frege on sense identity, Selected essays, Bibliopolis, Napoli, pp. 65-70.

reviews: creoles

2009-05-28T04:03:00.000-07:00

Contact Linguistics: Bilingual encounters and grammatical outcomes

By Carol Myers-Scotton

Oxford: Oxford University Press, 2002. Pp. 356. paper $45.00. ISBN 0198299532.

Reviewed by Alison

Nicolle (BTL, East Africa) and Steve Nicolle (BTL, East Africa)

A creole can have several natural languages contributing to its Matrix language system, and the Embedded language is the superstratal lexifier.

-------

Defining Creole

By John H. McWhorter

Oxford: Oxford University Press, 2005. Pp. 444. paper $49.95. ISBN

0195166698.

Reviewed by Gerry Beimers

SIL International and University of New England (Australia)

ch 1 “official” statement of his Creole Prototype hypothesis. Here he explicates the three traits of the creole prototype, namely,

“few or no inflectional affixes” (p. 12),
“little or no use of tone to distinguish monosyllabic lexical items or to encode morphosyntactic distinctions” (p. 13), and
a lack of noncompositional derivation.

ch 2: four diagnostics of grammatical complexity, namely,

phonemic inventory,
more syntactic rules to be processed,
grammaticalized expressions of fine-grained semantic and pragmatic distinctions, and
inflectional morphology.

ch 3: the developmental relationship between pidgins and creoles. In it he argues against the notion that the path from source language to creole is merely via “syntax-internal” (p. 74) transformation. The argument takes the shape of an examination of six features (which he designates as ornamental—metaphorically speaking) not found in creoles, namely:

ergativity,
inalienable possessive marking,
overt marking of inherent reflexivity,
evidential markers,
grammaticalized referential marking, and
consonant mutation.

ch 5: argues that the superstratist creole genesis model (advanced mainly by Chaudenson and Mufwene) is not supported by the data.

ch 11: English is “significantly less overspecified semantically and less complexified syntactically” (p. 268) compared to its Germanic sisters. His essential thesis, that this is due to a contact-based explanation, accounts for the facts. He outlines his view of overspecification and complexification and then goes on to examine ten features, namely,

reflexivity marking,
external possessor constructions,
grammatical gender marking on the article,
derivational morphology,
directional adverbs,
be with past participles,
passive marking with become,
verb-second word order,
disappearance of thou, and
disappearance of the indefinite pronoun man.

review: reduplication

2009-05-28T03:13:00.000-07:00

SIL Review
of
Reduplication: Doubling in morphology

By Sharon Inkelas and Cheryl Zoll

Cambridge Studies in Linguistics 106. Cambridge: Cambridge University Press, 2005. Pp. 276. hardback $90.00. ISBN 0521806496.

Reviewed by Mike Cahill

Ch 4: "go beyond the daughter phonologies to argue there is a layer of phonology (a cophonology) associated with the mother node, that is, the construction as a whole. This is compatible with Kiparsky’s Stratal OT approach (Kiparsky 2000), though not identical."

Has examples from Tagalog in Ch 6. Argues that reduplication is mostly at level of morphology, rather than phonology.

Kiparsky, Paul. 2000. Opacity and cyclicity. The Linguistic Review 17:351-367.

For annotations to address NLU, need basic research on language-world mapping

2009-01-21T19:06:00.000-08:00

Mark-up Barking Up the Wrong Tree
Annie Zaenen

In her Last Words column in CoLi, Zaenen describes the limitations of annotation applied to NLU, and calls for fundamental research "to understand the mapping between language and the world itself better"

"The interest in machine-learning methods to solve natural-language-understanding problems has led to the use of textual annotation as an important auxiliary technique. Grammar induction based on annotation has been very successful for the Penn Treebank" "The problems with the ‘coreference’ annotation tasks of MUC and the like are well documented and not solved. Kibble and van Deemter (2000), for instance, discuss the difficulties created by the assumption that coreference is an equivalence relation, and hence transitive"

2 difficulties: "The first is inherent in the kind of annotations that are currently needed. The field is moving from information retrieval to language understanding tasks. To understand a linguistic utterance is to map from it to a state of the world, a non-linguistic reality. Language understanding always has a non-linguistic component. In computational settings, unfortunately, most of the time we do not have independent access to this non-linguistic component. This means that language understanding systems have to be more than just language understanding systems: One expects them also to take care of some minimal understanding of the world the language is supposed to describe. But relations between linguistic entities and the world have not been studied in linguistics nor anywhere else in the systematic way that is required to develop reliable annotation schemas: Traditional formal semantics says how meanings are put together and remains silent about semantic primitives. Lexical semantics is very fragmented: One part of it tends to limit its scope to lexical items that exhibit syntactic alternations, whereas another part concentrates on improving traditional lexicography." "Annotation tasks typically involve these ill-understood phenomena. Current practice seems to assume that theoretical understanding can be circumvented and that the pristine intuitions of nearly na¨ıve native speakers can replace analysis of what is going on in these complicated cases. The results that I have looked at suggest that this is wishful thinking."

"The second problem that annotation tasks face is not inherent. It is created by the current fundingmodel. In the name of accountability, current NLP practice is wedded to quantifiable results, short time horizons, and strict financial control. In this setup there is no time for fundamental research. So, when research is necessary, it has to be called by another name. Part of it will be shuffled under the heading ‘annotation’."

"We should recognize that annotations are no substitute for the understanding of a phenomenon. They are an encoding of that understanding. The encoding is different from a rule-based encodingin that it does not require a generative formalization and it allows a more piecemeal approach."

Patrick Blackburn reviews Lambalgen and Hamm, The Proper Treatment of Events

2009-01-21T19:05:00.001-08:00

The Proper Treatment of Events by Michiel van Lambalgen and Fritz Hamm, 2005
review

This is an intriguing review.

"It provides a formalization of the notion of event (using a modification of Shanahan’s [1997] version of the Event Calculus, a many-sorted first-order theory), defines a dynamic-style semantics for the system, and discusses how constraint logic programming can be used to cash out its computational content. ... a detailed exploration of the ramifications of a single idea:... to properly understand how temporal expressions in natural language work, we must understand how human beings construct time, and that the cognitive construction of time is best explicated in terms of planning and causality. Planning is the glue that lets human minds integrate past, present, and future, and episodic memory (which Lambalgen and Hamm view as a “generalised capacity for imagining or constructing possible worlds”) is the key to this capacity." I am interested in causality and causal constraints, partly to harness in a situation theoretic semantics of verbs. The role of episodic memory may help clarify the role of consciousness.

"they broadly agree with Moschovakis’s (1993) interpretation of the Fregean notion of sense: The sense of an expression is the algorithm that computes its reference....Causality, the key relation between events, is presented in two variants: instantaneous change and continuous change. Moreover, in addition to this general background theory, they also allow for the constructions of “scenarios,” microtheories stating the specific causal relationships holding in a given situation (this machinery underlies their account of lexical meanings)....a theory that is carefully axiomatized. The authors consider various models for their theory, paying particular attention to minimal models, for they make a closed-world assumption in which anything that is not forced to happen does not happen....the authors distance themselves both from DRT (Kampand Reyle 1993) (because of its reliance on Davidson-style events with predicates corresponding to thematic roles) and from Amsterdam-style dynamic semantics (Groenendijk and Stokhof 1991) (which they view as treating computation implicitly rather than explicitly)." I definitely want to see why they reject DRT, and their ideas on events and thematic roles.

"Part III of the book (which, at around 160 pages, is by far the longest section) puts this apparatus to work to construct a theory of tense and aspect. Every VP is associated with a default scenario (that is, a microtheory) that determines the Aktionsart of the verb. The word “default” is important: temporal and aspectual operators, and many other linguistic items, may coerce the verb to assume a different Aktionsart...this book treats temporal and aspectual phenomena from a perspective very different from that of current corpusbased work. But it does so systematically and with great precision.... Interested in temporal semantics? Then this is essential reading." Again, this should help come up with a verb semantics, verb morphology construction semantics and also a clausal construction semantics.

Sowa reviews Halliday

2009-01-21T19:03:00.000-08:00

Computational Linguistics the journal is now open access. I came across this review of Construing Experience through Meaning: A Language-based Approach to Cognition by M. A. K. Halliday and Christian M. I. M. Matthiessen where John Sowa describes the ontology they use. They acknowledge a debt to the dyadic semiotics of Saussure, Hjelmsev and Firth, but Sowa suggests they are rediscovering (or not attributing) some of the insights of Peirce. Elements, Figures, Sequences correspond to referential indices, minimal clauses and discourse. “Elements are classified as participant, circumstance, or process. Figures are classified by another triad of relational (being or having), material (doing or happening), and mental (sensing or saying).” Sowa notes the latter “corresponds to Peirce’s fundamental triad of Quality, Reaction, and Representation.”

Sowa cites early Winograd and the USC ISI group (including Bateman), now in Germany as researchers computationally applying Systemic Functional Linguistics. This reminds me that my original interest in NLP was sparked by Winograd's Language as a Cognitive Process, including his appendix on a natural language specification technique. That must have been 1988 or so, twenty years! I need access to a good university lib, where I can reread it.

statistical NLP with R

2009-01-07T20:34:00.000-08:00

One of the areas I still need to study is statistical NLP, I lack even the mathematical background although I took a probability course for Math majors at Cornell a long time ago.

I came across this article about the R programming language and open source software package (community?). It seems there are some people applying it to speech processing, with the EMU Speech Database System.

There is even work related to language corpora (book draft) and multi-language annotation (not free :( tho), which I haven't reviewed yet.

Languages of Papua New Guinea

2009-01-03T16:04:00.000-08:00

Taking data from Ethnologue 15th Edition, there are 820 living languages in PNG, of which I count 88 to have 10 thousand or more speakers. The Austronesian ones are divided into three subfamilies of Western Oceanic, listed below. So languages with 10 thousand speakers or more in PNG are 8/66 for Meso Melanesian, 4/102 for North New Guinea and 9/62 for Papuan Tip subfamily.

The biggest family in PNG is Trans-New Guinea 57/564 have over 10k speakers, I have listed only the seven with over 50k speakers. There is also Tok Pisin, English (with 50,000 speakers), and three smaller families with over 10k speakers: East Papuan 2/36, Sepik-Ramu 3/100, and Torricelli 3/53.

Meso Melanesian (66)

Kuanua	[ksd] 61,000 (1991 SIL). East New Britain Province, Rabaul District, Gazelle Peninsula. Alternate names: Tolai, Gunantuna, Tinata Tuna, Tuna, Blanche Bay, New Britain Language. Dialects:Vunadidir, Rapitok, Raluana, Vanumami, Livuan, Matupit, Kokopo, Kabakada, Nodup, Kininanggunan, Rakunei, Rebar, Watom, Masawa. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai More information.
Halia	[hla] 20,000 (1994 SIL). Bougainville Province, North Bougainville District, northeastern Buka Island. Alternate names: Tasi. Dialects: Hanahan, Hangan, Touloun (Tulon, Tulun), Selau. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Nehan-North Bougainville, Buka, Halia More information.
Bola	[bnp] 13,746 (2000 census). Population includes 2,253 Harua. West New Britain Province, northeast coast, most of Willaumez Peninsula. Harua is on the east side of Kimbe. Alternate names: Bakovi, Bola-Bakovi. Dialects: Harua (Karua, Xarua, Garua, Mai), Bola. Harua is a dialect that has developed as a result of a group of people being resettled on an oil palm plantation. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez More information.
Lihir	[lih] 12,571 (2000 census). New Ireland Province, Lihir Island, and 3 smaller islands. Alternate names: Lir. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Tabar More information.
Nakanai	[nak] 13,000 (1981 Wurm and Hattori). West New Britain Province, Hoskins District, northwest coast. 42 villages. Alternate names: Nakonai. Dialects: Losa (Loso, Auka), Bileki (Lakalai, Muku, Mamuga), Vere (Vele, Tarobi), Ubae (Babata), Maututu. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Willaumez More information.
Tungag	[lcm] 12,000 (1990 SIL). New Ireland Province, Lamet District, New Hanover Island, Tingwon and Umbukul Islands. Alternate names: Tungak, Lavongai, Lavangai, Dang. Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, Lavongai-Nalik More information.
Ramoaaina	[rai] 10,266 (2000 census). East New Britain Province, Kokopo District, Duke of York Islands.Alternate names: Duke of York, Ramuaina. Dialects: Makada, Molot (Main Island), Aalawa (Aalawaa, Alawa, Mioko, Ulu, South Islands). Makada dialect is very different. Possibly not intelligible to speakers of other dialects. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, New Ireland, South New Ireland-Northwest Solomonic, Patpatar-Tolai More information.
Uneapa	[bbn] 10,000 (1998 SIL). West New Britain Province, Talasea District, Unea (Bali) Island off the northwest coast. Alternate names: Bali. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Meso Melanesian, Bali-Vitu More information.

North New Guinea (102)

Adzera	[azr] 20,675 (1988 Holzknecht). Population includes 367 Ngariawan (1978 McElhanon), 497 Sarasira (1988 Holzknecht), 990 Sukurum (1990). Morobe Province, Markham Valley, Kaiapit District, Leron River. Alternate names: Azera, Atzera, Acira. Dialects: Yarus, Amari, Azera, Ngarowapum, Tsumanggorun, Guruf-Ngariawang (Ngariawan), Sarasira (Sirasira), Sukurum. The dialects form a cluster. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, Markham, Upper, Adzera More information.
Takia	[tbc] 19,619 (2003 SIL). Southern half of Karkar Island, Bagabag Island, and coastal villages Megiar and Serang, Madang Province, Madang District. Dialects: Megiar, Serang. Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Ngero-Vitiaz, Vitiaz, Bel, Nuclear Bel, Northern More information.
Buang, Mapos	[bzh] 10,484 (2000). 30% monolingual. Morobe Province, middle Snake River area, Mumeng District. 10 villages. Alternate names: Mapos, Central Buang. Dialects: Wagau, Mambump, Buweyeu, Wins, Chimbuluk, Papakene, Mapos. Lexical similarity 61% between Mambump and Mangga. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, South, Hote-Buang, Buang More information.
Bugawac	[buk] 9,694 (1978 McElhanon). 40% monolingual. Morobe Province, coast of Huon Gulf. Alternate names: Bukawa, Bukaua, Bukawac, Kawa, Kawac, Yom Gawac. Dialects: Close to Yabem. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, North New Guinea, Huon Gulf, North More information.

Papuan Tip (62)

Kilivila	[kij] 20,000 (2000 Tryon). 60% monolingual. Milne Bay Province, Trobriand Islands. Alternate names: Kiriwina. Dialects: Kitava, Vakuta, Sinaketa. Various dialects. Lexical similarity 68% with Muyuw. Kitava Island has 80% lexical similarity. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Kilivila More information.
Tawala	[tbo] 20,000 (2000 census). Milne Bay Province, Alotau District, from Awaiama to East Cape, north and south shores of Milne Bay, Sideia and Basilaki islands. Alternate names: Tawara, Tavara. Dialects: Awayama (Awaiama, Awalama), Huhuna, Kehelala (Keherara, East Cape), Lelehudi, Diwinai (Divinai), Labe (Rabe), Yaleba (Wagawaga, Gwawili, Gwavili, Ealeba), Bohilai (Bohira'i, Basilaki), Sideya (Sideia). Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Are-Taupota, Taupota More information.
Keapara	[khz] 19,400 (2000 D. Tryon). Central Province, coast from east of Hood Peninsula to Lalaura west of Cape Rodney. 3 villages. Alternate names: Keopara, Kerepunu. Dialects: Babaga, Kalo, Keapara (Keopara), Aroma (Arona, Aloma, Galoma), Maopa, Wanigela, Kapari, Lalaura. Dialect continuum to Hula. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara More information.
Mekeo	[mek] 19,000 (2003 SIL). Central Province, Kaiyuku District, inland, bounded on the west by the Waima, on the east by the Kuni and Kunimaipa. Extends into Gulf Province. Alternate names:Mekeo-Kovio. Dialects: East Mekeo, West Mekeo, North Mekeo, Northwest Mekeo (Kovio). Kovio is a peripheral dialect. The four dialects are mutually unintelligible to each other's speakers, except for North and West Mekeo, but most Mekeo are reported to have familiarity with neighboring dialects. Kovio, however, is not contiguous to the other dialects. Kovio has 81% lexical similarity with West Mekeo and North Mekeo, and 79% with East Mekeo. West and East Mekeo have 87% lexical similarity. North Mekeo has 99% lexical similarity with West Mekeo and 87% with East Mekeo. Mekeo has 41% with Waima. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear More information.
Misima-Paneati	[mpx] 18,000 (2002 SIL). 4,000 monolinguals. Milne Bay Province, Misima District, Misima Island, Panaieti, and all the islands of the Calvados Chain to (not including) Panawina, Alcester, Ole, and Tewatewan Islands, and Bowagis on Woodlark Island. 32 villages. Alternate names: Panaieti, Panaeati, Paneyate, Paneate, Panayeti. Dialects: Nasikwabw (Tokunu), Tewatewa. Lexical similarity 33% with Nimowa and Dobu (closest). Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Kilivila-Louisiades, Misima More information.
Sinaugoro	[snc] 15,000 (1991 SIL). Central Province, Rigo District, south of Kwikila. Alternate names:Sinagoro. Dialects: Ikolu, Balawaia, Saroa, Babagarupu, Kwaibida, Taboro, Kwaibo, Alepa, Omene, Tubulamo, Ikega, Boku, Buaga, Wiga, Vora, Kubuli, Oruone. Boku dialect may be most central. Lexical similarity 70% to 75% with Kalo (closest), 65% to 70% with Hula. Classification:Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara More information.
Waima	[rro] 15,000 (2000 census). Central Province, Bereina District, near Kairuku, shores of Hall Sound, between Yule Island and mainland, 65 miles northwest of Port Moresby. Alternate names: Roro. Dialects: Waima, Paitana, Roro. Roro and Paitana populations are smaller and scattered. Lexical similarity 45% with Kuni (closest), 99% among all three dialects. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, West Central Papuan, Nuclear More information.
Motu	[meu] 14,000 (1981 Wurm and Hattori). Central Province, in and around Port Moresby, villages along the coast from Manumanu, Galley Reach, to GabaGaba (Kapakapa). Alternate names: True Motu, Pure Motu. Dialects: Western Motu, Eastern Motu. Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Peripheral, Central Papuan, Sinagoro-Keapara More information.
Dobu	[dob] 10,000 (1998 SIL). 60% monolingual. Milne Bay Province, Esa'ala District, Sanaroa, Dobu, and parts of Fergusson and Normanby islands. 500 villages. Dialects: Galubwa, Sanaroa, Ubuia, Central Dobu, Loboda (Roboda, Dawada-Siausi). Lexical similarity 56% with Morima (closest). Classification: Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Western Oceanic, Papuan Tip, Nuclear, North Papuan Mainland-D'Entrecasteaux, Dobu-Duau More information.

Trans-New Guinea (564)

Enga	[enq] 164,750 (1981 Wurm and Hattori). Population includes 12,000 in Sau (1990 UBS). Enga Province. The Maramuni are nomadic, and are in the lower reaches of the central range. Alternate names: Caga, Tsaga, Tchaga. Dialects: Kandepe, Layapo, Tayato, Mae (Mai, Wabag), Maramuni (Malamuni), Kaina, Kapona, Sau (Sau Enga, Wapi), Yandapo, Lapalama 1, Lapalama 2, Laiagam, Sari. Mae is the standard dialect; all understand it. Layapo is between Mae and Kyaka. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Enga More information.
Melpa	[med] 130,000 (1991 SIL). Western Highlands Province, Hagen District. Alternate names: Medlpa, Hagen. Dialects: Tembagla. Only slight dialect differences. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Hagen More information.
Kuman	[kue] 80,000 (1994 SIL). 10,000 monolinguals. Simbu Province, northern third, overlapping into Minj Subprovince of Western Highlands Province. Alternate names: Chimbu, Simbu. Dialects: Kuman, Nagane (Genagane, Genogane), Yongomugi. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu More information.
Huli	[hui] 70,000 (1991 UBS). Southern Highlands Province around Tari, and southern fringe of Enga Province. Alternate names: Huli-Hulidana, Huri. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, West-Central, Huli More information.
Kamano	[kbq] 63,170 (2000 census). Eastern Highlands Province, Kainantu and Henganofi districts.Alternate names: Kamano-Kafe. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, East-Central, Kamano-Yagaria More information.
Golin	[gvf] 51,105 (1981 Wurm and Hattori). Simbu Province, Gumine District. Alternate names: Gollum, Gumine. Dialects: Yuri, Kia (Kiari), Golin, Keri, Marigl. Close to Dom. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu More information.
Sinasina	[sst] 50,079 (1981 Wurm and Hattori). Simbu Province. Dialects: Tabare, Guna. Close to Dom and Golin. Classification: Trans-New Guinea, Main Section, Central and Western, East New Guinea Highlands, Central, Chimbu More information.

Other Families

Tok Pisin	[tpi] 121,000 (2003 SIL). 50,000 monolinguals. Mainly in the northern half of the country, and now well established in Port Moresby, and into other regions. Alternate names: Pisin, Pidgin, Neomelanesian, New Guinea Pidgin English, Melanesian English. Dialects: There are dialect differences between lowlands, highlands, and the islands. The highlands lexicon has more English influence (J. Holm). Classification: Creole, English based, Pacific More information.
Terei	[buo] 26,500 (2003 SIL). Southern Bougainville Province, Buin District. Alternate names: Buin, Telei, Rugara. Dialects: Closest to Uisai. Classification: East Papuan, Bougainville, East, Buin More information.
Naasioi	[nas] 10,000 (1990 SIL). Bougainville Province, Kieta District, central mountains and southeast coast. Alternate names: Nasioi, Kieta, Kieta Talk, Aunge. Dialects: Naasioi, Kongara, Orami (Guava), Pakia-Sideronsi. Classification: East Papuan, Bougainville, East, Nasioi More information.
Ambulas	[abt] 44,000 (1991 SIL). Population includes 27,000 in Wosera (1991 SIL), 9,000 in Maprik (1991 SIL), 8,000 in Wingei (1991 SIL). East Sepik Province, Maprik District. Alternate names: Abulas, Abelam. Dialects: Maprik, Wingei, Wosera-Kamu, Wosera-Mamu. Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu More information.
Boikin	[bzf] 31,328 (2003 SIL). East Sepik Province, Yangoru District. Alternate names: Boiken, Nucum, Yangoru, Yengoru. Dialects: West Boikin, Central Boikin, East Boikin, Munji, Haripmor, Kwusaun, Kunai, Island Boikin. Classification: Sepik-Ramu, Sepik, Middle Sepik, Ndu More information.
Kwanga	[kwj] 10,000 (2001 SIL). East Sepik Province, extending beyond the western boundary of Maprik District; Makru-Klaplei Division, Nuku District; Saundaun Province, east of Mehek. 40 villages.Alternate names: Kawanga, Gawanga. Dialects: Apos, Bongos (Bongomamsi, Bongomaise, Nambi), Tau (Kubiwat), Wasambu, Yubanakor (Daina). A dialect cluster of 5 subdialects, 2 main dialects. Classification: Sepik-Ramu, Sepik, Middle Sepik, Nukuma More information.
Bukiyip	[ape] 16,233 (2003 SIL). East Sepik Province, west Yangoru District, Torricelli Mountains.Alternate names: Bukiyúp, Mountain Arapesh. Dialects: Coastal Arapesh, Bukiyip (Mountain Arapesh). Lexical similarity 60% with Mufian. Classification: Torricelli, Kombio-Arapesh, Arapesh More information.
Olo	[ong] 13,667 (2003 SIL). Sandaun Province, Lumi District. 55 villages. Alternate names: Orlei. Dialects: Payi (Pay, North Olo), Wapi (Wape, South Olo). Related to Yis, Yau, Ningil, Valman. Classification: Torricelli, Wapei-Palei, Wapei More information.
Mufian	[aoj] 11,000 (1998 SIL). Population includes 6,000 Filifita (1999 SIL). East Sepik Province, Maprik District, Torricelli Mountains, west of Maprik. 36 villages. Alternate names: Southern Arapesh, Muhiang, Muhian. Dialects: Supari, Balif, Filifita (Ilahita), Iwam-Nagalemb, Nagipaem. Classification: Torricelli, Kombio-Arapesh, Arapesh More information.

ethnographic film

2009-01-03T15:45:00.000-08:00

There is such a thing as visual anthropology (Wikipedia) (see the list of influential films), and an ethnographic film unit at ANU (there is a web page on a film about Roti). It occurs to me that linguistic documentation should increasingly use video, and not just for sign language. A good way to embed this in the local culture is to produce short films for elementary school in local languages. This can range from traditional stories, to Sesame Street (or Batibot) like educational programs. This would be particularly helpful in language maintenance programs, where the teacher is not as fluent as resource speakers like elders, who could be captured on film.

Systemic Functional Linguistics

2008-12-15T21:24:00.000-08:00

source: http://www.isfla.org/Systemics/

Systemics & Computation

SFL has been prominent in computational linguistics, especially in Natural Language Generation (NLG). Penman, an NLG system started at Information Sciences Institute in 1980, is one of the three main such systems, and has influenced much of the work in the field. John Bateman (currently in Bremen, Germany) has extended this system into a multilingual text generator, KPML. Robin Fawcett in Cardiff have developed another systemic generator, called Genesys. Mick O'Donnell has developed yet another system, called WAG. Numerous other systems have been built using Systemic grammar, either in whole or in part.

Macquarie U in Sydney is a center for SFL, and at PACLIC I met Trevor Johnston who helped develop the Australian Sign Bank. Dick Hudson's Word Grammar is considered a spin-off that is based on dependency grammar. Geoffrey Huddleston, co-author of the Cambridge Grammar of the English Language with Geoffrey Pullum, has worked in that tradition. CGEL was cited by Culicover and Jackendoff as the right level of analysis for their Simpler Syntax approach (which addresses the syntactic part of Jackendoff's Parallel Architecture framework).

links - logic - Fefferman on Tarskian semantics

2008-12-13T17:06:00.001-08:00

"Tarski’s Conceptual Analysis of Semantical Notions"

http://math.stanford.edu/~feferman/papers/conceptanalysis.pdf

Expanded text of a lecture for the Colloque, “Sémantique et épistémologie”, Casablanca, April 24-26, 2002.

27 pp.

T's notions of

truth for formal languages and the allied notions of
satisfaction,
definability, and
logical consequence

two questions are of interest.

what motivated Tarski to make these analyses, and
what led to their particular form?

Turing's concept of computability

how? the general notion of a computing machine
why? a precise notion of computability was needed to show that certain problems (and specifically the Entscheidungsproblem in logic) are uncomputable

Tarski's concept of truth

how? his definition of truth is given in general set-theoretical terms
why? no similarly compelling logical reason for Tarski’s work on the concept of truth, and will suggest instead a combination of

psychological and
programmatic reasons

The main puzzle to be dealt with has to do with the relations between the notions of

truth in a structure and
absolute truth

evo1: Vocalization Displays and the Recognition of Typed Object-Features

2008-11-27T15:56:00.000-08:00

The evolution of language is difficult to explain. It is a complex faculty with various aspects or components, and it is difficult to see why rudimentary versions of those aspects would have selectional advantages for differential reproduction in hominids. It would be helpful to identify preadaptations, particular characters that developed complexity in some other function, that came to be used in a new function of language-based communication.

My working hypothesis is that the capacity for visual recognition of objects and scenes, specifically recognition of physical objects and behaving animals in the shared immediate social environment of hominids, gave rise to a complex of preadaptations for language. One preadaptation is 1) a shared conceptual scheme of individuation, where certain scenes were classified as involving significant happenings or objects. This scheme would eventually give rise to the mental lexicon, and the related capacity for semantic memory. Another is 2) the capacity to represent a scene and individual objects in terms of typed structures of object-features. By the hyphenated term object-feature, I refer specifically to higher level visual features like shape outline and component shape, that are distinguished from low level features like edge or color or texture. Both these preadaptations were refined and repurposed in the context of 3) recognizing social displays of individuals of the same species. Some displays, like pointing, have 4) a property of intentionality. Intentional displays are produced to trigger an observer's attunement to a relation between the structure of the display and a scene where some salient aspects are referenced (physical objects in the immediate environment are referred to by a shift of gaze or a hand movement, happenings are referenced in relation to a doer or an undergoer). Clearly vocalization displays are core to the evolution of language, but gestural displays that support the repeated emergence of sign languages (among deaf and hearing communities) are also of research interest.

How did the use of articulatory gestures in vocalization evolve into phonology? One possibility is song, perhaps evolved from gibbon-like vocalizations to mark territory. Tone is differentiated and evolves to be worth remembering, so song is born. The continuous articulation of vowels is another dimension of song, and this could eventually evolve into a phonemic set of distinguished vowel phonemes. Why would consonants and syllabification be added to the mix? Why would song recognition and articulation capacities get repurposed for referring gestures, replacing eye gaze and pointing with nouns or noun phrases?

From studying the semantics-syntax interface, we can theorize about two large layers in the structure of utterances. These layers define the linguitic types that an utterance (or sign language expression) is recognized to instantiate. One layer is the referring expression, where an concrete object or abstraction is referred to and a corresponding mental concept takes a place in a shared information structure where it could be referred to again. In other words, referring expressions introduce referential indices into discourse, and create the possibility of anaphoric resolution. We have three levels here:

the etic level of a complex nominal utterance in the context where the speaker picks out a particular situation for the listener,
the emic level of the types from the mental lexicon and language faculty that recognize the utterance as an instance of a structure phrasal type involving lexical items and a distinguished head item, and
the level of the shared information state including referential indices.

The other large layer is that of speech acts made with clausal utterances (or the sentences used in them), that introduce a semantics and pragmatics of what Searle calls conditions of satisfaction. In a request, the shared information state is about a state which does not obtain but where the listener can act to fulfill those conditions. A promise mentions a possible future state where the the speaker (or signer) will act to fulfill the conditions. And an assertive speech act calls attention to a state where the situation referred to (a visible scene, or some abstract situation) has the participants referred to with referential indexes, and they stand in a happening-relation mentioned by the head verb or predicator. Any modifiers also introduce properties of the participants or situation. Returning to the three levels, at

the etic level, a clausal utterance is used to perform a speech act between speaker and listener
the emic level, a sentence or clause is recognized as consisting of a head verb, arguments that have dependency relations to that verb, and additional modifiers
the shared information state level, of a discourse representation structure where referential indexes are resolved to salient items in discourse, creating shared attunement to a described situation, which may be a part of the concrete situation surrounding speaker and listener, or else a more abstract situation that does not (yet) exist in the physical surroundings.

This is the rudimentary model of language that I am using. I would need to demonstrate that a language faculty that supports this model is systematically related to existing visual recognition faculties (of objects, scenes and social displays) that provided some preadaptations for the relatively rapid emergence of complex language among hominids

First I will discuss typed feature structures in language and high-level vision, then I will discuss shared schemes of information in gestural displays and language. Future posts will cover:

Feature structures for phrasal features and visual object-features and scenes
Typing and semantic memory
Schemes of individuation
Shared schemes and the first-person character of perceptual qualia
Types in the mental lexicon

A Minimalist Theory with Untyped Features

2008-11-27T04:36:00.000-08:00

I read a review by Ash Asudeh and Ida Toivonen of two introductory textbooks on Minimalism. What interests me is the book by Adger:

David Adger, Core syntax: a Minimalist approach. Oxford: Oxford

University Press, 2003. Pp. xiii+424.

This is described as developing a specific coherent theory within the Minimalist Program, and it ends up being close to a lexicalist unification grammar. Like LFG, and unlike HPSG, the feature theory is untyped.

I don't know if there will be a lot of interest, but it may be useful to have a P&P (GB and/or MP) syntax module that can be plugged in to a Linguistic Exploration Environment. Adger's version may be a candidate, because it is more formalizable, and may have some compatibility at the level of feature theory. This allows a clearer connection to the lexicon.

I also have been reading the critique of GB and MP in Culicover and Jackendoff's Simpler Syntax. I am also interested in GB, because Paul Llido uses it in studying Case in Cebuano, mixing it with LFG.

forthcoming books by Stephen Wechsler

2008-11-26T19:24:00.000-08:00

Wechsler, Stephen (in progress). The Syntax-Lexicon Interface (working title). Oxford Surveys in Syntax and Morphology, General editor: Robert D. Van Valin, Jr. Oxford University Press, Oxford.
Asudeh, Ash, Joan Bresnan, Ida Toivonen, and Stephen Wechsler (in progress). Lexical Functional Syntax. Blackwell Publishers.

It seems Wechsler is now doing LFG, although he has worked extensively in HPSG. I think the crossfertilization between the two is healthy, I Wayan Arka is also working on LFG for Indonesian, and he has collaborated with Wechsler on HPSG

Constructional Semantics

2008-11-25T22:09:00.000-08:00

There has been extensive research on Lexical Semantics (e.g. Levin, Pustejovsky), often presuming a bottom-up compositionality. This is sometimes contrasted with a Phrasal Semantics that is often in the tradition of Montague grammar, focusing on issues like anaphora and scoping, often assuming that each lexical token contributes one atomic symbol to a semantic representation.

Rather than talk about Phrasal Semantics directly, I will assume that the semantical representation of a phrasal expression can be decomposed into both a lexical contribution and a constructional contribution. Compositionality is still possible, but it involves both bottom-up and top-down contributions.

An initial focus will be Argument Structure Constructions (ASCx) of verbs, such as those analyzed by Goldberg for English. However, I am also interested in modeling Austronesian morphosyntactic alignment (comparing with Llido's analysis of Cebuano, for example). I am assuming that related but distinct senses of a lexical entry are actually expression-level construal from combining fewer word senses with meaning-contributing constructions, especially ASCx's. It is possible that a set of related lexical entry senses are actually the same underlying word sense distinguished in usage by the constructions forming different expressions. It is further possible that the constructions are actually related constructions, perhaps linked by inheritance (Goldberg proposes several distinct types of Inheritance, some of which involve Lakoff-style metaphor).

Whether this approach captures a significant amount of generality about lexical senses can be explored empirically. If this proves productive, it could later be incorporated into a linguistic exploration environment.

A gold standard for distinguishing senses in a lexical entry using a corpus-based approach is the COBUILD dictionary of English. Each entry has a stylized definition, which generally picks out the typical arguments of the verb, including a common noun as a general type constraint for each argument. To some extent, it identifies other participants in the mentioned situation which may not be explicitly realized in arguments. These additional participants may be a mechanism for characterizing the connotations of a specific sense.

I propose to explore some of the most common verbs of English (those in the top tier of of 680+ most frequent used lexical entries). For the sample sentences of each sense (and perhaps additional sentences from the underlying corpus, the Bank of English), I would try to identify any ASCx, based on typical arguments for that sense. Closely related senses may have related constructions imposed on the same underlying verb sense.

Multilingual Explorations

It may be useful to compare several languages at this level of semantics, for example Chinese or German or Filipino or Cebuano. I am interested if there are potential applications in basic education, adult L2 education, human translation, machine assisted translation, etc. This may also be relevant to the documentation and description of less studied languages by people more familiar with another more widespread language (the SIL use cases for Fieldworks).

teasing apart "word sense"

2008-11-17T22:04:00.000-08:00

Analysis of "word sense"

From the point of view of constructions, a c:lexical_sense contributes only part of the meaning of a c:word_usage. The construction itself contributes something additional, which also interacts with the larger discourse environment and the social context of successful communication.

We can divide the meaning of a c:word_usage into a purely semantic representation, which is the typical meaning of the sentence being used. Or even a simplified form of the basic sentence, with only head words and abstracting from embellishments that don't contribute to a specified lexical sense. The semantic representation includes the literal meaning, but may also introduce additional roles that follow from "typically" understood arguments and properties of the explicitly mentioned words. It covers denotation and the related intensional functions, as well as connotations relative to a "typical" mental lexicon (perhaps of an ideal listener/reader).

In addition to the purely semantic representation, competent language users can infer a more c:pragmatic_construal of the c:word_usage.

In the context of the COBUILD lexical word senses and the Bank of English used to generate them, a distinguished lexical sense classifies a class of sentences which share a similar word_usage. It should be possible to isolate the intuitions of this "gold standard" classification into a semantic representation. This abstracts from less reproducible aspects of pragmatic construal, and focuses on what is typically inferred from the sentence itself, as if it were used in isolation rather that in a larger discourse or social situation.

We would like to tease apart the characterization of each lexical sense as a purely semantic representation into two parts, the contribution of the word (e.g. a free morpheme verb, or the stem of an inflected verb) from the contribution of the construction or constructions (including morphological constructions, and argument structure constructions at least).

We would like to discover if the set of lexical senses of a word (initially, some of the more frequently used verbs) can be characterized with a set of constructions related by the inheritance relations distinguished by Goldberg. We also want to see to what extent these constructions are reused across different verbs, and if we can identify verb classes that share the same inheritance-related constructions.