Thursday, October 9, 2008

Describing Language 2/2 (meaning structures)

Accepting phonology, the original generative problem was to model a string of tokens that are wordforms or morphs. However, the type-token distinction is abandoned with the commitment to competence modeling independent of performance. Token strings are demarcated by speech acts (clauses). A narrowly syntactic meaning of grammar is a generative production system (a set of combinatorial rules) that specifies soundly and completely the licensed set of strings. A state-machine generating strings.

This elevates the importance of not just word classes, but also the non-terminal categories or phrasal categories as the discrete tokens or alphabet for the generated strings.

Later, the alphabet of tokens can be replaced with feature-bundles. So the syntax rules can work over string of feature bundles.

An additional move in the mainstream generative grammar tradition (of Chomsky) is to allow tree-to-tree transformations. This is somewhat proof theoretic, rather than purely model theoretic. Non-reversible (thus not non-declarative) transformation rules somewhat correspond to proof steps. However, this was before the proofs-as-types approach (via the Curry-Howard isomorphism) brought out the equivalence of proof theory and model theory. For the critics in the declarative grammar traditions, the transformational move is a bad move that gives up the precision of modeling, and allows the theory to model anything rather than just reality. It also is the cultural foundation for a mainstream generative grammar to twiddle with perfecting the theory with internal concerns like minimalism, rather than sharpen the fit with ever broader data. 

Graddol et al [1994] compare mainstream generative grammar with the systemic grammar alternative: "The framework that is used for analysing language has to be extravagant rather than economical. Where universal grammar seeks simplicity and economy, and draws on intuition [of competence] as its main data, systemic-functional grammar attempts to be comprehensive and gives much more emphasis to 'real' language that has been spoken or written."

Recently, declarative generative grammar approaches are converging with cognitive grammar, paying attention to performance and retaining the precision of generative models while taking on the early ambitions of systemic grammar to broadly model language in use.

Pollard in Convergent Grammar is also reclaiming a proof-theoretic approach, carried forward by Categorial Grammar, by bringing it in line with type-theoretic declarative grammars.

Jackendoff and Culicover are less concerned with building elaborate internal mechanisms for declarative grammar, although they are happy to inherit the detailed analysis of the generative tradition. They propose a flat model of syntax, returning to simple strings of tokens, with just enough structure in the representation to support the semantics-syntax interface and other interfaces in their parallel architecture.

HPSG is one of the more elaborate declarative grammar approaches, and while it remains compatible with the concerns of Jackendoff and Culicover, they make a series of moves that allow precise modeling of syntax and perhaps semantics as well. It would be nice if language specialists can model grammar in the flat simpler syntax, which the linguistic exploration environment generates the complex representations of HPSG in the background.

HPSG moves away from strings of feature bundles combined according to atomic phrasal categories. The features in a bundle allow recursive feature values, so each terminal category becomes itself a tree. The phrasal categories are also recursive feature structures, that duplicate (or in fact share) features from their constituents, notably heads pass certain features up to the phrase. By sharing rather than duplicating, the trees become directed graphs. This very expressive structure is tightly constrained by a set of type labels for nodes, with feature labels for edges. There is a type subsumption hierarchy as well. 

Note that every utterance is modeled as a large graph where phrases are subgraphs, and wordforms are subgraphs in turn. These graphs are complete or sort-resolved [review Carpenter's distinctions], but the wordtypes or morphemes in the lexical unit are not complete, they are partial descriptions. These types are less specific than the token-level utterances and utterance fragments that they model, and they allow the complete structures to be generated from the partial feature descriptions in the lexicon.

HPSG was designed to work with the model-theoretic approach to modeling semantics of Montague Grammar (usually associated with Categorial Grammar for syntax). A related model-theoretic approach to modeling semantics is Minimal Recursion Semantics (and recently Robust MRS). 

I am dissatisfied with the propositional level addressed by these model-theoretic approaches, but it is early days yet in semantic modeling. The traditional emphasis has been set-theoretic concerns like the scoping of quantifiers. I am interested in richer semantic representations than austere sets. Situation theory may fit the bill, but a lot of work needs to be done, especially at the semantics-syntax interface.

At this interface, I am interested in integrating the contributions of Construction Grammar and Jackendoff's parallel architecture. I am also interested in insights from lexical semantics (Levin, Pustejovsky).

A lexical unit, when we model it for some application, provides explicit and implicit information that is realized in the wordforms of an utterance. The prepositional level, including the conceptual structures of Jackendoff, is the more or less explicit level (some semantic elements in the conceptual or propositional structure may be implicit in the syntax, derivable from a fixed Event ontology). I believe that we can reuse some of this conceptual machinery for the implicit connoted information of individual word senses as well. 

A lexical unit does not characterize a single unique meaning in usage, it subsumes a collection of related senses with the same wordform. Some of the meaning in usage may come from the construction, but perhaps a sense (learned from the shifting fads of usage, and construals passed on in a speaker's own utterances) brings in an additional circle of implicit concepts and conceptual relations. 

For example, not all senses of "break" have an implicit participant of pieces, or the idea of sudden separation. But one fairly common sense does carry such information implicitly. What is explicit are the participant roles with arguments or complements, filled in the utterance by particular individuals of a noun (or referential index) type. What allows a hearer to zero in on one familiar sense or another, or to realize that this is a new sense in usage, is the outer circle of concepts that are implicit. So if one of the participants mentioned in an argument is a piece, then this sense or a related one is specified.

Where is the boundary between construal in performance, and precise type representation in competence? In the brain, there is certainly a fluid spectrum. For applications in computers, we would like to fix a particular discrete model. The account above can clarify some of the choices to fix.

Describing Language 1/2 (sounds and words)

We observe humans making not just vocalizations (social displays) but utterances.

An utterance makes a cognitive speech act, engendering a shared information state and modulating movement and cognitively-controlled behavior. There is intentionality between that shared information state (as well as the acoustic utterance, or inscribed expression) and the actual or possbible situations that it is about.

What is perceptible in an utterance, or a sequence of utterances, is a string of phones.

The phones are clustered in cognition into words that bear a functional or substantive contribution to the shared information structure

In the thirties, it was discovered that not all distinct phones are functionally distinct. The allows the layering of distinctions between phonetic distinctions on the one hand, that may be governed for example by articulatory constraints, and phonemic distinctions on the other hand.
Phonemes classify segments (possibly a set of allophones) according to a phonological level that is the only interface with larger structure, such as semantics and morpho-syntax.

Thus the informal notion of words can be analyzed into strings of phonemic segments, as well as some supra-segmental phenomena. Some groups of segments carry conceptual contributions (substantive morphemes) to shared information structure, others only modulate how concepts are inter-related (functional morphemes, bound or free). This is hypothesized to account for the partitioning of all words into a group of open classes of words (words with a substantive morpheme at its core, possibly productively inflected with a bound morpheme; and perhaps historically derived in lexicon with a bound morpheme) and a group of closed word classes.

Can we marshall evidence for this substantive/functional = open/closed hypothesis? What about prepositions used either predicatively, as PP heads or as particles with phrasal verbs. They do contribute to conceptual structure, but are in a closed class. Perhaps there are only so many ways to stand in physical relations in a situation, and other meanings are metaphors, so the substantivity is in the construction layer rather than the morpheme layer. A predicative construction injects a substantive concept. Non-predicative uses allow participants to stand at the level of bare furniture of the mechanisms of language. The upholstery of substantivity is what contributes connotations, the implicit participants that carries meanings beyond the bare prepositions of logical form. The model of functionals allows a fixed inventory of relations, then substantives from the lexicon elaborate that bare level.

Abandoning the furniture metaphor, we could talk about three or four circles of meaning. 1) A bare level of participants (with referential indexes tracked and resolved) in (more or less) spatiotemporal relations, 2) a propositional semantics level where every substantive contributes exactly one concept type or relational concept type (like Sowa's conceptual graphs) 3) an Event semantics level (like that being developed by Levin and perhaps Jackendoff) which uses a minimal ontology of types that are revealed but perhaps implicitly in syntax, and 4) the additional implicit participants in the shared information state that are connoted by specific word senses, acquired by communicative experience.

We can distinguish wordforms in utterances, that carry the contextual meaning of a speech act (or written expression act), as a level of tokens, distinct but inextricably related to a level of types.

Types are wordtypes (lexical units) or morphemes in the shared scheme of individuation of speaker-hearers in the speech community. Wordforms are tokens of wordtypes participating in a particular utterance, and contributing in context to the shared information structure. We classify wordforms (at run-time, in occurrent cognitive states) according to their properties in the utterance and shared information state, both syntactic distribution and the situated meaning. This does not the extralinguistic significance of pragmatics, just the linguistic shared information state.

When we construct a level of description of lexical units (words, and also bound and free functional morphemes), where we model the shared regularities that allow the interpretation and semantic-syntactic generation of shared information states.

Note: Morphological rules may govern parts of the phonology-syntax interface, e.g. go + ed >> went, irregular verbs.

Wednesday, October 8, 2008

Leipzig glossing rules

Part of Typological tools for field linguistics

a consistent and widely accepted standard for the interlinear glossing of text

http://www.eva.mpg.de/lingua/pdf/LGR08_09_12.pdf

http://www.eva.mpg.de/lingua/resources/glossing-rules.php

Appendix: List of Standard Abbreviations

1first person
2second person
3third person
Aagent-like argument of canonical transitive verb
ABLablative
ABSabsolutive
ACCaccusative
ADJadjective
ADVadverb(ial)
AGRagreement
ALLallative
ANTIPantipassive
APPLapplicative
ARTarticle
AUXauxiliary
BENbenefactive
CAUScausative
CLFclassifier
COMcomitative
COMPcomplementizer
COMPLcompletive
CONDconditional
COPcopula
CVBconverb
DATdative
DECLdeclarative
DEFdefinite
DEMdemonstrative
DETdeterminer
DISTdistal
DISTRdistributive
DUdual
DURdurative
ERGergative
EXCLexclusive
Ffeminine
FOCfocus
FUTfuture
GENgenitive
IMPimperative
INCLinclusive
INDindicative
INDFindefinite
INFinfinitive
INSinstrumental
INTRintransitive
IPFVimperfective
IRRirrealis
LOClocative
Mmasculine
Nneuter
N-non- (e.g. NSG nonsingular, NPST nonpast)
NEGnegation, negative
NMLZnominalizer/nominalization
NOMnominative
OBJobject
OBLoblique
Ppatient-like argument of canonical transitive verb
PASSpassive
PFVperfective
PLplural
POSSpossessive
PREDpredicative
PRFperfect
PRSpresent
PROGprogressive
PROHprohibitive
PROXproximal/proximate
PSTpast
PTCPparticiple
PURPpurposive
Qquestion particle/marker
QUOTquotative
RECPreciprocal
REFLreflexive
RELrelative
RESresultative
Ssingle argument of canonical intransitive verb
SBJsubject
SBJVsubjunctive
SGsingular
TOPtopic
TRtransitive
VOCvocative


Jackendoff article

Construction after construction and its theoretical challenges by Ray Jackendoff
Language 84-1

This analyzes NPN constructions like "day by day" where both N's are the same. There are five propositions that are used: by, for, to, after and upon (with on as a variant).

On the connectionist front, I came across this paper:

Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience

Gayler, Dr Ross W. (2003) Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience. [Conference Paper] (Unpublished)

PDF
58Kb



Tuesday, October 7, 2008

Central Okinawan is endangered with 1 million speakers

^ Japan Focus: Language Loss and Revitalization in the Ryukyu Islands, Patrick Heinrich, posted November 10, 2005. Also | What leaves a mark should no longer stain: Progressive erasure and reversing language shift activities in the Ryukyu Islands, 2005, citing Hattori, Shirō (1954) 'Gengo nendaigaku sunawachi goi tokeigaku no hōhō ni tsuite' [‘Concerning the Method of Glottochronology and Lexicostatistics’], Gengo kenkyū [Journal of the Linguistic Society of Japan] v26/27

Are there Tagalog verbal roots? - Himmelmann

Tagalog semantics/lexical categories (or: "word classes") from Nikolaus Himmelmann at Bochum (also working with a Monash project).

Himmelmann, Nikolaus P. 1987. Morphosyntax und Morphologie - Die Ausrichtungsaffixe im Tagalog. München: Fink.
Himmelmann, Nikolaus P. 1991. The Philippine Challenge to Universal Grammar. Arbeitspapier Nr. 15. Köln: Institut für Sprachwissenschaft.
Himmelmann, Nikolaus P. 1998. ”Regularity in irregularity: Article use in adpositional phrases”. Linguistic Typology 2:315-353.

Himmelmann is a pioneer in defining the emerging subdiscipline within linguistics of language documentation and description, as a response to the crisis of endangered languages that has been accelerating over the last century and more. He is quoted at the website of the Hans Rausing Endangered Language Project:

The aim of a language documentation is to provide a comprehensive record of the linguistic practices characteristic of a given speech community... This... differs fundamentally from... language description [which] aims at the record of a language... as a system of abstract elements, constructions, and rules 

[p, 166, "Documentary and descriptive linguistics", Nikolaus P. Himmelmann (1998). Linguistics 36. pp. 161-195. Berlin: de Gruyter]


At Bochum and Köln, Leila Behrens is looking at the lexical typology of Tagalog.

"Die zweite Datenbankkomponente ist als Grundstein für ein neues Tagalog-Lexikon gedacht"

"Wenngleich wir nichts dagegen haben, unsere "Prinz"-Datenbank oder die "Tagalog"-Datenbank im Netz öffentlich zugänglich zu machen, erscheint uns das momentan aus dem genannten Grund noch als verfrüht."

Behrens & Sasse, H.-J. (1997), Lexical Typology: A Programmatic Sketch. Arbeitspapier Nr. 30 (Neue Folge). Institut für Sprachwissenschaft zu Köln.

- (2000), Semantics and Typology. In: Siemund, Peter (ed.), Methodology in Linguistic Typology. STUF 53 (1), 21-38.




Notes

"This proposal extends to the lexical level recent work challenging the categorial uniformity hypothesis (Bresnan 1994)"

Bresnan, Joan, 1994, ”Locative inversion and the architecture of Universal Grammar”. Language 70:72-131

"Therefore, it is not possible to define the subject simply as the phrase marked by ang. Instead, the subject is defined as the ang-phrase which follows the predicate (and there can be only one"

"if the predicate is marked with the CONVEYANCE VOICE prefix i-, then the subject expresses an argument bearing the semantic role of a displaced theme. ...(i.e. the entity viewed as moving) of the event expressed by the predicate"

"The suffix -an marks LOCATIVE VOICE. In locative voice, the subject expresses a locative argument, understood in a very broad sense. This may be the location at which something happened:
(11) tinirhán ko ang bahay na itó
Or the location to which (or from which) motion occurred:
(12) pinuntahán na namán nilá ang bata'
Locative voice is also used for recipients, addressees, and benefactees (13):
(13) tìtirán ninyó akó
Even more generally, locative voice may be used for all kinds of undergoers which are not directly affected by the action denoted by the predicate
(14) hindí'! tingnán mo si Maria
"

"The suffix -in marks PATIENT VOICE. It is the unmarked member of the undergoer-voice-marking affixes and is used for a broad variety of undergoers, including prototypical patients, i.e. entities directly affected or effected by the event denoted by the predicate:
(16) patayín natin itóng dalawang Hapón
The suffix -in differs from the other two undergoer suffixes in that it only occurs in non-realis mood (as in the preceding example). In realis mood, the predicate is simply marked by the realis undergoer voice infix -in-:"

(18) pùpunuín mo iyán ng kuto
pupunuin ng weyter ang baso ng tubigh
Why is the non-subject actor immediately after the predicate? Is linear order governed by thematic role rather than grammatical function?

"The locative marker sa marks a large variety of temporal and local adjuncts (20) and recipients/goals (21), as well as (some) definite patients and themes when they do not occur in subject function (cf. sa mga bata’ in (4) above)
(4) ang langgám rin ang tumulong sa mga bata’
"
I would analyze /bata'/ as a beneficiary, and thus a recipient rather than a patient

"To summarize: the four basic syntactic functions predicate, subject, non-subject argument or adjunct, and modifier are easily identifiable in Tagalog because there is a set of markers which in combination with a few positional restrictions allows a straightforward identification of each of these functions (with the exception of the modifier function which necessarily involves reference to the semantics of the two items joined by a linker)."

"However, it is common to assume that terminal syntactic categories and lexical categories are commensurate in that lexical categories are but further subcategorisations of the more general terminal syntactic categories. That is, declension classes are but a further subcategorisation of the superclass of nouns, verb classes just a further subcategorisation of the superclass of verbs, etc. Such a neat correlation between terminal syntactic categories and lexical categories in fact appears to exist in a number of languages (including, in particular, the Indo-European languages), but this is not universally so."


Of major interest

DeWolf, Charles M. 1979. Sentential Predicates: A Cross-Linguistic Analysis. Honululu: University of Hawaii dissertation.
DeWolf, Charles M. 1988. ”Voice in Austronesian languages of Philippine type: passive, ergative, or neither?”. In: Shibatani (ed.) 143-193.
Wolff, John U. 1993. ”Why roots add the affixes with which they occur”. In: Reesink (ed.) 217-244.
Gil, David. 1993. ”Tagalog Semantics”. BLS 19: 390-403.
Guzman, Videa P. de. 1978. Syntactic Derivation of Tagalog Verbs. Honululu: University Press of Hawaii.
Guzman, Videa P. de. 1997. ”Verbal affixes in Tagalog: Inflection or derivation?”. In: Odé, Cecilia & Wim Stokhof (eds.), Proceedings of the Seventh International Conference on Austronesian Linguistics, 303-325. Amsterdam: Rodopi.

Naylor, Paz B. 1980. ”Linking, Relation-Marking, and Tagalog Syntax”. In: id. (ed.), Austronesian Studies, Papers from the 2. Eastern Conference on Austronesian Languages, 33-49. Ann Arbor: The University of Michigan.
Kaswanti Purwo, Bambang (ed.). 1984. Towards a description of contemporary Indonesian: Preliminary Studies. Part I, Jakarta: Universitas Atma Jaya (=NUSA 18).
Clynes, Adrian. 1995. Topics in the phonology and morphosyntax of Balinese, based on the dialect of Singaraja, North Bali. PhD thesis, The Australian Nationa University.
Artawa, Ketut & Barry J. Blake. 1997. ”Patient Primacy in Balinese”. Studies in Language 21:483-508.

Other references in the paper

Anward, Jan, Edith Moravcsik & Leon Stassen. 1997. ”Parts of speech: A challenge for typology”. Linguistic Typology 1:167-183.
Austin, Peter & Joan Bresnan. 1996. ”Non-Configurationality in Australian Aboriginal Languages”. Natural Language and Linguistic Theory 14:215-268.
Broschart, Jürgen. 1997. ”Why Tongan does it differently: Categorial distinctions in a language without nouns and verbs”. Linguistic Typology 1:123-165.
Jacobs, Joachim, Arnim von Stechow, Wolfgang Sternefeld & Theo Vennemann (eds.). 1993. Syntax, Berlin: de Gruyter.
Jelinek, Eloise & Richard A. Demers. 1994. ”Predicates and pronominal arguments in Straits Salish”. Language 70:697-736.
Koptevskaja-Tamm, Maria. 1988. A typology of action nominal constructions. PhD thesis Stockholm University.
Lemaréchal, Alain. 1982. ”Semantisme des parties du discours et semantisme des relations”. Bulletin de la Société de Linguistique de Paris 77:1-39.
Lemaréchal, Alain. 1989. Les parties du discours. Sémantiquie et syntaxe. Paris: P.U.F.
Li, Charles N. (ed.). 1976. Subject and Topic. New York: Academic Press.
McFarland, Curtis D. 1976. A Provisional Classification of Tagalog Verbs. Tokio: Institute for the Study of Languages and Cultures of Asia and Africa.
Naylor, Paz B. 1995. ”Subject, Topic, and Tagalog syntax”. In: Benett, David, Bynon, Theodora and Hewitt, George B. (eds.), Subject, Voice and Ergativity 161-201. London: SOAS.
Pittman, Richard, 1966, ”Tagalog -um- and mag-. An Interim Report”. Papers in Philippine Linguistics 1:9-20 (Canberra: Pacific Linguistics, Series A, Nr.8).
Ramos, Teresita V. 1974. The Case system of Tagalog verbs. Canberra: Pacific Linguistics (Series B-27) .
Ramos, Teresita V. 1975. ”The Role of Verbal Features in the Subcategorization of Tagalog Verbs”. Philippine Journal of Linguistics 6:1-24.
Reesink, Ger P. (ed.). 1993. Topics in Descriptive Austronesian Linguistics. Leiden: Vakgroep Talen en Culturen van Zuidoost-Azië en Oceanië (= Semaian 11).
Rubino, Carl R.G. 1998b. ”The morphological realization and production of a nonprototypical morpheme: the Tagalog derivational clitic”. Linguistics 36:1147-1166.
Sasse, Hans-Jürgen. 1993a. ”Syntactic Categories and subcategories”. In: Jacobs et al. (eds.) 646-686.
Sasse, Hans-Jürgen. 1993b. ”Das Nomen - eine universale Kategorie?”. Sprachtypologie und Universalienforschung 46:187-221.
"two kinds of categorisation (lexical and syntactic/phrasal) should be clearly distinguished and that there is no necessary correlation between them."
Linguistics vol. 15. Los Angeles: UCLA/Department of Linguistics.
Shibatani, Masayoshi (ed.). 1988. Passive and Voice. Amsterdam: Benjamins.
Verhaar, John W.M. 1984. Affixation in contemporary Indonesian”. in: Kaswanti Purwo (ed.) 1-26.
Walter, Heribert. 1981. Studien zur Nomen-Verb-Distinktionaus typologischer Sicht. München: Fink

Standard references

Müller, Friedrich. 1882. Grundriss der Sprachwissenschaft. Bd.II, Abt.2. Wien: Alfred Hölder.
31

Blake, Frank R. 1925. A Grammar of the Tagalog Language. New Haven: American Oriental Society.
Bloomfield, Leonard. 1917. Tagalog Texts with Grammatical Analysis. 3 vols. Urbana, Ill.: University of Illinois.
Scheerer, Otto. 1924. ”On the Essential Difference Betweenthe Verbs of the European and the Philippine Languages”. Philippine Journal of Education 7:1-10.
Lopez, Cecilio. 1937. ”Preliminary Study of Affixes in Tagalog”. In: id. 1977, Selected Writings in Philippine Linguistics, 28-104. Quezon City: University of the Philippines.
Capell, Arthur. 1964. ”Verbal systems in Philippine languages”. Philippine Journal of Science 93:231-249.
Ramos, Teresita V. 1971. Tagalog Structures. Honululu: Univ. Press of Hawaii .
Schachter, Paul & Fay Otanes. 1972. Tagalog Reference Grammar. Berkeley/Los Angeles: University of California Press.
Schachter, Paul. 1976. ”The Subject in Philippine Languages, Topic, Actor, Actor-Topic or None of the Above”. In: Li (ed.) 491-518.
Schachter, Paul. 1995. The Subject in Tagalog: Still none of the above. UCLA Occasional Papers in

Cruz, Emilita L. 1975. A Subcategorization of Tagalog Verbs. Quezon City: University of the Philippines (= The Archive Special Monograph No.2).
Wolff, John U. with Maria Theresa C. Centeno and Der-Hwa V. Rau. 1991. Pilipino through Self-Instruction. 4 vols. Ithaca: Cornell Southeast Asia Program.
Kroeger, Paul R. 1993. Phrase Structure and Grammatical Relations in Tagalog. Stanford: Stanford University Press.

Keenan, Edward L. 1976. ”Towards a Universal Definition of ‘Subject’”. In: Li (ed.) 305-333.
Foley, William A. & Robert D. Van Valin. 1984. Functional Syntax and Universal Grammar. Cambridge: Cambridge University Press.
Jackendoff, Ray. 1983. Semantics and Cognition. Cambridge/Mass.: The MIT Press.

English, Leo J. 1986. Tagalog-English Dictionary. Manila: National Book Store.
Panganiban, José V. 1972. Diksyunario-Tesauro Pilipino-Ingles. Quezon City: Manlapaz Publishing Co.
Rubino, Carl R.G. 1998a. Tagalog Standard Dictionary. New York: Hippocrene Books.
Santos, Vito C. 1983. Pilipino-English Dictionary. 2nd revised edition. Metro Manila: National.