Concept of LSCon

LSCon is a knowledge representation technology

I also use the acronym to refer to the underlying methodology and approach to characterizing the semantics of words in construction as a contribution to a broader process of construal. This is studied in a principled way within a broad cognitive linguistics framework, but with immediate applications in mind. In case of ambiguity, this underlying approach to designing lexical resources can also be called LSCon Methodology
LSCon Notation refers to the specific notation as it develops for modeling English with the Sense Signature fixed by
a single MRD source (COBUILD). The notation is influenced by CG and EKN: Conceptual Graphs of John Sowa and the ISO standard, and the Extended Kamp Notation of Situation Theory and Situation Semantics. See also Controlled English.
The trial lexical resource (and its specific variant of the notation given the fixed sense signature) that is built during research in LSCon is tentatively called "the LSCon Resource for English" or LSCon-EN, but this may be replaced by another name: SenseNet? Sense Graphbank?

The initial resource includes an encoding of definitions, including the Distinguishing Situations of the most basic senses of lexical entries the most frequent verbs in COCA (plus nouns mentioned in definitions).
It also includes an integrated encoding of exemplars, one or two for each Grammar Pattern

Perhaps some or all exemplars will be taken from a different corpus from COBUILD's BoE, like COCA
This will help validate the sense signature, but takes more work

The initial resource is a derived work from COBUILD, and may be subject to a restrictive copyright licensing

A future lexical resource may be developed with a liberal Creative Commons license

This may involve generating a new sense signature from a corpus, consulting and synthesizing multiple dictionaries (so that the result is fair use of various dictionaries, not similar enough to any one to be called a derived work. Do we need clean-room methods for this?)

LSCon has layered dependencies or interfaces with other domains of cognition and language processing

It is layered above phonology-syntax (initially HPSG and/or SBCG, using MRS or an extension at the syntax-semantic interface).

This includes pre-semantic uses of context, in Perry's terms (R&R 3.4)

It is layered above a populated ontology of encyclopedic knowledge. This is fixed by the sense signature, but it involves discovering an upper ontology that may be substantially universal or cross-linguistic.

Each lexical entry interfaces to encyclopedic knowledge, via a populated local ontology. This includes certain concepts underlying the substantive lexemes in the definition, specifically the information conditions expressed in Distinguishing Sits of each sense.

I have not yet decided whether to "chase down sense distinctions" so that the concepts used in the local ontology are sense-level rather than word-level

The union of populated local ontologies of all lexical entries, including the shared upper ontology, is called PL-Onto.
The upper ontology, empirically derived rather than theoretically motivated, is called Upper PL-Onto. It is theoretically influenced by situation semantics, perhaps also Peirce's triadic schema of signs and Sowa's upper ontology in Conceptual Structures.
PL-Onto is expected to be layered into levels, which are derived empirically per sense (not per lexical entry)

Initial levels include

Situation (mostly incorporated into the upper ontology)
Physical - assumes only bodies and substances interacting in space, sources of force are unanalyzed
Mixed - default level, senses that don't clearly fit elsewhere
Animate - the source of force or causation is an animal (including hominids, but not requiring human-specific capacities)
Human - a human agent is involved in a way that invokes human-specific capacities not classified in subsequent levels
Institution - involves dependencies on human social institutions such as economic institutions, marriage, schooling, etc. There may be a need to further distinguish technology and artifacts, but the initial analysis of verbs does not make this obvious
Attitude - this is the first level to distinguish mental states. It is called attitude because it includes verbs of propositional attitude (belief, desire, etc.)
Language - entities at this level involve human language capacity in essential ways

Part of the motivation of layering is to simplifying analysis by limiting the parts of the upper ontology (essentially) required for modeling.
These levels seem to capture the interface to "common sense" required by lexical entries. See the separate note on "PL-Onto and common sense"

LSCon has dependencies or interfaces with other domains that are expected to be less influential (at least on the development of LSCon), since LSCon is layered below them (more or less) in the anticipated architecture of cognition. LSCon or related outputs feed into

DRS processing, at the semantic level (resolving anaphora) and post semantic level (anchoring variables to a state or context, perhaps via some worked out dynamic semantics)
Designation processing. This includes the inference of referential content and its anchoring to individuals and worlds fixed by context. LSCon covers the ground of an intensional logic, Frege's Sinn. Designation processing is using that Sinn to get to a Bedeutung (reference, denotation).
Near-side pragmatics (which may includes parts of Designation processing)
Far-side pragmatics, including speech acts, communicative intentions, conversational implicatures
Post-semantic connections to encyclopedic knowledge. This includes specialized knowledge domains, in contrast to the very generic (common sense) knowledge domains of the layered levels of PL-Onto.
Interface of lexical information with other cognitive capacities, such as linking lexical information about physical object to "visual objects" in a scheme of concept types from Visual Object Recognition

A KR technology is focused on structured data representations, and is relatively separate from the algorithms that use the representations.
LSCon is envisioned to be used in algorithms for cognitive construal. It is hoped that it will be a "broad spectrum" resource that is of interest to a wide range of researchers and technologiest. Parts of that range can include people investigating:

logical processing, like DRS. Even if DRS, because it is closely tied to referential content, prefers to stick with word-level symbols, the sense-level distinctions of LSCon may be useful at intermediate stages then discarded
parallel distributed processing
modeling of functional schemas in the brain, like Arbib (and the Finnish guy in the US whose name escapes me)
Quillan's semantic networks (?)

Many of these uses, and their algorithms, go beyond semantics narrowly defined
The major algorithmic use of LSCon that I expect to develop as part of research into LSCon is the construction of the Semantic Contribution of a sentence (and its MRS)

Initially, I am most interested in clause constructions. Perhaps I should simplify NP's to just their heads plus a few other constituents relevant to the Distinguishing Sits of the candidate senses.

Sem Contrib is intended to be context-independent

It avoids crossing the interface to other neighboring domains of construal.

in practice, it may be necessary for any implementation to perform certain context-dependent processes in parallel (notably DRS processing and Designational processing of the referential content). I think it will be helpful to keep those conceptually separate at this stage, and see how much can be achieved by semantic representations independent of everything else.

The algorithms to construct Sem Contrib from an input sentence and a lexical resource are called Semantic Integration.

This is intended to be psychologically plausible, it is a fragment of the psychological (and neural) processes of cognitive construal of languistic information.
Any implementations of Sem Contrib and linguistic construal are expected to make large and small departures from psychological plausibility, for whatever technological or empirical reasons are at hand.

The integration of Sem Contrib is expected to be relevant to a variety of tasks, including certain shared tasks useful for evaluating the state of the art of language technology

The WSD task may include DRS and Designational processing, but Sem Contrib may be the critical factor in resolving senses within a signature
The SRL task using LSCon would involve mapping to the lexical-entry local participant roles
It would be interesting to see implemented algorithms that integrate information from LSCon and other resources, such as WordNet, FrameNet, and Pustejovsky-style resources in the Generative Lexicon tradition (I believe both Pustejovsky and Buitelaar have resources). The resources of Martha Palmer and her students may also be of interest.

Perhaps undergrad and masters students might want to explore some of this.

Other shared tasks or variants could be designed using a LSCon resource

Semantical: reviews on meaning & representation

Sunday, September 19, 2010

Lexical Semantics of Construal

Concept of LSCon

No comments: