Tuesday, September 21, 2010

An operational definition of "meaning" (in a formal language)

Working notion of meaning

... we shall accept that the meaning of A is the set of sentences S true because of A. The set S may also be called the set of consequences of A. Calling sentences of S consequences of A underscores the fact that there is an underlying logic which allows one to deduce that a sentence is a consequence of A.

Wladyslaw M. Turski and Thomas S. E. Maibaum, The specification of computer programs (Wokingham: Addison-Wesley, 1987), p. 4.

cited in: C. M. Sperberg-McQueen, Meaning and interpretation of markup: a report on the Bechamel Project, slides from talk sponsored by the W3C German/Austrian Office at the Fraunhofer Gesellschaft Institut Medienkommunikation in Sankt Augustin, Germany, 1 October 2004.

How would one define "the epistemic sense of the utterance of a natural language expression"? What is the semantic contribution of individual lexical items that occur in that expression?

The slides are about the meaning of markup, which is a technological vocabulary for describing the representamen of a text (a character stream in Unicode, a layer of content and markup, a layer of syntax tree, a layer of Infoset graph). The representamen of a text or expression is described in terms of a type-vocabulary for its parts.

So ...

what inferences are licensed by each element type?
by each attribute?
for each location? (i.e. how do you associate the meaning with a particular instance?)

Part of the use-context of markup is it describes parts of the representamen that currently or at some time may get different treatment, in terms of stylistic rendering or some other (semantic?) application. Element names are typically in the everyday language of users of the text. Attribute names tend to be more technical, of interest more to technologists than end-users.

Some premises

Rules are vocabulary-specific.
The coding a is both visually and semantically parallel to Fa
Definition of F to be provided ...
In many cases, the relevant property has arity > 1: F(a,b), F(a,b,c), ...
As a consequence:

We need deixis.
Argument structure is crucial.

This may embody a rather extensional view of the representamen. The minimal character strings are individuals, an element type is predicated of one or more individuals. Deixis implies a need for reflexive content as well as referential.

Challenges:
technical / plumbing
- distributed / non-distributed properties
- overriding inheritance
- milestone elements
- unique identity of individuals

design / philosophical
- completeness
- fertile valley vs. desert landscape
- meta-markup

A ‘desert landscape’ view

The Wittgenstein transcripts postulate

the manuscript
the transcription
pages
text blocks (main block, left margin, ...)
characters

And possibly also

the von Wright catalog and its entries
words and sentences as described by Duden
people
dates
MECS-WIT version numbers

The name desert landscape is borrowed from W.V.O. Quine.

A ‘fertile valley’ view

We may also postulate

sections
revisions
acts of deletion
insertions
formulae
quotations
names, dates, things, ...

Markup is not used to classify utterances signaling a speech act display. Historically, it emerged to classify a text as a renderable document. This is a display of a persistent artifact, made of parts which are visually distinguishable in the rendering.

Sunday, September 19, 2010

Further research on Lexical Semantics of Construal

Some thoughts about the broader significance of LSCon

The LSCon Methodology could be applied to bilingual resources, fixing a signature that bridges two languages

I am interested in

German-English and English-German (perhaps using the Collins dictionary)

Chinese-English
English-Filipino, Cebuano-English
Indonesian-English and English-Indonesian
Tok Pisin-English

study AktionsArt, as empirically analyzed in PL-Onto and exemplars
develop applications in computer assisted language learning

explore games and interactive learning environments with a NLU interface
another area for collaboration with undergraduates or masters students

What tasks are involved in manually compiling an MRD suitable for conversion into MRS?

cluster KWIC data from a corpus into candidate senses
isolate a distinctive Recognition Situation for a cluster, and enumerate the typical Grammar Patterns
initially characterize the Distinguishing Situations of candidate senses, including implicit participants and logical alternatives (which may still be split off into separate senses)
fix a set of senses, choosing the Head Relations (e.g. the more specific verb to use) in the Distinguishing Sits

Head Relations are the symbols used to partition a word-level concept into sense-level concepts according to a fixed sense signature. For example, catch_capture_hrel and catch_seize_hrel are two senses out of 24. They are called Head Relations, because they characterize the head of a clause with a symbol for the head of an infon.

choose exemplars for each sense and Grammar Pattern
write the definition of each sense
verify the definition against exemplars and the KWIC data. Check there is adequate coverage (all corpus data fits well into a sense, or is close enough, or is an obscure usage)
compare the definition to what is available about related words (near-synonyms, antonyms, semantic sets)

How would these tasks change if a team were to develop a LSCon research directly from corpus data, with no pre-existing MRS?

an MRS could be automatically generated as one of the outputs from the LSCon Resource database
This approach could be used in developing resources for less-studied languages
It would be nice to provide integration with FieldWorks Language Explorer, so that SIL data could be massaged into a LSCon resource. Perhaps it is useful to develop a standard sense signature for bilingual dictionaries with English (and other languages of wider communication).

This could take the form of a defining vocabulary for senses and exemplars in the Source language, with tools for (graphically?) constructing logically precise definitions.
Since included among the target users of a FLEx dictionary are local translators and language workers, who have limited competence in English or other LWC, it is useful to have a restricted and precise defining vocabulary.
In fact, the production of a full bilingual DevelopmentLanguage-English dictionary with well written English definitions and glosses could be postponed if there are enough resources, it is sufficient to create sense definitions that are clear enough in LSCon, then move directly to other tasks like

Producing an English-DevLang dictionary, where local workers do have the linguistic competence to compose good definitions and glosses. Their reading knowledge of English may be sufficient for this work, more than trying to write good definitions and glosses in English. This also has important applications in schooling and literacy education.

This could start with matching the LSCon-EN encoded sense definitions of a DevLang headword from the DevLang-EN dictionary with the closest senses in English (senses from one or more English headword). This ensures that well-described words in the DevLang are maximally utilized in a precise and consistent way.
This approach seems to be more sensitive to the sense-level nuances of meaning, compared to the typical reversal index.
Perhaps I should experiment with doing this for an English-Filipino dictionary for the most common verbs.

Producing a monolingual dictionary in the Development Language. This could include the construction of an LSCon sense signature for that language (informed by the choices made in producing a English-DevLang dictionary, if there is one). It includes encoding the dictionary entries and exemplars using LSCon Notation into a lexical resource that could be readily integrated into existing software and interactive applications.

It might be useful to populate the SIL DDP topics with LSCon-EN senses....

One motivation for producing an LSCon resource during early stages of lexicography is to connect the language to existing NLP software and educational resources. Interactive learning resources and games might be particularly useful for literacy and multilingual education applications.

PL-Onto and common sense

Part of my concept of developing Lexical Semantics of Construal LSCon is to develop a Populated Local Ontology PL-Onto

The detailed empirical results of PL-Onto in LSCon-EN should be of interest to researchers in AI because it gives a concrete window in several domains of common sense, providing a more precise characterization of what it is.
We assume that the vague notion of common sense is constructed with the help of general concepts about situations and how individuals and situations interact, the PL-Onto level of Situation. This level constitutes (at least in part) the upper ontology of PL-Onto.

The "upper" level of Situation abstracts from the concrete details of other levels. It embodies the common sense intuitions of how situations work across other levels

The empirical work of deriving PL-Onto from MRD data can be done gradually, by layered level. The more basic levels make less complex assumptions about the world. Presumably they use only part of the upper ontology, the more basic the level, the smaller or simpler the part.
The PL-Onto level of Physical builds on intuitions that the world is nothing but res extensa, physical bodies made of matter and occupying space and time. These bodies have physical qualities that change, and interact because of forces. The source of force is not analyzed in detail, though it may be the movement of some animate creature, an Agent. Clauses describing a Physical situation may not mention an underlying animate cause of interaction, and the SUBJ in the clause may be some inanimate participant which we can call an Actant (or Sowa's Initiator), a thematic role with fewer assumptions than Agent.
The PL-Onto level of Animate builds on intuitions that Actions in the world are caused by the movement of animate creatures, various species of animals or humans seen primarily as moving creatures with bodies (ignoring specifically human capacities, social dimensions, or mental life).

[does "function" begin here, teleological functions relating to assumed biological regularities that are long-range and homeostatically maintained by biological systems?]

The PL-Onto level of Human allows the expression of basic common sense intuitions about how humans interact with each other and the world.
The PL-Onto level of Institution allows the expression of common sense intuitions about social institutions, and how humans interact socially.

[Should Artifact be split off, to characterize objects that are intentionally constructed with a "functional" purpose? Need to investigate nouns first. Perhaps verbs like "operate, build, construct" should be moved to level Artifact]
[examples, what they tell us about common sense]

The PL-Onto level of Attitude allows the expression of common sense intuitions about Intentionality [Searle]

This level embodies "folk psychology," providing Reasons for Action from the unseen domain of internal mental causes for human action. [see Dretske]
The scheme of individuation at this level has an intrinsic 1&3P character, so intuitions about the phenomenality of mental states can be accounted for at this level.
Although mental states in others are unseen, they are readily inferrable from behavior (and language reports) as causes for human Action. Causality at this level is likely very different from physical causes when the world is seen as simply Physical.

The intuitions about these differences underly Descartes distinction between res extensa and res cogitans. Thinking things, the mental state of thinking creatures, appear not to have physical extent since they are unseen. Descartes, working under the influence of a neo-Platonist tradition about ousia (Substantive-entities), considered primarily the Type-level of mental entities, taken to be immutable Forms. As a more modern alternative conception, we can consider Type-level of mental entities to be location-parameter-absorbed situation-types where humans are in a mental state. With a nod to Perry's antecedent physicalism, there can be a purely physicalist account of the Token-situations that ground these situation-types, working within a physicalist 3P scheme of individuation. However, situation-types that are intrinsically mental are individuated through a 1&3P scheme that associates phenomenality with mental entities, providing some kind of insight into the "hard problem" of consciousness
Common sense intuitions about mental states, the stuff of folk psychology, give humans readily inferrable information about unseen causal states underlying the Actions of others. PL-Onto at this level gives a detailed characterization of (linguistic aspects of) common sense about Intentionality in human Action. It allows a characterization of what is cognitively accessible to humans as they behave motivated by mental reasons for action. This provides a window into the "easier" problem of cognitive access (what Nagel earlier called access consciousness).

This characterization does not depend on functionalism or teleology, it can derive functions and purpose from more basic entities of mental states, as expressed in the content of utterances expressing common sense intuitions about Intentionality in human Action. Functionalist and teleological explanations are not necessarily wrong, but they may be a bit backward in explaining cognitive processes from the outside in. It is admittedly difficult to come up with alternative explanations starting from "inside" the head with unseen mental states. But this task becomes easier if we make use of the window into mental states provided by the empirically-derived PL-Onto level of Attitude (and Language).
Mental state may not be epistemologically-objective because they cannot be directly seen (humans only physically "see" behavior, not mental causes). However, mental states can be readily inferred from behavior, and reliably confirmed by subsequent behavior and language. Humans can indirectly "see" mental causes of behavior through the lens of folk psychology.
Once we accept as valid both direct observations (epistemologically-objective) and indirect observations as valid empirical basis of science, we can develop a rigorous cognitive psychology account of the phenomena characterized by common sense or "folk" psychology. The indirect observations are not arbitrarily subjective, they are intersubjective is consistent and observationally-confirmable ways.
The subject matter of this characterization of mental states as entities for scientific study has been rejected out of hand the Behaviorism and related tradtions in experimental psychology. Mental states are taken to be ontologically-subjective, as well as epistemologically-subjective. This is a methodological error in psychological research, grounded in unhelpful philosophical assumptions about the lived world of psychological phenomena. The alternative is to treat mental states as ontologically-objective real entities seen through the lens of human schemes of individuation, specifically the 1&3P scheme of individuation mental states. In fact, modern technology discovered after the demise of Behaviorism allows researchers to physically "see" traces of mental states using fMRI, PET, advanced EEG, etc.
It will be difficult to characterize mental entities if only approached "bottom-up" from the physical tokens of mental phenomena. If we use the indirect inter-subjective evidence of verbal reports about mental entities, a "top-down" approach, we can come out with precise theoretical accounts that can be confirmed or refuted by physical evidence. Extracting the PL-Onto levels of Attitude and Language depends on available print resources that characterize lexical knowledge (of the English speech community) and a window into common sense about Intentional mental states.
This can be helpful to the scientific study of cognition, and also to weak AI technologists who need to characterize common sense precisely enough for some practical purpose.

Therefore, we can move beyond Cartesian dualism and start treating mental entities as ontologically-objective real entities within the purview of cognitive science, observed using direct physical and indirect inter-subjective methods. The inter-subjective methods are reliable and confirmable. Extracting PL-Onto from lexical resources is a way of moving forward with this method to characterize (the linguistically accessible part of) common sense about human mental states.

So the level of Attitude characterizes the vocabulary used in talking about the lived world of mental life, where Actions are caused by mental states inferred but not seen in other humans.
Although mental states in others are unseen, they are inferred tokens of "the same" types that a human uses to classify their own mental states. This helps account for the intuition that there is something "it is like" [Nagel] to be a human experiencing a mental state. Through the shared 1&3P scheme of individuation of types of mental states (a subset of which are the verbal concepts for mental states expressed in verbs of propositional attitude), humans can infer states that have properties of cognitive access and phenomenality [what Nagel earlier called access consciousness and phenomenal consciousness].

The PL-Onto level of Language allows the expression of common sense intuitions about utterances using the human language capacity.

Lexical Semantics of Construal

Concept of LSCon

LSCon is a knowledge representation technology

I also use the acronym to refer to the underlying methodology and approach to characterizing the semantics of words in construction as a contribution to a broader process of construal. This is studied in a principled way within a broad cognitive linguistics framework, but with immediate applications in mind. In case of ambiguity, this underlying approach to designing lexical resources can also be called LSCon Methodology
LSCon Notation refers to the specific notation as it develops for modeling English with the Sense Signature fixed by
a single MRD source (COBUILD). The notation is influenced by CG and EKN: Conceptual Graphs of John Sowa and the ISO standard, and the Extended Kamp Notation of Situation Theory and Situation Semantics. See also Controlled English.
The trial lexical resource (and its specific variant of the notation given the fixed sense signature) that is built during research in LSCon is tentatively called "the LSCon Resource for English" or LSCon-EN, but this may be replaced by another name: SenseNet? Sense Graphbank?

The initial resource includes an encoding of definitions, including the Distinguishing Situations of the most basic senses of lexical entries the most frequent verbs in COCA (plus nouns mentioned in definitions).
It also includes an integrated encoding of exemplars, one or two for each Grammar Pattern

Perhaps some or all exemplars will be taken from a different corpus from COBUILD's BoE, like COCA
This will help validate the sense signature, but takes more work

The initial resource is a derived work from COBUILD, and may be subject to a restrictive copyright licensing

A future lexical resource may be developed with a liberal Creative Commons license

This may involve generating a new sense signature from a corpus, consulting and synthesizing multiple dictionaries (so that the result is fair use of various dictionaries, not similar enough to any one to be called a derived work. Do we need clean-room methods for this?)

LSCon has layered dependencies or interfaces with other domains of cognition and language processing

It is layered above phonology-syntax (initially HPSG and/or SBCG, using MRS or an extension at the syntax-semantic interface).

This includes pre-semantic uses of context, in Perry's terms (R&R 3.4)

It is layered above a populated ontology of encyclopedic knowledge. This is fixed by the sense signature, but it involves discovering an upper ontology that may be substantially universal or cross-linguistic.

Each lexical entry interfaces to encyclopedic knowledge, via a populated local ontology. This includes certain concepts underlying the substantive lexemes in the definition, specifically the information conditions expressed in Distinguishing Sits of each sense.

I have not yet decided whether to "chase down sense distinctions" so that the concepts used in the local ontology are sense-level rather than word-level

The union of populated local ontologies of all lexical entries, including the shared upper ontology, is called PL-Onto.
The upper ontology, empirically derived rather than theoretically motivated, is called Upper PL-Onto. It is theoretically influenced by situation semantics, perhaps also Peirce's triadic schema of signs and Sowa's upper ontology in Conceptual Structures.
PL-Onto is expected to be layered into levels, which are derived empirically per sense (not per lexical entry)

Initial levels include

Situation (mostly incorporated into the upper ontology)
Physical - assumes only bodies and substances interacting in space, sources of force are unanalyzed
Mixed - default level, senses that don't clearly fit elsewhere
Animate - the source of force or causation is an animal (including hominids, but not requiring human-specific capacities)
Human - a human agent is involved in a way that invokes human-specific capacities not classified in subsequent levels
Institution - involves dependencies on human social institutions such as economic institutions, marriage, schooling, etc. There may be a need to further distinguish technology and artifacts, but the initial analysis of verbs does not make this obvious
Attitude - this is the first level to distinguish mental states. It is called attitude because it includes verbs of propositional attitude (belief, desire, etc.)
Language - entities at this level involve human language capacity in essential ways

Part of the motivation of layering is to simplifying analysis by limiting the parts of the upper ontology (essentially) required for modeling.
These levels seem to capture the interface to "common sense" required by lexical entries. See the separate note on "PL-Onto and common sense"

LSCon has dependencies or interfaces with other domains that are expected to be less influential (at least on the development of LSCon), since LSCon is layered below them (more or less) in the anticipated architecture of cognition. LSCon or related outputs feed into

DRS processing, at the semantic level (resolving anaphora) and post semantic level (anchoring variables to a state or context, perhaps via some worked out dynamic semantics)
Designation processing. This includes the inference of referential content and its anchoring to individuals and worlds fixed by context. LSCon covers the ground of an intensional logic, Frege's Sinn. Designation processing is using that Sinn to get to a Bedeutung (reference, denotation).
Near-side pragmatics (which may includes parts of Designation processing)
Far-side pragmatics, including speech acts, communicative intentions, conversational implicatures
Post-semantic connections to encyclopedic knowledge. This includes specialized knowledge domains, in contrast to the very generic (common sense) knowledge domains of the layered levels of PL-Onto.
Interface of lexical information with other cognitive capacities, such as linking lexical information about physical object to "visual objects" in a scheme of concept types from Visual Object Recognition

A KR technology is focused on structured data representations, and is relatively separate from the algorithms that use the representations.
LSCon is envisioned to be used in algorithms for cognitive construal. It is hoped that it will be a "broad spectrum" resource that is of interest to a wide range of researchers and technologiest. Parts of that range can include people investigating:

logical processing, like DRS. Even if DRS, because it is closely tied to referential content, prefers to stick with word-level symbols, the sense-level distinctions of LSCon may be useful at intermediate stages then discarded
parallel distributed processing
modeling of functional schemas in the brain, like Arbib (and the Finnish guy in the US whose name escapes me)
Quillan's semantic networks (?)

Many of these uses, and their algorithms, go beyond semantics narrowly defined
The major algorithmic use of LSCon that I expect to develop as part of research into LSCon is the construction of the Semantic Contribution of a sentence (and its MRS)

Initially, I am most interested in clause constructions. Perhaps I should simplify NP's to just their heads plus a few other constituents relevant to the Distinguishing Sits of the candidate senses.

Sem Contrib is intended to be context-independent

It avoids crossing the interface to other neighboring domains of construal.

in practice, it may be necessary for any implementation to perform certain context-dependent processes in parallel (notably DRS processing and Designational processing of the referential content). I think it will be helpful to keep those conceptually separate at this stage, and see how much can be achieved by semantic representations independent of everything else.

The algorithms to construct Sem Contrib from an input sentence and a lexical resource are called Semantic Integration.

This is intended to be psychologically plausible, it is a fragment of the psychological (and neural) processes of cognitive construal of languistic information.
Any implementations of Sem Contrib and linguistic construal are expected to make large and small departures from psychological plausibility, for whatever technological or empirical reasons are at hand.

The integration of Sem Contrib is expected to be relevant to a variety of tasks, including certain shared tasks useful for evaluating the state of the art of language technology

The WSD task may include DRS and Designational processing, but Sem Contrib may be the critical factor in resolving senses within a signature
The SRL task using LSCon would involve mapping to the lexical-entry local participant roles
It would be interesting to see implemented algorithms that integrate information from LSCon and other resources, such as WordNet, FrameNet, and Pustejovsky-style resources in the Generative Lexicon tradition (I believe both Pustejovsky and Buitelaar have resources). The resources of Martha Palmer and her students may also be of interest.

Perhaps undergrad and masters students might want to explore some of this.

Other shared tasks or variants could be designed using a LSCon resource

Saturday, September 4, 2010

Precursors to Language

[added on Feb 2011]

Hominids experience reality and classify it. They classify scenes visually into visual objects that participate in visual scenes. Some of the visual scenes are action-scenes. Action-scenes involve other hominids visibly and audibly behaving, with their body parts like faces and eye-gaze and hand movements, and with extended vocalizations segmented into parts with different articulation. [The precursor to linguistic classification capacities is the segmental, and thus discrete, classification of display action-scenes. The breakthrough to linguistic classifications can work without vocalization, as demonstrated by the emergence of Deaf sign languages repeatedly in many communities in history. However, the typical case, which likely interacted with the biological evolution and cultural development of behavioral modernity, is that display action-scenes incorporate vocalizations classified phonemically. The breakthrough builds on prior cognitive capacities related to social behavior, conventionalized routines and intentionality. Behavior in the linguistic domain is unambiguously attained with the systematic intentionality of conventionalized phoneme sequences (or in the less common case, conventionalized sign lexical items). The intentionality of linguistic actions is marked by a sharp demarcation between the display situation of the source signal, and the described situation that becomes the cognitively accessible "content" of a linguistic display or utterance. The link is mediated by a shared lexical scheme of individuation, so that recovery of the source signal during the linguistic recognition performance of a listener depends critically on the conventionalized-arbitrary lexical association between phonemically-classified display and "cognitively established content". It is the shared lexical scheme that establishes what pragmatically available significance of vocalization displays is established. The established lexical senses extract a semantic level of meaning that is independent of the particular utterance situation, and indeed the broader linguistic context of usage. To be an established sense it is sufficient to associate with the phonemic form of a lexical item a discrete "semantic contribution" in term of other (more or less basic) lexical items in the scheme.]

[The level of "content" in the described situation is thus abstracted from particularities of the utterance display situations, with the important exception of indexicals. Content is in general practice transparently referential. (But see Perry's reflexive-referential theory of content, which analyzes the importance of additional levels of content to explicate indexicality and other reflexive phenomena, even proper names. ]

Hominids classify a visual scene into an action-scene if it involves a hominid (the Agent a) behaving in a known pattern of behavior [as mentioned by a verb], with specific success-conditions. For example, an a-grabs-f situation is one where an agent a moves parts of their body to make a significant change in the scene. Before the action, object f was located in the scene but not located in the hands of of a. If the scene satisfies the conditions to be an a-grabs-f situation, the scene changes so that at a later time f is in the hands of a.

Event:
Prior sit:
Resulting sit:
t1 part-of t, t2 part-of t, t1 before t2

We can say that an action-scene or event is the part of the world that can be classified according to such conditions. Let us say there is a hominid s, that classifies the visual scene before it as an a-grabs-f scene. Then s can visually track the various individuals involved in the event, and tracks the n-ary condition [which underlies the sense of the verb] picked out by that classification, which are relations that individuals stand in (or not) in the scene. If s registers a scene as falling under the classification a-grabs-f, they can remember it as such. If a few days later, the see a grabbing another object f2 of the same type F as f (a physical object, let us say food), they can classify the new scene as the same type of situation as earlier, call the situation-type a-grabs-F. The a-grabs-F situation-type is more general than the earlier a-grabs-f action-scene (a singular scene, involving fully identified individuals), because the participant F is parametrized to a type of object F. Similarly, s can classify the actions of another hominid b in a similar way, according to a still more general situation-type A-grabs-F. [A is the type of some particular hominid who takes the role of an Agent in the action-situation so classified.]

The world in which hominids live is full of regularities, and their ability to recognize and remember those regularities allows a troop of hominids to be successful in survival, reproduction and maintaining group cohesion. Some regularities are related to others. For example, the action of grabbing results in a situation where A-holds-F. For purposes of illustration, we do not treat hold as a significant change in a scene, but as a significant continuity or stative relation. An event that involves some A holding an F involves a stative relation rather that a proper change-based action (admittedly the boundary between actions and statives can be fuzzy or arbitrary, at some earlier time, the agent must have acted to come to hold the object).

Event:
Prior sit:
Resulting sit:
t1 part-of t, t2 part-of t, t1 before t2

Still another related situation-type to the grab action is a catch action, where the object of type F was moving in the prior situation, and when it is caught and held it no longer moves. We can distinguish two variations (at least) of the catch action. If the object is of an animate type G (it is a hominid, or an animal) it is able to move on its own and avoid the catching action of the agent. The agent will often have to chase the object it wants to catch, and may have to use some instrument like a rope or net to restrain its movement. We will call this variant a catch-capture situation. If the object is of an inanimate type H, it is typically moving because it is falling through the air due to gravity. We call this variant with inanimate objects a catch-seize situation. We can characterize the conditions for a scene to be classified as a catch situation as follows:

• Event:
o Sense1-event:
 Prior sit:
•
•
•
 Resulting sit:
• OR
•
• <>
• <>
•
 t1 part-of t, t2 part-of t, t1 before t2
o Sense2-event:
 Prior sit:
•
•
• <>
 Resulting sit:
•
• <>
• <>
 t1 part-of t, t2 part-of t, t1 before t2

Our hominid s can observe the scenes around her, and if they involve individuals of the appropriate type, s can classify situations that involve grabbing, holding and catching. We say that s has a long-term cognitive memory, with a scheme of individuation (for situations involving individuals and relations) that is attuned to precisely those types of situations, and many others. Our hominid can track scenes, classifying them as being in a certain type of situation, or as not being in that situation-type, using their working memory to judge if a situation sit1 is of a certain type (for example: a-grabs-f) or not. We say that hominid s believes that sit1 is of Situation-Type-A if her working memory tracks the relevant individuals as standing in the relations specified in the conditions for Situation-Type-A.

Hominid s lives in a troop with other hominids, and classifies their actions in ways that are relevant to the continuing social life of the troop. A hominid can call the attention of others in the troop to a certain situation of a type, perhaps by a vocalization display or by shifting their gaze in a way that can be observed by others, or by pointing. We call these attention-directing displays as referring actions. If the belief state of s is that the scene they ["they" = s and troop members who can observe her display] can see is significant because it is of a certain type, they can call attention to the scene or to the individuals involved by referring displays. Members of the troop are attuned to the situations that significant others find important, which is important to the cohesion of the group. Young hominids are socialized to be aware of the same types of situations as adult members of the troop, and develop the same sort of cognitive scheme of individuation as the others in the troop.

[In support of the idea of referring actions using monitored gaze-shift, we have the way that hominids apparently have some selectional advantage by having white scleras in their eyes. This make more prominent a referring display by shifting the gaze from one location to another.]

This framework allows us to propose a scenario for the evolution of language among hominids.