Sunday, September 19, 2010

Further research on Lexical Semantics of Construal

Some thoughts about the broader significance of LSCon

  • The LSCon Methodology could be applied to bilingual resources, fixing a signature that bridges two languages
    • I am interested in
      • German-English and English-German (perhaps using the Collins dictionary)
      • Chinese-English
      • English-Filipino, Cebuano-English
      • Indonesian-English and English-Indonesian
      • Tok Pisin-English
  • study AktionsArt, as empirically analyzed in PL-Onto and exemplars
  • develop applications in computer assisted language learning
    • explore games and interactive learning environments with a NLU interface
    • another area for collaboration with undergraduates or masters students
  • What tasks are involved in manually compiling an MRD suitable for conversion into MRS?
    • cluster KWIC data from a corpus into candidate senses
    • isolate a distinctive Recognition Situation for a cluster, and enumerate the typical Grammar Patterns
    • initially characterize the Distinguishing Situations of candidate senses, including implicit participants and logical alternatives (which may still be split off into separate senses)
    • fix a set of senses, choosing the Head Relations (e.g. the more specific verb to use) in the Distinguishing Sits
      • Head Relations are the symbols used to partition a word-level concept into sense-level concepts according to a fixed sense signature. For example, catch_capture_hrel and catch_seize_hrel are two senses out of 24. They are called Head Relations, because they characterize the head of a clause with a symbol for the head of an infon.
    • choose exemplars for each sense and Grammar Pattern
    • write the definition of each sense
    • verify the definition against exemplars and the KWIC data. Check there is adequate coverage (all corpus data fits well into a sense, or is close enough, or is an obscure usage)
    • compare the definition to what is available about related words (near-synonyms, antonyms, semantic sets)
  • How would these tasks change if a team were to develop a LSCon research directly from corpus data, with no pre-existing MRS?
    • an MRS could be automatically generated as one of the outputs from the LSCon Resource database
    • This approach could be used in developing resources for less-studied languages
    • It would be nice to provide integration with FieldWorks Language Explorer, so that SIL data could be massaged into a LSCon resource. Perhaps it is useful to develop a standard sense signature for bilingual dictionaries with English (and other languages of wider communication).
      • This could take the form of a defining vocabulary for senses and exemplars in the Source language, with tools for (graphically?) constructing logically precise definitions.
      • Since included among the target users of a FLEx dictionary are local translators and language workers, who have limited competence in English or other LWC, it is useful to have a restricted and precise defining vocabulary.
      • In fact, the production of a full bilingual DevelopmentLanguage-English dictionary with well written English definitions and glosses could be postponed if there are enough resources, it is sufficient to create sense definitions that are clear enough in LSCon, then move directly to other tasks like
        • Producing an English-DevLang dictionary, where local workers do have the linguistic competence to compose good definitions and glosses. Their reading knowledge of English may be sufficient for this work, more than trying to write good definitions and glosses in English. This also has important applications in schooling and literacy education.
          • This could start with matching the LSCon-EN encoded sense definitions of a DevLang headword from the DevLang-EN dictionary with the closest senses in English (senses from one or more English headword). This ensures that well-described words in the DevLang are maximally utilized in a precise and consistent way.
          • This approach seems to be more sensitive to the sense-level nuances of meaning, compared to the typical reversal index.
          • Perhaps I should experiment with doing this for an English-Filipino dictionary for the most common verbs.
        • Producing a monolingual dictionary in the Development Language. This could include the construction of an LSCon sense signature for that language (informed by the choices made in producing a English-DevLang dictionary, if there is one). It includes encoding the dictionary entries and exemplars using LSCon Notation into a lexical resource that could be readily integrated into existing software and interactive applications.
      • It might be useful to populate the SIL DDP topics with LSCon-EN senses....
    • One motivation for producing an LSCon resource during early stages of lexicography is to connect the language to existing NLP software and educational resources. Interactive learning resources and games might be particularly useful for literacy and multilingual education applications.

No comments: