Wednesday, January 21, 2009

For annotations to address NLU, need basic research on language-world mapping

Mark-up Barking Up the Wrong Tree
Annie Zaenen

In her Last Words column in CoLi, Zaenen describes the limitations of annotation applied to NLU, and calls for fundamental research "to understand the mapping between language and the world itself better"

"The interest in machine-learning methods to solve natural-language-understanding problems has led to the use of textual annotation as an important auxiliary technique. Grammar induction based on annotation has been very successful for the Penn Treebank" "The problems with the ‘coreference’ annotation tasks of MUC and the like are well documented and not solved. Kibble and van Deemter (2000), for instance, discuss the difficulties created by the assumption that coreference is an equivalence relation, and hence transitive"

2 difficulties: "The first is inherent in the kind of annotations that are currently needed. The field is moving from information retrieval to language understanding tasks. To understand a linguistic utterance is to map from it to a state of the world, a non-linguistic reality. Language understanding always has a non-linguistic component. In computational settings, unfortunately, most of the time we do not have independent access to this non-linguistic component. This means that language understanding systems have to be more than just language understanding systems: One expects them also to take care of some minimal understanding of the world the language is supposed to describe. But relations between linguistic entities and the world have not been studied in linguistics nor anywhere else in the systematic way that is required to develop reliable annotation schemas: Traditional formal semantics says how meanings are put together and remains silent about semantic primitives. Lexical semantics is very fragmented: One part of it tends to limit its scope to lexical items that exhibit syntactic alternations, whereas another part concentrates on improving traditional lexicography." "Annotation tasks typically involve these ill-understood phenomena. Current practice seems to assume that theoretical understanding can be circumvented and that the pristine intuitions of nearly na¨ıve native speakers can replace analysis of what is going on in these complicated cases. The results that I have looked at suggest that this is wishful thinking."

"The second problem that annotation tasks face is not inherent. It is created by the current fundingmodel. In the name of accountability, current NLP practice is wedded to quantifiable results, short time horizons, and strict financial control. In this setup there is no time for fundamental research. So, when research is necessary, it has to be called by another name. Part of it will be shuffled under the heading ‘annotation’."

"We should recognize that annotations are no substitute for the understanding of a phenomenon. They are an encoding of that understanding. The encoding is different from a rule-based encodingin that it does not require a generative formalization and it allows a more piecemeal approach." 

No comments: