导语摘要
《词义消歧---算法与应用(英文影印版)》是"计算语言学与语言科技原文丛书"中的一册。对于计算机来说,要理解人类语言就必须消除歧义,在计算语言学领域,词义消歧(Word Sense Disambiguation,简称WSD)一直是研究者探索的内容。本书是近年来国际学术界关于词义消歧研究成果的一部集成之作。几乎覆盖了词义消歧研究各个题目,具有重要学术价值。
作者简介
艾吉瑞,西班牙国立巴斯克大学副教授。
目录
导读
Contributors
Foreword
Preface
1 Introduction
Eneko Agirre and Philip Edmonds
1。1 Word Sense Disambiguation
1。2 A Brief History of WSD Research
1。3 What is a Word Sense?
1。4 Applications of WSD
1。5 Basic Approaches to WSD
1。6 State-of-the-Art Performance
1。7 Promising Directions
1。8 Overview of This Bok
1。9 Further Reading
References
2 Word Senses
Adam Kilgarriff
2。1 Introduction
2。2 Lexicographers
2。3 Philosophy
2。3。1 Meaning is Something You Do
2。3。2 The Fregean Tradition and Reification
2。3。3 Two Incompatible Semantics?
2。3。4 Implications for Word Senses
2。4 Lexicalization
2。5 Corpus Evidence
2。5。1 Lexicon Size
2。5。2 Quotations
2。6 Conclusion
2。7 Further Reading
Acknowledgments
References
3 Making Sense About Sense
Nancy Ide and Yorick Wilks
3。1 Introduction
3。2 WSD and the Lexicographers
3。3 WSD and Sense Inventories
3。4 NLP Applications and WSD
3。5 What Level of Sense Distinctions Do We Need for NLP, If Any?
3。6 What Now for WSD?
3。7 Conclusion
References
4 Evaluation of WSD Systems
Martha Palmer, Hwee Tou Ng and Hoa Trang Dang
4。1 Introduction
4。1。1 Terminology
4。1。2 Overview
4。2 Background
4。2。1 WordNet and Semcor
4。2。2 The Line and Interest Corpora
4。2。3 The DSO Corpus
4。2。4 Open Mind Word Expert
4。3 Evaluation Using Pseudo-Words
4。4 Senseval Evaluation Exercises
4。4。1 Senseval-1
Evaluation and Scoring
4。4。2 Senseval-2
English All-Words Task
English Lexical Sample Task
4。4。3 Comparison of Tagging Exercises
4。5 Sources of Inter-Annotator Disagreement
4。6 Granularity of Sense: Groupings for WordNet
4。6。1 Criteria for WordNet Sense Grouping
4。6。2 Analysis of Sense Grouping
4。7 Senseval-3
4。8 Discussion
References
5 Knowledge-Based Methods for WSD
Rada Mihalcea
5。1 Introduction
5。2 Lesk Algorithm
5。2。1 Variations of the Lesk Algorithm
Simulated Annealing
Simplified Lesk Algorithm
Augmented Semantic Spaces
Summary
5。3 Semantic Similarity
5。3。1 Measures of Semantic Similarity
5。3。2 Using Semantic Similarity Within a Local Context
5。3。3 Using Semantic Similarity Within a Global Context
5。4 Selectional Preferences
5。4。1 Preliminaries: Learning Word-to-Word Relations
5。4。2 Learning Selectional Preferences
5。4。3 Using Selectional Preferences
5。5 Heuristics for Word Sense Disambiguation
5。5。1 Most Frequent Sense
5。5。2 One Sense Per Discourse
5。5。3 One Sense Per Collocation
5。6 Knowledge-Based Methods at Senseval-2
5。7 Conclusions
References
6 Unsupervised Corpus-Based Methods for WSD
Ted Pedersen
6。1 Introduction
6。1。1 Scope
6。1。2 Motivation
Distributional Methods
Translational Equivalence
6。1。3 Approaches
6。2 Type-Based Discrimination
6。2。1 Representation of Context
6。2。2 Algorithms
Latent Semantic Analysis (LSA)
Hyperspace Analogue to Language (HAL)
Clustering By Committee (CBC)
6。2。3 Discussion
6。3 Token-Based Discrimination
6。3。1 Representation of Context
6。3。2 Algorithms
Context Group Discrimination
McQuitty's Similarity Analysis
6。3。3 Discussion
6。4 Translational Equivalence
6。4。1 Representation of Context
6。4。2 Algorithms
6。4。3 Discussion
6。5 Conclusions and the Way Forward
Acknowledgments
References
7 Supervised Corpus-Based Methods for WSD
8 Knowledge Sources for WSD
9 Automatic Acquisition of Lexical Information and Examples
10 Domain-Specific WSD
11 WSD in NLP Applications
内容摘要
《词义消歧---算法与应用(英文影印版)》是"计算语言学与语言科技原文丛书"中的一册。对于计算机来说,要理解人类语言就必须消除歧义,在计算语言学领域,词义消歧(Word Sense Disambiguation,简称WSD)一直是研究者探索的内容。本书是近年来国际学术界关于词义消歧研究成果的一部集成之作。几乎覆盖了词义消歧研究各个题目,具有重要学术价值。
主编推荐
艾吉瑞,西班牙国立巴斯克大学副教授。
精彩内容
Ironically, the very "statistical semantics" that Weaver proposed might have applied in cases such as this: Yarowsky (2000) notes that the trigram in the pen is very strongly indicative of the enclosure sense, since one almost never refers to what is in a writing pen, except for ink.
WSD was resurrected in the 1970s within artificial intelligence (AI) research on full natural language understanding. In this spirit, Wilks (l975) developed "preference semantics", one of the first systems to explicitly account for WSD. The system used selectional restrictions and a frame-based lexical semantics to find a consistent set of word senses for the words in a sentence. The idea of individual "word experts" evolved over this time (Rieger and Small 1979). For example, in Hirst's (1987) system,'a word was gradually disambiguated as information was passed between the various modules (including a lexicon, parscr, and semantic interpreter) in a process he called "Polaroid Words". "Proper" knowledge representation was important in the AI paradigm. Knowledge sources had to be handcrafted, so the ensuing knowledge acquisition bottleneck inevitably led to limited lexical coverage ofnarrow domains and would not scale.
The 1980s were a turning point for WSD. Large-scale lexical resources and corpora became available so handcrafting could be replaced with knowledge extracted automatically from the resources (Wilks et al. 1990). Lesk's (1986) short but extremely seminal paper used the overlap of word sense definitions in the Oxford Advanced Learner 's Dictionary of Current English (OALD) to resolve word senses. Given two (or more) target words in a sentence, the pair of senses whose definitions have the greatest lexical overlap are chosen (see Chap, 5 (Sect. 5.2)). Dictionary-based WSD had begun and the relationship of WSD to lexicography became explicit. For example, Guthrie.et al. (l991) used the subject codes (e.g., Economics, Engineering, etc.) in the Longman Dictionary of Contemporary English (LDOCE) (Procter 1978) on top ofLesk's method. Yarowsky (1992) combined the information in Rogets International Thesaurus with co occurrence data from large corpora in order to learn disambiguation rules for Roget's classes, which could then be applied to words in a manner reminiscent of Masterman (1957) (see Chap. 10 (Sect. 10.2.1)). Although dictionary methods are useful for some cases of word sense ambiguity (such as homographs), they are not robust since dictionaries lack complete coverage ofinformation on sense distinctions.
The 1990s saw three major developments: WordNet became available, the statistical revolution in NLP swept through, and Senseval began.
WordNet (Miller 1990) pushed research forward because it was both computationally accessible ancl luerarchically organized into word senses called synsets Today, English WordNet (together with wordnets for other languages) is the most-used general sense inventory in WSD research.
Statistical and machine learning methods have been successfully applied to the sense classification problem. Today, methods that train on manually sense-tagged corpora (i.e., supervised learning methods) have become the mainstream approach to WSD, with the best results in all tasks of the Senseval competitions. Weaver had recognized the statistical nature of the problem as early as 1949 and early corpus-based work by Weiss (1973), Kelley and Stone (1975), and Black (1988) presaged the statistical revolution by demonstrating the potential of empinical methods to extract disambiguation clues from manually-tagged corpora. Brown et al, (1991) were the first to use corpus-based WSD in statistical MT.
Before Senseval, it was extremely difficult to compare and evaluate different systems because ofdisparities in test words, annotators, sense inventories, and corpora. For instance, Gale et al. (1992:252) noted that "the literature on word sense disambiguation fails to offer a clear model that we might follow in order to quantify the performance of our disambiguation algorithms," and so they introduced lower bounds (choosing the most frequent sense) and upper bounds (the performance ofhuman annotators).
However, these could not be used effectively until sufficiently large test corpora were generated. Senseval was first discussed in 1997 (Resnik and Yarowsky 1999; Kilgarriff and Palmer 2000) and now after hosting three evaluation exercises has grown into the primary forum for researchers to discuss and advance the field, Its main contribution was to establish a framework for WSD evaluation that includes standardized task descriptions and an evaluation methodology. It has also focused research, enabled scientific rigor, produced benchmarks, and generated substantial resources in many languages (e.g*, sense-annotated corpora), thus enabling research in languages other than English.
Recently, at the
以下为对购买帮助不大的评价