《Text data mining》 offers thorough and detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation, feature selection, to text classification and text clustering. Also it presents predominant applications of text data mining, for example, topic model, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and text automatic summarization, etc.
【作者简介】
Chengqing Zong is professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences. He serves as chairs for many prestigious conferences such as ACL-IJCNLP, IJCAI, IJCAI-ECAI, AAAI and COLING, etc., and served as associate editors for prestigious journals such as TALLIP, Machine Translation, etc. He is the President of Asian Federation on Natural Language Processing and a member of International Committee on Computational Linguistics.
【目录】
1 Introduction 1
1.1 The Basic Concepts 1
1.2 Main Tasks of Text Data Mining 3
1.3 Existing Challenges in Text Data Mining 6
1.4 Overview and Organization of This Book 9
1.5 Further Reading 12
2 Data Annotation and Preprocessing 15
2.1 Data Acquisition 15
2.2 Data Preprocessing 20
2.3 Data Annotation 22
2.4 Basic Tools of NLP 25
2.4.1 Tokenization and POS Tagging 25
2.4.2 Syntactic Parser 27
2.4.3 N-gram Language Model 29
2.5 Further Reading 30
3 Text Representation 33
3.1 Vector Space Model 33
3.1.1 Basic Concepts 33
3.1.2 Vector Space Construction 34
3.1.3 Text Length Normalization 36
3.1.4 Feature Engineering 37
3.1.5 Other Text Representation Methods 39
3.2 Distributed Representation of Words 40
3.2.1 Neural Network Language Model 41
3.2.2 C&W Model 45
3.2.3 CBOW and Skip-Gram Model 47
3.2.4 Noise Contrastive Estimation and Negative Sampling 49
3.2.5 Distributed Representation Based on the Hybrid
Character-Word Method 51
3.3 Distributed Representation of Phrases 53
3.3.1 Distributed Representation Based on the
Bag-of-Words Model 54
3.3.2 Distributed Representation Based on Autoencoder 54
3.4 Distributed Representation of Sentences 58
3.4.1 General Sentence Representation 59
3.4.2 Task-Oriented Sentence Representation 63
3.5 Distributed Representation of Documents 66
3.5.1 General Distributed Representation of Documents 67
3.5.2 Task-Oriented Distributed Representation
of Documents 69
3.6 Further Reading 72
4 Text Representation with Pretraining and Fine-Tuning 75
4.1 ELMo: Embeddings from Language Models 75
4.1.1 Pretraining Bidirectional LSTM Language Models 76
以下为对购买帮助不大的评价