vue做的网站影响收录么,网站建设拍金手指排名贰贰,新版爱美眉网站源码,小程序代理多少钱自然语言处理基础#xff08;Natural Language Processing Basics, NLP Basics#xff09;
自然语言处理( Natural Language Processing, NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言…自然语言处理基础Natural Language Processing Basics, NLP Basics
自然语言处理( Natural Language Processing, NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此这一领域的研究将涉及自然语言即人们日常使用的语言所以它与语言学的研究有着密切的联系但又有重要的区别。自然语言处理并不是一般地研究自然语言而在于研制能有效地实现自然语言通信的计算机系统特别是其中的软件系统。因而它是计算机科学的一部分。
为什么NLP重要Why is NLP Important?
Turing TestA test of machine ability to exhibit intelligent behavior indistinguishable from a humanLanguage is the communication tool in the test 艾伦图灵的最早版本Imitation Game。 卷福也拍过这部电影Imitation Game为了破解德军的军情信息图灵和一群才华横溢的人研究如何破译密码如果纯粹人工破解则几乎不可能但是他们发明了最早的人工智能机器通过大模型破译出原始信息一开始该模型一直无法收敛直到他们发现传递的信息中总会有“希特勒万岁”这句话之后发现了大模型的初始条件一举攻破该难题。
词的表达Distributed Word Representation
Word Representation
Word representation: a process that transform the symbols to the machine understandable meaningsDefinition of meaning(Webster Dictionary) The thing one intends to convey especially by language The logical extension of a word How to represent the meaning so that the machine can understand?
Goal of Word Representation
Compute word similarity计算词的相似度Infer word relation发现词的语义关系
Synonym and Hypernym
Use a set of related words, such as synonyms and hypernyms to represent a word
用一组相关词同义词/上位词集合来表示它
Problems of Synonym/Hypernym Representation
Missing nuance有一些细微差异无法完成比如proficient和goodMissing new meanings of words同义词/上位词出现新的词义会缺失实际含义比如Applefruit — IT companySubjective主观性问题Data sparsity数据稀疏问题Requires human labor to create and adapt需要大量人工构建和维护这个字典
One-Hot Representation
Regard words as discrete symbols把它看作独立的符号Word ID or one-hot representation可以比较好的完成两个文档之间的相似的计算 Problems of One-Hot Representation
similarity(star, sun) (Vstar, Vsun) 0它的问题是假设词和词之间互相之间都是正交的那么从而导致任意两个词进行相似度的计算都是零All the vectors are orthogonal.No natural notion of similarity for one-hot vectors.
Represent Word by Context
The meaning of a word is given by the words that frequently appear close-by一个词的词义经常跟这个词的上下文有密切关系。Use context words to represent stars比如下图用上下文中的词表示stars这个词。 Co-Occurrence Counts
Count-based distributional representation Term-Term matrix: How often a word occurs with anotherTerm-Document matrix: How often a word occurs in a document 可以通过这个词出现次数得到的稠密向量算出两个词之间的相似度
Problems of Count-Based Representation
Increase in size with vocabularyRequire a lot of storagesparsity issues for those less frequent words
它的问题是当词表变得越来越大的时候存储的需求就会越来越大。 而频次出现较少的词它的上下文或者词的语境就会变得很稀疏。
Word Embedding词嵌入
Distributed Representation分布式的表示 Build a dense vector for each word learned from large-scale text corpora建立起一个低维的一个稠密的向量空间用这个空间里面的某一个位置所对应的那个向量来表示这个词。Learning method: Word2VecWe will learn it in the next class Language Modeling
Language Modeling is the task of predicting the upcoming word 语言模型的能力其实就是根据前面的词预测下面即将要出现的词 Compute conditional probability of an upcoming word Wn A language model is a probability distribution over a sequence of words Compute joint probability of a sequence of wordsP(W) P(w1,w2,…,wn) 它称为合法的一句话的概率也即所有词的序列的联合概率Compute conditional probability of an upcoming words WnP(wn | w1,w2,…,wn-1)根据前面已经说过的词预测下一个词 How to compute the sentence probabilityAssumptionthe probability of an upcoming word is only determined by all its previous words未来的词它只会受到它前面的词的影响 Language Model N-gram Model
Collect statistics about how frequent different n-grams are and use these to predict next word.E.g., 4-gram比如4-gram表达式如下它会统计too late to wj的频次和too late to出现的频次的关系 Problem Need to store count for all possible n-grams. So model size is O(exp(n)) Markov assumption马尔可夫假设 Simplifying Language Model BigramN-GramN2TrigramN-GramN3
Problems of N-gram
Not considering contexts farther than 1 or 2 words很少考虑更多的历史一般就用bigram或trigram没有办法考虑较长的词的上下文。Not capturing the similarity between wordsN-gram它的背后其实是一个典型的one-hot representation它会假设所有的词都是相互独立的它做统计的时候上下文其实都是基于符号来做统计的它是没有办法理解这些词互相之间的相似度。
Neural Language Model
A neural language model is a language model based on neural networks to learn distributed representations of words 分布式的表示建构前文和当前词的预测条件概率 Associate words with distributed vectorsCompute the joint probability of word sequences in terms of the feature vectorsOptimize the word feature vectorsembedding matrix Eand the parameters of the loss functionmap matrix W