人类语言的内容和结构的分布式表示曾在上世纪80年代短暂的繁荣,但它很快就消失了,而在过去20年一直占据着继续使用的语言明确表示,尽管在这些范畴的元素运用概率或权重表示。然而,在过去的五年里已经看到了复苏,随着高度成功的采用分布式向量空间表示,经常在“神经元”或“深度学习”模式的背景下。一个伟大的成功已经发给字表示,我也会看一些我们近期工作的其他人更好地理解文字表述,以及他们如何可以被认为是全球矩阵分解的,而且,更类似于传统文学。但是,计算机在职研究生需要的不仅仅是字表示:我们需要明白,是做出来的话较大的语言单位,已经少得多解决的问题。我将讨论使用分布式表示的树结构递归神经网络模型,展示他们如何能够提供语义相似性,情绪,句法解析结构和逻辑蕴涵复杂的语言模型。
克里斯托弗·曼宁是斯坦福大学计算机科学和语言学的教授。他的研究目标是可以智能化的过程,理解和生成人类的语言材料的计算机。曼宁专注于机器学习方法来解决计算语言学的问题,包括句法分析,计算语义和语用学,文字推理,机器翻译和深层次的学习NLP。他是ACM研究员,一AAAI研究员,以及ACL研究员,并与人合着领先统计自然语言处理和信息教科书。
原文:Distributed representations of human language content and structure had a brief boom in the 1980s, but it quickly faded, and the past 20 years have been dominated by continued use of categorical representations of language, despite the use of probabilities or weights over elements of these categorical representations. However, the last five years have seen a resurgence, with highly successful use of distributed vector space representations, often in the context of "neural" or "deep learning" models. One great success has been distributed word representations, and I will look at some of our recent work and that of others on better understanding word representations and how they can be thought of as global matrix factorizations, much more similar to the traditional literature. But we need more than just word representations: We need to understand the larger linguistic units that are made out of words, a problem which has been much less addressed. I will discuss the use of distributed representations in tree-structured recursive neural network models, showing how they can provide sophisticated linguistic models of semantic similarity, sentiment, syntactic parse structure, and logical entailment.