A Mutual Information Maximization Perspective of Language Representation Learning

题目：A Mutual Information Maximization Perspective of Language Representation Learning

报告人：Lingpeng Kong

时间：2019 年 12 月 3 日（周二）14:00 —16:00

地点：北京大学王选计算机研究所大楼106会议室

Abstract: In this talk, we show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspiration from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing).

Biography: Lingpeng Kong is Senior Research Scientist at Google DeepMind. His research addresses the problem of natural language understanding in two aspects. First, he designs better representation learners that exploit linguistic structure in the form of inductive bias. Second, he brings theoretical insights from machine learning to look for better algorithms that specifically utilize the linguistic cues in the data. These methods advance applications for syntactic parsing, speech recognition, social media analysis, machine translation and others. Lingpeng Kong received his Ph.D. from Carnegie Mellon University where he was co-advised by Professor Noah Smith and Professor Chris Dyer.