学术报告:Effort-Light StructMine: Turning Massive Text Corpora into Structures

时间:2017年6月21日星期三 10:00——12:00

地点:北京大学计算机所106会议室

Title: Effort-Light StructMine: Turning Massive Text Corpora into Structures

Abstract:

The real-world data, though massive, are hard for machines to resolve as they are largely unstructured and in the form of natural-language text. One of the grand challenges is to turn such massive corpora into machine-actionable structures. Yet, most existing systems have heavy reliance on human effort in the process of structuring various corpora, slowing down the development of downstream applications.

In this talk, I will introduce a data-driven framework, Effort-Light StructMine, that extracts structured facts from massive corpora without explicit human labeling effort. In particular, I will discuss how to solve three structure mining tasks under Effort-Light StructMine framework: from identifying typed entities in text, to fine-grained entity typing, to extracting typed relationships between entities. Together, these three solutions form a clear roadmap for turning a massive corpus into a structured network to represent its factual knowledge. Finally, I will share some directions towards mining corpus-specific structured networks for knowledge discovery.

Bio:

Xiang Ren is a Computer Science PhD candidate at University of Illinois at Urbana-Champaign, working with Jiawei Han, and will join Univeristy of Southern California (USC) Computer Science as an assistant professor in 2018. Xiang's research develops machine learning and data-driven methods for turning unstructured text data into machine-actionable structures. More broadly, his research interests span data mining, machine learning, and natural language processing, with a focus on making sense of massive text data and graph data. His results were covered in several top conference tutroails and keynote (SIGKDD, WWW, SIGMOD, ACL). Xiang's research has been recognized with several prestigious awards including a Google PhD Fellowship, Yahoo!-DAIS Research Excellence Award, Yelp Dataset Challenge Award, C. W. Gear Outstanding Graduate Student Award, and David J. Kuck Outstanding M.S. Thesis Award. Technologies he developed has been transferred to US Army Research Lab, National Institute of Health, Microsoft, Yelp and TripAdvisor.

[关闭]
版权所有 © 2007 北京大学计算机科学技术研究所