Multi-Column-at-a-Time Main-Memory Column-Stores: Algorithms, Systems and Implementation

时间:2017年3月16日 (周四)下午3点— 4点


报告题目: Multi-Column-at-a-Time Main-Memory Column-Stores: Algorithms, Systems and Implementation

摘要: Main memory analytic databases are gaining ground rapidly because of the strong demand of real-time analytics and the increasing capability of housing terabytes of main memory in modern servers. Modern main-memory analytical databases are “column-stores”, with data tables physically stored in memory as sections of columns of data rather than as rows of data. Query processing in main-memory column-stores have been based on the “column-at- a-time” approach, i.e., a query is evaluated as a sequence of primitive operations (e.g., hashing, sorting) on individual attributes/columns, one at a time. With the advent of several key techniques such as SIMD-accelerated data processing, column encoding, and code generation, our preliminary work showed that a main-memory column-store can attain substantial performance improvement if it can support “multi-column-at-a- time”


Multi-column-at-a-time means a column-store processes multiple columns together instead of one by one. It is a novel query processing paradigm that opens up a much finer level of optimization (e.g., bytes from different columns can be processed together). We are now building the community’s first multi-column-at-a-time enabled main-memory column-store. In this talk, I will cover its design, algorithms, and implementation details. We plan to open-source it afterwards.


Eric Lo is an associate professor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK).

He started his PhD study at ETH Zurich (Switzerland) in 2005 and obtained his PhD degree in 2006. Before he returned to Hong Kong, he worked at Google and Microsoft. His recent research focuses on large-scale data processing on modern architectures (e.g., lock-free programming on many-core), distributed Bayesian inference systems for big data, and data science. He has been the program committee members of all major data engineering conferences and will be the program vice chairs of CIKM’18 and ICDE’18. His research works have thrice selected as bests of conferences (VLDB’05, ICDE’12, and DASFAA’14).

版权所有 © 2007 北京大学计算机科学技术研究所