Main Page

From Data Management Lab
Jump to: navigation, search
Data Management Lab is subordinate to Institute of Computer Science and Technology of Peking University. The instructor is Lei Zou. Our lab's research is mainly about Graph-based Data Management.Graph Data Management involves how to store and search semantic data based on RDF(Resource Description Framework), graph stream data analysis, distributed graph database engine development and how to use new hardware technology to improve graph computing and graph query.


  • [2018-05-24] Peng Peng's paper (Multi-Query Optimization in Federated RDF Systems) is granted the DASFAA 2018 Best Paper Award. This paper revisits the classical problem of multiple query optimization in federated RDF systems. We propose a heuristic query rewriting-based approach to share the common computation during evaluation of multiple queries while considering the cost of both query evaluation and data shipment. Furthermore, we propose an e�cfficient method to use the interconnection topology between RDF sources to filter out irrelevant sources and share the common computation of intermediate results joining. The experiments over both real and synthetic RDF datasets show that our techniques are efficient.
  • [2018-05-03] Peng Peng's paper (Adaptive Distributed RDF Graph Fragmentation and Allocation based on Query Workload) is accepted by TKDE. As massive volumes of Resource Description Framework (RDF) data are growing, designing a distributed RDF database system to manage them is necessary. In designing this system, it is very common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design comprises two steps: fragmentation and allocation. In this study, we explore the workload for fragmentation and allocation, which aims to reduce the communication cost during SPARQL query processing. Specifically, we adaptively maintain some frequent access patterns (FAPs) to reflect the characteristics of the workload while ensuring the data integrity and approximation ratio. Based on these frequent access patterns, we propose three fragmentation strategies, namely vertical, horizontal and mixed fragmentation, to divide RDF graphs while meeting different types of query processing objectives. After fragmentation, we discuss how to allocate these fragments to various sites while balancing the fragments. Finally, we discuss how to process queries based on the results of fragmentation and allocation. Experiments over large RDF datasets confirm the superior performance of our proposed solutions.
  • [2018-04-03] Xinbo Zhang’s paper (IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems)is accepted by SIGMOD 2018 as a demonstration. we design an Interactive Mechanism aiming for PROmotion Via feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as Q′(D)) to users. Based on user’s feedback over Q(D), IMPROVE-QA automatically refines the original query Q into a new query graph Q′ with minimum modifications, where Q′(D) provides more precise answers. We will also demonstrate how IMPROVE-QA can apply the “lesson” learned from the user in each query to improve the precision of Q/A systems on subsequent natural language questions.
  • [2018-04-03] Shuo Han’s paper (Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions) is accepted by SIGMOD 2018. In this paper, we focus on accelerating a widely employed computing pattern --- set intersection, to boost a group of graph algorithms. Graph’s adjacency-lists can be naturally considered as node sets, thus set intersection is a primitive operation in many graph algorithms. We propose QFilter, a set intersection algorithm which can quickly filter out most of unnecessary comparisons in one byte-checking step by SIMD instructions. We also present a binary representation called BSR that encodes sets in a compact layout. By combining QFilter and BSR, we achieve data parallelism in two levels — inter-chunk and intra-chunk parallelism. Moreover, we find that node ordering impacts the performance of intersection by affecting the compactness of BSR. We formulate the graph reordering problem as an optimization of the compactness of BSR, and prove its strong NP-completeness. Thus we propose an approximate algorithm that can find a better ordering to enhance the intra-chunk parallelism. Extensive experiments on both synthetic and real datasets confirm that our approach can improve the performance of set intersection in graph algorithms significantly.
  • [2018-02-01] The research work “Large Scale Graph Data Management" done by DB Lab has achieved The Second Class Prize in Natural Science Award of Ministry of Education in 2017.
  • More...


  • For recruitment information, please see the navigation link.
  • Our lab is recruiting interns. See Interns Recruitment for more details.

Upcoming Conferences

  • WWW 2017
  • ICDE 2017
  • SIGMOD 2017 Ⅱ