Main Page

From Data Management Lab
Jump to: navigation, search
Data Management Lab is subordinate to Institute of Computer Science and Technology of Peking University. The instructor is Lei Zou. Our lab's research is mainly about Graph-based Data Management.Graph Data Management involves how to store and search semantic data based on RDF(Resource Description Framework), graph stream data analysis, distributed graph database engine development and how to use new hardware technology to improve graph computing and graph query.

News

  • [2017-05-19]Our team gStreamPKU wins the finalist award (top-5) in SIGMOD Programming Contest 2017 among 100+ teams. SIGMOD Programming Contest is held annually for undergraduate and graduate students to solve challenging database problems. The task this year is to search documents and return matchings from a dynamic updated dictionary. Please see http://sigmod17contest.athenarc.gr for details. It is worth mentioning that we are also the finalists in the last year's contest.
  • [2017-04-10]Weiguo Zheng's paper (Efficient SimRank-based Similarity Join) is accepted by ACM Transactions on Database Systems (ACM TODS). Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank” to evaluate the similarity between two vertices in a large graph because of its generality. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose a shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover+, to efficiently compute the SimRank score of any single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (i.e., the h-go cover+ vertex pairs), based on which, the SimRank score of any vertex pair can be computed easily. To find the h-go cover+ vertex pairs, we propose an efficient method without building the vertex-pair graph. Hence, large graphs can be dealt with easily. Extensive experiments over both real and synthetic datasets confirm the efficiency of our solution.
  • [2017-03-28]Peng Peng's paper(Answering top-K query combined keywords and structural queries on RDF graphs) is published on Information System. Although SPARQL has been the predominant query language over RDF (Resource Description Framework) graphs, some query intentions cannot be captured well using only SPARQL syntax. On the other hand, keyword search enjoys widespread usage because of its intuitive way of specifying information needs, but suffers from the problem of low precision. To maximize the advantages of both SPARQL and keyword search, we introduce a novel paradigm that combines them and propose a hybrid query (called a SPARQLKeyword (SK) query) that integrates SPARQL and keyword search. To answer SK queries efficiently, we propose a novel integrated query algorithm based on a structural index. We also present a distance-based optimization technique to further improve the efficiency of SK queries evaluation. We test our method in three large real RDF graphs and the experiments demonstrate both the effectiveness and efficiency of our method.
  • [2017-03-08]Xinbo Zhang's paper(Improving the Precision of RDF Question/Answering Systems— A Why Not Approach) is accepted by WWW 2017 as a poster. In this paper, we propose a self-learning solution based on the users' feedback over the original query graph. Specifically, our method automatically refines the SPARQL query into a new query graph with minimum modifications. The new query will fix the errors and omissions of the query results. Furthermore, each amendment can also be used to improve the precision in answering subsequent natural language questions.
  • [2016-11-17]Shuo Yang's paper(Efficiently Answering Technical Questions — A Knowledge Graph Approach) is accepted by AAAI 2017. In this paper, we propose a novel question understanding method. We construct a hierarchical knowledge graph automatically which contains abundant technical information, to understand a question. Next, an augmented knowledge graph is used for relating the questions to documents via the knowledge graph. Then we use a question-driven method to select candidate solutions. Finally, random walk approach is applied to candidates ranking and an index-based random walk is proposed to support the online search.
  • [2016-10-17]Youhuan Li's paper(Computing Longest Increasing Subsequences over Sequential Data Streams) is accepted by VLDB 2016. In this paper, we propose a data structure, a quadruple neighbor list (QN-list, for short), to support real time queries of all longest increasing subsequence (LIS) and LIS with constraints over sequential data streams. The QN-List built by our algorithm requires O(w) space and O(w) update time, where w is the time window size. The QN-list can efficiently support both LIS enumeration and LIS with constraints computation over real time sequential data streams. Our method utperforms the state-of-the-art methods in both time and space cost, not only theoretically, but also empirically.
  • [2016-10-17]Peng Peng's paper(Processing SPARQL queries over distributed RDF graphs) is accepted by VLDB 2016. We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a “partial evaluation and assembly” framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Based on properties of subgraph matching over a distributed graph, we introduce local partial match as partial answers in each fragment of RDF graph G. For assembly, we propose two methods: centralized and distributed assembly.We analyze our algorithms from both theoretically and experimentally. Extensive experiments over both real and benchmark RDF repositories of billions of triples confirm that our method is superior to the state-of-the-art methods in both the system’s performance and scalability.
  • More...

Notice

  • For recruitment information, please see the navigation link.
  • Our lab is recruiting interns. See Interns Recruitment for more details.

Upcoming Conferences

  • WWW 2017
  • ICDE 2017
  • SIGMOD 2017 Ⅱ