News

 [20170928]Youhuan Li’s paper on longest increasing subsequence (LIS) computation over data stream (Longest Increasing Subsequence Computation over Streaming Sequences) is accepted by TKDE 2017. This is an extension paper over previous work on streaming LIS computation that is published on VLDB 2017. This paper also focuses on the LIS computation over data stream with the proposed quadruple neighbor list (QNlist) data structure. The core idea of the extension part lies in the computation of LIS with all existing constraints. In previous work, we design algorithm for computing LIS with extreme gap/weight, while, in this version, we not only include the computation of LIS with extreme width (width is defined as the positional distance between the head item and the tail item of an LIS), but also consider the range/slopeconstrained LIS computation. The rangeconstrained LIS is the LIS where any two consecutive items satisfy that their difference and distance are within given ranges. The slopeconstrained LIS is the LIS where the slope between any two consecutive items (i.e., the ratio of the difference over the distance) should be not less than a certain parameter m. Computing range/slopeconstrained LIS are quite different that of LIS with extreme gap/weight/width while we innovatively design efficient algorithm to support range/slopeconstrained LIS and our solution is better than comparative work not only theoretically but also experimentally.
 [20170805]Shuo Han’s paper (Keyword Search on RDF Graphs — A Query Graph Assembly Approach) is accepted by CIKM 2017. Keyword search provides ordinary users an easytouse interface for querying RDF data. In this paper, we study how to assemble a query graph that is to represent user’s query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Although we prove theoretically that QGA is a NPcomplete problem,we design some heuristic lower bounds and propose a bipartite graph matchingbased bestfirst search algorithm. The algorithm’s time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs.
 [20170519]Our team gStreamPKU wins the finalist award (top5) in SIGMOD Programming Contest 2017 among 100+ teams. SIGMOD Programming Contest is held annually for undergraduate and graduate students to solve challenging database problems. The task this year is to search documents and return matchings from a dynamic updated dictionary. Please see http://sigmod17contest.athenarc.gr for details. It is worth mentioning that we are also the finalists in the last year's contest.
 [20170410]Weiguo Zheng's paper (Efficient SimRankbased Similarity Join) is accepted by ACM Transactions on Database Systems (ACM TODS). Graphs have been widely used to model complex data in many realworld applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank” to evaluate the similarity between two vertices in a large graph because of its generality. Specifically, we define a SimRankbased join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose a shortestpath distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called hgo cover+, to efficiently compute the SimRank score of any single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (i.e., the hgo cover+ vertex pairs), based on which, the SimRank score of any vertex pair can be computed easily. To find the hgo cover+ vertex pairs, we propose an efficient method without building the vertexpair graph. Hence, large graphs can be dealt with easily. Extensive experiments over both real and synthetic datasets confirm the efficiency of our solution.
 [20170328]Peng Peng's paper(Answering topK query combined keywords and structural queries on RDF graphs) is published on Information System. Although SPARQL has been the predominant query language over RDF (Resource Description Framework) graphs, some query intentions cannot be captured well using only SPARQL syntax. On the other hand, keyword search enjoys widespread usage because of its intuitive way of specifying information needs, but suffers from the problem of low precision. To maximize the advantages of both SPARQL and keyword search, we introduce a novel paradigm that combines them and propose a hybrid query (called a SPARQLKeyword (SK) query) that integrates SPARQL and keyword search. To answer SK queries efficiently, we propose a novel integrated query algorithm based on a structural index. We also present a distancebased optimization technique to further improve the efficiency of SK queries evaluation. We test our method in three large real RDF graphs and the experiments demonstrate both the effectiveness and efficiency of our method.
 [20170308]Xinbo Zhang's paper(Improving the Precision of RDF Question/Answering Systems— A Why Not Approach) is accepted by WWW 2017 as a poster. In this paper, we propose a selflearning solution based on the users' feedback over the original query graph. Specifically, our method automatically refines the SPARQL query into a new query graph with minimum modifications. The new query will fix the errors and omissions of the query results. Furthermore, each amendment can also be used to improve the precision in answering subsequent natural language questions.
 More...



Notice

 For recruitment information, please see the navigation link.
 Our lab is recruiting interns. See Interns Recruitment for more details.

Upcoming Conferences

 WWW 2017
 ICDE 2017
 SIGMOD 2017 Ⅱ

