TY - GEN
T1 - A new approach for processing ranked subsequence matching based on ranked union
AU - Han, Wook Shin
AU - Lee, Jinsoo
AU - Moon, Yang Sae
AU - Hwang, Seung Won
AU - Yu, Hwanjo
PY - 2011
Y1 - 2011
N2 - Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.
AB - Ranked subsequence matching finds top-k subsequences most similar to a given query sequence from data sequences. Recently, Han et al. [12] proposed a solution (referred to here as HLMJ) to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this paper, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Specifically, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a cost-aware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM [22], a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.
UR - http://www.scopus.com/inward/record.url?scp=79959918495&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959918495&partnerID=8YFLogxK
U2 - 10.1145/1989323.1989371
DO - 10.1145/1989323.1989371
M3 - Conference contribution
AN - SCOPUS:79959918495
SN - 9781450306614
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 457
EP - 468
BT - Proceedings of SIGMOD 2011 and PODS 2011
PB - Association for Computing Machinery
T2 - 2011 ACM SIGMOD and 30th PODS 2011 Conference
Y2 - 12 June 2011 through 16 June 2011
ER -