TY - GEN
T1 - Efficient text proximity search
AU - Schenkel, Ralf
AU - Broschart, Andreas
AU - Hwang, Seungwon
AU - Theobald, Martin
AU - Weikum, Gerhard
PY - 2007
Y1 - 2007
N2 - In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.
AB - In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.
UR - http://www.scopus.com/inward/record.url?scp=38049093465&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049093465&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-75530-2_26
DO - 10.1007/978-3-540-75530-2_26
M3 - Conference contribution
AN - SCOPUS:38049093465
SN - 9783540755296
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 287
EP - 299
BT - String Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings
PB - Springer Verlag
T2 - 14th International Symposium on String Processing and Information Retrieval, SPIRE 2007
Y2 - 29 October 2007 through 31 October 2007
ER -