Efficient text proximity search

Ralf Schenkel, Andreas Broschart, Seungwon Hwang, Martin Theobald, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)


In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 14th International Symposium, SPIRE 2007, Proceedings
PublisherSpringer Verlag
Number of pages13
ISBN (Print)9783540755296
Publication statusPublished - 2007
Event14th International Symposium on String Processing and Information Retrieval, SPIRE 2007 - Santiago, Chile
Duration: 2007 Oct 292007 Oct 31

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4726 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other14th International Symposium on String Processing and Information Retrieval, SPIRE 2007

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Efficient text proximity search'. Together they form a unique fingerprint.

Cite this