Abstract
In natural language processing, a common task is to compute the probability of a given phrase appearing or to calculate the probability of all phrases matching a given pattern. For instance, one computes affix (prefix, suffix, infix, etc.) probabilities of a string or a set of strings with respect to a probability distribution of patterns. The problem of computing infix probabilities of strings when the pattern distribution is given by a probabilistic context-free grammar or by a probabilistic finite automaton is already solved, yet it was open to compute the infix probabilities in an incremental manner. The incremental computation is crucial when a new query is built from a previous query. We tackle this problem and suggest a method that computes infix probabilities incrementally for probabilistic finite automata by representing all the probabilities of matching strings as a series of transition matrix calculations. We show that the proposed approach is theoretically faster than the previous method and, using real world data, demonstrate that our approach has vastly better performance in practice.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
Editors | Ellen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii |
Publisher | Association for Computational Linguistics |
Pages | 2732-2741 |
Number of pages | 10 |
ISBN (Electronic) | 9781948087841 |
Publication status | Published - 2018 |
Event | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium Duration: 2018 Oct 31 → 2018 Nov 4 |
Publication series
Name | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
---|
Conference
Conference | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
---|---|
Country/Territory | Belgium |
City | Brussels |
Period | 18/10/31 → 18/11/4 |
Bibliographical note
Funding Information:This article is adapted from the author?s PhD dissertation Constructing Global Amman: Petrodollars, Identity, and the Built Environment in the Early Twenty-First Century (University of Illinois at Urbana-Champaign, 2013). The author is grateful to professors John Stallmeyer, D. Fairchild Ruggles, Lynne Dearborn, and Kenneth Cuno for their thoughtful comments on the draft of the dissertation. The author is also thankful to the research participants in Amman for their valuable contributions.
Publisher Copyright:
© 2018 Association for Computational Linguistics
All Science Journal Classification (ASJC) codes
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems