TY - GEN
T1 - Non-blocking parallel subset construction on shared-memory multicore architectures
AU - Choi, Hyewon
AU - Burgstaller, Bernd
PY - 2013
Y1 - 2013
N2 - We discuss ways to effectively parallelize the sub- set construction algorithm, which is used to con- vert non-deterministic finite automata (NFAs) to deterministic finite automata (DFAs). This con- version is at the heart of string pattern match- ing based on regular expressions and thus has many applications in text processing, compilers, scripting languages and web browsers, security and more recently also with DNA sequence analysis. We discuss sources of parallelism in the sequen- tial algorithm and their profitability on shared- memory multicore architectures. Our NFA and DFA data-structures are designed to improve scal- ability and keep communication and synchroniza- tion overhead to a minimum. We present three dif- ferent ways for synchronization; the performance of our non-blocking synchronization based on a compare-and-swap (CAS) primitive compares fa- vorably to a lock-based approach. We consider structural NFA properties and their relationship to scalability on highly-parallel multicore architec- tures. We demonstrate the efficiency of our paral- lel subset construction algorithm through several benchmarks run on a 4-CPU (40 cores) node of the Intel Manycore Testing Lab. Achieved speedups are up to a factor of 32x with 40 cores.
AB - We discuss ways to effectively parallelize the sub- set construction algorithm, which is used to con- vert non-deterministic finite automata (NFAs) to deterministic finite automata (DFAs). This con- version is at the heart of string pattern match- ing based on regular expressions and thus has many applications in text processing, compilers, scripting languages and web browsers, security and more recently also with DNA sequence analysis. We discuss sources of parallelism in the sequen- tial algorithm and their profitability on shared- memory multicore architectures. Our NFA and DFA data-structures are designed to improve scal- ability and keep communication and synchroniza- tion overhead to a minimum. We present three dif- ferent ways for synchronization; the performance of our non-blocking synchronization based on a compare-and-swap (CAS) primitive compares fa- vorably to a lock-based approach. We consider structural NFA properties and their relationship to scalability on highly-parallel multicore architec- tures. We demonstrate the efficiency of our paral- lel subset construction algorithm through several benchmarks run on a 4-CPU (40 cores) node of the Intel Manycore Testing Lab. Achieved speedups are up to a factor of 32x with 40 cores.
UR - http://www.scopus.com/inward/record.url?scp=84906993782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906993782&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84906993782
SN - 9781921770258
T3 - Conferences in Research and Practice in Information Technology Series
SP - 13
EP - 20
BT - Parallel and Distributed Computing 2013 - Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing, AusPDC 2013
PB - Australian Computer Society
T2 - 11th Australasian Symposium on Parallel and Distributed Computing, AusPDC 2013
Y2 - 29 January 2013 through 1 February 2013
ER -