Smaller Representation of Compiled Regular Expressions

Sicheol Sung, Sang Ki Ko, Yo Sub Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of running the regex pattern matching in a space-efficient manner. Given a regex, we suggest a bit-packing scheme for representing a compiled regex in a compressed way, which is its position automaton. Our scheme reduces its representation size further by relying on the homogeneous property of position automata and practical features of regexes. We implement the proposed scheme and evaluate the memory consumption using a practical regex benchmark dataset. Our approach produces a much smaller representation compared to two common FA representations. In addition, experimental results show that our bit-packing regex engine is effective for matching regexes that have large compiled forms, by showing less memory consumption compared to the current state-of-the-art regex engine (RE2).

Original languageEnglish
Title of host publicationImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings
EditorsBenedek Nagy, Benedek Nagy
PublisherSpringer Science and Business Media Deutschland GmbH
Pages290-301
Number of pages12
ISBN (Print)9783031402463
DOIs
Publication statusPublished - 2023
EventImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings - Famagusta, Cyprus
Duration: 2023 Sept 192023 Sept 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14151 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings
Country/TerritoryCyprus
CityFamagusta
Period23/9/1923/9/22

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Smaller Representation of Compiled Regular Expressions'. Together they form a unique fingerprint.

Cite this