Smaller Representation of Compiled Regular Expressions

Sicheol Sung, Sang Ki Ko, Yo Sub Han

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider the problem of running the regex pattern matching in a space-efficient manner. Given a regex, we suggest a bit-packing scheme for representing a compiled regex in a compressed way, which is its position automaton. Our scheme reduces its representation size further by relying on the homogeneous property of position automata and practical features of regexes. We implement the proposed scheme and evaluate the memory consumption using a practical regex benchmark dataset. Our approach produces a much smaller representation compared to two common FA representations. In addition, experimental results show that our bit-packing regex engine is effective for matching regexes that have large compiled forms, by showing less memory consumption compared to the current state-of-the-art regex engine (RE2).

Original languageEnglish
Title of host publicationImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings
EditorsBenedek Nagy, Benedek Nagy
PublisherSpringer Science and Business Media Deutschland GmbH
Pages290-301
Number of pages12
ISBN (Print)9783031402463
DOIs
StatePublished - 2023
EventImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings - Famagusta, Cyprus
Duration: 19 Sep 202322 Sep 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14151 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceImplementation and Application of Automata - 27th International Conference, CIAA 2023, Proceedings
Country/TerritoryCyprus
CityFamagusta
Period19/09/2322/09/23

Keywords

  • Bit-packing scheme
  • Position automata
  • Regex engine
  • Regular expressions

Fingerprint

Dive into the research topics of 'Smaller Representation of Compiled Regular Expressions'. Together they form a unique fingerprint.

Cite this