SPLITREGEX: EFFICIENT REGEX SYNTHESIS BY NEURAL EXAMPLE SPLITTING

  • Seongmin Kim
  • , Hyunjoon Cheon
  • , Su Hyeon Kim
  • , Yo Sub Han
  • , Sang Ki Ko

Research output: Contribution to journalArticlepeer-review

Abstract

Due to the practical importance of regular expressions (regexes, for short), much research has been done on automatically generating regexes from positive and negative string examples. We tackle the problem of learning to generate regexes faster and more precisely from positive and negative examples by relying on a novel approach called 'neural example splitting.' Using a neural network trained to group similar substrings from positive strings, our approach splits each positive example string into multiple parts. We propose an effective regex synthesis framework called SPLITREGEX that synthesizes subregexes from 'split' positive substrings and produces the final regex by concatenating the synthesized subregexes. To verify the correctness of generated independent subregexes during the subregex synthesis process, we exploit negative strings by matching against the generated subregexes. As a result, we guarantee that the final regex rejects all negative strings. SPLITREGEX is a divided-and-conquer framework for learning target regexes by the following process: split (=divide) positive strings, infer partial regexes for multiple parts, which is much more accurate than the whole regex inference, and concatenate (=conquer) inferred regexes. We empirically demonstrate that the proposed SPLITREGEX framework improves the previous regex synthesis approaches over two practical datasets and one synthesized dataset.

Original languageEnglish
Pages (from-to)157-179
Number of pages23
JournalJournal of Automata, Languages and Combinatorics
Volume30
Issue number1-3
DOIs
StatePublished - 2025

Keywords

  • divide-and-conquer paradigm
  • neuro-symbolic algorithm
  • program synthesis
  • regex inference
  • regular expression

Fingerprint

Dive into the research topics of 'SPLITREGEX: EFFICIENT REGEX SYNTHESIS BY NEURAL EXAMPLE SPLITTING'. Together they form a unique fingerprint.

Cite this