\documentclass[oneside]{book} \usepackage[a4paper,margin=2cm]{geometry} \newcommand*{\myversion}{2025B} \newcommand*{\mydate}{Version \myversion\ (\the\year-\mylpad\month-\mylpad\day)} \newcommand*{\mylpad}[1]{\ifnum#1<10 0\the#1\else\the#1\fi} \setlength{\parindent}{0pt} \setlength{\parskip}{4pt plus 1pt minus 1pt} \usepackage{hyperref} \hypersetup{ colorlinks=true, urlcolor=blue3, linkcolor=green3, } \usepackage{tabularray} \NewTblrEnviron{spectblr} \SetTblrOuter[spectblr]{long} \SetTblrInner[spectblr]{ hlines = {gray3}, column{Z} = {co=1}, colsep = 5pt, row{2-Z} = {brown8}, row{1} = {fg=white, bg=purple2, font=\bfseries\sffamily}, rowhead = 1, } \usepackage{codehigh} \colorlet{highback}{azure9} \CodeHigh{language=latex/latex2,style/main=gray9,style/code=gray9} \NewDocumentCommand\PP{m}{\texttt{\fakeverb{#1}}} % package \NewDocumentCommand\CC{m}{\texttt{\fakeverb{#1}}} % command \NewDocumentCommand\VV{m}{\texttt{\fakeverb{#1}}} % variable \NewDocumentCommand\VT{m}{\texttt{\fakeverb{#1}}} % variable type \NewDocumentCommand\TT{m}{\texttt{\fakeverb{#1}}} % text \usepackage{pegmatch} \begin{document} \title{\textsf{\color{green3}PEGMATCH: Parsing Expression Grammars for TeX}} \author{Jianrui Lyu (tolvjr@163.com)\\ \url{https://github.com/lvjr/pegmatch}} \date{\mydate} \maketitle \tableofcontents \chapter{Package Interfaces} \section{Introduction} The \PP{pegmatch} package ports PEG (Parsing Expression Grammars)% \footnote{See Parsing Expression Grammars page: \url{https://bford.info/packrat/}.} to TeX. Following the design in LPEG (Parsing Expression Grammars for Lua),% \footnote{See Parsing Expression Grammars for Lua page: \url{https://www.inf.puc-rio.br/~roberto/lpeg/}.} it defines patterns as LaTeX3 variables, and offers several operators to compose patterns. In general, PEG matching is much more powerful than RE (Regular Expressions) matching. At this time, \PP{pegmatch} package only supports TeX strings.% \footnote{I started to write it for my \PP{codehigh} package to get rid of \PP{l3regex} dependency.} Also it is still in experimental status, hence some interfaces may change in future releases. \section{The first example} The following is the first example: \begin{demohigh} \NewSpeg\lMyTestSpeg \SetSpeg\lMyTestSpeg{\SpegP{abc}} \IfSpegMatchTF\lMyTestSpeg{a}{T}{F} \IfSpegMatchTF\lMyTestSpeg{ab}{T}{F} \IfSpegMatchTF\lMyTestSpeg{abc}{T}{F} \IfSpegMatchTF\lMyTestSpeg{abcd}{T}{F} \end{demohigh} In this example, we use \CC{\NewSpeg} to create a new \VT{speg} variable \VV{\lMyTestSpeg}, and use \CC{\SetSpeg} to set the variable with a pattern expression, then use \CC{\IfSpegMatchTF} to match it against different subject strings. The pattern \TT{\SpegP{abc}} matches the string \TT{abc} literally. \section{Basic commands} This package provides the following commands for creating and matching \VT{speg} patterns:\nopagebreak \begin{spectblr}[ caption = Basic commands ]{} Command & Description \\ \CC{\NewSpeg#1} & create \VT{speg} variable \TT{#1}\\ \CC{\SetSpeg#1{#2}} & set \VT{speg} variable \TT{#1} with \VT{speg} expresssion \VT{#2}\\ \CC{\IfSpegMatchT#1{#2}{#3}} & match \TT{#1} against string \TT{#2}, then run code \TT{#3} if the match succeeds\\ \CC{\IfSpegMatchF#1{#2}{#3}} & match \TT{#1} against string \TT{#2}, then run code \TT{#3} if the match fails\\ \CC{\IfSpegMatchTF#1{#2}{#3}{#4}} & match \TT{#1} against string \TT{#2}, then run code \TT{#3} if the match succeeds, othewise run code \TT{#4}\\ \CC{\IfSpegExtractT#1{#2}#3{#4}} & match \TT{#1} against string \TT{#2}, then store captures in \TT{#3} and run code \TT{#4} if the match succeeds\\ \CC{\IfSpegExtractF#1{#2}#3{#4}} & match \TT{#1} against string \TT{#2}, then clear \TT{#3} and run code \TT{#4} if the match fails\\ \CC{\IfSpegExtractTF#1{#2}#3{#4}{#5}} & match \TT{#1} against string \TT{#2}, then store captures in \TT{#3} and run code \TT{#4} if the match succeeds, othewise clear \TT{#3} and run code \TT{#5} \end{spectblr} \section{Scratch variables} There are two predefined scratch \VT{speg} variables for setting \VT{speg} patterns: \VV{\lTmpaSpeg} and \VV{\lTmpbSpeg}. Also there are two predefined scratch \VT{seq} variables for storing captures (see Section~\ref{sect:capture}): \VV{\lSpegTmpaSeq} and \VV{\lSpegTmpbSeq}.% \section{Primitive patterns} This package provides the following commands for making primitive patterns:%\nopagebreak \begin{spectblr}[ caption = Primitive patterns ]{} Pattern & Description \\ \TT{\SpegP{}} & match \TT{} literally \\ \TT{\SpegQ{}} & match exactly \TT{} characters \\ \TT{\SpegR{}} & match any character between \TT{} and \TT{} or between \TT{} and \TT{} \\ \TT{\SpegS{}} & match any character in \TT{} \end{spectblr} The following examples demonstrate pattern matching with other primitive patterns: \begin{demohigh} \SetSpeg\lMyTestSpeg{\SpegQ{2}} \IfSpegMatchTF\lMyTestSpeg{u}{T}{F} \IfSpegMatchTF\lMyTestSpeg{vw}{T}{F} \IfSpegMatchTF\lMyTestSpeg{xyz}{T}{F} \end{demohigh} \begin{demohigh} \SetSpeg\lMyTestSpeg{\SpegR{AZ}} \IfSpegMatchTF\lMyTestSpeg{Qq}{T}{F} \IfSpegMatchTF\lMyTestSpeg{q1}{T}{F} \IfSpegMatchTF\lMyTestSpeg{1Q}{T}{F} \SetSpeg\lMyTestSpeg{\SpegR{AZaz}} \IfSpegMatchTF\lMyTestSpeg{Qq}{T}{F} \IfSpegMatchTF\lMyTestSpeg{q1}{T}{F} \IfSpegMatchTF\lMyTestSpeg{1Q}{T}{F} \end{demohigh} \begin{demohigh} \SetSpeg\lMyTestSpeg{\SpegS{world}} \IfSpegMatchTF\lMyTestSpeg{one}{T}{F} \IfSpegMatchTF\lMyTestSpeg{two}{T}{F} \end{demohigh} By default, PEG always starts at the first character. Since both \CC{\SpegR} and \CC{\SpegS} match only one letter, both last commands in previous two examples give \TT{F}. \section{Pattern operators} This package provides the following pattern operators for composing patterns: \begin{spectblr}[ caption = Pattern operators ]{} Operator & Precedence & Description \\ \TT{patt1/patt2} & 1 (choice) & match \TT{patt1} or \TT{patt2} (ordered choice) \\ \TT{patt1*patt2} & 2 (concat) & match \TT{patt1} followed by \TT{patt2} \\ \TT{!patt} & 3 (not predicate) & match only if \TT{patt} does not match, and consume no input \\ \TT{&patt} & 3 (and predicate) & match \TT{patt} but consume no input \\ \TT{patt^{}} & 4 (repeat) & match at least \TT{} ($n\ge0)$ repetitions of \TT{patt} \\ \TT{patt^{-}} & 4 (repeat) & match at most \TT{} ($n>0$) repetitions of \TT{patt} \\ \TT{{patt expr}} & 5 (group) & match \TT{patt expr} (pattern expression) \end{spectblr} With \TT{!} and \TT{*} operators, we can create negative character sets: \begin{demohigh} \SetSpeg\lMyTestSpeg{!\SpegR{09} * \SpegQ{1}} \IfSpegMatchTF\lMyTestSpeg{A}{T}{F} \IfSpegMatchTF\lMyTestSpeg{5}{T}{F} \SetSpeg\lMyTestSpeg{!\SpegS{abc} * \SpegQ{1}} \IfSpegMatchTF\lMyTestSpeg{B}{T}{F} \IfSpegMatchTF\lMyTestSpeg{b}{T}{F} \end{demohigh} With \TT{^} operator, we can match words: \begin{demohigh} \SetSpeg\lMyTestSpeg{\SpegR{AZaz} ^ {1}} \IfSpegMatchTF\lMyTestSpeg{HELLO}{T}{F} \IfSpegMatchTF\lMyTestSpeg{world}{T}{F} \IfSpegMatchTF\lMyTestSpeg{ text }{T}{F} \IfSpegMatchTF\lMyTestSpeg{(text)}{T}{F} \end{demohigh} In fact, \TT{patt^{-1}} is similar to \TT{expr?}, \TT{patt^0} is similar to \TT{expr*}, and \TT{patt^1} is similar to \TT{expr+} in regular expression matching. \section{Pattern variables} In using \CC{\SetSpeg} command to set a \VT{speg} variable with a pattern expression, you can use other \VT{speg} variables. For example: \begin{demohigh} \SetSpeg\lTmpaSpeg{\SpegR{AZ} / \SpegR{az}} \SetSpeg\lTmpbSpeg{\SpegS{135} * \lTmpaSpeg} \IfSpegMatchTF\lTmpbSpeg{2ab}{[T]}{[F]} \IfSpegMatchTF\lTmpbSpeg{3ab}{[T]}{[F]} \SetSpeg\lTmpbSpeg{\SpegS{135} * \lTmpaSpeg^{3}} \IfSpegMatchTF\lTmpbSpeg{3ab}{[T]}{[F]} \IfSpegMatchTF\lTmpbSpeg{3abcd}{[T]}{[F]} \end{demohigh} By using another recursive pattern, we can make \PP{speg} find a pattern anywhere in a string. The following example demonstrates how to match a word with at least three letters inside a string:\nopagebreak \begin{demohigh} \NewSpeg\lMyWordSpeg \NewSpeg\lMyAnywhereSpeg \SetSpeg\lMyWordSpeg{\SpegR{AZaz}^{3}} \SetSpeg\lMyAnywhereSpeg{\lMyWordSpeg / \SpegQ{1} * \lMyAnywhereSpeg} \IfSpegMatchTF\lMyAnywhereSpeg{foo bar}{[T]}{[F]} \IfSpegMatchTF\lMyAnywhereSpeg{fo bar}{[T]}{[F]} \IfSpegMatchTF\lMyAnywhereSpeg{123 ba}{[T]}{[F]} \IfSpegMatchTF\lMyAnywhereSpeg{123 bar}{[T]}{[F]} \end{demohigh} In this example, \VV{\lMyAnywhereSpeg} tries to match \VV{\lMyWordSpeg}, skipping one letter and tries again if it fails. \section{Capture patterns}% \label{sect:capture} This package provides the following commands for making capture patterns: \begin{spectblr}[ caption = Primitive patterns ]{} Pattern & Name & Description \\ \CC{\SpegC{}} & simple capture & capture the match for \TT{}\\ \CC{\SpegCp} & position capture & capture current position \end{spectblr} Position capture \CC{\SpegCp} must be concatenated with other patterns (by using \TT{*} operator): \begin{demohigh} \SetSpeg\lTmpaSpeg{\SpegCp * \SpegR{az}^{1} * \SpegCp * \SpegR{09}^{1} * \SpegCp} \IfSpegExtractTF\lTmpaSpeg{12ab}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \IfSpegExtractTF\lTmpaSpeg{ab12}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \IfSpegExtractTF\lTmpaSpeg{abcd12345}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \end{demohigh} In this example, we use \CC{\IfSpegExtractTF} command to extract all captures, which are stored in the \VT{seq} variable (\CC{\lSpegTmpaSeq}) specified by the third argument. Then we use \CC{\MapSpegSeqInline} command to print each capture. If you want to capture the substrings, you can modified the above example as follows:\nopagebreak \begin{demohigh} \SetSpeg\lTmpaSpeg{\SpegC{\SpegR{az}^{1}} * \SpegC{\SpegR{09}^{1}}} \IfSpegExtractTF\lTmpaSpeg{12ab}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \IfSpegExtractTF\lTmpaSpeg{ab12}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \IfSpegExtractTF\lTmpaSpeg{abcd12345}\lSpegTmpaSeq{% \MapSpegSeqInline\lSpegTmpaSeq{[#1]}% }{Failed} \end{demohigh} \chapter{The Source Code} \dochighinput[language=latex/latex3]{pegmatch.sty} \end{document}