FunctionalParsers Raku package
Introduction
This Raku package provides a (monadic) system of Functional Parsers (FPs).
The package design and implementation follow closely the article "Functional parsers" by Jeroen Fokker, [JF1].
That article can be used as both a theoretical- and a practical guide to FPs.
Two in one
The package provides both FPs and
Extended Backus-Naur Form (EBNF)
parsers and interpreters.
The reasons for including the EBNF functionalities are:
- EBNF parsing is discussed in [JF1]
- EBNF parsers and interpreters are very good examples of FPs application
Previous work
Anton Antonov
- FPs packages implementations in Lua, Mathematica, and R.
See these blog posts and [AAp1, AAp2].
Remark: In this document Mathematica and Wolfram Language (WL) are used as synonyms.
Jeroen Fokker
- "Functional parsers" article using Haskell, [JF1].
Wim Vanderbauwhede
Installation
From Zef ecosystem:
zef install FunctionalParsers;
From GitHub:
zef install https://github.com/antononcube/Raku-FunctionalParsers.git
Motivation
Here is a list of motivations for implementing this package:
- Word-based backtracking
- Elevate the "tyranny" of Raku grammars
- Easier transfer of Raku grammars into other languages
- Monadic parser construction
- Quick, elegant implementation
Word-based backtracking
I had certain assumptions about certain slow parsing with Raku using regexes.
For example, is not that easy to specify backtracking over sequences of words (instead of characters) in grammars.
To check my assumptions I wrote the basic eight FPs (which is quick to do.) After my experiments,
I could not help myself making a complete package.
Elevate the "tyranny" of Raku grammars and transferring to other languages
The "first class citizen" treatment of grammars is one of the most important and unique features of Raku.
It is one of the reason why I treat Raku as a "secret weapon."
But that uniqueness does not necessarily facilitate easy utilization or transfer in large software systems.
FPs, on the other hand, are implemented in almost every programming language.
Hence, making or translating grammars with- or to FPs would provide greater knowledge transfer and integration
of Raku-derived solutions.
Monadic parser construction
Having a monadic way of building parsers or grammars is very appealing. (To some people.)
Raku's operator making abilities can be nicely utilized.
Remark: The monad of FPs produces Abstract Syntax Trees (ASTs) that are simple lists.
I prefer that instead of using specially defined types (as, say, in [WV1, WVp1].) That probably,
comes from too much usage of LISP-family programming languages. (Like Mathematica and R.)
Quick, elegant implementation
The Raku code implementing FPs was quick to write and looks concise and elegant.
(To me at least. I would not be surprised if that code can be simplified further.)
Naming considerations
Package name
I considered names like "Parser::Combinator", "Parser::Functional", etc. Of course, looked up
names of similar packages.
Ultimately, I decided to use "FunctionalParsers" because:
- Descriptive name that corresponds to the title of the article by Jeroen Fokker, [JF1].
- The package has not only parser combinators, but also parser transformers and modifiers.
- The connections with corresponding packages in other languages are going to be more obvious.
- For example, I have used the name "FunctionalParsers" for similar packages in other programming languages (Lua, R, WL.)
Actions vs Contexts
I considered to name the directory with EBNF interpreters "Context" or "Contexts", but
since "Actions" is used a lot I chose that name.
Remark: In [JF1] the term "contexts" is used.
Examples
Make a parser for a family of (two) simple sentences:
use FunctionalParsers :ALL;
my &p1 = (symbol('numerical') «|» symbol('symbolic')) «&» symbol('integration');
# -> @x { #`(Block|6352212853176) ... }
Here we parse sentences adhering to the grammar of the defined parser:
.say for ("numerical integration", "symbolic integration")>>.words.map({ $_ => &p1($_)});
# (numerical integration) => ((() (numerical integration)))
# (symbolic integration) => ((() (symbolic integration)))
These sentences are not parsed:
("numeric integration", "symbolic summation")>>.words.map({ $_ => &p1($_)});
# ((numeric integration) => () (symbolic summation) => ())
Infix operators
Several notation alternatives are considered for the infix operations corresponding to
the different combinators and transformers. Here is a table with different notation styles:
Description | set | double | n-ary |
---|
sequential combination | (&) | «&» | ⨂ |
left sequential pick | (<&) | «& | ◁ |
right sequential pick | (&>) | &» | ▷ |
alternatives combination | (⎸) | «⎸» | ⨁ |
function application | (^) | «o | ⨀ |
Consider the parsers:
my &p1 = apply( {1}, symbol('one'));
my &p2 = apply( {2}, symbol('two'));
my &p3 = apply( {3}, symbol('three'));
my &p4 = apply( {4}, symbol('four'));
my &pM = symbol('million');
my &pTh = symbol('things');
# -> @x { #`(Block|6352245759584) ... }
Here are spec examples for each style of infix operators:
# set
my &p = (&p1 (|) &p2 (|) &p3 (|) &p4) (&) (&pM (^) {10**6}) (&) &pTh;
&p('three million things'.words.List).head.tail;
# (3 (1000000 things))
# double
(&p1 «|» &p2 «|» &p3 «|» &p4) «&» &pM «o {10**6} «&» &pTh;
# n-ary
(&p1 ⨁ &p2 ⨁ &p3 ⨁ &p4) ⨂ {10**6} ⨀ &pM ⨂ &pTh
Remark: The arguments of the apply operator ⨀
are "reversed" when compared to the arguments of the operators (^)
and «o
.
For ⨀
the function to be applied is the first argument.
Parser generation
Here is an EBNF grammar:
my $ebnfCode = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<top> = <integer> ;
END
# <digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
# <integer> = <digit> , { <digit> } ;
# <top> = <integer> ;
Here generation is the corresponding functional parsers code:
use FunctionalParsers::EBNF;
.say for fp-ebnf-parse($ebnfCode, actions => 'Raku::Code').head.tail;
# my &pDIGIT = alternatives(symbol('0'), symbol('1'), symbol('2'), symbol('3'), symbol('4'), symbol('5'), symbol('6'), symbol('7'), symbol('8'), symbol('9'));
# my &pINTEGER = sequence(&pDIGIT, many(&pDIGIT));
# my &pTOP = &pINTEGER;
For more detailed examples see "Parser-code-generation.md".
Random sentence generation
Here is an EBNF grammar:
my $ebnfCode2 = q:to/END/;
<top> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = 'love' | 'hate' | { '♥️' } | '🤮';
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
END
# <top> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = 'love' | 'hate' | { '♥️' } | '🤮';
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
Here is generation of random sentences with the grammar above:
.say for fp-random-sentence($ebnfCode2, 12);
# We love R
# We hate WL
# I love Perl
# I hate Julia
# We 🤮 R
# We R
# We Julia
# I hate R
# We ♥️ WL
# I ♥️ ♥️ WL
# We 🤮 WL
# We love Julia
Generating Mermaid diagrams for EBNFs
The function fp-ebnf-parse
can produce
Mermaid-JS diagrams
corresponding to grammars with the target "MermaidJS::Graph".
Here is an example:
my $ebnfCode3 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ];
<b> = 'b' , ( 'B' | '2' );
END
fp-ebnf-parse($ebnfCode3, target=>"MermaidJS::Graph", dir-spec => 'LR').head.tail
graph LR
alt1((or))
NT:top["top"]
seq12((and))
T:A("A")
NT:b["b"]
T:B("B")
T:2("2")
T:a("a")
NT:a["a"]
T:b("b")
rep7((*))
alt14((or))
opt9((?))
T:1("1")
seq5((and))
alt1 --> NT:a
alt1 --> NT:b
NT:top --> alt1
rep7 --> T:A
opt9 --> T:1
seq5 --> |1|T:a
seq5 --> |2|rep7
seq5 --> |3|opt9
NT:a --> seq5
alt14 --> T:B
alt14 --> T:2
seq12 --> |1|T:b
seq12 --> |2|alt14
NT:b --> seq12
Here is a legend:
- The non-terminals are shown with rectangles
- The terminals are shown with round rectangles
- The "conjunctions" are shown in disks
Remark: The Markdown cell above has the parameters result=asis, output-lang=mermaid, output-prompt=NONE
which allow for direct diagram rendering of the obtained Mermaid code in various Markdown viewers (GitHub, IntelliJ, etc.)
Compare the following EBNF grammar and corresponding diagram with the ones above:
my $ebnfCode4 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ] ;
<b> = 'b' , 'B' | '2' ;
END
fp-grammar-graph($ebnfCode4, dir-spec => 'LR')
graph LR
T:A("A")
rep7((*))
NT:a["a"]
T:B("B")
T:2("2")
seq5((and))
alt12((or))
NT:top["top"]
NT:b["b"]
T:a("a")
seq13((and))
alt1((or))
opt9((?))
T:b("b")
T:1("1")
alt1 --> NT:a
alt1 --> NT:b
NT:top --> alt1
rep7 --> T:A
opt9 --> T:1
seq5 --> |1|T:a
seq5 --> |2|rep7
seq5 --> |3|opt9
NT:a --> seq5
seq13 --> |1|T:b
seq13 --> |2|T:B
alt12 --> seq13
alt12 --> T:2
NT:b --> alt12
CLI
The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:
fp-ebnf-parse --help
# Usage:
# fp-ebnf-parse <ebnf> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar.
# fp-ebnf-parse <file> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar file.
#
# <ebnf> EBNF text.
# -t|--actions=<Str> Actions ('t' for 'target'.) [default: 'Raku::Class']
# -n|--parser-name=<Str> Parser name. [default: 'MyParser']
# -p|--rule-name-prefix=<Str> Rule names prefix. [default: 'p']
# -m|--rule-name-modifier=<Str> Rule names modifier. [default: 'WhateverCode']
# -s|--style=<Str> EBNF style, one of 'G4', 'Inverted', 'Standard', 'Relaxed', or 'Whatever'. [default: 'Whatever']
# <file> EBNF file name.
If mermaid-cli
is installed here is an example UNIX shell pipeline with it:
fp-ebnf-parse ./resources/Arithmetic.ebnf -s=relaxed -t=mermaid > diag.md && mmdc -i diag.md -o diag.png -w 1200 && open diag.png
Implementation considerations
Infix operators
The infix operators have to be reviewed and probably better sets of symbols would be chosen.
The challenge is to select operators that are "respected" by the typical Raku IDEs.
(I only experimented with Emacs and Comma IDE.)
EBNF parser
All EBNF parser functions in FunctionalParsers::EBNF
have apply-transformers that use the attributes of
a dedicated object:
unit module FunctionalParsers::EBNF;
...
our $ebnfActions = FunctionalParsers::EBNF::Actions::Raku::AST.new;
....
By assigning instances of different classes to $ebnfActions
we get different parsing interpretations.
Not having abstract class
Looking at the Raku EBNF interpreter classes it can be easily seen that each can inherit from a common abstract class.
But since the EBNF parsing methods (or attributes that callables) are approximately a dozen one-liners,
it seems more convenient to have all class method- and attribute definitions on “one screen.”
Flowchart
graph TD
FPs[[EBNF<br/>Functional Parsers]]
RakuAST[Raku::AST]
RakuClass[Raku::Class]
RakuCode[Raku::Code]
RakuGrammar[Raku::Grammar]
WLCode[WL::Code]
WLGrammar[WL::Grammar]
JavaFuncJ[Java::FuncJ]
JavaANTLR[Java::ANTLR]
Input[/- EBNF code<br/>- Properties/]
PickTarget[Assign context]
Parse[Parse]
QEVAL{Evaluate?}
EVAL[[EVAL]]
Code>Code]
Context[Context object]
Result{{Result}}
Input --> PickTarget
PickTarget -.- WL
PickTarget -.- Java
PickTarget -..- Raku
PickTarget -.-> Context
PickTarget --> Parse
Parse -.- FPs
Context -.-> FPs
Parse --> QEVAL
Parse -.-> Code
QEVAL --> |yes|EVAL
Code -.-> EVAL
EVAL ---> Result
QEVAL ---> |no|Result
subgraph Raku
RakuAST
RakuClass
RakuCode
RakuGrammar
end
subgraph WL
WLCode
WLGrammar
end
subgraph Java
JavaFuncJ
JavaANTLR
end
TODO
- TODO Parsing EBNF refactoring & additional features
- DONE Parse any combination of sequence operators
- Initially, only these were parsed:
'a' <& 'b' <& 'c' | 'a' &> 'd';
'a' , 'b' , 'c' | 'a' &> 'd';
- These are also parsed:
'a' , 'b' &> 'c'
'a' <& 'b' &> 'c'
- DONE Class-based parsers
- DONE From characters
- DONE From tokens
- DONE Themed parsers
- DONE Inheritance based implementation
- DONE "Simpler"
- DONE ANTLR / G4
- DONE Whatever
- TODO "Named" tokens
'_?StringQ'
or '_String'
'_WordString'
, '_LetterString'
, and '_IdentifierString'
'_?NumberQ'
and '_?NumericQ'
'_Integer'
'Range[*from*, *to*]'
- TODO Interpreters of EBNF
- DONE Java
- TODO GraphViz
- DONE MermaidJS
- TODO Scala
- MAYBE Python
- TODO Raku
- DONE AST
- DONE Class
- DONE Code
- DONE Grammar
- TODO Tokenizer (of character sequences)
- Other EBNF styles
- TODO WL
- DONE FunctionalParsers, [AAp1, AAp2]
- [P] TODO GrammarRules
- Implemented to a point, not tested in WL.
- Graph
- TODO Translators
- TODO FPs code into EBNF
- Very cool to have, but seems to be a lot of work.
- DONE Raku grammars to FPs
- See the class "Grammar::TokenProcessing::Actions::EBNF" of the package "Grammar::TokenProcessing".
- DONE Stand-alone grammar-graph translation function.
- TODO Extensions
- DONE First-matched alternation
- TODO Extra parsers
- DONE
pInteger
- DONE
pNumber
- DONE
pWord
- DONE
pLetterWord
- DONE
pIdentifier
- TODO
pNumberRange
- Other?
- TODO Zero-width assertions implementation
- TODO Lookahead
- TODO Lookbehind
- DONE Random sentence generation
- DONE Basic class code
- DONE Preventing infinite recursion
- DONE "Named" tokens interpretation
'_?StringQ'
or '_String'
'_WordString'
, '_LetterString'
, and '_IdentifierString'
'_?NumberQ'
and '_?NumericQ'
'_Integer'
'Range[*from*, *to*]'
- TODO Documentation
- DONE README
- DONE Parser code generation
- TODO Raku
- DONE Class
- TODO Code
- DONE Grammar
- DONE WL
- DONE FunctionalParsers
- DONE GrammarRules
- TODO Java
- TODO Random sentences generation
- TODO Mermaid flowchart
- TODO Mermaid class diagram?
- TODO Videos
- TODO Introduction
- TODO TRC-2023 presentation
References
Articles
[JF1] Jeroen Fokker,
"Function Parsers",
(1997),
Conference: Advanced Functional Programming,
First International Spring School on Advanced Functional Programming Techniques-Tutorial Text.
10.1007/3-540-59451-5_1.
[WV1] Wim Vanderbauwhede,
"List-based parser combinators in Haskell and Raku",
(2020),
Musings of an Accidental Computing Scientist at codeberg.page.
Packages, paclets, repositories
[AAp1] Anton Antonov,
"FunctionalParsers.m",
(2014),
MathematicaForPrediction at GitHub.
[AAp2] Anton Antonov,
"FunctionalParsers" WL paclet,
(2023),
Wolfram Language Paclet Repository.
[WVp1] Wim Vanderbauwhede,
List-based parser combinator library in Raku,
(2020),
GitHub/wimvanderbauwhede.
[WVp2] Wim Vanderbauwhede,
Parser::Combinators Perl package,
(2013-2015),
GitHub/wimvanderbauwhede.