FunctionalParsers Raku package

Introduction

This Raku package provides a (monadic) system of Functional Parsers (FPs).

The package design and implementation follow closely the article "Functional parsers" by Jeroen Fokker, [JF1]. That article can be used as both a theoretical- and a practical guide to FPs.

Two in one

The package provides both FPs and Extended Backus-Naur Form (EBNF) parsers and interpreters. The reasons for including the EBNF functionalities are:

EBNF parsing is discussed in [JF1]
EBNF parsers and interpreters are very good examples of FPs application

Previous work

Anton Antonov

FPs packages implementations in Lua, Mathematica, and R. See these blog posts and [AAp1, AAp2].

Remark: In this document Mathematica and Wolfram Language (WL) are used as synonyms.

Jeroen Fokker

"Functional parsers" article using Haskell, [JF1].

Wim Vanderbauwhede

Interesting and insightful blog post "List-based parser combinators in Haskell and Raku".
- The corresponding Raku code repository [WVp1] is not (fully) productized.
Perl package "Parser::Combinators", [WVp2].

Installation

From Zef ecosystem:

zef install FunctionalParsers;

From GitHub:

zef install https://github.com/antononcube/Raku-FunctionalParsers.git

Motivation

Here is a list of motivations for implementing this package:

Word-based backtracking
Elevate the "tyranny" of Raku grammars
Easier transfer of Raku grammars into other languages
Monadic parser construction
Quick, elegant implementation

Word-based backtracking

I had certain assumptions about certain slow parsing with Raku using regexes. For example, is not that easy to specify backtracking over sequences of words (instead of characters) in grammars. To check my assumptions I wrote the basic eight FPs (which is quick to do.) After my experiments, I could not help myself making a complete package.

Elevate the "tyranny" of Raku grammars and transferring to other languages

The "first class citizen" treatment of grammars is one of the most important and unique features of Raku. It is one of the reasons why I treat Raku as a "secret weapon."

But that uniqueness does not necessarily facilitate easy utilization or transfer in large software systems. FPs, on the other hand, are implemented in almost every programming language. Hence, making or translating grammars with- or to FPs would provide greater knowledge transfer and integration of Raku-derived solutions.

Monadic parser construction

Having a monadic way of building parsers or grammars is very appealing. (To some people.) Raku's operator making abilities can be nicely utilized.

Remark: The monad of FPs produces Abstract Syntax Trees (ASTs) that are simple lists. I prefer that instead of using specially defined types (as, say, in [WV1, WVp1].) That probably, comes from too much usage of LISP-family programming languages. (Like Mathematica and R.)

Quick, elegant implementation

The Raku code implementing FPs was quick to write and looks concise and elegant.

(To me at least. I would not be surprised if that code can be simplified further.)

Naming considerations

Package name

I considered names like "Parser::Combinator", "Parser::Functional", etc. Of course, I looked up names of similar packages.

Ultimately, I decided to use "FunctionalParsers" because:

Descriptive name that corresponds to the title of the article by Jeroen Fokker, [JF1].
The package has not only parser combinators, but also parser transformers and modifiers.
The connections with corresponding packages in other languages are going to be more obvious.
- For example, I have used the name "FunctionalParsers" for similar packages in other programming languages (Lua, R, WL.)

Actions vs Contexts

I considered to name the directory with EBNF interpreters "Context" or "Contexts", but since "Actions" is used a lot I chose that name.

Remark: In [JF1] the term "contexts" is used.

Examples

Make a parser for a family of (two) simple sentences:

use FunctionalParsers :ALL;

my &p1 = (symbol('numerical') «|» symbol('symbolic')) «&» symbol('integration');

# -> @x { #`(Block|6352212853176) ... }

Here we parse sentences adhering to the grammar of the defined parser:

.say for ("numerical integration", "symbolic integration")>>.words.map({ $_ => &p1($_)});

# (numerical integration) => ((() (numerical integration)))
# (symbolic integration) => ((() (symbolic integration)))

These sentences are not parsed:

("numeric integration", "symbolic summation")>>.words.map({ $_ => &p1($_)});

# ((numeric integration) => () (symbolic summation) => ())

Infix operators

Several notation alternatives are considered for the infix operations corresponding to the different combinators and transformers. Here is a table with different notation styles:

Description	set	double	n-ary
sequential combination	(&)	«&»	⨂
left sequential pick	(<&)	«&	◁
right sequential pick	(&>)	&»	▷
alternatives combination	(⎸)	«⎸»	⨁
function application	(^)	«o	⨀

Consider the parsers:

my &p1 = apply( {1}, symbol('one'));
my &p2 = apply( {2}, symbol('two'));
my &p3 = apply( {3}, symbol('three'));
my &p4 = apply( {4}, symbol('four'));
my &pM = symbol('million');
my &pTh = symbol('things');

# -> @x { #`(Block|6352245759584) ... }

Here are spec examples for each style of infix operators:

# set
my &p = (&p1 (|) &p2 (|) &p3 (|) &p4) (&) (&pM (^) {10**6}) (&) &pTh;
&p('three million things'.words.List).head.tail;

# (3 (1000000 things))

# double 
(&p1 «|» &p2 «|» &p3 «|» &p4) «&» &pM «o {10**6} «&» &pTh;

# n-ary
(&p1 ⨁ &p2 ⨁ &p3 ⨁ &p4) ⨂ {10**6} ⨀ &pM ⨂ &pTh

Remark: The arguments of the apply operator ⨀ are "reversed" when compared to the arguments of the operators (^) and «o. For ⨀ the function to be applied is the first argument.

Parser generation

Here is an EBNF grammar:

my $ebnfCode = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<top> = <integer> ;
END

# <digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
# <integer> = <digit> , { <digit> } ;
# <top> = <integer> ;

Here generation is the corresponding functional parsers code:

use FunctionalParsers::EBNF;
.say for fp-ebnf-parse($ebnfCode, actions => 'Raku::Code').head.tail;

# my &pDIGIT = alternatives(symbol('0'), symbol('1'), symbol('2'), symbol('3'), symbol('4'), symbol('5'), symbol('6'), symbol('7'), symbol('8'), symbol('9'));
# my &pINTEGER = sequence(&pDIGIT, many(&pDIGIT));
# my &pTOP = &pINTEGER;

For more detailed examples see "Parser-code-generation.md".

Random sentence generation

Here is an EBNF grammar:

my $ebnfCode2 = q:to/END/;
<top> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = 'love' | 'hate' | { '♥️' } | '🤮';
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ; 
END

# <top> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = 'love' | 'hate' | { '♥️' } | '🤮';
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;

Here is generation of random sentences with the grammar above:

.say for fp-random-sentence($ebnfCode2, 12);

# We love R
# We hate WL
# I love Perl
# I hate Julia
# We 🤮 R
# We  R
# We  Julia
# I hate R
# We ♥️ WL
# I ♥️ ♥️ WL
# We 🤮 WL
# We love Julia

Generating Mermaid diagrams for EBNFs

The function fp-ebnf-parse can produce Mermaid-JS diagrams corresponding to grammars with the target "MermaidJS::Graph". Here is an example:

my $ebnfCode3 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ];
<b> = 'b' , ( 'B' | '2' );
END

fp-ebnf-parse($ebnfCode3, target=>"MermaidJS::Graph", dir-spec => 'LR').head.tail

graph LR
	alt1((or))
	NT:top["top"]
	seq12((and))
	T:A("A")
	NT:b["b"]
	T:B("B")
	T:2("2")
	T:a("a")
	NT:a["a"]
	T:b("b")
	rep7((*))
	alt14((or))
	opt9((?))
	T:1("1")
	seq5((and))
	alt1 --> NT:a
	alt1 --> NT:b
	NT:top --> alt1
	rep7 --> T:A
	opt9 --> T:1
	seq5 --> |1|T:a
	seq5 --> |2|rep7
	seq5 --> |3|opt9
	NT:a --> seq5
	alt14 --> T:B
	alt14 --> T:2
	seq12 --> |1|T:b
	seq12 --> |2|alt14
	NT:b --> seq12

Here is a legend:

The non-terminals are shown with rectangles
The terminals are shown with round rectangles
The "conjunctions" are shown in disks

Remark: The Markdown cell above has the parameters result=asis, output-lang=mermaid, output-prompt=NONE which allow for direct diagram rendering of the obtained Mermaid code in various Markdown viewers (GitHub, IntelliJ, etc.)

Compare the following EBNF grammar and corresponding diagram with the ones above:

my $ebnfCode4 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ] ;
<b> = 'b' , 'B' | '2' ;
END

fp-grammar-graph($ebnfCode4, dir-spec => 'LR')

graph LR
	T:A("A")
	rep7((*))
	NT:a["a"]
	T:B("B")
	T:2("2")
	seq5((and))
	alt12((or))
	NT:top["top"]
	NT:b["b"]
	T:a("a")
	seq13((and))
	alt1((or))
	opt9((?))
	T:b("b")
	T:1("1")
	alt1 --> NT:a
	alt1 --> NT:b
	NT:top --> alt1
	rep7 --> T:A
	opt9 --> T:1
	seq5 --> |1|T:a
	seq5 --> |2|rep7
	seq5 --> |3|opt9
	NT:a --> seq5
	seq13 --> |1|T:b
	seq13 --> |2|T:B
	alt12 --> seq13
	alt12 --> T:2
	NT:b --> alt12

CLI

The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:

fp-ebnf-parse --help

# Usage:
#   fp-ebnf-parse <ebnf> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar.
#   fp-ebnf-parse <file> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar file.
#   
#     <ebnf>                           EBNF text.
#     -t|--actions=<Str>               Actions ('t' for 'target'.) [default: 'Raku::Class']
#     -n|--parser-name=<Str>           Parser name. [default: 'MyParser']
#     -p|--rule-name-prefix=<Str>      Rule names prefix. [default: 'p']
#     -m|--rule-name-modifier=<Str>    Rule names modifier. [default: 'WhateverCode']
#     -s|--style=<Str>                 EBNF style, one of 'G4', 'Inverted', 'Standard', 'Relaxed', or 'Whatever'. [default: 'Whatever']
#     <file>                           EBNF file name.

If mermaid-cli is installed here is an example UNIX shell pipeline with it:

fp-ebnf-parse ./resources/Arithmetic.ebnf -s=relaxed -t=mermaid > diag.md && mmdc -i diag.md -o diag.png -w 1200 && open diag.png

Implementation considerations

Infix operators

The infix operators have to be reviewed and probably better sets of symbols would be chosen. The challenge is to select operators that are "respected" by the typical Raku IDEs. (I only experimented with Emacs and Comma IDE.)

EBNF parser

All EBNF parser functions in FunctionalParsers::EBNF have apply-transformers that use the attributes of a dedicated object:

unit module FunctionalParsers::EBNF;
...
our $ebnfActions = FunctionalParsers::EBNF::Actions::Raku::AST.new;
....

By assigning instances of different classes to $ebnfActions we get different parsing interpretations.

Not having abstract class

Looking at the Raku EBNF interpreter classes it can be easily seen that each can inherit from a common abstract class. But since the EBNF parsing methods (or attributes that callables) are approximately a dozen one-liners, it seems more convenient to have all class method- and attribute definitions on “one screen.”

Flowchart

graph TD
    FPs[[EBNF<br/>Functional Parsers]]
    RakuAST[Raku::AST]
    RakuClass[Raku::Class]
    RakuCode[Raku::Code]
    RakuGrammar[Raku::Grammar]
    WLCode[WL::Code]
    WLGrammar[WL::Grammar]
    JavaFuncJ[Java::FuncJ]
    JavaANTLR[Java::ANTLR]
    Input[/- EBNF code<br/>- Properties/]
    PickTarget[Assign context]
    Parse[Parse]
    QEVAL{Evaluate?}
    EVAL[[EVAL]] 
    Code>Code]
    Context[Context object]
    Result{{Result}}
    Input --> PickTarget
    PickTarget -.- WL
    PickTarget -.- Java
    PickTarget -..- Raku
    PickTarget -.-> Context
    PickTarget --> Parse
    Parse -.- FPs
    Context -.-> FPs
    Parse --> QEVAL
    Parse -.-> Code
    QEVAL --> |yes|EVAL
    Code -.-> EVAL 
    EVAL ---> Result
    QEVAL ---> |no|Result
    subgraph Raku
        RakuAST
        RakuClass
        RakuCode
        RakuGrammar
    end    
    subgraph WL
        WLCode
        WLGrammar
    end
    subgraph Java
        JavaFuncJ
        JavaANTLR
    end

TODO

TODO Parsing EBNF refactoring & additional features
- DONE Parse any combination of sequence operators
  - Initially, only these were parsed:
    - 'a' <& 'b' <& 'c' | 'a' &> 'd';
    - 'a' , 'b' , 'c' | 'a' &> 'd';
  - These are also parsed:
    - 'a' , 'b' &> 'c'
    - 'a' <& 'b' &> 'c'
- DONE Class-based parsers
  - DONE From characters
  - DONE From tokens
- DONE Themed parsers
  - DONE Inheritance based implementation
  - DONE "Simpler"
  - DONE ANTLR / G4
  - DONE Whatever
- TODO "Named" tokens
  - '_?StringQ' or '_String'
  - '_WordString', '_LetterString', and '_IdentifierString'
  - '_?NumberQ' and '_?NumericQ'
  - '_Integer'
  - 'Range[*from*, *to*]'
TODO Interpreters of EBNF
- DONE Java
  - DONE "funcj.parser"
- TODO GraphViz
  - TODO DOT
- DONE MermaidJS
- TODO Scala
  - TODO built-in
  - TODO parsley
- MAYBE Python
- TODO Raku
  - DONE AST
  - DONE Class
  - DONE Code
  - DONE Grammar
  - TODO Tokenizer (of character sequences)
  - Other EBNF styles
- TODO WL
  - DONE FunctionalParsers, [AAp1, AAp2]
  - [P] TODO GrammarRules
    - Implemented to a point, not tested in WL.
  - Graph
    - Via MermaidJS classes.
TODO Translators
- TODO FPs code into EBNF
  - Very cool to have, but seems to be a lot of work.
- DONE Raku grammars to FPs
  - See the class "Grammar::TokenProcessing::Actions::EBNF" of the package "Grammar::TokenProcessing".
- DONE Stand-alone grammar-graph translation function.
  - fp-grammar-graph
TODO Extensions
- DONE First-matched alternation
  - The standard alterations parser is "longest alternation" (in Raku's terms.)
- TODO Extra parsers
  - DONE pInteger
  - DONE pNumber
  - DONE pWord
  - DONE pLetterWord
  - DONE pIdentifier
  - TODO pNumberRange
  - Other?
- TODO Zero-width assertions implementation
  - TODO Lookahead
  - TODO Lookbehind
DONE Random sentence generation
- DONE Basic class code
- DONE Preventing infinite recursion
- DONE "Named" tokens interpretation
  - '_?StringQ' or '_String'
  - '_WordString', '_LetterString', and '_IdentifierString'
  - '_?NumberQ' and '_?NumericQ'
  - '_Integer'
  - 'Range[*from*, *to*]'
TODO Documentation
- DONE README
- DONE Parser code generation
  - TODO Raku
    - DONE Class
    - TODO Code
    - DONE Grammar
  - DONE WL
    - DONE FunctionalParsers
    - DONE GrammarRules
  - TODO Java
- TODO Random sentences generation
- TODO Mermaid flowchart
- TODO Mermaid class diagram?
TODO Videos
- TODO Introduction
- TODO TRC-2023 presentation

References

Articles

[JF1] Jeroen Fokker, "Function Parsers", (1997), Conference: Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text. 10.1007/3-540-59451-5_1.

[WV1] Wim Vanderbauwhede, "List-based parser combinators in Haskell and Raku", (2020), Musings of an Accidental Computing Scientist at codeberg.page.