Rand Stats

FunctionalParsers

zef:antononcube

FunctionalParsers Raku package

Introduction

This Raku package provides a (monadic) system of Functional Parsers (FPs).

The package design and implementation follow closely the article "Functional parsers" by Jeroen Fokker, [JF1]. That article can be used as both a theoretical- and a practical guide to FPs.

Two in one

The package provides both FPs and Extended Backus-Naur Form (EBNF) parsers and interpreters. The reasons for including the EBNF functionalities are:

Previous work

Anton Antonov

Remark: In this document Mathematica and Wolfram Language (WL) are used as synonyms.

Jeroen Fokker

Wim Vanderbauwhede


Installation

From Zef ecosystem:

zef install FunctionalParsers;

From GitHub:

zef install https://github.com/antononcube/Raku-FunctionalParsers.git

Motivation

Here is a list of motivations for implementing this package:

  1. Word-based backtracking
  2. Elevate the "tyranny" of Raku grammars
  3. Easier transfer of Raku grammars into other languages
  4. Monadic parser construction
  5. Quick, elegant implementation

Word-based backtracking

I had certain assumptions about certain slow parsing with Raku using regexes. For example, is not that easy to specify backtracking over sequences of words (instead of characters) in grammars. To check my assumptions I wrote the basic eight FPs (which is quick to do.) After my experiments, I could not help myself making a complete package.

Elevate the "tyranny" of Raku grammars and transferring to other languages

The "first class citizen" treatment of grammars is one of the most important and unique features of Raku. It is one of the reason why I treat Raku as a "secret weapon."

But that uniqueness does not necessarily facilitate easy utilization or transfer in large software systems. FPs, on the other hand, are implemented in almost every programming language. Hence, making or translating grammars with- or to FPs would provide greater knowledge transfer and integration of Raku-derived solutions.

Monadic parser construction

Having a monadic way of building parsers or grammars is very appealing. (To some people.) Raku's operator making abilities can be nicely utilized.

Remark: The monad of FPs produces Abstract Syntax Trees (ASTs) that are simple lists. I prefer that instead of using specially defined types (as, say, in [WV1, WVp1].) That probably, comes from too much usage of LISP-family programming languages. (Like Mathematica and R.)

Quick, elegant implementation

The Raku code implementing FPs was quick to write and looks concise and elegant.

(To me at least. I would not be surprised if that code can be simplified further.)


Naming considerations

Package name

I considered names like "Parser::Combinator", "Parser::Functional", etc. Of course, looked up names of similar packages.

Ultimately, I decided to use "FunctionalParsers" because:

Actions vs Contexts

I considered to name the directory with EBNF interpreters "Context" or "Contexts", but since "Actions" is used a lot I chose that name.

Remark: In [JF1] the term "contexts" is used.


Examples

Make a parser for a family of (two) simple sentences:

use FunctionalParsers :ALL;

my &p1 = (symbol('numerical') «|» symbol('symbolic')) «&» symbol('integration');
# -> @x { #`(Block|6352212853176) ... }

Here we parse sentences adhering to the grammar of the defined parser:

.say for ("numerical integration", "symbolic integration")>>.words.map({ $_ => &p1($_)});
# (numerical integration) => ((() (numerical integration)))
# (symbolic integration) => ((() (symbolic integration)))

These sentences are not parsed:

("numeric integration", "symbolic summation")>>.words.map({ $_ => &p1($_)});
# ((numeric integration) => () (symbolic summation) => ())

Infix operators

Several notation alternatives are considered for the infix operations corresponding to the different combinators and transformers. Here is a table with different notation styles:

Descriptionsetdoublen-ary
sequential combination(&)«&»
left sequential pick(<&)«&
right sequential pick(&>)
alternatives combination(⎸)«⎸»
function application(^)«o

Consider the parsers:

my &p1 = apply( {1}, symbol('one'));
my &p2 = apply( {2}, symbol('two'));
my &p3 = apply( {3}, symbol('three'));
my &p4 = apply( {4}, symbol('four'));
my &pM = symbol('million');
my &pTh = symbol('things');
# -> @x { #`(Block|6352245759584) ... }

Here are spec examples for each style of infix operators:

# set
my &p = (&p1 (|) &p2 (|) &p3 (|) &p4) (&) (&pM (^) {10**6}) (&) &pTh;
&p('three million things'.words.List).head.tail;
# (3 (1000000 things))
# double 
(&p1 «|» &p2 «|» &p3 «|» &p4) «&» &pM «o {10**6} «&» &pTh;
# n-ary
(&p1 ⨁ &p2 ⨁ &p3 ⨁ &p4) ⨂ {10**6} ⨀ &pM ⨂ &pTh

Remark: The arguments of the apply operator are "reversed" when compared to the arguments of the operators (^) and «o. For the function to be applied is the first argument.


Parser generation

Here is an EBNF grammar:

my $ebnfCode = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<top> = <integer> ;
END
# <digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
# <integer> = <digit> , { <digit> } ;
# <top> = <integer> ;

Here generation is the corresponding functional parsers code:

use FunctionalParsers::EBNF;
.say for fp-ebnf-parse($ebnfCode, actions => 'Raku::Code').head.tail;
# my &pDIGIT = alternatives(symbol('0'), symbol('1'), symbol('2'), symbol('3'), symbol('4'), symbol('5'), symbol('6'), symbol('7'), symbol('8'), symbol('9'));
# my &pINTEGER = sequence(&pDIGIT, many(&pDIGIT));
# my &pTOP = &pINTEGER;

For more detailed examples see "Parser-code-generation.md".


Random sentence generation

Here is an EBNF grammar:

my $ebnfCode2 = q:to/END/;
<top> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = 'love' | 'hate' | { '♥️' } | '🤮';
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ; 
END
# <top> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = 'love' | 'hate' | { '♥️' } | '🤮';
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;

Here is generation of random sentences with the grammar above:

.say for fp-random-sentence($ebnfCode2, 12);
# We love R
# We hate WL
# I love Perl
# I hate Julia
# We 🤮 R
# We  R
# We  Julia
# I hate R
# We ♥️ WL
# I ♥️ ♥️ WL
# We 🤮 WL
# We love Julia

Generating Mermaid diagrams for EBNFs

The function fp-ebnf-parse can produce Mermaid-JS diagrams corresponding to grammars with the target "MermaidJS::Graph". Here is an example:

my $ebnfCode3 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ];
<b> = 'b' , ( 'B' | '2' );
END

fp-ebnf-parse($ebnfCode3, target=>"MermaidJS::Graph", dir-spec => 'LR').head.tail
graph LR
	alt1((or))
	NT:top["top"]
	seq12((and))
	T:A("A")
	NT:b["b"]
	T:B("B")
	T:2("2")
	T:a("a")
	NT:a["a"]
	T:b("b")
	rep7((*))
	alt14((or))
	opt9((?))
	T:1("1")
	seq5((and))
	alt1 --> NT:a
	alt1 --> NT:b
	NT:top --> alt1
	rep7 --> T:A
	opt9 --> T:1
	seq5 --> |1|T:a
	seq5 --> |2|rep7
	seq5 --> |3|opt9
	NT:a --> seq5
	alt14 --> T:B
	alt14 --> T:2
	seq12 --> |1|T:b
	seq12 --> |2|alt14
	NT:b --> seq12

Here is a legend:

Remark: The Markdown cell above has the parameters result=asis, output-lang=mermaid, output-prompt=NONE which allow for direct diagram rendering of the obtained Mermaid code in various Markdown viewers (GitHub, IntelliJ, etc.)

Compare the following EBNF grammar and corresponding diagram with the ones above:

my $ebnfCode4 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ] ;
<b> = 'b' , 'B' | '2' ;
END

fp-grammar-graph($ebnfCode4, dir-spec => 'LR')
graph LR
	T:A("A")
	rep7((*))
	NT:a["a"]
	T:B("B")
	T:2("2")
	seq5((and))
	alt12((or))
	NT:top["top"]
	NT:b["b"]
	T:a("a")
	seq13((and))
	alt1((or))
	opt9((?))
	T:b("b")
	T:1("1")
	alt1 --> NT:a
	alt1 --> NT:b
	NT:top --> alt1
	rep7 --> T:A
	opt9 --> T:1
	seq5 --> |1|T:a
	seq5 --> |2|rep7
	seq5 --> |3|opt9
	NT:a --> seq5
	seq13 --> |1|T:b
	seq13 --> |2|T:B
	alt12 --> seq13
	alt12 --> T:2
	NT:b --> alt12

CLI

The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:

fp-ebnf-parse --help
# Usage:
#   fp-ebnf-parse <ebnf> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar.
#   fp-ebnf-parse <file> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar file.
#   
#     <ebnf>                           EBNF text.
#     -t|--actions=<Str>               Actions ('t' for 'target'.) [default: 'Raku::Class']
#     -n|--parser-name=<Str>           Parser name. [default: 'MyParser']
#     -p|--rule-name-prefix=<Str>      Rule names prefix. [default: 'p']
#     -m|--rule-name-modifier=<Str>    Rule names modifier. [default: 'WhateverCode']
#     -s|--style=<Str>                 EBNF style, one of 'G4', 'Inverted', 'Standard', 'Relaxed', or 'Whatever'. [default: 'Whatever']
#     <file>                           EBNF file name.

If mermaid-cli is installed here is an example UNIX shell pipeline with it:

fp-ebnf-parse ./resources/Arithmetic.ebnf -s=relaxed -t=mermaid > diag.md && mmdc -i diag.md -o diag.png -w 1200 && open diag.png

Implementation considerations

Infix operators

The infix operators have to be reviewed and probably better sets of symbols would be chosen. The challenge is to select operators that are "respected" by the typical Raku IDEs. (I only experimented with Emacs and Comma IDE.)

EBNF parser

All EBNF parser functions in FunctionalParsers::EBNF have apply-transformers that use the attributes of a dedicated object:

unit module FunctionalParsers::EBNF;
...
our $ebnfActions = FunctionalParsers::EBNF::Actions::Raku::AST.new;
....

By assigning instances of different classes to $ebnfActions we get different parsing interpretations.

Not having abstract class

Looking at the Raku EBNF interpreter classes it can be easily seen that each can inherit from a common abstract class. But since the EBNF parsing methods (or attributes that callables) are approximately a dozen one-liners, it seems more convenient to have all class method- and attribute definitions on “one screen.”

Flowchart

graph TD
    FPs[[EBNF<br/>Functional Parsers]]
    RakuAST[Raku::AST]
    RakuClass[Raku::Class]
    RakuCode[Raku::Code]
    RakuGrammar[Raku::Grammar]
    WLCode[WL::Code]
    WLGrammar[WL::Grammar]
    JavaFuncJ[Java::FuncJ]
    JavaANTLR[Java::ANTLR]
    Input[/- EBNF code<br/>- Properties/]
    PickTarget[Assign context]
    Parse[Parse]
    QEVAL{Evaluate?}
    EVAL[[EVAL]] 
    Code>Code]
    Context[Context object]
    Result{{Result}}
    Input --> PickTarget
    PickTarget -.- WL
    PickTarget -.- Java
    PickTarget -..- Raku
    PickTarget -.-> Context
    PickTarget --> Parse
    Parse -.- FPs
    Context -.-> FPs
    Parse --> QEVAL
    Parse -.-> Code
    QEVAL --> |yes|EVAL
    Code -.-> EVAL 
    EVAL ---> Result
    QEVAL ---> |no|Result
    subgraph Raku
        RakuAST
        RakuClass
        RakuCode
        RakuGrammar
    end    
    subgraph WL
        WLCode
        WLGrammar
    end
    subgraph Java
        JavaFuncJ
        JavaANTLR
    end

TODO


References

Articles

[JF1] Jeroen Fokker, "Function Parsers", (1997), Conference: Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text. 10.1007/3-540-59451-5_1.

[WV1] Wim Vanderbauwhede, "List-based parser combinators in Haskell and Raku", (2020), Musings of an Accidental Computing Scientist at codeberg.page.

Packages, paclets, repositories

[AAp1] Anton Antonov, "FunctionalParsers.m", (2014), MathematicaForPrediction at GitHub.

[AAp2] Anton Antonov, "FunctionalParsers" WL paclet, (2023), Wolfram Language Paclet Repository.

[WVp1] Wim Vanderbauwhede, List-based parser combinator library in Raku, (2020), GitHub/wimvanderbauwhede.

[WVp2] Wim Vanderbauwhede, Parser::Combinators Perl package, (2013-2015), GitHub/wimvanderbauwhede.