EBNF::Grammar Raku package
Introduction
Raku package for Extended Backus-Naur Form (EBNF) parsing and interpretation.
The grammar follows the description of the Wikipedia entry
"Extended Backus–Naur form", [Wk1],
which refers to the proposed ISO/IEC 14977 standard, by R. S. Scowen, page 7, table 1. [RS1, ISO1].
Motivation
The main motivation for this package is to have:
- Multiple EBNF styles parsed (quickly)
- Grammar generation for multiple languages
The motivation comes from the the "need" to parse (and interpret) EBNF grammars
generated with Large Language Models (LLMs), like ChatGPT and PaLM. For more details see
"Incremental grammar enhancement".
I considered extending "Grammar::BNF",
but ultimately decided that "Grammar::BNF" needs too much refactoring for my purposes,
and, well, it is for BNF not EBNF.
Installation
From Zef ecosystem:
zef install EBNF::Grammar;
From GitHub:
zef install https://github.com/antononcube/Raku-EBNF-Grammar.git
Usage examples
Here is an EBNF grammar for integers and its interpretation into a Raku grammar:
use EBNF::Grammar;
my $ebnf = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<TOP> = <integer> ;
END
ebnf-interpret($ebnf);
# grammar EBNF_1702441429_8786073 {
# regex digit { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' }
# regex integer { <digit> <digit>* }
# regex TOP { <integer> }
# }
Here the obtained Raku grammar is evaluated and used to do a few parsings:
my $gr = ebnf-interpret($ebnf):eval;
.say for <212 89 9090>.map({ $gr.parse($_) });
# 「212」
# integer => 「212」
# digit => 「2」
# digit => 「1」
# digit => 「2」
# 「89」
# integer => 「89」
# digit => 「8」
# digit => 「9」
# 「9090」
# integer => 「9090」
# digit => 「9」
# digit => 「0」
# digit => 「9」
# digit => 「0」
Random sentence generation
Random sentences of grammars given in EBNF can be generated with additional help of the package
"Grammar::TokenProcessing", [AAp2].
Here is an EBNF grammar:
my $ebnfCode = q:to/END/;
<statement> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = [ 'really' ] , ( 'love' | 'hate' | { '♥️' } | '🤮' );
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
END
# <statement> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = [ 'really' ] , ( 'love' | 'hate' | { '♥️' } | '🤮' );
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
Here is the corresponding Raku grammar:
ebnf-interpret($ebnfCode, name=>'LoveHateProgLang');
grammar LoveHateProgLang {
regex statement { <who> <verb> <lang> }
regex who { 'I' | 'We' }
regex verb { 'really'? ['love' | 'hate' | '♥️'* | '🤮'] }
regex lang { 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' }
}
Here we generate random sentences:
use Grammar::TokenProcessing;
my $gr = ebnf-interpret($ebnfCode, name=>'LoveHateProgLang'):eval;
.say for random-sentence-generation($gr, '<statement>') xx 12;
# We really love Perl
# We hate Perl
# I really hate R
# We hate Python
# I really 🤮 Python
# I really ♥️ R
# We really love Perl
# I hate WL
# I hate Python
# I really ♥️ WL
# I love Julia
# I love Julia
CLI
The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:
ebnf-parse --help
# Usage:
# /Users/antonov/.rakubrew/versions/moar-2023.11/share/perl6/site/bin/ebnf-parse <ebnf> [-t|--target=<Str>] [--name|--parser-name=<Str>] [-s|--style=<Str>] -- Generates a parser code for a given EBNF grammar.
#
# <ebnf> EBNF text.
# -t|--target=<Str> Target. [default: 'Raku::Grammar']
# --name|--parser-name=<Str> Parser name. [default: 'Whatever']
# -s|--style=<Str> EBNF style, one of 'Standard', 'Inverted', 'Relaxed', or 'Whatever'. [default: 'Standard']
Implementation notes
- The first version of "EBNF::Grammar::Standardish" was generated with "FunctionalParsers", [AAp1], using the EBNF grammar (given in EBNF) in [Wk1].
- Refactored
<term>
(originally <pTERM>
) into separate parenthesized, optional, and repeated specs.- This corresponds to the design in "FunctionalParsers".
- Tokens and regexes were renamed. (More concise, easier to read names.)
- Implemented the "relaxed" version of the standard EBNF.
Comparison with other packages
The following table overviews the similarities and differences of this package
with the packages "FunctionalParsers" and "Grammar::TokenProcessing":
Feature | FunctionalParsers | EBNF::Grammar | Grammar::TokenProcessing |
---|
Parsing EBNF: | | | ✔ |
Standard | ✔ | ✔ | |
Modified versions | ✔ | ✔ | |
Whatever | ✔ | | |
Automatic top rule determination | ✔ | | |
Parsing Raku grammar: | | | ✔ |
Pick left and pick right | ✔ | | |
Skip element | | | ✔ |
Automatic top rule determination | ✔ | | ✔ |
Comprehensive quantifiers | | | ✔ |
Interpretation: | ✔ | ✔ | |
Raku grammar | ✔ | ✔ | |
EBNF grammar (standard) | ✔ | | ✔ |
WL grammar | ✔ | | |
Java functional parsers | ✔ | | |
Raku functional parsers | ✔ | | |
Scala functional parsers | ✔ | | |
WL functional parsers | ✔ | ✔ | |
Random sentence generation | ✔ | | ✔ |
CLI | ✔ | ✔ | ✔ |
Here are some additional- and clarification points:
Since one of the motivations for "FunctionalParsers" and "EBNF::Grammar" is parsing and interpretation of EBNF
grammars derived with Large Language Models (LLMs) multiple EBNF variants have to be parsed.
- And a
Whatever
parsing method would be of great convenience.
It is envisioned that "EBNF::Grammar" is completed with functionalities from "Grammar::TokenProcessing".
- (Like random sentence generation.)
Both "FunctionalParsers" and "EBNF::Grammar" generate Functional Parsers (FPs) for other programming languages
because many languages have packages implementing FPs.
The interpretations to FPs of other programming languages (Java, Swift) with "EBNF::Grammar" will be also implemented.
In many cases the parsing with "EBNF::Grammar" is much faster than "FunctionalParsers".
- The conjecture that that would be case was one of the motivations for implementing of "EBNF::Grammar".
Cross-interfacing:
- The package "Grammar::TokenProcessing" can translate Raku grammars into EBNFs.
- Both "FunctionalParsers" and "EBNF::Grammar" can translate EBNFs into Raku grammars.
- "EBNF::Grammar" can generate parser classes that are utilizing the FPs of "FunctionalParsers".
The following diagram summarizes the relationships (and implied workflows) in the comparison table
and clarification points above:
graph TD
EBNF>EBNF]
RakuGrammar>"Raku grammar"]
FPClass>"Functional parsers class<br/>(grammar)"]
FPs[[FunctionalParsers::EBNF]]
FPsEBNFMmdGraph[[FunctionalParsers::EBNF::Actions::MermaidJS::Graph]]
FPsEBNFWLGraph[[FunctionalParsers::EBNF::Actions::WL::Graph]]
EBNFGram[[EBNF::Grammar]]
GT[[Grammar::TokenProcessing]]
RS>Random sentences]
RakuAST>Raku AST]
MmdGraph>Mermaid JS<br>graph]
WLGraph>Mathematica/WL<br>graph]
EBNF --> FPs
EBNF --> EBNFGram
EBNFGram --> |ebnf-interpret|FPClass
EBNFGram --> |ebnf-grammar-graph|RakuAST
FPs --> |fp-ebnf-parse|FPClass
GT --> |random-sentence-generation|RS
FPClass --> |fp-random-sentence|RS
FPs --> |fp-ebnf-parse|RakuAST
RakuAST --> |fp-grammar-graph|FPsEBNFMmdGraph
FPsEBNFMmdGraph --> MmdGraph
RakuAST --> |fp-grammar-graph|FPsEBNFWLGraph
FPsEBNFWLGraph --> WLGraph
EBNFGram --> |ebnf-interpret|RakuGrammar
FPs --> |fp-ebnf-interpret|RakuGrammar
RakuGrammar --> GT
TODO
- TODO Parsing of EBNF
- DONE Parse apply function,
<@
- TODO Sequence-pick-left,
<&
- TODO Sequence-pick-right,
&>
- TODO "Named" tokens
'_?StringQ'
or '_String'
'_WordString'
, '_LetterString'
, and '_IdentifierString'
'_?NumberQ'
and '_?NumericQ'
'_Integer'
'Range[*from*, *to*]'
- TODO Interpreters of EBNF
- TODO Java
- TODO Mermaid JS
- DONE Simple
- TODO Proper
- Most likely, via "FunctionalParsers"
- TODO Scala
- MAYBE Python
- TODO Raku
- DONE Grammar
- DONE FunctionalParsers
- TODO MermaidJS
- Other EBNF styles
- TODO WL
- DONE FunctionalParsers, [AAp1, AAp2]
- TODO GrammarRules
- DONE Implement grammar-graph translator
- DONE CLI
References
Articles
[Wk1] Wikipedia entry, "Extended Backus–Naur form".
[RS1] Roger S. Scowen: Extended BNF — A generic base standard. Software Engineering Standards Symposium 1993.
[ISO1] ISO/IEC 14977:1996.
Packages, repositories
[AAp1] Anton Antonov,
FunctionParsers Raku package,
(2023),
GitHub/antononcube.
[AAp2] Anton Antonov,
Grammar::TokenProcessing Raku package,
(2022-2023),
GitHub/antononcube.