Rand Stats

EBNF::Grammar

zef:antononcube

EBNF::Grammar Raku package

Introduction

Raku package for Extended Backus-Naur Form (EBNF) parsing and interpretation.

The grammar follows the description of the Wikipedia entry "Extended Backus–Naur form", [Wk1], which refers to the proposed ISO/IEC 14977 standard, by R. S. Scowen, page 7, table 1. [RS1, ISO1].

Motivation

The main motivation for this package is to have:

  1. Multiple EBNF styles parsed (quickly)
  2. Grammar generation for multiple languages

The motivation comes from the the "need" to parse (and interpret) EBNF grammars generated with Large Language Models (LLMs), like ChatGPT and PaLM. For more details see "Incremental grammar enhancement".

I considered extending "Grammar::BNF", but ultimately decided that "Grammar::BNF" needs too much refactoring for my purposes, and, well, it is for BNF not EBNF.


Installation

From Zef ecosystem:

zef install EBNF::Grammar;

From GitHub:

zef install https://github.com/antononcube/Raku-EBNF-Grammar.git

Usage examples

Here is an EBNF grammar for integers and its interpretation into a Raku grammar:

use EBNF::Grammar;

my $ebnf = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<TOP> = <integer> ;
END

ebnf-interpret($ebnf);
# grammar EBNF_1702441429_8786073 {
# 	regex digit { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' }
# 	regex integer { <digit> <digit>* }
# 	regex TOP { <integer> }
# }

Here the obtained Raku grammar is evaluated and used to do a few parsings:

my $gr = ebnf-interpret($ebnf):eval;

.say for <212 89 9090>.map({ $gr.parse($_) });
# 「212」
#  integer => 「212」
#   digit => 「2」
#   digit => 「1」
#   digit => 「2」
# 「89」
#  integer => 「89」
#   digit => 「8」
#   digit => 「9」
# 「9090」
#  integer => 「9090」
#   digit => 「9」
#   digit => 「0」
#   digit => 「9」
#   digit => 「0」

Random sentence generation

Random sentences of grammars given in EBNF can be generated with additional help of the package "Grammar::TokenProcessing", [AAp2].

Here is an EBNF grammar:

my $ebnfCode = q:to/END/;
<statement> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = [ 'really' ] , ( 'love' | 'hate' | { '♥️' } | '🤮' );
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ; 
END
# <statement> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = [ 'really' ] , ( 'love' | 'hate' | { '♥️' } | '🤮' );
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;

Here is the corresponding Raku grammar:

ebnf-interpret($ebnfCode, name=>'LoveHateProgLang');
grammar LoveHateProgLang {
	regex statement { <who> <verb> <lang> }
	regex who { 'I' | 'We' }
	regex verb { 'really'? ['love' | 'hate' | '♥️'* | '🤮'] }
	regex lang { 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' }
}

Here we generate random sentences:

use Grammar::TokenProcessing;

my $gr = ebnf-interpret($ebnfCode, name=>'LoveHateProgLang'):eval;

.say for random-sentence-generation($gr, '<statement>') xx 12;
# We really love Perl
# We hate Perl
# I really hate R
# We hate Python
# I really 🤮 Python
# I really ♥️ R
# We really love Perl
# I hate WL
# I hate Python
# I really ♥️ WL
# I love Julia
# I love Julia

CLI

The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:

ebnf-parse --help
# Usage:
#   /Users/antonov/.rakubrew/versions/moar-2023.11/share/perl6/site/bin/ebnf-parse <ebnf> [-t|--target=<Str>] [--name|--parser-name=<Str>] [-s|--style=<Str>] -- Generates a parser code for a given EBNF grammar.
#   
#     <ebnf>                        EBNF text.
#     -t|--target=<Str>             Target. [default: 'Raku::Grammar']
#     --name|--parser-name=<Str>    Parser name. [default: 'Whatever']
#     -s|--style=<Str>              EBNF style, one of 'Standard', 'Inverted', 'Relaxed', or 'Whatever'. [default: 'Standard']

Implementation notes

  1. The first version of "EBNF::Grammar::Standardish" was generated with "FunctionalParsers", [AAp1], using the EBNF grammar (given in EBNF) in [Wk1].
  2. Refactored <term> (originally <pTERM>) into separate parenthesized, optional, and repeated specs.
    • This corresponds to the design in "FunctionalParsers".
  3. Tokens and regexes were renamed. (More concise, easier to read names.)
  4. Implemented the "relaxed" version of the standard EBNF.

Comparison with other packages

The following table overviews the similarities and differences of this package with the packages "FunctionalParsers" and "Grammar::TokenProcessing":

FeatureFunctionalParsersEBNF::GrammarGrammar::TokenProcessing
Parsing EBNF:
Standard
Modified versions
Whatever
Automatic top rule determination
Parsing Raku grammar:
Pick left and pick right
Skip element
Automatic top rule determination
Comprehensive quantifiers
Interpretation:
Raku grammar
EBNF grammar (standard)
WL grammar
Java functional parsers
Raku functional parsers
Scala functional parsers
WL functional parsers
Random sentence generation
CLI

Here are some additional- and clarification points:

The following diagram summarizes the relationships (and implied workflows) in the comparison table and clarification points above:

graph TD
    EBNF>EBNF]
    RakuGrammar>"Raku grammar"]
    FPClass>"Functional parsers class<br/>(grammar)"]
    FPs[[FunctionalParsers::EBNF]]
    FPsEBNFMmdGraph[[FunctionalParsers::EBNF::Actions::MermaidJS::Graph]]
    FPsEBNFWLGraph[[FunctionalParsers::EBNF::Actions::WL::Graph]]
    EBNFGram[[EBNF::Grammar]]
    GT[[Grammar::TokenProcessing]]
    RS>Random sentences]
    RakuAST>Raku AST]
    MmdGraph>Mermaid JS<br>graph]
    WLGraph>Mathematica/WL<br>graph]
    EBNF --> FPs 
    EBNF --> EBNFGram
    EBNFGram --> |ebnf-interpret|FPClass
    EBNFGram --> |ebnf-grammar-graph|RakuAST
    FPs --> |fp-ebnf-parse|FPClass
    GT --> |random-sentence-generation|RS
    FPClass --> |fp-random-sentence|RS
    FPs --> |fp-ebnf-parse|RakuAST
    RakuAST --> |fp-grammar-graph|FPsEBNFMmdGraph
    FPsEBNFMmdGraph --> MmdGraph
    RakuAST --> |fp-grammar-graph|FPsEBNFWLGraph
    FPsEBNFWLGraph --> WLGraph
    EBNFGram --> |ebnf-interpret|RakuGrammar
    FPs --> |fp-ebnf-interpret|RakuGrammar
    RakuGrammar --> GT

TODO


References

Articles

[Wk1] Wikipedia entry, "Extended Backus–Naur form".

[RS1] Roger S. Scowen: Extended BNF — A generic base standard. Software Engineering Standards Symposium 1993.

[ISO1] ISO/IEC 14977:1996.

Packages, repositories

[AAp1] Anton Antonov, FunctionParsers Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, Grammar::TokenProcessing Raku package, (2022-2023), GitHub/antononcube.