Raku DSL::English::DataAcquisitionWorkflows


In brief
This Raku Perl 6 package has grammar classes and action classes for the parsing and
interpretation of natural language commands that specify Data Acquisition (DA) workflows.
It is envisioned that the interpreters (actions) are going to target different
programming languages: R, Mathematica, Python, etc.
This mind-maps shows the conversational agent components this grammar addresses:

This
org-mode file
is used to track project's progress.
Installation
From Zef ecosystem:
zef install DSL::English::DataAcquisitionWorkflows
From GitHub:
zef install https://github.com/antononcube/Raku-DSL-Shared.git
zef install https://github.com/antononcube/Raku-DSL-Entity-English-Metadata.git
zef install https://github.com/antononcube/Raku-DSL-English-DataAcquisitionWorkflows.git
Examples
Here is an introspection query command interpretation to Wolfram Language (WL) code:
use DSL::English::DataAcquisitionWorkflows;
ToDataAcquisitionWorkflowCode('How many times I acquired anatomical structure data last year', "WL-Ecosystem")
# Length[dsDataAcquisitions[Select[AbsoluteTime[DateObject["2025-01-01"]] <= AbsoluteTime[#Timestamp] <= AbsoluteTime[DateObject["2025-12-31"]]&]]]
Recommend by profile:
ToDataAcquisitionWorkflowCode('recommend bike store and collection page data', "WL-Ecosystem")
# smrDataAcquisitions \[DoubleLongRightArrow]
# SMRMonRecommendByProfile[ {"ColumnHeading:BikeStore", "ColumnHeading:CollectionPage"}] \[DoubleLongRightArrow]
# SMRMonJoinAcross["Warning"->False] \[DoubleLongRightArrow]
# SMRMonTakeValue[]
TODO
Currently, the package does not parse/interpret the following commands, but its future versions would.
Here is an ingredients command interpretation to Wolfram Language (WL) code:
ToDataAcquisitionWorkflowCode('how many datasets contain both categorical and numerical columns', "WL-Ecosystem")
General recommendation request:
ToDataAcquisitionWorkflowCode(
"what data can I get for time series investigations?;
why did you recommend those",
"WL-Ecosystem");
Recommendation request with subsequent filtering:
ToDataAcquisitionWorkflowCode(
"I want to investigate data that cross references good purchases with customer demographics
keep only datasets that can be transformed to star schema",
"WL-Ecosystem");
Data quality verification specification:
ToDataAcquisitionWorkflowCode(
"verify the quality of the database dbGJ99;
what fraction of records have missing data;
what are the distributions of the numerical columns",
"WL-Ecosystem");
Here is a more complicated, statistics pipeline specification:
ToDataAcquisitionWorkflowCode(
"how many people used customer service data last month;
what is the breakdown of data sources over data types;
where textual data is utilized the most;
plot the results;",
"R-tidyverse")
Here is a recommendation specification (by collaborative filtering):
ToDataAcquisitionWorkflowCode(
"what data people like me acquired last month;
which of those I can use for classfier investigations;
show me the data sizes and metadata;",
"WL-Ecosystem")
Implementation notes
The general structure of this package and its grammar (and sub-grammars) design is analogous
the structure and grammars of the Raku package
DSL::English::FoodPreparationWorkflows,
[AAr3].
The original versions of the grammars were generated using Mathematica.
See the notebook
"Data-Acquisition-Workflows-grammar-generation.nb"
References
Repositories
[AAr1] Anton Antonov,
DSL::Shared Raku package,
(2020),
GitHub/antononcube.
[AAr2] Anton Antonov,
DSL::Entity::Metadata Raku package,
(2021),
GitHub/antononcube.
[AAr3] Anton Antonov,
DSL::English::FoodPreparationWorkflows Raku package,
(2021),
GitHub/antononcube.
Videos
[AAv1] Anton Antonov,
"Multi-language Data Acquisition Conversational Agent (extended version)",
(2021),
YouTube.com.