Rand Stats

DSL::English::DataAcquisitionWorkflows

zef:antononcube

Raku DSL::English::DataAcquisitionWorkflows

MacOS Linux Win64

In brief

This Raku Perl 6 package has grammar classes and action classes for the parsing and interpretation of natural language commands that specify Data Acquisition (DA) workflows.

It is envisioned that the interpreters (actions) are going to target different programming languages: R, Mathematica, Python, etc.

This mind-maps shows the conversational agent components this grammar addresses:

MindMap

This org-mode file is used to track project's progress.


Installation

From Zef ecosystem:

zef install DSL::English::DataAcquisitionWorkflows

From GitHub:

zef install https://github.com/antononcube/Raku-DSL-Shared.git
zef install https://github.com/antononcube/Raku-DSL-Entity-English-Metadata.git
zef install https://github.com/antononcube/Raku-DSL-English-DataAcquisitionWorkflows.git

Examples

Here is an introspection query command interpretation to Wolfram Language (WL) code:

use DSL::English::DataAcquisitionWorkflows;
ToDataAcquisitionWorkflowCode('How many times I acquired anatomical structure data last year', "WL-Ecosystem")
# Length[dsDataAcquisitions[Select[AbsoluteTime[DateObject["2025-01-01"]] <= AbsoluteTime[#Timestamp] <= AbsoluteTime[DateObject["2025-12-31"]]&]]]

Recommend by profile:

ToDataAcquisitionWorkflowCode('recommend bike store and collection page data', "WL-Ecosystem")
# smrDataAcquisitions \[DoubleLongRightArrow]
# SMRMonRecommendByProfile[ {"ColumnHeading:BikeStore", "ColumnHeading:CollectionPage"}] \[DoubleLongRightArrow]
# SMRMonJoinAcross["Warning"->False] \[DoubleLongRightArrow]
# SMRMonTakeValue[]

TODO

Currently, the package does not parse/interpret the following commands, but its future versions would.

Here is an ingredients command interpretation to Wolfram Language (WL) code:

ToDataAcquisitionWorkflowCode('how many datasets contain both categorical and numerical columns', "WL-Ecosystem")

General recommendation request:

ToDataAcquisitionWorkflowCode(
    "what data can I get for time series investigations?;
     why did you recommend those",
    "WL-Ecosystem");

Recommendation request with subsequent filtering:

ToDataAcquisitionWorkflowCode(
    "I want to investigate data that cross references good purchases with customer demographics
     keep only datasets that can be transformed to star schema",
    "WL-Ecosystem");

Data quality verification specification:

ToDataAcquisitionWorkflowCode(
    "verify the quality of the database dbGJ99;
     what fraction of records have missing data;
     what are the distributions of the numerical columns",
    "WL-Ecosystem");

Here is a more complicated, statistics pipeline specification:

ToDataAcquisitionWorkflowCode(
    "how many people used customer service data last month;
     what is the breakdown of data sources over data types;
     where textual data is utilized the most;
     plot the results;", 
    "R-tidyverse")

Here is a recommendation specification (by collaborative filtering):

ToDataAcquisitionWorkflowCode(
    "what data people like me acquired last month;
     which of those I can use for classfier investigations;
     show me the data sizes and metadata;", 
    "WL-Ecosystem")

Implementation notes

The general structure of this package and its grammar (and sub-grammars) design is analogous the structure and grammars of the Raku package DSL::English::FoodPreparationWorkflows, [AAr3].

The original versions of the grammars were generated using Mathematica. See the notebook "Data-Acquisition-Workflows-grammar-generation.nb"


References

Repositories

[AAr1] Anton Antonov, DSL::Shared Raku package, (2020), GitHub/antononcube.

[AAr2] Anton Antonov, DSL::Entity::Metadata Raku package, (2021), GitHub/antononcube.

[AAr3] Anton Antonov, DSL::English::FoodPreparationWorkflows Raku package, (2021), GitHub/antononcube.

Videos

[AAv1] Anton Antonov, "Multi-language Data Acquisition Conversational Agent (extended version)", (2021), YouTube.com.