Rand Stats

DSL::Entity::MachineLearning

zef:antononcube

DSL::Entity::MachineLearning (Raku package)

Raku grammar classes for Machine Learning (ML) entities (names.)

The package does entity name recognition using regexes over a set of hashes (dictionaries) that map entity name phrases into entity Identifiers (IDs). The hashes are obtained from the resource files.

In short, we say that the package has grammar-resource-based architecture. The same architecture is used in the Domain Specific Language (DSL) entity packages [AAr3-AAr6].

Remark: It is assumed that the associations between entity name phrases and entity IDs placed in the resource files are going to be changed in the future, because of classifier systems updates, or usage feedback. This is one of the main reasons to use grammar-resource-based architecture: subsequent package versions would have better, fuller associations.


Installation

zef install https://github.com/antononcube/Raku-DSL-Entity-MachineLearning.git

Usage examples

use DSL::Entity::MachineLearning;
use DSL::Entity::MachineLearning::ResourceAccess;

my $pCOMMAND = DSL::Entity::MachineLearning::Grammar;
$pCOMMAND.set-resources(DSL::Entity::MachineLearning::resource-access-object());

say $pCOMMAND.parse('DecisionTree', rule => 'machine-learning-entity-command');
say $pCOMMAND.parse('gradient boosted trees', rule => 'machine-learning-entity-command');
say $pCOMMAND.parse('roc curve', rule => 'machine-learning-entity-command');
# 「DecisionTree」
#  classifier-entity-command => 「DecisionTree」
#   entity-classifier-name => 「DecisionTree」
#    0 => 「DecisionTree」
#     word-value => 「DecisionTree」
# 「gradient boosted trees」
#  classifier-entity-command => 「gradient boosted trees」
#   entity-classifier-name => 「gradient boosted trees」
#    0 => 「gradient boosted trees」
#     word-value => 「gradient」
#     word-value => 「boosted」
#     word-value => 「trees」
# 「roc curve」
#  classifier-measurement-entity-command => 「roc curve」
#   entity-classifier-measurement-name => 「roc curve」
#    0 => 「roc curve」
#     word-value => 「roc」
#     word-value => 「curve」

Command line interface

The package provide as Command Line Interface (CLI) to its functionalities:

> ToMachineLearningEntityCode --help 
# Usage:
#   ToMachineLearningEntityCode <command> [--target=<Str>] [--user=<Str>] -- Conversion of (natural) DSL machine learning entity name into code.
#   ToMachineLearningEntityCode <target> <command> [--user=<Str>] -- Both target and command as arguments.
#   
#     <command>         natural language command (DSL commands)
#     --target=<Str>    target language/system/package (defaults to 'WL-System') [default: 'WL-System']
#     --user=<Str>      user identifier (defaults to '') [default: '']
#     <target>          Programming language.

Remark: (Currently) the CLI script always returns results in JSON format.


Resource files

The resource file:

  1. "ClassifierNameToEntityID_EN.csv", was derived from the Mathematica function page for Classify, [WRI1].

  2. "ClassifierMeasurementNameToEntityID_EN.csv" was derived using Mathematica's built-in function ClassifierMeasurements, [WRI3]. Some additional associations were put in following [WK1].

  3. "ClassifierPropertyNameToEntityID_EN.csv", was derived using Mathematica's built-in function Information, [WRI4].

  4. "ROCFunctionNameToEntityID_EN.csv" uses the names and mappings in [WK1]. (See also the related package [AAr7].)

The initial versions Bulgarian versions of the resource files with name suffix "_BG.csv" were derived by automatic translations of the corresponding English content. Afterwards the Bulgarian mappings were reviewed and manually modified.


References

Articles

[WK1] Wikipedia entry, "Receiver operating characteristic".

Wolfram Language (WL) articles and functions

[WRI1] Wolfram Research (2014), Classify, Wolfram Language function, https://reference.wolfram.com/language/ref/Classify.html (updated 2021).

[WRI2] Wolfram Research, Inc., Machine Learning Methods.

[WRI3] Wolfram Research (2014), ClassifierMeasurements, Wolfram Language function, https://reference.wolfram.com/language/ref/ClassifierMeasurements.html (updated 2021).

[WRI4] Wolfram Research (1988), Information, Wolfram Language function, https://reference.wolfram.com/language/ref/Information.html (updated 2021).

Repositories

[AAr1] Anton Antonov, DSL::English::ClassificationWorkflows Raku package, (2020-2022), GitHub/antononcube.

[AAr2] Anton Antonov, DSL::Shared Raku package, (2020), GitHub/antononcube.

[AAr3] Anton Antonov, DSL::Entity::Geographics Raku package, (2021), GitHub/antononcube.

[AAr4] Anton Antonov, DSL::Entity::Jobs Raku package, (2021), GitHub/antononcube.

[AAr5] Anton Antonov, DSL::Entity::Foods Raku package, (2021), GitHub/antononcube.

[AAr6] Anton Antonov, DSL::Entity::Metadata Raku package, (2021), GitHub/antononcube.

[AAr7] Anton Antonov, ML::ROCFunctions Raku package, (2022), GitHub/antononcube.