Sparse Matrix Recommender (SMR) Raku package

Introduction

This Raku package, "ML::SparseMatrixRecommender", has different functions for computations of recommendations based on (user) profile or history using Sparse Linear Algebra (SLA). The package mirrors the Wolfram Language (WL) implementation, [AAp1]. There are also corresponding implementations in Python and R; see [AAp6, AAp2].

The package is based on a certain "standard" Information retrieval paradigm -- it utilizes Latent Semantic Indexing (LSI) functions like IDF, TF-IDF, etc. Hence, the package also has document-term matrix creation functions and LSI application functions. I included them in the package since I wanted to minimize the external package dependencies.

The package includes the data-set dfTitanic in order to make easier the writing of introductory examples and unit tests.

For more theoretical description see the article "Mapping Sparse Matrix Recommender to Streams Blending Recommender" , [AA1].

For detailed examples see the files "Basic-usage.raku" and "Classification.raku", and the Jupyter notebooks in the GitHub repository "./docs" folder.

Remark: "SMR" stands for "Sparse Matrix Recommender". Most of the operations of this Raku package mirror the operations of the software monads "MonadicSparseMatrixRecommender", "SMRMon-R", [AAp1, AAp2] and the attributes and methods of the Python package [AAp7].

Workflows

Here is a diagram that encompasses the workflows this package supports (or will support):

Here is a narration of a certain workflow scenario:

Get a dataset.
Create contingency matrices for a given identifier column and a set of "tag type" columns.
Examine recommender matrix statistics.
If the assumptions about the data hold apply LSI functions.
- For example, the "usual trio" IDF, Frequency, Cosine.
Do (verify) example profile recommendations.
If satisfactory results are obtained use the recommender as a nearest neighbors classifier.

Monadic design

Here is a diagram of typical pipeline building using a ML::SparseMatrixRecommender object:

flowchart TD
%% --- Top / Legend ---
%% SMR = Sparse Matrix Recommender

%% --- Inputs & Constructors ---
    %%subgraph IO["Input/Output"]
        IN1[/"data frame<br>or<br>a hashmap of<br>Math::SparseMatrix objects"/]
        WIDE[("Data<br>(wide form)")]
        ECHO[/Echo<br>output/]
        OUT[/dataset<br>or<br>hashmap/]
    %%end
    IN1 -.- WIDE

    IN1 --> cfwf
    WIDE --> join

%% --- SMR object & pipeline value (context/state) ---
    subgraph MON[" "]
        SMR{{<br>SMR<br>object<br>}}
        VAL([SMR<br>pipeline value])
    end
    
%% --- Pipeline container ---
    subgraph PIPE[SMR monad pipeline]
        direction LR
        unit["ML::SparseMatrixRecommender.new"]
        cfwf[create-from-wide-form]
        echo[echo-data-sumary]
        twf[apply-term-weight-functions]
        rec[recommend]
        join[join-across]
        prove[prove-by-metadata]

        unit ==> cfwf ==> echo ==> twf ==> rec ==> join ==> prove
    end

    cfwf -.- |data<br>matrices<br>M|SMR
    echo -.- |data|SMR
    twf  -.- |M|SMR
    echo -- echo-value --> ECHO
    join -- take-value --> OUT
    prove -- take-value --> OUT

    VAL === PIPE
    VAL -.- SMR
    SMR === PIPE

Remark: The monadic design allows "pipelining" of the SMR operations -- see the usage example section.

Installation

To install from GitHub use the shell command:

zef install https://github.com/antononcube/Raku-ML-SparseMatrixRecommender

To install from Zef ecosystem:

zef install ML::SparseMatrixRecommender

Usage example

Here is an example of an SMR pipeline for creation of a recommender over Titanic data and recommendations for the profile "passengerSex:male" and "passengerClass:1st":

use ML::SparseMatrixRecommender;
use ML::SparseMatrixRecommender::Utilities;

my @dsTitanic = ML::SparseMatrixRecommender::Utilities::get-titanic-dataset();

my $smrObj = 
        ML::SparseMatrixRecommender
        .new
        .create-from-wide-form(
                @dsTitanic,
                tag-types => Whatever,
                item-column-came => <id>)
        .apply-term-weight-functions('IDF', 'None', 'Cosine')
        .recommend-by-profile(["passengerSex:male", "passengerClass:1st"], 10, :!normalize)
        .echo-value('recommendation by profile: ');

# recommendation by profile: [10 => 2 101 => 2 102 => 2 107 => 2 11 => 2 110 => 2 111 => 2 115 => 2 116 => 2 119 => 2]

Remark: More examples can be found in the directory "./docs".

The Python package "SparseMatrixRecommender", [AAp6], implements a software monad for SMR workflows.

The Python package "LatentSemanticAnalyzer", [AAp7], can be used to make matrices for "SparseMatrixRecommender".

The Python package "SSparseMatrix", [AAp6], is fundamental in both "SparseMatrixRecommender" and "LatentSemanticAnalyzer". "SSparseMatrix" corresponds to the Raku package "Math::SparseMatrix", [AAp9], which is fundamental for this package.

Here is the Python "SparseMatrixRecommender" pipeline that corresponds to the Raku pipeline above:

from SparseMatrixRecommender.SparseMatrixRecommender import *
from SparseMatrixRecommender.DataLoaders import *

dfTitanic = load_titanic_data_frame()

smrObj = (SparseMatrixRecommender()
          .create_from_wide_form(data = dfTitanic, 
                                 item_column_name="id", 
                                 columns=None, 
                                 add_tag_types_to_column_names=True, 
                                 tag_value_separator=":")
          .apply_term_weight_functions(global_weight_func = "IDF", 
                                       local_weight_func = "None", 
                                       normalizer_func = "Cosine")
          .recommend_by_profile(profile=["passengerSex:male", "passengerClass:1st"], 
                                nrecs=12)
          .join_across(data=dfTitanic, on="id")
          .echo_value())

The package "SMRMon-R", [AAp2], implements a software monad for SMR workflows. Most of "SMRMon-R" functions delegate to `SparseMatrixRecommender", [AAp3].

The package "SparseMatrixRecommenderInterfaces", [AAp4], provides functions for interactive Shiny interfaces for the recommenders made with "SparseMatrixRecommender" and/or "SMRMon-R".

The package "LSAMon-R", [AAp5], can be used to make matrices for "SparseMatrixRecommender" and/or "SMRMon-R".

Here is the "SMRMon-R" pipeline that corresponds to the Raku pipeline above:

smrObj <-
  SMRMonCreate( data = dfTitanic, 
                itemColumnName = "id", 
                addTagTypesToColumnNamesQ = TRUE, 
                sep = ":") %>%
  SMRMonApplyTermWeightFunctions(globalWeightFunction = "IDF", 
                                 localWeightFunction = "None", 
                                 normalizerFunction = "Cosine") %>%
  SMRMonRecommendByProfile( profile = c("passengerSex:male", "passengerClass:1st"), 
                            nrecs = 12) %>%
  SMRMonJoinAcross( data = dfTitanic, by = "id") %>%
  SMRMonEchoValue

The Wolfram Language (WL) software monad "MonadicSparseMatrixRecommender", [AAp1], provides recommendation pipelines similar to the pipelines created with this package.

Here is a WL monadic pipeline that corresponds to the Raku pipeline above:

smrObj =
  SMRMonUnit[]⟹
   SMRMonCreate[dfTitanic, "id", 
                "AddTagTypesToColumnNames" -> True, 
                "TagValueSeparator" -> ":"]⟹
   SMRMonApplyTermWeightFunctions["IDF", "None", "Cosine"]⟹
   SMRMonRecommendByProfile[{"passengerSex:male", "passengerClass:1st"}, 12]⟹
   SMRMonJoinAcross[dfTitanic, "id"]⟹
   SMRMonEchoValue[];

(Compare the pipeline diagram above with the corresponding diagram using Mathematica notation .)

Recommender comparison project

The project repository "Scalable Recommender Framework", [AAr1], has documents, diagrams, tests, and benchmarks of a recommender system implemented in multiple programming languages.

This Python recommender package is a decisive winner in the comparison -- see the first 10 min of the video recording [AAv1] or the benchmarks at [AAr1].

Code generation with natural language commands

Using grammar-based interpreters

The project "Raku for Prediction", [AAr2, AAv2, AAp7], has a Domain Specific Language (DSL) grammar and interpreters that generate SMR code for the corresponding Mathematica, Python, R, and Raku packages, [AAp11].

Here is Command Line Interface (CLI) invocation example that generate code for this package:

ToRecommenderWorkflowCode Raku 'create with dfTitanic; apply the LSI functions IDF, None, Cosine;recommend by profile 1st and male'

# my $obj = ML::SparseMatrixRecommender.new.create-from-wide-form(dfTitanic).apply-term-weight-functions(global-weight-func => "IDF", local-weight-func => "None", normalizer-func => "Cosine").recommend-by-profile(["1st", "male"])

NLP Template Engine

Here is an example using the NLP Template Engine, [AAp12, AAr2, AAv3], (which uses LLMs to fill in static templates):

use ML::NLPTemplateEngine;
'create recommender with dfTitanic; apply the LSI functions IDF, None, Cosine;recommend by profile 1st and male' 
==> concretize(lang => "Raku")

# my $smrObj = ML::SparseMatrixRecommender.new
# .create-from-wide-form(dfTitanic, item-column-name='id', :add-tag-types-to-column-names, tag-value-separator=':')
# .apply-term-weight-functions('IDF', 'None', 'Cosine')
# .recommend-by-profile(["male"], 12, :!normalize)
# .join-across(dfTitanic)
# .echo-value();

By DSL examples

Instead of using grammars the individual commands translation can be done using LLMs and few-shot training examples, see "DSL::Examples", [AAp13]. Here is an example:

use DSL::Examples;
use LLM::Functions;
my &llm-pipeline-segment = llm-example-function(dsl-examples()<Raku><SMRMon>);

my $spec = q:to/END/;
new recommender;
create from @dsData; 
apply LSI functions IDF, None, Cosine; 
recommend by profile for passengerSex:male, and passengerClass:1st;
join across with @dsData on "id";
echo the pipeline value;
classify by profile passengerSex:female, and passengerClass:1st on the tag passengerSurvival;
echo value
END

my @commands = $spec.lines;

@commands
        .map({ .&llm-pipeline-segment })
        .map({ .subst(/:i Output \h* ':'?/, :g).trim })
        .join("\n.")

# ML::SparseMatrixRecommender.new
# .create(@dsData)
# .apply-term-weight-functions('IDF', 'None', 'Cosine')
# .recommend-by-profile(['male', '1st'])
# .join-across(@dsData, on => 'id')
# .echo-value()
# .classify-by-profile('passengerSurvival', ['passengerSex.female', 'passengerClass.1st'])
# .echo-value()

Performance

Two performance topics are more important than rest:

Recommender object creation
Recommendations computations

See the dedicated document "Performance.md" for a detailed discussion.

References

Sparse Matrix Recommender (SMR) Raku package

Introduction

Workflows

Monadic design

Installation

Usage example

Related Python packages

Related R packages

Related Wolfram Language packages

Recommender comparison project

Code generation with natural language commands

Using grammar-based interpreters

NLP Template Engine

By DSL examples

Performance

References

Articles

Mathematica / Wolfram Language (WL)

R packages

Python packages

Raku packages

Repositories

Videos