Math::SparseMatrix

$Actions Status$ $Actions Status$ $Actions Status$

Raku package for sparse matrix algorithms:

Implements (some of) the algorithms described (and spelled-out in FORTRAN) in the book "Sparse Matrix Technology" by S. Pissanetzky, [SP1].
Provides convenient interface to accessing sparse matrix elements, rows, column, and sub-matrices.

Motivation

Sparse Matrix Algebra (SMA) is a "must have" for many computational workflows. Here is a (non-exhaustive) list given in the order of my preferences:

Recommendation Systems (RS)
- I make recommenders often during Exploratory Data Analysis (EDA).
- For me, RS are "first order regression."
- I also specialize in the making of RS.
- I implemented a Raku recommender without SMA, "ML::StreamsBlendingRecommender", [AAp1], but it is too slow for "serious" datasets.
  - Still useful; see [AAv1].
Latent Semantic Analysis (LSA)
- LSA is one my favorite Unsupervised Machine Learning (ML) workflows.
- That means that this SMA package should have algorithms facilitating the programming of:
  - Singular Value Decomposition (SVD)
  - Non-Negative Matrix Factorization (NNMF)
Graphs
- There is a natural (representation) connection between sparse matrices and graphs.
- Many graph algorithms can leverage (fast) SMA.
- So far (2024-09-25) the algorithms in "Graph", [AAp2], do not use SMA and that is feature- and speed limiting.
Optimization
- For large scale optimization problems using SMA is a must.
  - Since their constraints are given with sparse matrices.
Partial Differential Equations (PDE) solving

Usage examples

Here is a simple sparse matrix in Compressed Sparse Row (CSR) format:

use Math::SparseMatrix;
use Math::SparseMatrix::Utilities;

my $nrow = 5;
my $ncol = 8;
my $density = 0.2;
my $tol = 0.01;
my $type = 'CSR';

my $matrix1 = generate-random-sparse-matrix($nrow, $ncol, :$density, :$tol, :$type):!decorated;
say $matrix1;

# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))

Here it is "pretty printed":

$matrix1.print;

# .    .    .    .    .    .    .    .   
# .    .    .    .    .    .    .    .   
# .    .    0.35 0.85 .    .    0.52 .   
# .    .    .    .    .    0.92 .    .   
# .    .    0.43 .    .    0.07 .    .

Here 10 is multiplied with all elements:

my $matrix2 = $matrix1.multiply(10); 

$matrix2.print;

# .   .   .   .   .   .   .   .  
# .   .   .   .   .   .   .   .  
# .   .   3.5 8.5 .   .   5.2 .  
# .   .   .   .   .   9.2 .   .  
# .   .   4.3 .   .   0.7 .   .

Here is the dot-product of the original matrix with its transpose:

my $matrix3 = $matrix1.dot($matrix1.transpose); 

$matrix3.print;

# .                   .                   .                   .                   .                  
# .                   .                   .                   .                   .                  
# .                   .                   1.1154              .                   0.1505             
# .                   .                   .                   0.8464              0.06440000000000001
# .                   .                   0.1505              0.06440000000000001 0.18979999999999997

Special features

Here are few features that other SMA packages typically do not provide.

Named rows and columns

It is very convenient to have named rows and columns that are respected (or preserved) in the typical SMA operations.

Here is an example:

my $smat = Math::SparseMatrix.new($matrix1, row-names => 'a' .. 'e', column-names => 'A' .. 'H'); 

$smat.print;

# –––––––––––––––––––––––––––––––––––––––––––
#     A    B    C    D    E    F    G    H   
# ––┼––––––––––––––––––––––––––––––––––––––––
# a │ .    .    .    .    .    .    .    .   
# b │ .    .    .    .    .    .    .    .   
# c │ .    .    0.35 0.85 .    .    0.52 .   
# d │ .    .    .    .    .    0.92 .    .   
# e │ .    .    0.43 .    .    0.07 .    .

Here is the dot-product of that matrix with its transpose:

my $smat2 = $smat.dot($smat.transpose); 

$smat2.round(0.02).print;

# ––––––––––––––––––––––––––––
#     a    b    c    d    e   
# ––┼–––––––––––––––––––––––––
# a │ .    .    .    .    .   
# b │ .    .    .    .    .   
# c │ .    .    1.12 .    0.16
# d │ .    .    .    0.84 0.06
# e │ .    .    0.16 0.06 0.18

Implicit value

The sparse matrices can have an implicit value that is different from 0. For example, adding a number to a sparse matrix produces another sparse matrix with different implicit value:

my $matrix3 = $matrix1.add(10);

# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))

$matrix3.implicit-value

# 10

Here is the pretty print:

$matrix3.print(:iv)

# 10    10    10    10    10    10    10    10   
# 10    10    10    10    10    10    10    10   
# 10    10    10.35 10.85 10    10    10.52 10   
# 10    10    10    10    10    10.92 10    10   
# 10    10    10.43 10    10    10.07 10    10

Remark: Currently, the implicit values are ignored in dot.

Design

General

There should be a "main" class, Math::SpareMatrix that:
- Provides the SMA functionalities
- Delegates to concrete sparse matrix classes that are based on different representation formats
- Can have named rows, columns, and dimensions
- Gives access to sparse matrix elements, rows, columns, and sub-matrices
The default or "main" core sparse matrix class should use Compressed Sparse Row (CSR) format.
Also, a class using Dictionary Of Keys (DOK) format should be provided.
The core sparse matrix classes do not have named rows, columns, and dimensions.
Ideally, a class using NativeCall should be implemented at some point.
- It looks like this is "a must", since the CSR and DOK classes are fairly slow.
- Both "plain C" and macOS Accelerate implementations should be made.
The most important operation is Matrix-Vector Dot Product.
- The current design is to use one-row or one-column matrices for the vectors.
- Dense vectors are (of course) also supported

Object-Oriented Programming (OOP) architecture

The OOP Decorator Design Pattern is used to organize the SMA functionalities.
In that pattern:
- The Component is played by the class Math::SparseMatrix::Abstract.
- The ConcreteComponent is played by the classes:
  - Math::SparseMatrix::CSR
  - Math::SparseMatrix::DOK
- The concrete component classes provide the core SMA operations.
- The Decorator is played by Math::SparseMatrix.
  - That is a "top level", interface class.
  - Allows access using named rows and columns.
  - "Hides" the actual component class used.

Here is a corresponding diagram:

classDiagram
    class Abstract["Math::SparseMatrix::Abstract"] {
        <<abstract>>
        +value-at()
        +row-at()
        +column-at()
        +row-slice()
        +AT-POS()
        +print()
        +transpose()
        +add()
        +multiply()
        +dot()
    }

    class CSRStruct {
        <<C struct>>
    }
    
    class NativeCSR["Math::SparseMatrix::CSR::Native"] {
        $row_ptr
        $col_index
        @values
        nrow
        ncol
        implicit_value
    }

    class NativeAdapater["Math::SparseMatrix::NativeAdapter"] {
        +row-ptr
        +col-index
        +values
        +nrow
        +ncol
        +implicit-value
    }    

    class CSR["Math::SparseMatrix::CSR"] {
        @row-ptr
        @col-index
        @values
        nrow
        ncol
        implicit-value
    }
    
    class DOK["Math::SparseMatrix::DOK"] {
        %adjacency-map
        nrow
        ncol
        implicit-value
    }
    
    class SparseMatrix["Math::SparseMatrix"] {
        Abstract core-matrix
        +AT-POS()
        +print()
        +transpose()
        +add()
        +multiply()
        +dot()
    }
    
    CSR --> Abstract : implements
    DOK --> Abstract : implements
    NativeAdapater --> Abstract : implements
    SparseMatrix --> Abstract : Hides actual component class
    SparseMatrix *--> Abstract
    NativeAdapater *--> NativeCSR
    NativeCSR -- CSRStruct : reprents

Implementation details

Again, the most important operation is Matrix-Vector Dot Product.
- It has to be as fast as possible.
- There are two Dot Product implementations for CSR:
  - Direct
  - Symbolic-&-numeric
- (Currently) the direct one is 20-50% faster.
It seems it is a good idea to provide for some operations symbolic (or sparse matrix elements pattern) methods.
- For example:
  - add-pattern / add
  - dot-pattern / dot-numeric
It is important to have access methods / operators.
- All three are used in the accessor implementation: AT-POS, postcircumfix:<[ ]>, postcircumfix:<[; ]> .

Performance

Performance of CSR and DOK sparse matrices is not good: between 40 to 150 times slower than Wolfram Language.
- (Using the same matrices, of course.)
It is somewhat surprising that DOK is faster than CSR.
- (Using pure-Raku.)
NativeCall based implementations are ≈ 100 times faster.
- See "Math::SparseMatrix::Native", [AAp3].

Acknowledgements

Thanks to @lizmat and @tony-o for helping figuring out the proper use of postcircumfix:<[]> and postcircumfix:<[; ]> in order to have the named rows and columns functionalities.