Math::SparseMatrix
Raku package for sparse matrix algorithms:
Implements (some of) the algorithms described (and spelled-out in FORTRAN) in the book
"Sparse Matrix Technology" by S. Pissanetzky, [SP1].
Provides convenient interface to accessing sparse matrix elements, rows, column, and sub-matrices.
Motivation
Sparse Matrix Algebra (SMA) is a "must have" for many computational workflows.
Here is a (non-exhaustive) list given in the order of my preferences:
- Recommendation Systems (RS)
- I make recommenders often during Exploratory Data Analysis (EDA).
- For me, RS are "first order regression."
- I also specialize in the making of RS.
- I implemented a Raku recommender without SMA,
"ML::StreamsBlendingRecommender", [AAp1],
but it is too slow for "serious" datasets.
- Still useful; see [AAv1].
- Latent Semantic Analysis (LSA)
- LSA is one my favorite Unsupervised Machine Learning (ML) workflows.
- That means that this SMA package should have algorithms facilitating the programming of:
- Singular Value Decomposition (SVD)
- Non-Negative Matrix Factorization (NNMF)
- Graphs
- There is a natural (representation) connection between sparse matrices and graphs.
- Many graph algorithms can leverage (fast) SMA.
- So far (2024-09-25) the algorithms in "Graph", [AAp2], do not use SMA and that is feature- and speed limiting.
- Optimization
- For large scale optimization problems using SMA is a must.
- Since their constraints are given with sparse matrices.
- Partial Differential Equations (PDE) solving
Usage examples
Here is a simple sparse matrix in Compressed Sparse Row (CSR) format:
use Math::SparseMatrix;
use Math::SparseMatrix::Utilities;
my $nrow = 5;
my $ncol = 8;
my $density = 0.2;
my $tol = 0.01;
my $type = 'CSR';
my $matrix1 = generate-random-sparse-matrix($nrow, $ncol, :$density, :$tol, :$type):!decorated;
say $matrix1;
# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))
Here it is "pretty printed":
$matrix1.print;
# . . . . . . . .
# . . . . . . . .
# . . 0.35 0.85 . . 0.52 .
# . . . . . 0.92 . .
# . . 0.43 . . 0.07 . .
Here 10
is multiplied with all elements:
my $matrix2 = $matrix1.multiply(10);
$matrix2.print;
# . . . . . . . .
# . . . . . . . .
# . . 3.5 8.5 . . 5.2 .
# . . . . . 9.2 . .
# . . 4.3 . . 0.7 . .
Here is the dot-product of the original matrix with its transpose:
my $matrix3 = $matrix1.dot($matrix1.transpose);
$matrix3.print;
# . . . . .
# . . . . .
# . . 1.1154 . 0.1505
# . . . 0.8464 0.06440000000000001
# . . 0.1505 0.06440000000000001 0.18979999999999997
Special features
Here are few features that other SMA packages typically do not provide.
Named rows and columns
It is very convenient to have named rows and columns that are respected (or preserved)
in the typical SMA operations.
Here is an example:
my $smat = Math::SparseMatrix.new($matrix1, row-names => 'a' .. 'e', column-names => 'A' .. 'H');
$smat.print;
# –––––––––––––––––––––––––––––––––––––––––––
# A B C D E F G H
# ––┼––––––––––––––––––––––––––––––––––––––––
# a │ . . . . . . . .
# b │ . . . . . . . .
# c │ . . 0.35 0.85 . . 0.52 .
# d │ . . . . . 0.92 . .
# e │ . . 0.43 . . 0.07 . .
Here is the dot-product of that matrix with its transpose:
my $smat2 = $smat.dot($smat.transpose);
$smat2.round(0.02).print;
# ––––––––––––––––––––––––––––
# a b c d e
# ––┼–––––––––––––––––––––––––
# a │ . . . . .
# b │ . . . . .
# c │ . . 1.12 . 0.16
# d │ . . . 0.84 0.06
# e │ . . 0.16 0.06 0.18
Implicit value
The sparse matrices can have an implicit value that is different from 0.
For example, adding a number to a sparse matrix produces another sparse matrix
with different implicit value:
my $matrix3 = $matrix1.add(10);
# Math::SparseMatrix::CSR(:specified-elements(6), :dimensions((5, 8)), :density(0.15))
$matrix3.implicit-value
# 10
Here is the pretty print:
$matrix3.print(:iv)
# 10 10 10 10 10 10 10 10
# 10 10 10 10 10 10 10 10
# 10 10 10.35 10.85 10 10 10.52 10
# 10 10 10 10 10 10.92 10 10
# 10 10 10.43 10 10 10.07 10 10
Remark: Currently, the implicit values are ignored in dot
.
Design
General
- There should be a "main" class,
Math::SpareMatrix
that:- Provides the SMA functionalities
- Delegates to concrete sparse matrix classes that are based on different representation formats
- Can have named rows, columns, and dimensions
- Gives access to sparse matrix elements, rows, columns, and sub-matrices
- The default or "main" core sparse matrix class should use Compressed Sparse Row (CSR) format.
- Also, a class using Dictionary Of Keys (DOK) format should be provided.
- The core sparse matrix classes do not have named rows, columns, and dimensions.
- Ideally, a class using
NativeCall
should be implemented at some point.- It looks like this is "a must", since the CSR and DOK classes are fairly slow.
- Both "plain C" and macOS Accelerate implementations should be made.
- The most important operation is Matrix-Vector Dot Product.
- The current design is to use one-row or one-column matrices for the vectors.
- Dense vectors are (of course) also supported
Object-Oriented Programming (OOP) architecture
- The OOP Decorator Design Pattern is used to organize the SMA functionalities.
- In that pattern:
- The Component is played by the class
Math::SparseMatrix::Abstract
. - The ConcreteComponent is played by the classes:
- The concrete component classes provide the core SMA operations.
- The Decorator is played by
Math::SparseMatrix
.- That is a "top level", interface class.
- Allows access using named rows and columns.
- "Hides" the actual component class used.
Here is a corresponding diagram:
classDiagram
class Abstract["Math::SparseMatrix::Abstract"] {
<<abstract>>
+value-at()
+row-at()
+column-at()
+row-slice()
+AT-POS()
+print()
+transpose()
+add()
+multiply()
+dot()
}
class CSRStruct {
<<C struct>>
}
class NativeCSR["Math::SparseMatrix::CSR::Native"] {
$row_ptr
$col_index
@values
nrow
ncol
implicit_value
}
class NativeAdapater["Math::SparseMatrix::NativeAdapter"] {
+row-ptr
+col-index
+values
+nrow
+ncol
+implicit-value
}
class CSR["Math::SparseMatrix::CSR"] {
@row-ptr
@col-index
@values
nrow
ncol
implicit-value
}
class DOK["Math::SparseMatrix::DOK"] {
%adjacency-map
nrow
ncol
implicit-value
}
class SparseMatrix["Math::SparseMatrix"] {
Abstract core-matrix
+AT-POS()
+print()
+transpose()
+add()
+multiply()
+dot()
}
CSR --> Abstract : implements
DOK --> Abstract : implements
NativeAdapater --> Abstract : implements
SparseMatrix --> Abstract : Hides actual component class
SparseMatrix *--> Abstract
NativeAdapater *--> NativeCSR
NativeCSR -- CSRStruct : reprents
Implementation details
- Again, the most important operation is Matrix-Vector Dot Product.
- It has to be as fast as possible.
- There are two Dot Product implementations for CSR:
- (Currently) the direct one is 20-50% faster.
- It seems it is a good idea to provide for some operations symbolic (or sparse matrix elements pattern) methods.
- For example:
add-pattern
/ add
dot-pattern
/ dot-numeric
- It is important to have access methods / operators.
- All three are used in the accessor implementation:
AT-POS
, postcircumfix:<[ ]>
, postcircumfix:<[; ]>
.
- Performance of CSR and DOK sparse matrices is not good: between 40 to 150 times slower than Wolfram Language.
- (Using the same matrices, of course.)
- It is somewhat surprising that DOK is faster than CSR.
NativeCall
based implementations are ≈ 100 times faster.
Acknowledgements
Thanks to @lizmat and @tony-o for helping figuring out the proper use of postcircumfix:<[]>
and postcircumfix:<[; ]>
in order to have the named rows and columns functionalities.
References
Books
[SP1] Sergio Pissanetzky, Sparse Matrix Technology, Academic Pr (January 1, 1984), ISBN-10: 0125575807, ISBN-13: 978-0125575805.
Packages
[AAp1] Anton Antonov,
ML::StreamsBlendingRecommender Raku package,
(2021-2024),
GitHub/antononcube.
[AAp2] Anton Antonov,
Graph Raku package,
(2024),
GitHub/antononcube.
[AAp3] Anton Antonov,
Math::SparseMatrix::Native Raku package,
(2024),
GitHub/antononcube.
Videos
[AAv1] Anton Antonov,
"TRC 2022 Implementation of ML algorithms in Raku",
(2022),
YouTube/antononcube.