Rand Stats

Math::Fitting

zef:antononcube

Math::Fitting

Raku package for line, curve, and hyper-plane fitting over sets of points.


Installation

From Zef ecosystem:

zef install Math::Fitting

From GitHub:

zef install https://github.com/antononcube/Raku-Math-Fitting.git

Usage examples

Linear regression (1D regressor)

Here is data that is an array of pairs logarithms of GDP and electricity production of different countries:

my @data =
        (11.773906839650255e0, 11.253864992156679e0), (12.171286902172696e0, 12.009204578014298e0),
        (12.107713405070312e0, 11.437962903093876e0), (12.696829332236298e0, 12.00401094268482e0),
        (11.28266350477445e0, 11.499283586840601e0), (12.585056018658895e0, 11.775388329493465e0),
        (10.723949075934968e0, 10.54745313463026e0), (11.389785159542825e0, 10.928944887584644e0),
        (13.167988368224217e0, 12.856756553879103e0), (11.019424327650983e0, 10.472888033769635e0),
        (10.198154686798192e0, 9.331143140569578e0), (11.395702941127277e0, 10.772811339725912e0),
        (12.031878183906066e0, 11.516569745919236e0), (11.430963567030055e0, 10.855585778720567e0),
        (10.21302480711734e0, 10.138888016785716e0), (11.857393542118762e0, 11.520472966745002e0),
        (11.223299730942632e0, 10.933368427760566e0), (12.15978767072575e0, 11.78868016913858e0),
        (11.190367918582032e0, 10.557867961568022e0), (11.733248321280731e0, 11.226857570288722e0),
        (10.839510106614012e0, 10.677157910153495e0), (11.191959374358786e0, 11.17621400228234e0),
        (12.215911663039382e0, 11.808235068507617e0), (12.420008212627797e0, 11.723537761532056e0),
        (11.551553518127756e0, 10.519077512018512e0), (11.845171164859963e0, 11.528956530193586e0),
        (12.212327463463307e0, 11.782617762483701e0), (11.479897117669006e0, 11.355095745306354e0),
        (13.320906155798967e0, 12.6405552706939e0);
        
@data.elems;        
# 29

Remark: The data was taken from "Data::Geographics", [AAp1].

Here is a corresponding plot (using "Text::Plot", [AAp2]):

use Text::Plot;

text-list-plot(@data, title => 'lg(GDP) vs lg(Electricity Production)', x-label => 'GDP', y-label => 'EP');
# lg(GDP) vs lg(Electricity Production)            
# +--------+-------+-------+-------+--------+-------+-------++         
# +                                                          +  13.00  
# |                                                    * *   |         
# |                                                          |         
# +                                   *        *             +  12.00  
# |                                   **  *  *               |         
# |                     *  *     *  **                       |        E
# +                   *        **                            +  11.00 P
# |             *      * **                                  |         
# |           *    *  *     *                                |         
# +   *                                                      +  10.00  
# |                                                          |         
# |   *                                                      |         
# +                                                          +   9.00  
# +--------+-------+-------+-------+--------+-------+-------++         
#          10.50   11.00   11.50   12.00    12.50   13.00   13.50    
#                             GDP

Here is corresponding linear model fit, a functor (i.e. objects that does Callable):

use Math::Fitting;
my &f = linear-model-fit(@data);
# Math::Fitting::FittedModel(type => linear, data => (29, 2), response-index => 1, basis => 2)

Here are the best fit parameters (fit coefficients):

&f('BestFitParameters');
# [0.6719587872748085 0.9049080671822028]

Here we call the functor over a range of values:

(10..13)».&f
# (9.721039459096838 10.62594752627904 11.530855593461244 12.435763660643445)

Here are the corresponding residuals:

text-list-plot(&f('residuals'))
# +---+--------+--------+--------+---------+--------+--------+     
# |                                                          |     
# +          *                                               + 0.60
# |                                                          |     
# +    *                                    *          *     + 0.40
# +                 *          *           *                 + 0.20
# |              *               * * *        *     *        |     
# +                         *                         *      + 0.00
# |   *    *       *          *          *               *   |     
# +      *            *   *            *        *            +-0.20
# |            *                                             |     
# +                                                          +-0.40
# +                     *                         *          +-0.60
# |                                                          |     
# +---+--------+--------+--------+---------+--------+--------+     
#     0.00     5.00     10.00    15.00     20.00    25.00

Multidimensional regression (2D regressor)

Here is a matrix with 3D data:

use Data::Reshapers;

(1..2) X (1..3)
==> { .map({ [|$_, sin($_.sum)] }) }()
==> my @data3D;

to-pretty-table(@data3D);
# +---+---+---------------------+
# | 0 | 1 |          2          |
# +---+---+---------------------+
# | 1 | 1 |  0.9092974268256817 |
# | 1 | 2 |  0.1411200080598672 |
# | 1 | 3 | -0.7568024953079283 |
# | 2 | 1 |  0.1411200080598672 |
# | 2 | 2 | -0.7568024953079283 |
# | 2 | 3 | -0.9589242746631385 |
# +---+---+---------------------+

Here is a functor for the multidimensional fit:

my &mf = linear-model-fit(@data3D);
# Math::Fitting::FittedModel(type => linear, data => (6, 3), response-index => 2, basis => Whatever)

Here are the best parameters:

&mf('BestFitParameters');
# [2.1036843161171217 -0.62274056716294 -0.6915360512141538]

Here is a predicated value:

&mf(2.5, 2.5);
# -1.182007229825613

Here is plot of the residuals:

text-list-plot(&mf.fit-residuals);
# +---+---------+---------+----------+---------+---------+---+     
# +                                                          + 0.30
# |                                                      *   |     
# +                                                          + 0.20
# |                                                          |     
# +   *                                                      + 0.10
# |             *                                            |     
# +                                                          + 0.00
# |                                  *                       |     
# |                                                          |     
# +                                                          +-0.10
# |                       *                                  |     
# +                                            *             +-0.20
# |                                                          |     
# +---+---------+---------+----------+---------+---------+---+     
#     0.00      1.00      2.00       3.00      4.00      5.00

Using function basis

Data:

use Data::Generators;

my @data = (-1, -0.95 ... 1);
@data = @data.map({ [$_, sqrt(abs($_/2)) + sin($_*2) + random-real((-0.25, 0.25))] });

text-list-plot(@data);
# +---+------------+-----------+------------+------------+---+      
# +                                                          +  2.00
# |                                           *** **** **    |      
# +                                         *            *   +  1.50
# |                                                          |      
# +                                     * **                 +  1.00
# |                                   **                     |      
# |                                  *                       |      
# +                               **                         +  0.50
# |                       *  * * *                           |      
# +                 *   *     *                              +  0.00
# |   ***  *   *   * *   *  *                                |      
# +      *  **  **    *                                      + -0.50
# |                                                          |      
# +---+------------+-----------+------------+------------+---+      
#     -1.00        -0.50       0.00         0.50         1.00

Basis functions:

my @basis = {1}, {$_}, {-1 + 2 * $_ **2}, {-3 * $_ + 4 * $_ **3}, {1 - 8 * $_ ** 2 + 8 * $_ **4};
# [-> ;; $_? is raw = OUTER::<$_> { #`(Block|3831690270616) ... } -> ;; $_? is raw = OUTER::<$_> { #`(Block|3831690270688) ... } -> ;; $_? is raw = OUTER::<$_> { #`(Block|3831690270760) ... } -> ;; $_? is raw = OUTER::<$_> { #`(Block|3831690270832) ... } -> ;; $_? is raw = OUTER::<$_> { #`(Block|3831690270904) ... }]

Compute the fit and show the best fit parameters:

my &mf = linear-model-fit(@data, :@basis);

say 'BestFitParameters : ', &mf('BestFitParameters');
# BestFitParameters : [0.551799047700142 1.173011790424553 0.2158197987446535 -0.2733280230300862 -0.1178403498782696]

Plot the residuals:

text-list-plot(&mf('FitResiduals'));
# +---+------------+-----------+------------+------------+---+     
# |                                                          |     
# |                 *     *                                  |     
# +                     *                                    + 0.20
# |                                     *   * *              |     
# +   **             *       *                 *       *     + 0.10
# |            *   *                            *  *    *    |     
# +        *                                                 + 0.00
# |     *        *            **  *   **          *          |     
# |                      *           *    *          *       |     
# +          *  *           *                       *    *   +-0.10
# |                   *          *                           |     
# +      *  *                      *       *                 +-0.20
# |                                                          |     
# +---+------------+-----------+------------+------------+---+     
#     0.00         10.00       20.00        30.00        40.00

(Compare the plot values with the range of the added noise when @data is generated.)


TODO


References

[AAp1] Anton Antonov, Data::Geographics Raku package, (2024), GitHub/antononcube.

[AAp2] Anton Antonov, Text::Plot Raku package, (2022-2023), GitHub/antononcube.

[PSp1] Paweł Szulc, Statistics::LinearRegression Raku package, (2017), GitHub/hipek8.