Raku Data::Reshapers

This Raku package has data reshaping functions for different data structures that are coercible to full arrays.

The supported data structures are:

Positional-of-hashes
Positional-of-arrays

The five data reshaping provided by the package over those data structures are:

Cross tabulation, cross-tabulate
Long format conversion, to-long-format
Wide format conversion, to-wide-format
Join across (aka SQL JOIN), join-across
Transpose, transpose

The first four operations are fundamental in data wrangling and data analysis; see [AA1, Wk1, Wk2, AAv1-AAv2].

(Transposing of tabular data is, of course, also fundamental, but it also can be seen as a basic functional programming operation.)

Usage examples

Cross tabulation

Making contingency tables -- or cross tabulation -- is a fundamental statistics and data analysis operation, [Wk1, AA1].

Here is an example using the Titanic dataset (that is provided by this package through the function get-titanic-dataset):

use Data::Reshapers;

my @tbl = get-titanic-dataset();
my $res = cross-tabulate( @tbl, 'passengerSex', 'passengerClass');
say $res;

# {female => {1st => 144, 2nd => 106, 3rd => 216}, male => {1st => 179, 2nd => 171, 3rd => 493}}

say to-pretty-table($res);
# +--------+-----+-----+-----+
# |        | 1st | 2nd | 3rd |
# +--------+-----+-----+-----+
# | female | 144 | 106 | 216 |
# | male   | 179 | 171 | 493 |
# +--------+-----+-----+-----+

Long format

Conversion to long format allows column names to be treated as data.

(More precisely, when converting to long format specified column names of a tabular dataset become values in a dedicated column, e.g. "Variable" in the long format.)

my @tbl1 = @tbl.roll(3);
.say for @tbl1;

.say for to-long-format( @tbl1 );

my @lfRes1 = to-long-format( @tbl1, 'id', [], variablesTo => "VAR", valuesTo => "VAL2" );
.say for @lfRes1;

Wide format

Here we transform the long format result @lfRes1 above into wide format -- the result has the same records as the @tbl1:

‌‌say to-pretty-table( to-wide-format( @lfRes1, 'id', 'VAR', 'VAL2' ) );

# +-------------------+----------------+--------------+--------------+-----+
# | passengerSurvival | passengerClass | passengerAge | passengerSex |  id |
# +-------------------+----------------+--------------+--------------+-----+
# |        died       |      1st       |      20      |     male     | 308 |
# |        died       |      2nd       |      40      |    female    | 412 |
# |      survived     |      2nd       |      50      |    female    | 441 |
# |        died       |      3rd       |      20      |     male     | 741 |
# |        died       |      3rd       |      -1      |     male     | 932 |
# +-------------------+----------------+--------------+--------------+-----+

Transpose

Using cross tabulation result above:

my $tres = transpose( $res );

say to-pretty-table($res, title => "Original");
# +--------------------------+
# |         Original         |
# +--------+------+----------+
# |        | died | survived |
# +--------+------+----------+
# | female | 127  |   339    |
# | male   | 682  |   161    |
# +--------+------+----------+

say to-pretty-table($tres, title => "Transposed");
# +--------------------------+
# |        Transposed        |
# +----------+--------+------+
# |          | female | male |
# +----------+--------+------+
# | died     |  127   | 682  |
# | survived |  339   | 161  |
# +----------+--------+------+

TODO

Simpler more convenient interface.
- ~~Currently, a user have to specify four different namespaces in order to be able to use all package functions.~~
More extensive long format tests.
More extensive wide format tests.
Implement verifications for
- Positional-of-hashes
- Positional-of-arrays
- Positional-of-key-to-array-pairs
- Positional-of-hashes, each record of which has:
  - Same keys
  - Same type of values of corresponding keys
- Positional-of-arrays, each record of which has:
  - Same length
  - Same type of values of corresponding elements
Implement "nice tabular visualization" using Pretty::Table and/or Text::Table::Simple.
Document examples using pretty tables.
Implement transposing operation for:
- hash of hashes
- hash of arrays
- array of hashes
- array of arrays
- array of key-to-array pairs
Implement to-pretty-table for:
- hash of hashes
- hash of arrays
- array of hashes
- array of arrays
- array of key-to-array pairs
Implemented join-across:
- inner, left, right, outer
- single key-to-key pair
- multiple key-to-key pairs
- optional fill-in of missing values
- handling collisions
Implement to long format conversion for:
- hash of hashes
- hash of arrays
Speed/performance profiling.