Rand Stats

CSV::Table

zef:tbrowder

Actions Status Actions Status Actions Status

NAME

CSV::Table - Provides routines for querying and modifying a CSV file with or without a header row.

SYNOPSIS

For example, using an MxN row/column matrix for data plus a header row in a file with the first three lines being:

name, age, ...
John, 40,  ...
Sally, 38, ...
...

Handle the file with CSV::Table in a Raku program:

use CSV::Table;
# with indexing from zero
my $t = CSV::Table.new: :csv($my-csv-file);
say $t.fields;       # OUTPUT: «M␤»     # zero if no header row
say $t.rows;         # OUTPUT: «N-1␤»   # N if no header row
say $t.cols;         # OUTPUT: «M␤»
say $t.field[0];     # OUTPUT: «name␤»  # (Any) if no header row
say $t.cell[0][0];   # OUTPUT: «John␤»

There are multiple ways to query a data cell:

say $t.cell[1][0]    # OUTPUT: «Sally␤»
say $t.rowcol(1, 0); # OUTPUT: «Sally␤»
say $t.rc(1, 0);     # OUTPUT: «Sally␤»
say $t.ij(1, 0);     # OUTPUT: «Sally␤»
say $t.colrow(0, 1); # OUTPUT: «Sally␤»
say $t.cr(0, 1);     # OUTPUT: «Sally␤»
say $t.ji(0, 1);     # OUTPUT: «Sally␤»

You can change the value of any cell:

$t.cell[0][1] = 48;
$t.rowcol(0, 1, 50);

You can also change the names of fields, but, unless you also change the corresponding field names in the data hashes, you will most likely have problems. It would be much easier to modify the original CSV file.

You can choose to have the changed data ($t.save) any time, but you will be asked to confirm the save.

You can also save the data in a new file: $t.save: $stem. Where $stem is the desired basename without a suffix. The new files will have the '.csv' and '-raw.csv' names (or your desired 'raw' file string).

You can define a CSV table with row names using the '$has-row-names' option to the constructor and query the table by row name and column name.

say $t.rowcol("water", "Feb"); # OUTPUT: «10.80␤»

DESCRIPTION

CSV::Table is a class enabling access to a CSV table's contents. Tables with a header row must have unique field names. Input files are read immediately, so very large files may overwhelm system resources.

By default, text in a cell is 'normalized', that is, it is trimmed of leading and trailing whitespace and multiple contiguous interior whitespaces are collapsed into single space characters (' '). In this context, 'whitespace' is one or more characters in the set (" ", "\t", "\n"). Exceptions to that rule occur when the user wishes to use a newline in a cell or a tab is used as a cell separator. In those cases, some other character must be chosen as a line-ending or cell separator, and the newline is not considered to be a whitespace character for the normalization algorithm while any tab not used as a cell separator is treated as whitespace. (See more details and examples below.)

It can handle the following which other CSV handlers may not:

Note header lines with any empty fields causes an exception. This is a valid header line:

field0, field1, field2

This header line is not valid (notice the ending comma has no text following it):

field0, field1, field2,

As simple as it is, it also has some uncommon features that are very useful:

Limitations

It cannot currently handle:

Also, quoted words are not specially treated nor are unbalanced quote characters detected.

Constructor with default options

my $t = CSV::Table.new: :$csv,
                        :has-header=True,
                        :separator='auto',
                        :normalize=True,
                        :trim=True,
                        :comment-char='#',
                        :line-ending="\n",
                        :empty-cell-value="",
                        :has-row-names=False,
                        :raw-ending="-raw",
                        :config
                        ;

Following are the allowable values for the named arguments. The user is cautioned that unspecified values are probably not tested. File an issue if your value of choice is not specified and it can be added and tested for.

There are a lot of options, one or all of which can be defined in a YAML (or JSON) configuration file whose path is provided by the config option. The user may get a prefilled YAML config file by executing:

$ raku -e'use CSV::Table; CSV::Table.write-config'
See CSV::Table JSON configuration file 'config-csv-table.yml'

or

$ raku -e'use CSV::Table; CSV::Table.write-config(:type<json>)'
See CSV::Table JSON configuration file 'config-csv-table.json'

Alternatively, you can call the method on a CSV::Table object in the REPL:

$ raku
> use CSV::Table;
> CSV::Table.write-config
See CSV::Table YAML configuration file 'config-csv-table.yml'

Accessing the table

The following table shows how to access each cell in a table $t with a header row plus R rows and C columns of data. (In matrix terminology it is an M x N rectangular matrix with M rows and N columns.)

$t.field[0]...$t.field[C-1]
$t.cell[0][0]...$t.cell[0][C-1]
.........
$t.cell[R-1][0]...$t.cell[R-1][C-1]

The table's data cells can also be accessed by field name and row number:

$t.col{$field-name}{$row-number}

If row names are provided, data cells can be accessed by row and column names:

$t.rowcol: $row-name, $field-name

Possible new features

The following features can be implemented fairly easily if users want it and file an issue.

Other matrix-related features could be implemented, but most are available in the published modules Math::Libgsl::Matrix and Math::Matrix.

CREDITS

Thanks to @lizmat and @[Coke] for pushing for a more robust CSV handling module including quotes and newlines.

Thanks to @librasteve for the idea of the slice method and his suggestion of aliases slice2d and view for slice.

AUTHOR

Tom Browder tbrowder@acm.org

COPYRIGHT AND LICENSE

© 2024 Tom Browder

This library is free software; you may redistribute it or modify it under the Artistic License 2.0.