Rand Stats

Data::Translators

zef:antononcube

Data::Translators

Raku package for translation of JSON specs or JSON-like data structures into other formats.

It is envisioned this package to have translators to multiple formats. For example:

The main motivation for making the package is to have convenient way of making tables while doing Literate programming with Raku using:

The use of JSON came to focus, since when working with Large Language Model (LLM) functions, [AAp3], very often it is requested from LLMs to produce output in JSON format, [AA1, AA2].

The package "Data::Reshapers", [AAp1], would complement nicely "Data::Translators" and vice versa. The package "Data::TypeSystem", [AAp2], is used for "translation decisions" and for conversions into more regular datasets.

The package "Mathematica::Serializer", [AAp5], has a very similar mission -- it is for translating Raku data structures into Mathematica (aka Wolfram Language or WL) code.

In order to utilize "Data::Translators" while doing Literate programming with:

One can find concrete examples for:

Remark: The provided converters are made for communication purposes, so they might not be very performant. I have used or tested them with datasets that have less than 5000 rows.


Installation

Package installations from both sources use zef installer (which should be bundled with the "standard" Rakudo installation file.)

To install the package from Zef ecosystem use the shell command:

zef install Data::Translators

To install the package from the GitHub repository use the shell command:

zef install https://github.com/antononcube/Raku-JSON-Translators.git

Basic usage

Main use case

Here is a "main use case" example:

  1. Get a dataset that is an array of hashes
  2. Filter or sample the records
  3. Make an HTML table with those records

The HTML table outputs can be used to present datasets nicely in:

Here we get the Titanic dataset and sample it:

use Data::Reshapers;
use Data::TypeSystem;
use Data::Translators;

my $tbl = get-titanic-dataset.pick(3);
# ({id => 1256, passengerAge => -1, passengerClass => 3rd, passengerSex => male, passengerSurvival => died} {id => 1033, passengerAge => -1, passengerClass => 3rd, passengerSex => male, passengerSurvival => died} {id => 1037, passengerAge => -1, passengerClass => 3rd, passengerSex => female, passengerSurvival => survived})

Here is the corresponding dataset type:

deduce-type($tbl);
# Vector(Assoc(Atom((Str)), Atom((Str)), 5), 3)

Here is the corresponding HTML table:

$tbl ==> data-translation
idpassengerClasspassengerSurvivalpassengerAgepassengerSex
12563rddied-1male
10333rddied-1male
10373rdsurvived-1female

We can specify field names and HTML table attributes:

$tbl ==> data-translation(field-names => <id passengerSurvival>, table-attributes => 'id="info-table" class="table table-bordered table-hover" text-align="center"');
idpassengerSurvival
1256died
1033died
1037survived

Here is how the transposed dataset is tabulated:

$tbl ==> transpose() ==> data-translation;
passengerSurvival
  • died
  • died
  • survived
id
  • 1256
  • 1033
  • 1037
passengerClass
  • 3rd
  • 3rd
  • 3rd
passengerSex
  • male
  • male
  • female
passengerAge
  • -1
  • -1
  • -1

From JSON strings

Here is a JSON string translation to HTML:

my $json1 = q:to/END/;
{
    "sample": [
        {"name": "json2html", "desc": "coverts json 2 html table format", "lang": "python"},
        {"name": "testing", "desc": "clubbing same keys of array of objects", "lang": "python"}
    ]
}
END

data-translation($json1);
sample
namelangdesc
json2htmlpythoncoverts json 2 html table format
testingpythonclubbing same keys of array of objects

From HTML strings

Get the data of an HTML table as a Raku dataset (array of hashmaps). Here is an HTML table string:

sink my $html = q:to/END/;
<table>
    <tr>
        <th>Name</th>
        <th>Age</th>
        <th>City</th>
    </tr>
    <tr>
        <td>John</td>
        <td>25</td>
        <td>New York</td>
    </tr>
    <tr>
        <td>Alice</td>
        <td>30</td>
        <td>London</td>
    </tr>
</table>
END

Here is the Raku dataset:

data-translation($html, target => 'dataset')
# [{Age => 25, City => New York, Name => John} {Age => 30, City => London, Name => Alice}]

Cross-tabulated data

Here is a more involved data example:

data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
female
survived339
died127
male
died682
survived161

Compare the HTML table above with the following plain text table:

to-pretty-table(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
# +--------+----------+------+
# |        | survived | died |
# +--------+----------+------+
# | female |   339    | 127  |
# | male   |   161    | 682  |
# +--------+----------+------+

Generation of R and WL code

Here is the R code version of the Titanic data sample:

$tbl ==> data-translation(target => 'R', field-names => <id passengerClass passengerSex passengerAge passengerSurvival>)
data.frame(`id` = c("1256", "1033", "1037"),
`passengerClass` = c("3rd", "3rd", "3rd"),
`passengerSex` = c("male", "male", "female"),
`passengerAge` = c("-1", "-1", "-1"),
`passengerSurvival` = c("died", "died", "survived"))

Here is the R code version of the contingency table:

data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'), target => 'R')
list("male"=list("died"=682, "survived"=161), "female"=list("survived"=339, "died"=127))

Here is the WL code version of the contingency table:

data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'), target => 'WL')
Association["male"->Association["survived"->161, "died"->682], "female"->Association["survived"->339, "died"->127]]

Nicer datasets

In order to obtain datasets or more regular datasets the function to-dataset can be used. Here a rugged dataset is made regular and converted to an HTML table:

my @tbl2 = get-titanic-dataset.pick(6);
@tbl2 = @tbl2.map({ $_.pick((1..5).pick).Hash });
@tbl2 ==> to-dataset(missing-value=>'・') ==> data-translation
passengerAgeidpassengerClasspassengerSexpassengerSurvival
survived
-11191stmale
1stmale
3rddied
03602ndmalesurvived
406923rd

Here a hash is transformed into dataset with columns <Key Value> and then converted into an HTML table:

{ 4 => 'a', 5 => 'b', 8 => 'c'} ==> to-dataset() ==> data-translation
KeyValue
4a
5b
8c

Implementation notes


CLI

The package provides a Command Line Interface (CLI) script. Here is its usage message:

data-translation --help
# Usage:
#   data-translation <data> [-t|--target=<Str>] [--encode] [--escape] [--field-names=<Str>] -- Convert data into another format.
#   
#     <data>                 Data to convert.
#     -t|--target=<Str>      Target to convert to, one of <JSON HTML R>. [default: 'HTML']
#     --encode               Whether to encode or not. [default: False]
#     --escape               Whether to escape or not. [default: False]
#     --field-names=<Str>    Field names to use for Map objects, separated with ';'. [default: '']

Here is an example application (to this file):

data-translation ./resources/professionals.json --field-names='data;id;name;age;profession'
data
idnameageprofession
1Alice25Engineer
2Bob30Doctor
3Charlie28Artist
4Diana32Teacher

References

Articles

[AA1] Anton Antonov, "Workflows with LLM functions", (2023), RakuForPrediction at WordPress.

[AA2] Anton Antonov, "TLDR LLM solutions for software manuals", (2023), RakuForPrediction at WordPress.

Packages

[AAp1] Anton Antonov, Data::Reshapers Raku package, (2021-2023), GitHub/antononcube.

[AAp2] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.

[AAp4] Anton Antonov, Text::CodeProcessing Raku package, (2021-2023), GitHub/antononcube.

[AAp5] Anton Antonov, Mathematica::Serializer Raku package, (2021-2022), GitHub/antononcube.

[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.

[VMp1] Varun Malhotra, json2html Python package, (2013-2021), GitHub/softvar.