Rand Stats

App::samaki

zef:bduggan

Actions Status Actions Status

NAME

Samaki -- Stich Associated Modes of Accessing and Keeping Information

SYNOPSIS

Usage:
  sam            -- start the default UI, and browser the current directory
  sam <name>     -- start with the named samaki page or directory
  sam import <file> [--format=jupyter] -- import from another format to samaki
  sam export <name> [--format=html] -- export a samaki file to HTML (or other formats)
  sam conf       -- edit the configuration file ~/.samaki.conf
  sam reset-conf -- reset the configuration file to the default

Type `sam -h` for the full list of options.

DESCRIPTION

Samaki is a file format and tool for using multiple programming languages in a single file.

It's a bit like Jupyter notebooks (or R or Observable notebooks), but with multiple types of cells in one notebook and all the cells belong to a simple text file. It has a plugin architecture for defining the types of cells, and for describing the types of output. Outputs from cells are serialized, often as CSV files. Cells can reference each others' content or output.

Some use cases for samaki include

Here's an example:

-- duck:hello
select 'hello' as world;

-- duck:earth
select 'earth' as planet;

-- llm:planet.txt
Which planet from the sun is 〈 cells('earth').rows[0]<planet> 〉?

To use this:

  1. save it as a file, e.g. "planets.samaki"

  2. run `sam planets'

  3. press 'm' to toggle between raw mode and rendered mode

  4. highlight the second cell and press enter to run the query

  5. press r to refresh the page, also press m to change the mode, and notice that it has changed to

    "Which planet from the sun is earth?"

  6. highlight the third cell and press enter to run the LLM query

For more examples, check out the eg/ directory.

Below is what the screen looks like during this interactive session before earth.csv is created, when the cell is in raw mode:

╔═════════════════════════════════════════════════════════════════════╗
╢                           -- planets --                             ║
║    ┌── duck (txt)           [run] ➞  hello.txt                      ║
║  0 │ select 'hello' as world;                                       ║
║  1 └                                                                ║
║    ┌── duck (csv)           [run] ➞  earth.csv                      ║
║  0 │ select 'earth' as planet;                                      ║
║  1 └                                                                ║
║    ┌── llm (txt)            [run] ➞  planet.txt                     ║
║  0 │ Which planet from the sun is 〈 cells(1).rows[0]<planet> 〉?   ║
║  1 └                                                                ║
║                                                                     ║
╟─────────────────────────────────────────────────────────────────────╢
║                       planets/                                      ║
║ planet.txt                 45 b         9 hours and 52 minutes ago  ║
║ hello.csv                  12 b            7 days and 18 hours ago  ║
║                                                                     ║
╚═════════════════════════════════════════════════════════════════════╝

FORMAT

A samaki page (or notebook) consists of two things

  1. a text file, ending in .samaki

  2. a directory containing data files.

The directory name will be the same as the basename of the file, and it will be created if it doesn't exist. e.g.

planets.samaki
planets/
   hello.txt
   earth.csv
   planet.txt

The samaki file is a text file divided into cells, each of which looks like this:

-- <cell type> [ : <name> ['.' <ext>]? ]?
| <conf-key 1> : <conf-value 1>
| <conf-key 2> : <conf-value 2>
[... cell content ..]

That is:

  1. New cells are indicated with a line starting with two dashes and a space ("-- ") folowed by the type of the cell. (Other similar unicode dashes like "─" can also be used)

  2. The type of the cell should be a single word with alphanumeric characters.

  3. An optional colon and name can give a name to the cell.

  4. After the dashes, optional configuration options can be set as name : value pairs with a leading pipe symbol (|)

Another example: a cell named "the_answer" that runs a query and uses a duckdb file named life.duckdb

-- duck : the_answer
| file: life.duckdb

select 42 as life_the_universe_and_everything

Running the cell above creates the_answer.csv in the data directory. Note that if the extension is omitted, it is assumed to be .csv. the_answer.csv could also have been written.

Angle brackets are interpolated into cell content. For instance :

〈 cells(0).content 〉

alternatively, an ASCII equivalent <<< can be used:

<<< cells(0).content >>>

Cells can be referenced by name or by number, e.g.

〈 cells('the_answer').content 〉

Also c and cell are synonyms for cells. Also out and src refer to the output and source for the cell respectively. For instance:

〈 c('the_answer').out 〉

Calling res will return a Duckie::Result object for cells with CSV data.

The API is still evolving, suggestions are welcome!

CONFIGURATION

The configuration file for samaki is a raku file located at ~/.config/samaki/samaki-conf.raku. Environment variables $SAMAKI_HOME and $SAMAKI_CONFIG can be used to override the directory and file name respectively. Also $XDG_CONFIG_HOME is used if set.

Samaki is configured with a set of regular expressions which are used to determine how to handle each cell. The "type" of the cell above is matched against the regexes, and whichever one matches first will be used to parse the input and generate output.

Samaki comes with a default configuration file and some default plugins. The default configuration looks something like this (see here for the actual contents) :

# samaki-conf.raku
#
%*samaki-conf =
  plugins => [
    / duck /   => 'Samaki::Plugin::Duck',
    / llm  /   => 'Samaki::Plugin::LLM',
    / text /   => 'Samaki::Plugin::Text',
    / bash /   => 'Samaki::Plugin::Bash',
    / html/    => 'Samaki::Plugin::HTML',
  ],
  plugouts => [
    / csv  /   => 'Samaki::Plugout::Duckview',
    / csv  /   => 'Samaki::Plugout::DataTable',
    / html /   => 'Samaki::Plugout::HTML',
    / .*   /   => 'Samaki::Plugout::Raw',
  ]
;

RELOADING

Starting sam with "--watch" will autoreload the page when the file is changed.

INIT BLOCKS

A special type of cell that has no type can be used to run Raku code when the page loads, like this:

--
my $p = 'mars';

-- duck
select * from planets where name = '〈 $p 〉';

The two dashes without a type indicate that this code should immediately be evaluated. Blocks like this can be used throughout the page, and are executed when the page loads, at the same time that interpolated code is evaluated.

PLUGINS

Plugin classes should do the Samaki::Plugin role, and at a minimum should implement the execute method and have name and description attributes. The usual RAKULIB directories are searched for plugins, so adding local plugins is a matter of adding a new calss and placing it into this search path.

When interacting with external programs, there are three (and probably more) distinct ways to do this. There is some redundancy in the plugins because we offer more than one way to interact with external programs. The three ways that are currently abstracted across plugins are are

Of these methods, there are a few functional differences.

  1. persistence: currently only the last one offers persistence -- i.e. definitions between cells will persist within the REPL process. e.g. if one cell has x=12 and another has print(x) then the second will print 12 if it is run after the first. The other plugins are executed once and are stateless.

  2. output shown vs output saved: for native drivers the output that is shown on the screen is precisely what is stored. The second one stores output in a file, but does not necessarily display it all. This can be useful running programs that create large datasets. There may be some inconsistency depending on the plugin, so consult the individual plugin's implementation to see what it does.

In addition to classes defined in code, class definitions may be placed directly into the configuration file.

For instance, this snippet below is sufficient to implement a plugin called python for executing python code, saving the result to a file for that cell:

/ python / => class SamakiPython does Samaki::Plugin {
                has $.name = 'samaki-python';
                has $.description = 'run some python!';
                method execute(:$cell, :$mode, :$page, :$out) {
                   my $content = $cell.get-content(:$mode, :$page);
                   $content ==> spurt("in.py");
                   shell "python in.py > out.py 2> errs.py";
                   $out.put: slurp "out.py";
                }

An even simpler version could make use of the Process base class described above:

use Samaki::Plugin::Process;

%*samaki-conf =
    / python / => class SamakiPython does Samaki::Plugin::Process[
                   name => 'python',
                   cmd => 'python3' ] {
       has %.add-env = PYTHONUNBUFFERED => '1';
      },

INCLUDED PLUGINS

The following plugins are included with samaki:

PluginTypeDescription
BashProcessExecute contents as a bash program
CodeEvaluate raku code in the current process
DuckProcessRun SQL queries via duckdb executable
DuckieinlineRun SQL queries via Duckie inline driver
FileDisplay file metadata and info
HTMLGenerate HTML from contents
LLMinlineSend contents to LLM via LLM::DWIM
MarkdowninlineGenerate HTML from markdown
PostgresProcessExecute SQL via psql process
RakuProcessRun raku in a separate process
Repl::RakuReplInteractive raku REPL (persistent session)
Repl::PythonReplInteractive python REPL (persistent session)
Repl::RReplInteractive R REPL (persistent session)
TextWrite contents to a text file

Plugin documentation:

PLUGIN OPTIONS

Options may be given using a vertical line after the name of the plugin like this:

-- llm
| model: claude

Options are plugin-specific. See the documentation for each plugin for details.

PLUGOUTS

Output files are also matched against a sequence of regexes, and these can be used for visualizing or showing output.

These should also implement execute which has this signature:

method execute(IO::Path :$path!, IO::Path :$data-dir!, Str :$name!) { ... }

Plugouts are intended to either visualize or export data. The plugout for viewing an HTML file is basically:

method execute(IO::Path :$path!, IO::Path :$data-dir!, Str :$name!) {
  shell <<open $path>>;
}

INCLUDED PLUGOUTS

The following plugouts are included with samaki:

PlugoutDescription
ChartJSDisplay CSV as interactive charts in browser (via Chart.js)
CSVGeoDisplay CSV that has geojson data using a map in browser (via leaflet)
D3Display CSV as D3.js visualizations in browser
DataTableDisplay CSV in browser with sorting/pagination/search
DeckGLBinDisplay spatial bins (H3, geohash, GeoJSON) as 3D extrusions (via deck.gl)
DuckviewShow CSV summary in bottom pane (via duckdb)
GeojsonDisplay GeoJSON on map in browser (via leaflet)
HTMLOpen HTML content in browser
JSONDisplay prettified JSON in bottom pane
PlainDisplay plain text in browser
RawOpen file with system default application
TJLessView JSON in new tmux window (requires jless)

Plugout documentation:

IMPORTS/EXPORTS

An entire samaki page can be exported as HTML or imported from Jupyter. This is still evolving. For now, for instance:

sam export eg/planets

will generate a nice HTML page based on the samaki input. It will embed output files into the HTML.

USAGE

Usage is described at the top. For help, type sam -h.

Have fun!

BUGS

The backronym is a bit forced. Here's another one: Simple Arrangements of Modules with Any Kind of Items

TODO

A lot, especially more documentation.

Contributions are welcome!

AUTHOR

Brian Duggan (bduggan at matatu.org)