WWW::Ollama

Raku package for accessing Ollama models. (Ollama models can be run "locally" on user's computer.)
The implementation is based in the Ollama's API, [Ol1], and observing (and trying to imitate) the
Ollama client of Wolfram Language (WL).
The package has the following features:
- If
ollama is not running the corresponding executable is found and started - If a request specifies the use of a known Ollama "local-evaluation" model, but that model is not available locally, then the model is downloaded first
Installation
From GitHub:
zef install https://github.com/antononcube/Raku-WWW-Ollama.git
From Zef ecosystem:
zef install WWW::Ollama
Usage
Here is a list of usage design items:
- The Ollama access is done via a client object,
WWW::Ollama::Client - It is not necessary to use a
WWW::Ollama::Client object explicitly - Instead the following functions / subs can be used:
ollama-base-urlollama-list-modelsollama-model-infoollama-embeddingollama-completionollama-chat-completionollama-client ("umbrella" function for all of the above)
- The Ollama functions -- and client methods -- take the named option "format"
- With
format => 'hash' more details of the request are obtained in Raku hashmap form
- The Ollama host and port can be explicitly specified when a new
WWW::Ollama::Client object is created - The method
WWW::Ollama::Client.new takes the Boolean option (adverb) :ensure-running- Which is set to false by default, i.e.
:!ensure-running
This diagram summarizes the Ollama client interaction (in this notebook):
flowchart LR
Gemma1b(gemma3:1b)
Gemma12b(gemma3:11b)
Gemma27b(gemma3:27b)
Lamma4(lamma4)
Etc("...")
OllamaRepo[(Ollama<br>LLM Repository)]
Ollama{{Ollama<br>executable}}
OClient("Ollama client<br>(Raku)")
Req[LLM request]
LLMAval{LLM locally<br>available?}
Download[[Download LLM]]
Req --> OClient --> Ollama --> LLMAval
LLMAval --> |No| Download
Download <-.-> OllamaRepo
NB>notebook]
NB --> Req
Ollama -.- Gemma1b
Ollama -.- Gemma12b
Ollama -.- Gemma27b
Ollama -.- Lamma4
subgraph LocalLLMs["Locally stored LLMs"]
Gemma1b
Gemma12b
Gemma27b
Lamma4
Etc
end
IsRunning{"Is Ollama running?"}
EnsureVal{"ensure-running"}
IsRunning --> |No|EnsureVal --> |Yes|StartOllama[[Start Ollama executable]]
StartOllama -.-> Ollama
Download -.-> LocalLLMs
subgraph Raku["Raku session"]
OClient
Req
subgraph Ensure["Ensure running"]
IsRunning
EnsureVal
StartOllama
end
OClient <--> Ensure
end
Usage examples
For detailed usage examples see:
CLI
The package provides the Command Line Interface (CLI) script ollama-client for making Ollama LLM generations.
Here is the usage message:
ollama-client --help
# Usage:
# ollama-client [<words> ...] [--path=<Str>] [-m|--model=<Str>] [-f|--format=<Str>] -- Ollama client invocation.
#
# --path=<Str> Path, one of 'completion', 'chat', 'embedding', 'model-info', 'list-models', or 'list-running-models'. [default: 'completion']
# -m|--model=<Str> Model to use. [default: 'Whatever']
# -f|--format=<Str> Format of the result; one of "json", "hash", "values", or "Whatever". [default: 'Whatever']
Design and implementation details
Separate OOP and functional interfaces
- From the very beginning was decided to have an Object-Oriented Programming (OOP) implementation, not a Functional Programming (FP) one.
- A functional interface / front-end is, of course, desirable.
- The functions
ollama-list-models, ollama-model-info, ollama-embedding, ollama-completion, ollama-chat-completion use the umbrella function ollama-client. - The umbrella function
ollama-client, in turn, has an optional argument :$client that takes WWW::Ollama::Client objects or Whatever.- If the value is
Whatever then a new WWW::Ollama::Client object is created.
Automatic start and download
- Initially, the idea was to have a "seamless" experience. (As in WL). In other words, these steps are automatic:
- The running of the executable
ollama is detected - If not running, then the executable
ollama is located and started - The LLM model specified in the request is downloaded (if not available already)
- From a certain sysadmin point of view the "seamless" run of
ollama and model downloading is not that great.- Say, at a certain organization Jupyter notebooks that have "WWW::Ollama" setup are mass-deployed.
- This might produce too much internet traffic because of the LLM models being downloaded.
- So, now, by default, the automatic start is disabled if a
WWW::Ollama::Client.new() is used.- To enable it, use
WWW::Ollama::Client.new(:ensure-running).
- Also, there was the idea to have the Ollama executables for different platforms to be part of the package. (Placed in "./resources".)
- But, that would make the package too big, ≈95MB.
TODO
- TODO Implementation
- DONE Reasonable gists for the different objects.
- TODO Authorization
- DONE Initialize the client with an API key and use that key
- TODO Pass & use an API key per client method call
- TODO Automatic discovery and use of OLLAMA_API_KEY
- DONE Functional interface
- I.e. without the need to explicitly make a client object.
- TODO Refactor
- TODO Change the streaming method to use the output of "HTTP::Tiny"
- TODO Review and simplify the code
- There are, probably, too many classes.
- TODO CLI
- DONE MVP
- TODO Detect JSON file with valid chat records
- TODO Detect JSON string with valid chat records
- DONE Unit tests
- DONE Client object creation
- DONE Completion generation
- DONE Chat generation
- DONE Embeddings
- TODO Documentation
- DONE Basic usage script
- DONE Basic usage notebook
- DONE Using via the LLM-function framework
- TODO Benchmarking over DSL translations
- TODO Demo video
References
[Ol1] "Ollama API".