
LLM::Chat
Simple framework for LLM inferencing in Raku. Supports multiple backends (OpenAI-compatible, KoboldCpp), chat templates (ChatML, Llama 3/4, Mistral, Gemma 2, and any HuggingFace Jinja2 template), conversation management with context shifting, and token counting.
Synopsis
use LLM::Chat::Backend::KoboldCpp;
use LLM::Chat::Template::ChatML;
use LLM::Chat::Conversation;
my $backend = LLM::Chat::Backend::KoboldCpp.new(
api_url => 'http://localhost:5001/v1',
template => LLM::Chat::Template::ChatML.new,
);
my $conv = LLM::Chat::Conversation.new;
$conv.add-message('user', 'Hello!');
my $response = $backend.text-completion($conv.messages);
Templates
Built-in Templates
use LLM::Chat::Template::ChatML;
use LLM::Chat::Template::Llama3;
use LLM::Chat::Template::Llama4;
use LLM::Chat::Template::MistralV7;
use LLM::Chat::Template::Gemma2;
my $template = LLM::Chat::Template::ChatML.new;
Jinja2 Templates (HuggingFace)
Load any HuggingFace chat_template directly from a tokenizer_config.json:
use LLM::Chat::Template::Jinja2;
# From tokenizer_config.json
my $json = 'tokenizer_config.json'.IO.slurp;
my $template = LLM::Chat::Template::Jinja2.from-tokenizer-config($json);
# Or provide the template string directly
my $template = LLM::Chat::Template::Jinja2.new(
template => $jinja2-string,
bos-token => '<s>',
eos-token => '</s>',
);
The Jinja2 template support is powered by Template::Jinja2, a complete Jinja2 engine for Raku with byte-identical output to Python Jinja2.
Backends
KoboldCpp
use LLM::Chat::Backend::KoboldCpp;
my $backend = LLM::Chat::Backend::KoboldCpp.new(
api_url => 'http://localhost:5001/v1',
template => $template, # for text completions
max_tokens => 200,
);
OpenAI-compatible
Any OpenAI-compatible API (vLLM, Ollama, etc.):
use LLM::Chat::Backend::OpenAICommon;
my $backend = LLM::Chat::Backend::OpenAICommon.new(
api_url => 'http://localhost:8000/v1',
model => 'my-model',
);
Conversation Management
use LLM::Chat::Conversation;
my $conv = LLM::Chat::Conversation.new;
$conv.add-message('system', 'You are helpful.');
$conv.add-message('user', 'Hello!');
$conv.add-message('assistant', 'Hi there!');
# Access messages
say $conv.messages;
Token Counting
use LLM::Chat::TokenCounter;
my $counter = LLM::Chat::TokenCounter.new(
tokenizer-path => 'path/to/tokenizer.json',
template => $template,
);
my $count = $counter.count-messages(@messages);
Dependencies
Cro::HTTP — HTTP client for API calls
Template::Jinja2 — Jinja2 template engine
Tokenizers — HuggingFace tokenizers via Rust FFI
JSON::Fast — JSON parsing
Author
Matt Doughty
License
Artistic-2.0