LLM::Chat
Introduction
LLM::Chat
is a module for inferencing large language models.
It automatically manages pruning old messages, retaining the system prompt (:sysprompt
) & other sticky (:sticky
) messages, and inserting messages at depth (:depth
).
Example Usage
This is an implementation of a terminal-based conversational loop with LLM::Chat:
#!/usr/bin/env raku
use lib 'lib';
use LLM::Chat::Backend::OpenAICommon;
use LLM::Chat::Backend::Settings;
use LLM::Chat::Conversation;
use LLM::Chat::Conversation::Message;
use LLM::Chat::Template::MistralV7;
use LLM::Chat::TokenCounter;
use Tokenizers;
## EDIT THESE TO MATCH YOUR ENVIRONMENT
constant $API_URL = 'http://192.168.1.193:5001/v1';
constant $MAX_TOKENS = 1024;
constant $MAX_CONTEXT = 32768;
my @conversation = (
LLM::Chat::Conversation::Message.new(
role => 'system',
content => 'You are a helpful assistant.',
sysprompt => True
);
);
my $template = LLM::Chat::Template::MistralV7.new;
my $tokenizer = Tokenizers.new-from-json(
slurp('t/fixtures/tokenizer.json')
);
my $token-counter = LLM::Chat::TokenCounter.new(
tokenizer => $tokenizer,
template => $template,
);
my $settings = LLM::Chat::Backend::Settings.new(
max_tokens => $MAX_TOKENS,
max_context => $MAX_CONTEXT,
);
my $con = LLM::Chat::Conversation.new(
token-counter => $token-counter,
context-budget => $MAX_CONTEXT - $MAX_TOKENS,
);
my $backend = LLM::Chat::Backend::OpenAICommon.new(
api_url => $API_URL,
settings => $settings,
);
loop {
my @lines;
say "Enter your input. Type 'DONE' on a line by itself when finished:\n";
loop {
print "> ";
my $line = $*IN.get // last;
last if $line.trim eq 'DONE';
@lines.push: $line;
}
last if @lines.elems == 0;
@conversation.push: LLM::Chat::Conversation::Message.new(
role => 'user',
content => @lines.join("\n"),
);
my @prompt = $con.prepare-for-inference(@conversation);
my $resp = $backend.chat-completion-stream(@prompt);
my $last;
loop {
my $new = $resp.latest.subst(/^$last/, '');
$last = $resp.latest;
print $new if $new ne "";
last if $resp.is-done;
sleep(0.1);
}
print "\n";
print "ERROR: {$resp.err}\n" if !$resp.is-success;
}
See examples/*
and t/*
for more usage examples.
Current Support
Inference Types
- Chat completion (with or without streaming)
- Text completion (with or without streaming)
API Types
- OpenAI compatible (most backends) -
LLM::Chat::Backend::OpenAICommon
- KoboldCpp (additional samplers & cancel function) -
LLM::Chat::Backend::KoboldCpp
To implement more API types, just extend LLM::Chat::Backend
.
Chat Templates
- ChatML (
LLM::Chat::Template::ChatML
) - Gemma 2 (
LLM::Chat::Template::Gemma2
) - Llama 3 (
LLM::Chat::Template::Llama3
) - Llama 4 (
LLM::Chat::Template::Llama4
) - Mistral V7 (
LLM::Chat::Template::MistralV7
)
To implement more chat templates, just extend LLM::Chat::Template
. You will need a correct chat template for accurate context shifting and/or text completion.
Planned
- Tool Calling
- VLM Capabilities
- More APIs & templates
- Automatically fetching tokenizers & chat template from HF model identifiers
Contributing
Pull requests and issues welcome.
License
Artistic License 2.0
(C) 2025 Matt Doughty <matt@apogee.guru>
The file at t/fixtures/tokenizer.json
is (C) 2025 Mistral AI.
It is extracted from Mistral Nemo, which is an Apache 2.0 licensed model.