Revision history for LLM::Chat
0.5.1 2026-04-29T23:47:12+01:00
- Bump Github Actions to use node 24+
- LLM::Chat::Backend::OpenAICommon gains a symmetric
`_on-blocking-complete` hook on the non-streaming path,
mirroring the existing `_on-stream-complete` contract: fires
after the response body has been parsed and `_lift-usage`
has lifted OAI/provider usage fields, before
`$response.done`. Default implementation is a no-op.
Subclasses use it to attach post-call metadata that isn't
in the body itself — symmetric with what was already
possible on streams.
- LLM::Chat::Backend::OpenRouter wires the new hook through
to the same `/generation?id=...` lookup the streaming path
already uses, so blocking callers (e.g. App::Storygen, which
calls `chat-completion` rather than `chat-completion-stream`)
now see `.cost` populated by the time `$response.done` fires.
Pre-fix this was the silent regression from dropping
`usage: { include: true }` — `Response::OpenRouter.cost`
stayed Nil on every blocking call.
- LLM::Chat::Backend::OpenRouter refactor: the lookup logic
is now in a private `!fetch-generation-metadata` helper that
both `_on-stream-complete` and `_on-blocking-complete`
delegate to. No behaviour change on the streaming path.
- Tests — t/12-openrouter-backend.rakutest gains a subtest
covering the defensive guards on both completion hooks
(no-op when generation-id is undefined; no crash when the
response isn't OR-augmented). Plan goes 11 → 12; total
LLM::Chat tests 130.
0.5.0 2026-04-27T22:52:25+01:00
- LLM::Chat::Backend::OpenRouter request shape now mirrors
SillyTavern's wire bytes verbatim. Removed two body fields
that were causing OpenRouter's upstream router to hold 200 OK
headers indefinitely against some providers (~80% header-phase
timeouts in App::Cantina vs ~0% in SillyTavern on the same
models / keys / network):
* `usage: { include: true }` — no longer sent.
* `stream_options: { include_usage: true }` — no longer sent
(also removed from OpenAICommon.chat-completion-stream so
all OAI-compatible streams now match this shape).
* `reasoning: { effort, enabled }` — `enabled` key dropped;
we now send only `{ effort }` when reasoning_effort is
configured, matching ST.
Added on every request, also matching ST:
* `include_reasoning: Bool` — Boolean parity with ST's flag.
* `top_k` — plumbed from Settings into the OAI body.
`repetition_penalty` is now omitted when at the default 1.0
(was previously sent unconditionally).
- Cost telemetry that the inline `usage: { include: true }`
block used to carry now arrives via a one-shot post-stream
GET against `/generation?id=...` after `[DONE]`. Lookup is
async and best-effort; on failure $resp.cost stays Nil rather
than escalating. Lifts cost, provider-name, and (when not
already populated from the stream) prompt/completion tokens.
Latency: ~50–200ms after .is-done becomes True before .cost
is readable. New hook _on-stream-complete on OpenAICommon so
future provider subclasses can do the same kind of
post-stream metadata fetch.
- LLM::Chat::Backend::OpenAICommon stream parser now buffers
bytes across body-byte-stream emissions and splits on the
SSE `\n\n` event delimiter before parsing, instead of
decoding+parsing each TCP chunk independently. Pre-fix, a
`data: {...}` JSON object split across two TCP packets would
crash from-json on the truncated half and terminate the
stream as 'unknown' error class. Both chat-completion-stream
and text-completion-stream got the fix. Heartbeat / SSE
comment lines (`: OPENROUTER PROCESSING`) are dropped per
spec — never produce a chunk.
- LLM::Chat::Backend::OpenAICommon.!classify-exception no
longer string-matches "timeout" / "timed out" in the default
arm; only X::Cro::HTTP::Client::Timeout maps to error-class
'timeout' now. Substring matching was masking unrelated
errors (JSON parse failures, stream-cancel messages) as
header timeouts. Connection-error pattern (refused / reset /
DNS / unreachable) stayed — those don't have a typed
exception class to discriminate on.
- LLM::Chat::Debug log format gains elapsed-ms timestamps for
streaming requests: HEADERS RECEIVED, FIRST BODY BYTE, and
EXCEPTION lines all carry "+Nms" relative to the call start
so latency can be diagnosed without external instrumentation.
Existing log labels unchanged.
0.3.0 2026-04-23T15:56:28+01:00
- LLM::Chat::Backend::Response gains structured error metadata:
$.error-status (Int HTTP code) and $.error-class (Str —
'http' / 'timeout' / 'connection' / 'response' / 'unknown').
Populated via _set-error-info(:$class, :$status) alongside
the existing .quit path. Lets consumers branch on error kind
without regex-parsing raw messages — used by the
LLM::Data::Inference::Task model-fallback policy.
- LLM::Chat::Backend::OpenAICommon CATCH blocks classify Cro
exceptions (X::Cro::HTTP::Error — picks up status off
.response.status; X::Cro::HTTP::Client::Timeout) plus
heuristic socket-error detection into the Response's error
fields before quitting. Finish-reason quits (length /
content_filter / unknown) are tagged error-class => 'response'
so the fallback layer advances on them.
- LLM::Chat::Backend::Mock gains &.error-producer — an optional
(Int $call-index --> Hash) callback that scripts per-call
failures for fallback / retry tests. Returning a hash like
{ class => 'http', status => 500, message => 'x' } fails
that call without consuming a slot from @.responses. Also
exposes $.call-index (monotonic per-backend call count,
bumped on every completion regardless of outcome).
0.2.6 2026-04-13T17:03:42+01:00
- LLM::Chat::Backend::Mock: new test-only backend that returns
canned responses in order. Supports streaming (default splits
on whitespace, configurable via :token-splitter), optional delay
between tokens via :stream-delay, and an :initial-delay (10ms
default) so consumers can attach taps before tokens flow.
:fail-on-empty makes the backend die when the response queue
is exhausted instead of repeating the last entry. Useful for
exercising error paths in downstream consumers.
Recording: every completion call is logged to @.recorded-calls
as a hash with kind / messages / tools / response / at. Tests
can assert on what reached the backend, not just what came
back — catches prompt-assembly and template-substitution
regressions. clear-recorded-calls resets the log between
test phases.
0.2.5 2026-04-09T12:30:48+01:00
- Optional :@tools parameter on chat-completion and chat-completion-stream
- Response.tool-calls and has-tool-calls for detecting LLM tool call requests
- Response.finish-reason field
0.2.4 2026-04-09T05:15:41+01:00
- CI: exclude Windows (Tokenizers Rust FFI build not yet supported)
0.2.3 2026-04-09T05:09:48+01:00
- Add GitHub Actions CI workflow with Rust toolchain for Tokenizers
- Add dist.ini for mi6 (UploadToZef, ReadmeFromPod, Badges)
- Add docs/Readme.rakudoc
0.2.2 2026-04-09T04:59:41+01:00
- Add stub LLM::Chat to make mi6 stop renaming the module.
0.2.1 2026-04-09T04:41:36+01:00
- Add LLM::Chat::Template::Jinja2 for HuggingFace chat template support
- from-tokenizer-config class method loads templates from tokenizer_config.json
- Supports bos_token, eos_token passthrough
- Continuation mode maps to add_generation_prompt=false
0.2.0
- Previous releases