Rand Stats

LLM::Data::Inference

zef:apogee

Actions Status

NAME

LLM::Data::Inference - Structured LLM task layer with model-fallback, JSON parsing, and query-based routing

SYNOPSIS

use LLM::Data::Inference;

# Simple blocking LLM call (single backend — legacy shape)
my $task = LLM::Data::Inference::Task.new(
    :backend($my-backend),
    :system-prompt('You are a helpful assistant.'),
    :user-prompt('What is 2+2?'),
);
say $task.execute;  # "4"

# Model-fallback chain: try backends in order, advance to the next
# on model-specific failures (timeout, malformed output, 429, etc.),
# retry the head on transient errors (connection drop, 5xx), abort
# immediately on config errors (401 / 402 / 403).
my $task = LLM::Data::Inference::Task.new(
    :backends($primary, $secondary, $cheap-fallback),
    :user-prompt('Write a scene.'),
);
say $task.execute;  # serves from whichever backend produced a good response

# JSON output with schema checks — parser failures advance through
# the chain rather than retrying the same model on malformed JSON.
my $json-task = LLM::Data::Inference::JSONTask.new(
    :backends($primary, $fallback),
    :user-prompt('Return a JSON object with name and age.'),
    :required-keys('name', 'age'),
);
my %result = $json-task.execute;
say %result<name>;  # "Alice"

# Template-based prompts
my $pb = LLM::Data::Inference::PromptBuilder.new(
    :template('Write a {{genre}} story about {{topic}}.')
);
say $pb.render(%(:genre('fantasy'), :topic('dragons')));

# Query-based routing (orthogonal to the fallback chain — routers
# pick a backend, Tasks run fallbacks against that backend).
my $router = LLM::Data::Inference::Router.new(:default-backend($cloud-api));
$router.add-route('confidential OR restricted', $local-model);
$router.add-route('genre:technical', $reasoning-model);

my $backend = $router.select-backend($tags, $doc-id);

DESCRIPTION

LLM::Data::Inference provides a structured task layer on top of LLM::Chat for use in data generation pipelines. It wraps the async LLM::Chat API into blocking calls with a three-bucket retry + model-fallback policy, JSON extraction, and content-based model routing.

LLM::Data::Inference::Task

Blocking LLM call with a model-fallback chain. Accepts either the legacy single :$backend or an ordered :@backends list; internally both are stored as a chain, and a single-backend chain behaves exactly like the pre-fallback Task on retry-same errors.

my $task = LLM::Data::Inference::Task.new(
    :backends($primary, $fallback),   # OR :backend($single)
    :system-prompt('Be helpful.'),    # Optional system prompt
    :user-prompt('Hello'),            # User message (required)
    :max-retries(3),                  # Per-backend same-model retry
                                      # budget for transient errors
                                      # (see "Fallback policy" below)
    :timeout(120e0),                  # Seconds per HTTP round-trip
    :parser(-> $text { ... }),        # Optional parser; die to flag
                                      # malformed output (advances
                                      # to the next backend)
    :on-call-complete(-> %p { ... }), # Optional per-call telemetry hook
);

my $result = $task.execute;  # blocks until a backend returns a good
                             # response, or dies if every backend
                             # in the chain fails

Fallback policy

Failures classify into three buckets:

Parser failures advance (behavioural change from pre-fallback)

When :&parser is set, a thrown exception from inside the parser is classified as an advance-class failure. The Task does NOT retry the same backend on parser failure — in practice, a model that emits malformed JSON once rarely recovers on a second attempt against the same model, and a chain of [primary, fallback] produces cleaner recovery with lower latency.

Consumers on a single-backend Task that previously relied on parser recovery via retry should either (a) add a fallback model to the chain, (b) pass :backends($primary, $primary) to preserve the old "try the same model twice" shape on advance-class errors, or (c) build a retry loop at the application layer.

Telemetry

:&on-call-complete fires once per HTTP round-trip with a hash:

%(
    attempt       => 1,         # monotonic across the execute call
    backend-index => 0,         # 0-based position within @.backends
    model-name    => 'z-ai/glm-5.1',
    latency-ms    => 1234,
    success       => True,
    stage         => 'network',
    error         => Str,       # present on failure
    error-class   => Str,       # 'http' / 'timeout' / 'connection' /
                                # 'response' / 'unknown' (on failure)
    error-status  => Int,       # HTTP code (when error-class eq 'http')
    # Provider-reported usage — presence-gated:
    prompt-tokens, completion-tokens, total-tokens,
    cost, model-used, provider-id, finish-reason,
)

classify-error — inspect the policy

The public classify-error method maps an error shape to the bucket name for consumers that want to implement the same policy outside the Task:

my $bucket = $task.classify-error(
    error-class  => 'http',
    error-status => 401,
);
# returns 'abort'

$task.classify-error(error-class => 'timeout');       # 'advance'
$task.classify-error(error-class => 'connection');    # 'retry-same'
$task.classify-error(:parser-failed);                 # 'advance'

LLM::Data::Inference::JSONTask

JSON extraction from LLM responses with key validation and optional custom validator. Handles LLMs that wrap JSON in prose by extracting the outermost { } or [ ]. Accepts :$backend or :@backends and threads either through to the inner Task unchanged — all fallback semantics come from the Task layer.

my $task = LLM::Data::Inference::JSONTask.new(
    :backends($primary, $fallback),
    :user-prompt('Give me a character card as JSON.'),
    :required-keys('name', 'description'),    # Missing key → advance
    :validator(-> %h { %h<name>.chars > 0 }), # False return → advance
    :max-retries(3),                          # Same-model retry budget
                                              # for transient errors
);

my %character = $task.execute;

LLM::Data::Inference::Router

Query-based routing using Roaring::Tags. Each route is a tag query string paired with a backend. Routes are evaluated in order — first match wins. Orthogonal to the Task-level fallback chain: a Router selects which backend (or chain) to hand to the Task, and the Task handles retries / fallbacks on top.

my $router = LLM::Data::Inference::Router.new(
    :default-backend($cloud-api),
);

$router.add-route('confidential', $local-model);
$router.add-route('confidential, sensitive', $air-gapped-model);
$router.add-route('genre:technical', $reasoning-model);

my $backend = $router.select-backend($tags, $doc-id);

LLM::Data::Inference::PromptBuilder

Mustache-style template rendering with {{variable}} substitution.

my $pb = LLM::Data::Inference::PromptBuilder.new(
    :template('Write a {{length}} word {{genre}} story about {{topic}}.')
);
my $prompt = $pb.render(%(:length('500'), :genre('sci-fi'), :topic('AI')));

Dies if a {{variable}} has no matching key in the vars hash.

AUTHOR

Matt Doughty matt@apogee.guru

COPYRIGHT AND LICENSE

Copyright 2026 Matt Doughty

This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.