Rand Stats

LLM::Data::ContentTag

zef:apogee

Actions Status

NAME

LLM::Data::ContentTag - Content classification and model routing for LLM data pipelines

SYNOPSIS

use LLM::Data::ContentTag;

# Define classification rules: tag name → trigger keywords
my $classifier = LLM::Data::ContentTag::Classifier.new(
    :rules(%(
        confidential => <secret classified restricted>,
        technical    => <algorithm database server>,
        creative     => <story poem song>,
    )),
    :restricted('confidential'),  # Tags that need a local/unrestricted model
);

# Classify content
my $tags = $classifier.classify('Deploy the database server update.');
say $tags.has-tag('technical');          # True
say $tags.needs-unrestricted-model;     # False
say $tags.all-tags;                     # (technical)

# Classify from metadata
my $tags2 = $classifier.classify-from-metadata(%(
    confidential => True,
    technical    => False,
));

# Route to appropriate backend
my $router = LLM::Data::ContentTag::Router.new(
    :default-backend('cloud-api'),
);
$router.add-route('local-model', 'confidential', 'restricted');
$router.add-route('reasoning-model', 'technical');

say $router.select-backend($tags);      # "reasoning-model"

DESCRIPTION

LLM::Data::ContentTag provides content classification and model routing for LLM data generation pipelines. Tags and rules are fully configurable — no hardcoded categories.

LLM::Data::ContentTag::Tags

Immutable content tag set. Tags are arbitrary string keys with boolean values.

my $t = LLM::Data::ContentTag::Tags.new(
    :tags(%(confidential => True, draft => True)),
    :restricted('confidential'),   # Which tags need an unrestricted model
);

$t.has-tag('confidential');        # True
$t.has-tag('missing');             # False
$t.needs-unrestricted-model;      # True (confidential is restricted and true)
$t.all-tags;                      # List of tag names that are true
$t.to-hash;                      # Serializable Hash
LLM::Data::ContentTag::Tags.from-hash(%data);  # Deserialize

LLM::Data::ContentTag::Classifier

Assigns tags to content via configurable keyword rules.

my $c = LLM::Data::ContentTag::Classifier.new(
    :rules(%(
        tag-name => <keyword1 keyword2 keyword3>,
    )),
    :restricted('tag-name'),       # Optional: tags needing unrestricted model
);

$c.classify(Str $content --> Tags);           # Match keywords (case-insensitive)
$c.classify-from-metadata(%meta --> Tags);    # Set tags from a metadata hash

LLM::Data::ContentTag::Router

Maps content tags to backend identifiers. Routes are checked in order — first match wins.

my $r = LLM::Data::ContentTag::Router.new(
    :default-backend('cloud-api'),
);

# Route if content has ANY of these tags
$r.add-route('local-model', 'confidential', 'restricted');
$r.add-route('reasoning-model', 'technical');

$r.select-backend($tags --> Str);  # Returns matching backend or default

AUTHOR

Matt Doughty matt@apogee.guru

COPYRIGHT AND LICENSE

Copyright 2026 Matt Doughty

This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.