Revision history for LLM::Classifiers::Emotions
0.1.3 2026-04-24T06:47:14+01:00
- test.yml: exclude windows-latest × hf-fallback from the
matrix. The HF fallback goes through Cro::HTTP →
IO::Socket::Async::SSL, which on the Windows runner fails
with "Server certificate verification failed: unable to
get local issuer certificate" — a known Raku-on-Windows
CA-bundle issue, not fixable inside this module. The
primary path (our hosted Release) works on Windows fine
and is still tested under windows-latest × default.
0.1.2 2026-04-24T06:26:59+01:00
- resources/checksums.txt: populate the tarball sha256
produced by the first hf-mirror.yml run
(2257f061…74ca1). Was previously a placeholder of
zeros — macOS/Linux survived via the HF fallback path but
Windows failed to fall through cleanly, so the primary
path is the right one to make work on every platform.
0.1.1 2026-04-24T06:10:52+01:00
- META6: fix HuggingFace::API build-depends version spec.
Was `ver<0.2.0+>` (invalid — the `+` suffix isn't part of
Raku's identity syntax and zef couldn't resolve it). Now
`ver<0.2.0>`, matching the pinned-exact convention every
other module in the monorepo uses (Cro::HTTP:ver<0.8.11>,
JSON::Fast:ver<0.19>, etc.). Requires HuggingFace::API
0.2.0 to be published to the ecosystem.
0.1.0 2026-04-24T06:02:15+01:00
- Initial release.
- Workflow triggers aligned with the CRoaring / Tokenizers /
ONNX-Native convention: hf-mirror.yml fires on
workflow_dispatch OR push of a `binaries-*` tag (releases
are deliberate, not incidental). Added pushed-tag-vs-
BINARY_TAG mismatch guard.
- .github/workflows/test.yml: runs `zef install .` + prove6
on macOS / Linux / Windows for every branch push and PR,
ignoring binaries-* tags so tag-push doesn't double-fire.
Matrix dimension `install-mode` exercises both the default
path (our Release primary, HuggingFace secondary) AND the
explicit `LLM_EMOTIONS_FROM_HF=1` backup path so both code
paths stay green.
- 28-way text emotion classifier built on ONNX::Native +
Tokenizers, backed by Cohee's quantized DistilBERT
go-emotions model on HuggingFace
(Cohee/distilbert-base-uncased-go-emotions-onnx at revision
d22488bc83be87678f12eee8a3f65a65de94ef85).
- Public class `LLM::Classifiers::Emotions::Classifier` with
`new`, `classify`, `top`, `labels`, `raw-logits`, `dispose`.
Softmax + argmax + top-k pipeline on top of the ONNX session;
returns a score-sorted `List[Hash]` or a single argmax label.
- Fully deterministic on the CPU execution provider —
bit-identical logits across runs for the same input on the
same ONNX Runtime version. Tests assert this invariant.
- CoreML supported via `:providers`; scores diverge
at the ~1e-3 level but the argmax label agrees with CPU.
- Build.rakumod stages the three model files
(tokenizer.json, config.json, model.onnx) under
$XDG_DATA_HOME/LLM-Classifiers-Emotions// at
install time. Primary download path: this repo's GitHub
Release. Fallback: direct HuggingFace download via
HuggingFace::API at the pinned revision.
- Input truncation: 512 tokens (DistilBERT's positional
embedding limit; Cohee's tokenizer.json has truncation
disabled, so the classifier handles it internally).
- Env knobs: LLM_EMOTIONS_BINARY_URL,
LLM_EMOTIONS_BINARY_ONLY, LLM_EMOTIONS_FROM_HF,
LLM_EMOTIONS_CACHE_DIR, LLM_EMOTIONS_DATA_DIR,
LLM_EMOTIONS_MODEL_DIR.
- Exceptions: X::LLM::Classifiers::Emotions::ModelMissing,
X::LLM::Classifiers::Emotions::InvalidConfig.
- New CI workflow: .github/workflows/hf-mirror.yml fetches
the three files from HuggingFace at the pinned revision,
sha256-verifies, bundles into
cohee-goemotions-.tar.gz, and publishes to a
GitHub Release on manual dispatch or BINARY_TAG change.
- Tests: t/01-smoke (constructor + labels + softmax invariant),
t/02-known-inputs (argmax + score bounds + determinism for
seven hand-picked inputs), t/03-edge-cases (empty string,
>512-token input, unicode / emoji, min-score threshold,
disposed-classifier error path, 50-iteration leak check).
- Upstream dep: HuggingFace::API 0.2.0+ (extended this release
with `get-file` / `get-file-blob` / `get-file-to-file` and
`:$revision` on existing tokenizer methods; back-compatible
with 0.1.x callers).