Rand Stats

Tokenizers

zef:apogee
Revision history for Tokenizers

0.2.1  2026-04-15T23:58:11+01:00
    - Build.rakumod: detect system glibc via `ldd --version` and
      fall back to cargo source compile when it's older than the
      prebuilt target (currently v2.35, matching the ubuntu-22.04 CI
      runner). Previously, users on Ubuntu 20.04 / Debian 11 / RHEL 8
      downloaded a prebuilt .so that loaded but failed at first
      symbol use with "GLIBC_2.xx not found". The guard fires before
      the download so affected users just see a one-line note and a
      ~5min cargo build instead of a broken install.
      TOKENIZERS_BINARY_ONLY=1 now hard-fails with a clear message
      on old-glibc systems rather than producing a broken install.
    - New CI workflow .github/workflows/glibc-fallback.yml: runs
      `zef install .` inside an ubuntu:20.04 container (glibc 2.31)
      with rustup-installed toolchain and asserts both that the
      fallback message appears in the build log and that the
      source-compiled .so loads.

0.2.0  2026-04-14T23:55:42+01:00
    - Prebuilt-binary-first install path. Build.rakumod now attempts
      to download a prebuilt .dylib / .so / .dll from the repo's
      GitHub Releases for the detected (OS, arch) pair before
      falling back to the existing `make` (cargo build --release)
      source compilation. Shaves the ~5-minute cargo build from the
      default install flow.
    - SHA256 verification against bundled resources/checksums.txt;
      refuses any prebuilt whose hash isn't recorded (hard security
      boundary — no downloaded checksums trusted).
    - Cache downloaded artefacts in $XDG_CACHE_HOME/Tokenizers-binaries/
      (or $HOME/.cache/ fallback) so reinstalls skip the network.
    - macOS publishes a universal dylib (arm64 + x86_64 slices in one
      fat binary) built via lipo from per-slice cargo builds. One
      macOS artefact covers both Apple architectures.
    - New env knobs: TOKENIZERS_BUILD_FROM_SOURCE=1 to skip prebuilts;
      TOKENIZERS_BINARY_ONLY=1 to refuse fallback; TOKENIZERS_BINARY_URL
      to override the release base URL; TOKENIZERS_CACHE_DIR to
      override cache location; TOKENIZERS_LIB for runtime lib path
      override.
    - New BINARY_TAG file at repo root as single source of truth for
      the pinned binary release tag (binaries-tokenizers--
      r), read by both Build.rakumod and the CI
      workflow.
    - New .github/workflows/build-binaries.yml: builds + publishes
      prebuilt artefacts for five platforms (macOS universal, Linux
      x86_64/aarch64 glibc, Windows x86_64/arm64) via Rust
      cross-compilation targets on manual dispatch or binaries-*
      tag push. Cargo cache keyed by Cargo.lock so repeat builds hit
      warm caches.
    - FFI lookup in Tokenizers::Wrapper now respects the
      TOKENIZERS_LIB env override before falling back to %?RESOURCES.