Rand Stats

Encoding::Emacs

zef:winfredraj

Encoding::Emacs

Auto-generated character encodings from Emacs MULE definitions. Provides 80+ legacy character encodings for Raku, including ISO-8859, Windows codepages, DOS codepages, KOI8, EBCDIC, and more.

Installation

zef install Encoding::Emacs

Or from source:

git clone https://github.com/winfred-raj/encoding-emacs.git
cd encoding-emacs
zef install .

Quick Start

#!/usr/bin/env raku
use Encoding::Emacs::Full::Cp850;

# Decode CP850 bytes (DOS encoding with box-drawing chars)
my $bytes = Buf.new(0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0x9D);  # "Hello Ø"
my $decoder = Encoding::Emacs::Full::Cp850.decoder;
my $text = $decoder.consume($bytes);
say $text;  # Output: Hello Ø

# Encode Unicode string to CP850
my $encoder = Encoding::Emacs::Full::Cp850.encoder;
my $encoded = $encoder.encode-chars("Grüß Gott");
say $encoded;  # Output: Buf:0x<47 72 81 E1 20 47 6F 74 74>

Supported Encodings

ISO-8859 Series (Latin)

Windows Codepages

DOS Codepages

KOI8 Variants

Other Legacy Encodings

Usage Examples

Decode Legacy Files

use Encoding::Emacs::Full::Iso_8859_5;  # Cyrillic

my $bytes = slurp("old-file.txt", :bin);
my $decoder = Encoding::Emacs::Full::Iso_8859_5.decoder;
my $text = $decoder.consume($bytes);
say $text;  # Readable Russian text

Encode for Legacy Systems

use Encoding::Emacs::Full::Windows_1252;

my $unicode-text = "Café résumé";
my $encoder = Encoding::Emacs::Full::Windows_1252.encoder;
my $bytes = $encoder.encode-chars($unicode-text);
spurt("output.txt", $bytes);

Round-Trip Conversion

use Encoding::Emacs::Full::Cp866;  # Russian DOS

my $original = "Привет мир!";
my $encoder = Encoding::Emacs::Full::Cp866.encoder;
my $decoder = Encoding::Emacs::Full::Cp866.decoder;

my $encoded = $encoder.encode-chars($original);
my $decoded = $decoder.consume($encoded);

say $decoded eq $original;  # True

How It Works

This module parses Emacs' MULE (Multilingual Environment) charset definitions and mapping tables to automatically generate Raku encoding classes. Each encoding:

  1. Lazy loads mapping tables on first use
  2. Caches mappings for performance
  3. Uses hash-based O(1) lookups for encode/decode
  4. Implements Raku's Encoding::Encoder and Encoding::Decoder roles

Performance

Architecture

Encoding::Emacs/
├── Parser.rakumod          # Parses Emacs Lisp charset definitions
├── Generator.rakumod       # Generates Raku encoding classes
├── MappingTable.rakumod    # Runtime mapping table loader
└── Full/                   # 81 generated encoding classes
    ├── Iso_8859_5.rakumod
    ├── Cp850.rakumod
    ├── Windows_1252.rakumod
    └── ...

Development

Generate Encodings

raku bin/generate-encodings.raku emacs-src/mule-conf.el lib/Encoding/Emacs/Full

Run Tests

raku -I lib t/01-parser.rakutest          # Parser tests
raku -I lib t/02-generator.rakutest       # Generator tests
raku -I lib t/03-mapping-table.rakutest   # Mapping loader tests
raku -I lib t/04-encoding-decoding.rakutest  # Encode/decode tests
raku -I lib t/05-phase3-encode-decode.rakutest  # Round-trip tests
raku -I lib t/06-cp850-encoding.rakutest  # CP850 specific tests

All 52 tests should pass.

Status

Limitations

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

Artistic License 2.0 (same as Raku)

Author

Winfred Raj

See Also

Acknowledgments

Built using Emacs' comprehensive MULE (Multilingual Environment) charset definitions and mapping tables, originally created by the GNU Emacs team.