Lang::JA::Kana - Japanese Kana Conversion Utilities
Languages: English • 日本語
Documentation:
Overview
Lang::JA::Kana is a Raku module for converting between different Japanese kana scripts (Hiragana and Katakana) and their various forms. It provides support for modern kana, historical variants, half-width characters, and specialized Unicode symbols.
Features
- Bidirectional Script Conversion: Seamless conversion between Hiragana and Katakana
- Half-width Support: Handling of half-width katakana (ハンカク) conversion
- Historical Kana: Support for Hentaigana (変体仮名) and obsolete characters
- Modern Extensions: Foreign sound adaptations (ファ, ティ, ウィ, etc.)
- Specialized Symbols: Circled and squared katakana processing
- Sound Mark Analysis: Diacritical mark separation and analysis
- Cross-script Integration: Built-in integration with Romaji, Cyrillic, and Hangul converters
Installation
use Lang::JA::Kana;
Basic Usage
Hiragana ↔ Katakana Conversion
use Lang::JA::Kana;
# Basic conversions
say to-katakana("こんにちは"); # → コンニチハ
say to-hiragana("コンニチハ"); # → こんにちは
# With modern extensions
say to-katakana("ふぁみりー"); # → ファミリー
say to-hiragana("ファミリー"); # → ふぁみりー
# Combination sounds (拗音)
say to-katakana("きゃりーぱみゅぱみゅ"); # → キャリーパミュパミュ
say to-hiragana("キャリーパミュパミュ"); # → きゃりーぱみゅぱみゅ
# Mixed text (non-kana characters pass through unchanged)
say to-katakana("Hello こんにちは World"); # → Hello コンニチハ World
say to-hiragana("Hello コンニチハ World"); # → Hello こんにちは World
Half-width Katakana Conversion
# Half-width to full-width conversion
say to-fullwidth-katakana("アイウエオ"); # → アイウエオ
say to-fullwidth-katakana("ガギグ"); # → ガギグ (voiced combinations)
say to-fullwidth-katakana("パピプ"); # → パピプ (semi-voiced combinations)
# Full-width to half-width conversion
say to-halfwidth-katakana("アイウエオ"); # → アイウエオ
say to-halfwidth-katakana("ガギグ"); # → ガギグ
say to-halfwidth-katakana("パピプ"); # → パピプ
# Integration with other conversions
say to-hiragana("カタカナ"); # → かたかな (auto-converts half-width)
Character Support
Standard Kana
All 50-sound (五十音) characters:
# Basic vowels (母音)
to-katakana("あいうえお"); # → アイウエオ
# K-series (カ行)
to-katakana("かきくけこ"); # → カキクケコ
to-katakana("がぎぐげご"); # → ガギグゲゴ
# S-series (サ行)
to-katakana("さしすせそ"); # → サシスセソ
to-katakana("ざじずぜぞ"); # → ザジズゼゾ
# T-series (タ行)
to-katakana("たちつてと"); # → タチツテト
to-katakana("だぢづでど"); # → ダヂヅデド
# N-series (ナ行)
to-katakana("なにぬねの"); # → ナニヌネノ
# H-series (ハ行)
to-katakana("はひふへほ"); # → ハヒフヘホ
to-katakana("ばびぶべぼ"); # → バビブベボ (voiced)
to-katakana("ぱぴぷぺぽ"); # → パピプペポ (semi-voiced)
# M-series (マ行)
to-katakana("まみむめも"); # → マミムメモ
# Y-series (ヤ行)
to-katakana("やゆよ"); # → ヤユヨ
# R-series (ラ行)
to-katakana("らりるれろ"); # → ラリルレロ
# W-series (ワ行) and N
to-katakana("わゐゑをん"); # → ワヰヱヲン
Small Kana (小文字)
# Small vowels
to-katakana("ぁぃぅぇぉ"); # → ァィゥェォ
# Small Y-sounds
to-katakana("ゃゅょ"); # → ャュョ
# Small tsu (促音)
to-katakana("っ"); # → ッ
# Small wa
to-katakana("ゎ"); # → ヮ
Combination Sounds (拗音)
# Y-combinations
to-katakana("きゃきゅきょ"); # → キャキュキョ
to-katakana("しゃしゅしょ"); # → シャシュショ
to-katakana("ちゃちゅちょ"); # → チャチュチョ
to-katakana("にゃにゅにょ"); # → ニャニュニョ
to-katakana("ひゃひゅひょ"); # → ヒャヒュヒョ
to-katakana("みゃみゅみょ"); # → ミャミュミョ
to-katakana("りゃりゅりょ"); # → リャリュリョ
# Voiced Y-combinations
to-katakana("ぎゃぎゅぎょ"); # → ギャギュギョ
to-katakana("じゃじゅじょ"); # → ジャジュジョ
to-katakana("びゃびゅびょ"); # → ビャビュビョ
to-katakana("ぴゃぴゅぴょ"); # → ピャピュピョ
Modern Extensions for Foreign Sounds
# F-sounds
to-katakana("ふぁふぃふぇふぉ"); # → ファフィフェフォ
# T/D-sounds
to-katakana("てぃでぃ"); # → ティディ
to-katakana("とぅどぅ"); # → トゥドゥ
# W-sounds
to-katakana("うぃうぇうぉ"); # → ウィウェウォ
# V-sounds
to-katakana("ゔぁゔぃゔぇゔぉ"); # → ヴァヴィヴェヴォ
# Kw/Gw-sounds
to-katakana("くぁくぃくぇくぉ"); # → クァクィクェクォ
to-katakana("ぐぁぐぃぐぇぐぉ"); # → グァグィグェグォ
# Ts-sounds
to-katakana("つぁつぃつぇつぉ"); # → ツァツィツェツォ
# Other combinations
to-katakana("ちぇじぇしぇいぇ"); # → チェジェシェイェ
Historical and Obsolete Kana
# Historical Wi/We/Wo
to-katakana("ゐゑを"); # → ヰヱヲ
# VU sound
to-katakana("ゔ"); # → ヴ
# Digraph Yori
to-katakana("ゟ"); # → ヿ
Hentaigana (変体仮名) Support
Hentaigana are historical variant forms of kana characters used before standardization. The module provides support for these Unicode characters.
Hentaigana to Hiragana Conversion
# Basic conversion
say hentaigana-to-hiragana("𛀁𛀂𛀃"); # → あいう
# Multiple readings (some Hentaigana have ambiguous readings)
say hentaigana-to-hiragana("𛀒"); # → し・せ (can be "shi" or "se")
say hentaigana-to-hiragana("𛁆"); # → ゐ・い (can be "wi" or "i")
# Complex examples
say hentaigana-to-hiragana("𛀆𛀈𛀊"); # → かきく
Hiragana to Hentaigana Conversion
# Single variant
say hiragana-to-hentaigana("る"); # → 𛁂
# Multiple variants (shows all possibilities)
say hiragana-to-hentaigana("あ"); # → 𛀁・𛄀・𛄁
say hiragana-to-hentaigana("し"); # → 𛀒・𛀖・𛂡
# With voiced marks
say hiragana-to-hentaigana("が"); # → (variants)゛
Hentaigana Character Origins
Many Hentaigana derive from specific Chinese characters (kanji):
- 𛀁 (A): From 安 (an)
- 𛀂 (I): From 以 (i)
- 𛀃 (U): From 宇 (u)
- 𛀆 (KA): From 加 (ka)
- 𛀈 (KI): From 幾 (ki)
- 𛀐 (SA): From 左 (sa)
- 𛀒 (SHI/SE): From 之 (shi/se) - ambiguous reading
- 𛀚 (TA): From 太 (ta)
- 𛀜 (CHI): From 知 (chi)
Sound Mark Processing
The module provides utilities for analyzing and manipulating diacritical marks (濁点・半濁点).
Sound Mark Splitting
# Split voiced characters into base + mark
my @parts = split-sound-marks("が");
say @parts[0]; # → か (base character)
say @parts[1]; # → ゛ (voiced mark)
# Split semi-voiced characters
@parts = split-sound-marks("ぱ");
say @parts[0]; # → は (base character)
say @parts[1]; # → ゜ (semi-voiced mark)
# Regular characters return as-is
@parts = split-sound-marks("あ");
say @parts[0]; # → あ (no splitting)
Practical Applications
# Analyze character composition
sub analyze-kana($char) {
my @parts = split-sound-marks($char);
if @parts.elems == 2 {
say "$char = {@parts[0]} + {@parts[1]}";
} else {
say "$char = base character";
}
}
analyze-kana("が"); # → が = か + ゛
analyze-kana("ぱ"); # → ぱ = は + ゜
analyze-kana("あ"); # → あ = base character
Specialized Unicode Symbols
Circled Katakana
# Convert circled katakana to components
say decircle-katakana("㋐㋑㋒"); # → アイウ
say decircle-katakana("㋕㋖㋗"); # → カキク
# Convert components to circled katakana
say encircle-katakana("アイウ"); # → ㋐㋑㋒
say encircle-katakana("カキク"); # → ㋕㋖㋗
Squared Katakana (Units and Abbreviations)
# Convert squared katakana to full forms
say desquare-katakana("㌔"); # → キロ (kilo)
say desquare-katakana("㌧"); # → トン (ton)
say desquare-katakana("㍍"); # → メートル (meter)
say desquare-katakana("㍑"); # → リットル (liter)
# Convert full forms to squared katakana
say ensquare-katakana("キロ"); # → ㌔
say ensquare-katakana("メートル"); # → ㍍
# Complex examples
say desquare-katakana("㌔㍍"); # → キロメートル
say desquare-katakana("㍉㍍"); # → ミリメートル
Common Squared Katakana Units:
- ㌔ (キロ) - kilo
- ㌧ (トン) - ton
- ㍍ (メートル) - meter
- ㍑ (リットル) - liter
- ㍉ (ミリ) - milli
- ㌢ (センチ) - centi
- ㌦ (ドル) - dollar
- ㌫ (パーセント) - percent
- ㍗ (ワット) - watt
Cross-script Conversion Integration
The module seamlessly integrates with other script converters, automatically handling half-width conversion.
Romaji Conversion
# Automatic half-width handling
say kana-to-romaji("コンニチハ"); # → konnichiha
say kana-to-romaji("こんにちは"); # → konnichiha
# Multiple romanization systems
say kana-to-romaji("しんぶん", :system<hepburn>); # → shinbun
say kana-to-romaji("しんぶん", :system<kunrei>); # → sinbun
say kana-to-romaji("しんぶん", :system<nihon>); # → sinbun
# Sokuon (っ) handling
say kana-to-romaji("がっこう"); # → gakkou
say kana-to-romaji("ちょっと"); # → chotto
Cyrillic Conversion
# Polivanov system (default)
say kana-to-kuriru-moji("こんにちは"); # → конничиха
say kana-to-kuriru-moji("ひらがな"); # → хирагана
# Phonetic system
say kana-to-kuriru-moji("しんぶん", :system<phonetic>); # → шинбун
say kana-to-kuriru-moji("ちゃちゅちょ", :system<phonetic>); # → чачучо
# Slavic language variants
say kana-to-kuriru-moji("さくら", :system<ukrainian>); # → сакура
say kana-to-kuriru-moji("さくら", :system<serbian>); # → сакура
Hangul Conversion
# Standard system (default)
say kana-to-hangul("こんにちは"); # → 곤니치하
say kana-to-hangul("ひらがな"); # → 히라가나
# Academic system (with consonant doubling)
say kana-to-hangul("がっこう", :system<academic>); # → 깍코우
say kana-to-hangul("ばっば", :system<academic>); # → 빱빠
# Phonetic system (preserves Japanese pronunciation)
say kana-to-hangul("ちゃちゅちょ", :system<phonetic>); # → 치야치유치요
# Popular system (K-pop/media usage)
say kana-to-hangul("ちゅう", :system<popular>); # → 추우
Advanced Features
Mixed Text Processing
All functions handle mixed text gracefully, processing only kana characters:
say to-katakana("Hello こんにちは 123"); # → Hello コンニチハ 123
say to-hiragana("Hello コンニチハ 123"); # → Hello こんにちは 123
say to-fullwidth-katakana("Hello アイウ 123"); # → Hello アイウ 123
Chained Conversions
# Complex conversion chains
my $text = "コンニチハ"; # Half-width katakana
$text = to-fullwidth-katakana($text); # → コンニチハ
$text = to-hiragana($text); # → こんにちは
say kana-to-romaji($text); # → konnichiha
# Historical processing
$text = "𛀆𛀈𛀊"; # Hentaigana
$text = hentaigana-to-hiragana($text); # → かきく
$text = to-katakana($text); # → カキク
say encircle-katakana($text); # → ㋕㋖㋗
Empty String and Edge Case Handling
say to-katakana(""); # → "" (empty string)
say to-hiragana(""); # → "" (empty string)
say to-fullwidth-katakana(""); # → "" (empty string)
say hentaigana-to-hiragana(""); # → "" (empty string)
Character Coverage
Unicode Ranges Supported
- Hiragana: U+3040-U+309F (ひらがな)
- Katakana: U+30A0-U+30FF (カタカナ)
- Half-width Katakana: U+FF61-U+FF9F (ハンカク)
- Hentaigana: U+1B001-U+1B11E (𛀁-𛄟)
- Circled Katakana: U+32D0-U+32FE (㋐-㋾)
- Squared Katakana: U+3300-U+3357 (㌀-㍗)
Character Count
- Basic Hiragana/Katakana: 46 + 25 (voiced/semi-voiced) = 71 characters
- Small Kana: 10 characters
- Y-combinations: 33 combinations
- Modern Extensions: 25+ foreign sound adaptations
- Half-width Forms: 63 characters
- Hentaigana: 300+ historical variants
- Circled Katakana: 47 symbols
- Squared Katakana: 88 unit abbreviations
Optimization Features
- Longest-First Matching: Multi-character combinations processed before single characters
- Efficient Hash Lookups: O(1) character mapping using Raku hashes
- Minimal Regex Usage: Direct string substitution where possible
- Lazy Evaluation: Conversion tables computed only when needed
Best Practices
# Efficient: Single conversion call
my $result = to-katakana($large-text);
# Less efficient: Multiple small conversions
for @small-texts -> $text {
$result ~= to-katakana($text); # Consider batching
}
# Efficient: Reuse conversion results
my $katakana = to-katakana($text);
my $romaji = kana-to-romaji($katakana); # Uses already-converted katakana
Error Handling and Edge Cases
# Invalid or unknown characters are preserved
say to-katakana("こんにちは🎌"); # → コンニチハ🎌
say to-hiragana("カタカナ🗾"); # → かたかな🗾
# Mixed scripts handled appropriately
say to-katakana("ひらがなカタカナ"); # → ヒラガナカタカナ
say to-hiragana("カタカナひらがな"); # → かたかなひらがな
# Partial conversions work correctly
say to-fullwidth-katakana("Normal アイウ text"); # → Normal アイウ text
Ambiguous Character Handling
# Hentaigana with multiple readings
say hentaigana-to-hiragana("𛀒"); # → し・せ (shows all possibilities)
# Historical characters preserved if no modern equivalent
say to-katakana("古い𛀁文字"); # → 古イ𛀁文字 (𛀁 processed separately)
Integration Examples
Text Processing Pipeline
sub normalize-japanese-text($text) {
# Step 1: Convert half-width to full-width
my $normalized = to-fullwidth-katakana($text);
# Step 2: Standardize to hiragana for processing
$normalized = to-hiragana($normalized);
# Step 3: Convert historical kana
$normalized = hentaigana-to-hiragana($normalized);
# Step 4: Expand abbreviated forms
$normalized = desquare-katakana($normalized);
$normalized = decircle-katakana($normalized);
return $normalized;
}
# Example usage
my $text = "𛀁イウ㌔㋖";
say normalize-japanese-text($text); # → あいうキロキ
Multilingual Conversion
sub convert-to-all-scripts($japanese-text) {
# Normalize input
my $normalized = to-fullwidth-katakana($japanese-text);
return {
hiragana => to-hiragana($normalized),
katakana => to-katakana($normalized),
romaji => kana-to-romaji($normalized),
cyrillic => kana-to-kuriru-moji($normalized),
hangul => kana-to-hangul($normalized)
};
}
# Example usage
my %scripts = convert-to-all-scripts("コンニチハ");
say %scripts<hiragana>; # → こんにちは
say %scripts<romaji>; # → konnichiha
say %scripts<cyrillic>; # → конничиха
say %scripts<hangul>; # → 곤니치하
Limitations
- Kanji Processing: Does not convert Kanji characters (漢字)
- Context Sensitivity: Pure character-level conversion without semantic analysis
- Historical Accuracy: Hentaigana mappings based on Unicode standards, not historical manuscripts
- Regional Variants: Based on standard Japanese, not dialectal pronunciations
Use Cases
Educational Applications
- Japanese language learning materials
- Script conversion exercises
- Historical text modernization
- Unicode character reference
Text Processing
- Document normalization
- Search and indexing systems
- Legacy text conversion
- Character encoding migration
Digital Humanities
- Historical manuscript digitization
- Classical Japanese text processing
- Unicode compliance testing
- Script evolution research
Entertainment Industry
- Game localization
- Anime subtitle processing
- Manga text conversion
- Social media content adaptation
Contributing
Contributions are welcome. Please visit the project repository at:
https://github.com/slavenskoj/raku-lang-ja-kana
We apologize for any errors and welcome suggestions for improvements.
References
Unicode Standards
- Unicode Standard Annex #15: Unicode Normalization Forms
- Unicode block specifications for Japanese scripts
- Unicode Consortium Hentaigana guidelines
Academic Sources
- Japanese Ministry of Education kana standardization
- Historical kana usage studies
- Unicode Consortium technical reports
License
This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.
Author
Danslav Slavenskoj
For specific script conversions (Romaji, Cyrillic, Hangul), see the specialized README files: