Rand Stats

Text::Homoglyph::Cyrillic

zef:slavenskoj

Text::Homoglyph::Cyrillic

Identifies and replaces homoglyphic characters (visually identical or nearly identical characters) with their proper Cyrillic equivalents. Language independent. It simply changes all homoglyphs to Cyrillic letters. If you'd like to keep Latin letters for any reason in your text, you'll have to implement additional language specific logic first.

Problem

Text containing Cyrillic script corrupted or modified with look-alike characters from other Unicode blocks is difficult to process correctly. For example:

This creates issues with:

Installation

zef install Text::Homoglyph::Cyrillic

Usage

Basic Cleaning

use Text::Homoglyph::Cyrillic;

# Clean mixed text
my $dirty = "мoре";  # Contains Latin "o"
my $clean = clean-cyrillic($dirty);
say $clean;  # "море" (proper Cyrillic)

# More examples
say clean-cyrillic("pека");       # "река" 
say clean-cyrillic("Αлександр");  # "Александр"

Detecting Look-alikes

# Find all look-alike characters in text
my @lookalikes = detect-cyrillic-lookalikes("мoре");
for @lookalikes -> %item {
    say "Found '{%item<char>}' at positions {%item<positions>}, should be '{%item<replacement>}'";
}
# Output: Found 'o' at positions [1], should be 'о'

Verbose Cleaning

# Get detailed information about the cleaning process
my %result = clean-cyrillic-verbose("Пpивет, мир!");
say "Original: {%result<original>}";
say "Cleaned:  {%result<cleaned>}";
say "Changed:  {%result<changed>}";
say "Replacements: {%result<replacements>}";

API Reference

Functions

clean-cyrillic(Str $text --> Str)

Cleans the input text by replacing look-alike characters with proper Cyrillic equivalents.

detect-cyrillic-lookalikes(Str $text --> Array)

Detects all look-alike characters in the text without modifying it.

clean-cyrillic-verbose(Str $text --> Hash)

Performs cleaning and returns detailed information about the process.

Use Cases

Contributing

https://github.com/slavenskoj/Raku-Text-Homoglyph-Cyrillic

Testing

Run the test suite:

prove -e "raku -I lib" t/

License

This module is available under the Artistic License 2.0.

Author

Danslav Slavenskoj

See Also