Unicode grapheme cluster boundary detection
say GCB.always(0x600, 0x30);
"\c[REGIONAL INDICATOR SYMBOL LETTER G]".ord,
"\c[REGIONAL INDICATOR SYMBOL LETTER B]".ord);
Implements the Unicode 9.0 grapheme cluster boundary rules or Unicode 11.0 grapheme cluster boundary rules depending on the Rakudo version in use.
In contrast to earlier versions of the standard, it is no longer possible
to unambiguously decide if there's a cluster break between two Unicode
characters by looking at just these two characters.
In particular, there's a break between a pair of regional indicator symbols
only if the first symbol has already been paired up with another indicator
and there's no break between extension characters and emoji modifiers if the
current cluster forms an emoji sequence. [FIXME: Unicode 11.0 rules]
Therefore, the module provides two different methods
GCB.maybe() which both expect two Unicode codepoints as arguments.
GCB.clusters() expects a
Uni object as argument and returns
a sequence of such objects split along cluster boundaries.
Bugs and Development
Development happens at GitHub. If you found a bug or have a feature
request, use the issue tracker over there.
Copyright and License
Copyright (C) 2016 by email@example.com
Distributed under the Boost Software License, Version 1.0