⚠︎ Warning ⚠︎
v0.11+ is mostly backwards compatible with v.0.10 and prior. The following is not backwards compatible from v0.10-:
- Heavy extensions introspection (possible in v0.12, but via new API)
- Grandfathered / legacy tags (possible in v0.12, but via new API)
- Creation by means other than a
Support for all will be addressed in forthcoming updates.
use Intl::LanguageTag; # ← Imports as 'LanguageTag'
use Intl::LanguageTag::BCP-47; # ← Imports as 'LanguageTag::BCP-47'
use Intl::LanguageTag::BCP-47 <enums>; # ← Include enums
use Intl::LanguageTag::BCP-47 <utils>; # ← Include lookup-language-tags
# and filter-language-tags subs
# Create a new LanguageTag from a string
Most of the time,
use Intl::LanguageTag is what you will want (the BCP-47 tag type is set as the default for a reason).
use Intl::LanguageTag::BCP-47 when interacting with other language tag types in the same scope to avoid a symbol collision.
Everything is value typed! This means you can easily use them in sets, bags, and mixes and routines like
unique will operate as you'd expect.
Once you've created a language tag, you have the following simple methods to introspect it.
The language code.
Introspections: .well-formed, .valid, .preferred, .deprecated, .macrolanguage, .default-script)
The script used, will be omitted if the same as the default script.
Introspections: .well-formed, .valid, .deprecated)
The region code
Introspections: .well-formed, .valid, .deprecated, .preferred)
The variant codes. This object provides positional access to its subtags.
Introspections: .well-formed, .valid, .valid-for-tag, .deprecated, .prefixes)
Any extensions. This object provides hashy access (currently recognizing
Any private use tags. This object provides positional access to its subtags.
Each of the above will stringify into the exact code, but also has introspective methods.
.language.default-script tells you what the default script for the language is.
Language tags are canonicalized to the extent possible upon creation.
This is done in accordance with BCP 47, RFC 6067, RFC 6497, and TR 35 and helps to guarantee value typing mechanics.
Most likely, you may notice that a script will disappear.
Less likely, if you use grandfathered tags, tags like
i-navajo will be automatically converted to their preferred form (
nv) when those exist.
There are five grandfathered tags without preferred forms which will preserve the entire tag as the “language” (e.g.
i-default), and issue a warning since those tags should not be used.
Extended languages tags are preserved, and with on-demand and automatic conversion to preferred forms planned for a future release.
If you include
<utils> in your use statement, you will have access to two subs to aid working with language tags.
They are the following:
sub filter-language-tags(@source, $filter, :$basic = False)
This performs a 'filter' operation. The source is a list of BCP47 objects, and the filter is also a BCP47.
When in basic mode, all source tags that are identical to, or begin with tags identical to the filter are returned.
sub lookup-language-tags(@available, @preferred, $default)
Performs a 'lookup' operation to return an optimally matching language tag.
A common usage might be in an HTML server to receive the client's
@preferred languages and compare them
@available languages on the server. The
.head is the best language to use (or use
:single if you have no need for backup choices).
If the names of these functions is too verbose, you can alias them easily by doing
my &filter = filter-language-tags.
In likely order of completion:
- Better introspection of extensions U / T
- Logical cloning with changes
- Better validation (
- Extlang and grandfathered tag support (via automatic canonicalization)
- Improve filtering by enabling wildcards in matches.
- More exhaustive test files
- Updated to IANA subtag registry dated 2023-03-22
- Fixed minor signature bug with
EmptyExtension (found thanks to
- Updated to IANA subtag registry dated 2023-02-14
- Fixed an issue with a potentially consumed sequence when accessing extensions
- Updated to IANA subtag registry dated 2022-08-08
- Moved to zef ecosystem
- Extensions and their subelements no longer prefix their type in
.Str (technically not backwards compatible) for more intuitive use
- Some canonicalization during creation (making
- Smart introspection (
Latn, as it's the specified default)
- Updated to IANA subtag registry dated 2021-05-11
- 'Shortcuts' available for introspection to Unicode extensions T / U
- API makes this available for modules creating custom extensions (although it's unlikely anyone will)
- Internal code overhaul for better long-term maintenance
- Tags are now value types
- All tags automatically canonicalize upon creation.
- Update to IANA subtag registry dated 2020-12-18
- Added a
COERCE(Str) method for Raku's new coercion protocol.
- Final update before near total reworking of the innards for better performance, code cleanliness/maintainability, etc.
- Updated to IANA subtag registry dated 2020-06-10
- Temporarily removed
is DEPRECATED from
LanguageTag::Script::type until more extensive recoding can be done.
- Preliminary Wildcard support
- Updated to IANA subtag registry dated 2019-09-16
- Language and Region enums available under Language:: and Region:: namespaces
- Preliminary semantic support for the T extension (U support still very early)
- Preliminary creation of POD6 documentation (both inline and separate).
- Particularly evident for the T extension
- Lookups available (no wildcard support yet)
- Added initial support for parsing -t and -u extensions.
- Added initial support for grandfathered tags
- Fixed bug on parsing variants when no region code was present
- Laid groundwork for various validation and canonicalization options (not fully implemented yet)
All files (unless noted otherwise) can be used, modified and redistributed
under the terms of the Artistic License Version 2. Examples (in the
documentation, in tests or distributed as separate files) can be considered
“Unless noted otherwise”
The resources directory "cldr" contains the contents of the "bcp47" folder
of the Unicode CLDR data. These files are copyrighted by Unicode, Inc., and
are available and distributed in accordance with
The resources file “language-subtag-registry” comes from the
IANA. I do not
currently distribute it because I am not aware of its exact license, but it
will be automatically downloaded when running the parsing script. Its data
is not needed for distribution, and so is gitignored