Intl::LanguageTag
Usage
use Intl::LanguageTag; # ← Imports as 'LanguageTag'
use Intl::LanguageTag::BCP-47; # ← Imports as 'LanguageTag::BCP-47'
use Intl::LanguageTag::BCP-47 <enums>; # ← Include enums
use Intl::LanguageTag::BCP-47 <utils>; # ← Include lookup-language-tags
# and filter-language-tags subs
# Create a new LanguageTag from a string
LanguageTag.new: 'en-US';
Which to use
Most of the time, use Intl::LanguageTag
is what you will want (the BCP-47 tag type is set as the default for a reason).
Prefer use Intl::LanguageTag::BCP-47
when interacting with other language tag types in the same scope to avoid a symbol collision.
Features
Everything is value typed! This means you can easily use them in sets, bags, and mixes and routines like unique
will operate as you'd expect.
Once you've created a language tag, you have the following simple methods to introspect it.
.language
The language code.
Introspections: .well-formed, .valid, .preferred, .deprecated, .macrolanguage, .default-script).script
The script used, will be omitted if the same as the default script.
Introspections: .well-formed, .valid, .deprecated).region
The region code
Introspections: .well-formed, .valid, .deprecated, .preferred).variants
The variant codes. This object provides positional access to its subtags.
Introspections: .well-formed, .valid, .valid-for-tag, .deprecated, .prefixes).extensions
Any extensions. This object provides hashy access (currently recognizing <t>
and <u>
).private-use
Any private use tags. This object provides positional access to its subtags.
Each of the above will stringify into the exact code, but also has introspective methods.
For instance, .language.default-script
tells you what the default script for the language is.
Canonicalization
Language tags are canonicalized to the extent possible upon creation.
This is done in accordance with BCP 47, RFC 6067, RFC 6497, and TR 35 and helps to guarantee value typing mechanics.
Most likely, you may notice that a script will disappear.
Less likely, if you use grandfathered tags, tags like i-navajo
will be automatically converted to their preferred form (nv
) when those exist.
There are five grandfathered tags without preferred forms which will preserve the entire tag as the “language” (e.g. i-default
), and issue a warning since those tags should not be used.
Extended languages tags are preserved, and with on-demand and automatic conversion to preferred forms planned for a future release.
Utility functions
If you include <utils>
in your use statement, you will have access to two subs to aid working with language tags.
They are the following:
sub filter-language-tags(@source, $filter, :$basic = False)
This performs a 'filter' operation. The source is a list of BCP47 objects, and the filter is also a BCP47.
When in basic mode, all source tags that are identical to, or begin with tags identical to the filter are returned.sub lookup-language-tags(@available, @preferred, $default)
Performs a 'lookup' operation to return an optimally matching language tag.
A common usage might be in an HTML server to receive the client's @preferred
languages and compare them
to the @available
languages on the server. The .head
is the best language to use (or use :single
if you have no need for backup choices).
If the names of these functions is too verbose, you can alias them easily by doing my &filter = filter-language-tags
.
Usage notes
Major architectural changes were made between v0.10 and v0.11. Some functionality is not backwards compatible,
however, I am not aware of any module making use of those previous features. The changes include
how introspection and grandfathered tags work, removal of creation by anything other than a string, removal of enums,
removal of LanguageTagFilter
objects.
Todo
In likely order of completion:
- Restore
enum
access - Better introspection of extensions U / T
- Logical cloning with changes
- Better validation (
.deprecated
, .valid
, etc) - Extlang and grandfathered tag support (via automatic canonicalization)
- Improve filtering by enabling wildcards in matches.
- More exhaustive test files
Version history
- 0.12.4
- Updated to IANA subtag registry dated 2024-06-14
- 0.12.3
- Updated to IANA subtag registry dated 2023-03-22
- 0.12.2
- Fixed minor signature bug with
EmptyExtension
(found thanks to Intl::Format::Number
's <u><nu>
usage) - Updated to IANA subtag registry dated 2023-02-14
- 0.12.1
- Fixed an issue with a potentially consumed sequence when accessing extensions
- Updated to IANA subtag registry dated 2022-08-08
- Moved to zef ecosystem
- 0.12.0
- Extensions and their subelements no longer prefix their type in
.Str
(technically not backwards compatible) for more intuitive use - Some canonicalization during creation (making
en-Latn-US-u-rg-us
yields en-US
) - Smart introspection (
LanguageTag.new('en').script
yields Latn
, as it's the specified default) - Updated to IANA subtag registry dated 2021-05-11
- 'Shortcuts' available for introspection to Unicode extensions T / U
- API makes this available for modules creating custom extensions (although it's unlikely anyone will)
- 0.11.0
- Internal code overhaul for better long-term maintenance
- Added
LanguageTaggish
role - Tags are now value types
- All tags automatically canonicalize upon creation.
- 0.10.0
- Update to IANA subtag registry dated 2020-12-18
- Added a
COERCE(Str)
method for Raku's new coercion protocol. - Final update before near total reworking of the innards for better performance, code cleanliness/maintainability, etc.
- 0.9.1
- Updated to IANA subtag registry dated 2020-06-10
- Temporarily removed
is DEPRECATED
from LanguageTag::Script::type
until more extensive recoding can be done.
- 0.9.0
- Preliminary Wildcard support
- Updated to IANA subtag registry dated 2019-09-16
- Language and Region enums available under Language:: and Region:: namespaces
- Preliminary semantic support for the T extension (U support still very early)
- Preliminary creation of POD6 documentation (both inline and separate).
- Particularly evident for the T extension
- 0.8.5
- Lookups available (no wildcard support yet)
- 0.8.3
- Added initial support for parsing -t and -u extensions.
- Added initial support for grandfathered tags
- Fixed bug on parsing variants when no region code was present
- Laid groundwork for various validation and canonicalization options (not fully implemented yet)
License
All files (unless noted otherwise) can be used, modified and redistributed
under the terms of the Artistic License Version 2. Examples (in the
documentation, in tests or distributed as separate files) can be considered
public domain.
“Unless noted otherwise”
The resources directory "cldr" contains the contents of the "bcp47" folder
of the Unicode CLDR data. These files are copyrighted by Unicode, Inc., and
are available and distributed in accordance with
their terms.
The resources file “language-subtag-registry” comes from the
IANA. I do not
currently distribute it because I am not aware of its exact license, but it
will be automatically downloaded when running the parsing script. Its data
is not needed for distribution, and so is gitignored