Rand Stats

Intl::LanguageTag

zef:guifa

Intl::LanguageTag

###⚠︎ Warning ⚠︎ v0.11+ is mostly backwards compatible with v.0.10 and prior. The following is not backwards compatible from v0.10-:**

  • Heavy extensions introspection (possible in v0.12, but via new API)
  • Grandfathered / legacy tags (possible in v0.12, but via new API)
  • Creation by means other than a Str
  • Enums
  • LanguageTagFilter objects

Support for all will be addressed in forthcoming updates.

Usage

use Intl::LanguageTag;                  # ← Imports as 'LanguageTag'
use Intl::LanguageTag::BCP-47;          # ← Imports as 'LanguageTag::BCP-47'
use Intl::LanguageTag::BCP-47 <enums>;  # ← Include enums
use Intl::LanguageTag::BCP-47 <utils>;  # ← Include lookup-language-tags
                                        #       and filter-language-tags subs

# Create a new LanguageTag from a string
LanguageTag.new: 'en-US';

Which to use

Most of the time, use Intl::LanguageTag is what you will want (the BCP-47 tag type is set as the default for a reason). Prefer use Intl::LanguageTag::BCP-47 when interacting with other language tag types in the same scope to avoid a symbol collision.

Features

Everything is value typed! This means you can easily use them in sets, bags, and mixes and routines like unique will operate as you'd expect.

Once you've created a language tag, you have the following simple methods to introspect it.

Each of the above will stringify into the exact code, but also has introspective methods. For instance, .language.default-script tells you what the default script for the language is.

Canonicalization

Language tags are canonicalized to the extent possible upon creation.
This is done in accordance with BCP 47, RFC 6067, RFC 6497, and TR 35 and helps to guarantee value typing mechanics. Most likely, you may notice that a script will disappear. Less likely, if you use grandfathered tags, tags like i-navajo will be automatically converted to their preferred form (nv) when those exist. There are five grandfathered tags without preferred forms which will preserve the entire tag as the “language” (e.g. i-default), and issue a warning since those tags should not be used. Extended languages tags are preserved, and with on-demand and automatic conversion to preferred forms planned for a future release.

Utility functions

If you include <utils> in your use statement, you will have access to two subs to aid working with language tags. They are the following:

If the names of these functions is too verbose, you can alias them easily by doing my &filter = filter-language-tags.

Todo

In likely order of completion:

Version history

License

All files (unless noted otherwise) can be used, modified and redistributed under the terms of the Artistic License Version 2. Examples (in the documentation, in tests or distributed as separate files) can be considered public domain.

“Unless noted otherwise”

The resources directory "cldr" contains the contents of the "bcp47" folder of the Unicode CLDR data. These files are copyrighted by Unicode, Inc., and are available and distributed in accordance with their terms.

The resources file “language-subtag-registry” comes from the IANA. I do not currently distribute it because I am not aware of its exact license, but it will be automatically downloaded when running the parsing script. Its data is not needed for distribution, and so is gitignored