Rand Stats

Sort::Naturally

zef:thundergnat

Actions Status

NAME Sort::Naturally

Provides a transform routine to modify sorting into a "natural" order.

SYNOPSIS

use Sort::Naturally;

# sort strings containing a mix of letters and digits sensibly
my @a =   <1 11 100 14th 2 210 21 30 3rd d1 Any any d10 D2 D21 d3 aid Are ANY >;

say @a.sort: { .&naturally };

yields:

1 2 3rd 11 14th 21 30 100 141 210 aid ANY Any any Are d1 D2 d3 d10 D21

compared to @a.sort:

1 100 11 141 14th 2 21 210 30 3rd ANY Any Are D2 D21 aid any d1 d10 d3

DESCRIPTION

This implementation of a natural sort order transform will yield results similar, though not identical to the Perl 5 Sort::Naturally. When sorting strings that contain groups of digits, it will sort the groups of consecutive digits by "order of magnitude", then lexically by lower-cased terms. Order of magnitude is something of a simplification. The transformation routine &naturally doesn't try to interpret or evaluate a group of digits as a number, it just counts how many digits are in each group and uses that as its order of magnitude.

The implications are:

However, that also means:

It could have been modified to ignore leading zeros, and in fact I experimented with that bit, but ran into issues with strings where leading zeros WERE significant. Just remember, it is for sorting strings, not numbers. It makes some attempt at treating groups of digits in a kind of numbery way, but they are still strings. If you truly want to sort numbers, use a numeric sort.

USAGE

Sort::Naturally provides a transformation routine: &naturally, which you can use as a sorting block modifier. It performs a transform of the terms so they will end up in the natural order.

BACKWARD COMPATIBILITY BREAKING CHANGES

Previous versions provided some pre-made augmented methods and infix comparators that are no longer provided. Partially because they were causing compilation failures due to incomplete and not yet implemented compiler features, and partially because I decided it was a bad idea to unnecessarily pollute the name-space. If you would like to have the syntactic sugar, it can be added easily.

To create the method .nsort that can be used similar to .sort:

use Sort::Naturally;

use MONKEY-TYPING;
augment class Any {
    method nsort (*@) { self.list.flat.sort( { .&naturally } ) };
}

(Note: since this was originally written, augmenting rules have changed in Raku. Now, if you augment a parent class (Any), you must then re-compose any subtypes that you want to see the augmentation.)

For a natural sorting infix comparator:

sub infix:<ncmp>($a, $b) { $a.&naturally cmp $b.&naturally }

PERL 5 BACKWARD COMPATIBILITY

Perl 5 Sort::Naturally has an odd convention in that numbers at the beginning of strings are sorted in ASCII order (digits sort before letters) but numbers embedded inside strings are sorted in non-ASCII order (digits sort after letters). While this is just plain strange in my opinion, some people may rely on or prefer this behavior so Raku Sort::Naturally has a "p5 compatibility mode" routine. p5naturally().

for comparison:

('       sort:',<foo12z foo foo13a fooa Foolio Foo12a foolio foo12 foo12a 9x 14>\
   .sort).join(' ').say;
('  naturally:',<foo12z foo foo13a fooa Foolio Foo12a foolio foo12 foo12a 9x 14>\
   .sort({ .&naturally })).join(' ').say;
('p5naturally:',<foo12z foo foo13a fooa Foolio Foo12a foolio foo12 foo12a 9x 14>\
   .sort({ .&p5naturally })).join(' ').say;

yields:

       sort: 14 9x Foo12a Foolio foo foo12 foo12a foo12z foo13a fooa foolio
  naturally: 9x 14 foo foo12 Foo12a foo12a foo12z foo13a fooa Foolio foolio
p5naturally: 9x 14 foo fooa Foolio foolio foo12 Foo12a foo12a foo12z foo13a

BUGS

Tests and the p5 routine will fail under locales that specify lower case letters to sort before upper case. (EBCDIC locales notably). They will still sort consistently, just not in the order advertised. I can probably implement some kind of run time check to modify the behavior based on current locale. I'll look into it more seriously later if necessary. Right now, there are no Raku compilers for any EBCDIC OSs so it is not really an issue yet.

AUTHOR

Stephen Schulze (also known as thundergnat)

LICENSE

Licensed under The Artistic 2.0; see LICENSE.