Libarchive - Multi-format archive and compression
libarchive is a multi-format archive and compression libarary. This module provides a very composable high level interface to the library for reading, processing and writing archives of files.
See PHLPM Talk for some more description and examples of usage (mostly duplicates what is here).
Simple, streaming archive reading
use Libarchive::Simple;
.put for archive-read 'myfile.tar.gz';               # Print listing
.extract for archive-read $*IN;                      # Extract all files
# Print a custom listing, using field accessors
for archive-read($*IN) {
    put "dir: {.pathname}" if .is-dir;
    put "file: {.pathname} {.human-size}" if .is-file;
}
for archive-read('this.tar.gz') {
   .content.put if .pathname eq 'README'
}
archive-read('this.zip'.IO)           # Process Seq in normal ways
    .grep({ .pathname ~~ /README/ })  # with for, grep, map, etc.
    .map: { .extract :verbose };      # print listing to STDERR as extract
# Many extract options to customize, either in object or extract()
for archive-read('dvd.iso', :extract-no-overwrite,
                 destpath => '/somewhere') {
    next unless .pathname eq 'the-file-i-want';
    .extract(perm => 0o600);
}
Can read from filename, IO::Path, IO::Handle, Memory Buf, Supply of Blobs, Channel of Blobs
archive-read() is just short-hand for Libarchive::Read.new()
Simple, streaming archive writing
use Libarchive::Simple;
with archive-write('foo.zip')
{
    .add: 'afile';         # Add a file from the filesystem to the archive
    .add: 'somedir';       # Add a directory, but not contents
    .add: dir('somedir');  # Add every file in a directory
    .add: 'thisdir', dir('thisdir');  # Add directory and contents
    .write: 'afile', "Some content\n";      # Create a file from a Str
    .write: 'bfile', buf8.new(1,2,3,4);     # or from a Blob
    .write: 'bigrandomfile',                # or an IO::Handle
            '/dev/urandom'.IO.open(:bin),
            size => 100_000;                # override size
    .mkdir: 'adir';                               # Create a directory
    .mkdir: 'bdir', perm => 0o700;                # Override perm
    .write: 'cdir'.IO.add('another'), "this\n";   # IO::Path is fine too
    .symlink: 'linked', 'adir/anotherfile';       # Create a symlink
    .symlink: 'anotherlink' => 'adir/yetanother'; # Pair symlink is ok
    .close;                                       # Always close!
}
Can write to filename, IO::Path, IO::Handle, Memory Buf, Supplier of Blobs, Channel of Blobs. Must specify format (optionally filters) unless filename:
archive-write($*OUT, format => 'zip');  # Send zip file to STDOUT
archive-write() is just shorthand for Libarchive::Write.new()
Simple, Slurping all content into memory:
use Libarchive::Simple;
my $archive := archive-slurp 'this.tar';
say $archive;                                   # Print listing
put $archive<README>;                           # content of a file
$archive<afile>.content = "Change content\n";   # change existing file
$archive<adir/bad>:delete;                      # Remove file
$archive.spurt: 'foo.zip';                      # Dump archive back to disk
archive-slurp() is just shorthand for Libarchive::Archive.new()
It creates an object that is both Iterable just like archive-read,
and also Associative, including all the data/content from the
archive instead of reading it out of the stream as it goes, so you can
use hyper processing in parallel without worry. The keys are paths,
not just filenames. If the archive has two files with exactly the
same path, you'll just get one. (Why would you do that anyway?)
Processing Archives in a pipeline
Libarchive::Read (and archive-read) produces a Seq of
Libarchive::Entrys. You can use the .copy method to copy them into
an Libarchive::Write.
For example, you could hook up a reader to a writer to convert a tar file to a zip file (or ISO or whatever):
use Libarchive::Simple;
with archive-write($*OUT, format => 'zip')
{
    .copy: archive-read($*IN, format => 'tar')
    .close;
}
Or even process the contents in various ways as they go:
use Libarchive::Simple;
with archive-write($*OUT, format => 'zip')
{
    .write: 'NEWREADME', "This is my README\n";      # Add some extra files
    .write: 'LICENSE', "Special license file\n";
    .copy: archive-read($*IN, format => 'tar')
           .grep({ .pathname ~~ /good/})             # Only pass good files
           .map({ .pathname(.pathname.uc) })         # Uppercase filenames
           .map({ .uname('fred').perm(0o600)});      # Change owner and perm
    .close;
}
When streaming, make sure you keep the sequence lazy, otherwise the
stream with the data will be past before the copy occurs. If you want
random access, use Libarchive::Archive or archive-slurp.
Filtering without an Archive, format 'raw'
libarchive supports a special format 'raw' that works on a single
virtual file, passing it through the specified filters. This can be
used to compress, gzip, bzip2 etc.
The manual process is something like this:
with archive-write($dest, format => 'raw', filter => 'gzip')
{
    .write('ignore-filename', $source, size => ...);
    .close
}
or
with archive-read($source, format => 'raw')
{
    my $header = .read;  # Read and ignore the archive header
    while my $buf = .read-data(<blocksize>)
    {
        ...do something with $buf...
    }
}
These constructs have been packaged up into Libarchive::Filter with
two subroutines archive-encode and archive-decode. Each take a
$source, and a $destination that can be most of the normal things.
archive-encode, of course, must include 1 or more filters to be
useful.
For example, you can read/write files:
use Libarchive::Filter;
archive-encode('Some content', 'file.gz', filter => 'gzip');
my $content = archive-decode('file.gz');
... $content eq 'Some content';
or just use a memory buffer:
use Libarchive::Filter;
my $buf = archive-encode('Some content', filter => 'gzip');
...encoded into $buf...
my $content = archive-decode($buf);
...$content eq 'Some content'
archive-encode sources can be anything that archive-write will write:
content in a Str or Buf, or a filename IO::Path, an
IO::Handle, a Supply or Channel of Blobs.
archive-encode destinations can be anything that archive-write
will produce: Buf, IO::Handle, Supplier, Channel, or a Str
or IO::Path filename.
archive-decode sources can be anything that archive-read will read:
filename in a Str or IO::Path, Blob, Supply, IO::Handle, or
Channel.
archive-decode destinations can be Blob, IO::Handle, IO::Path,
Supplier, Channel. If you don't set a destination, a Str with
the content is returned.
Note that the Str into archive-encode or out of
archive-decode is the content itself, but Str out of
archive-encode or into archive-decode are filenames. You can
always use IO::Path for a filename.
A number of shortcuts for various filters have also been defined:
use Libarchive::Filter :gzip;
my $buf = gzip('Some content');
my $content = gunzip($buf);
These include:
:gzip->gzip()andgunzip():compress->compress()anduncompress():bzip2->bzip2()andbunzip2():lz4->lz4()andunlz4():uuencode->uuencode()anduudecode():lzma->lzma()andunlzma()
You can also specify use Libarchive::Filter :all to get all the
shortcut routines.
These all take the same options that archive-encode() and
archive-decode() do and go to/from files, IO::Handles, Supplies,
Channels, etc.
Formats and Filters
Valid read formats:
'7zip', 'ar', 'cab', 'cpio', 'empty', 'gnutar', 'iso9660', 'lha', 'mtree', 'rar', 'raw', 'tar', 'warc', 'xar', 'zip', 'zip-streamable', 'zip-seekable'
Valid read filters:
'bzip2', 'compress', 'gzip', 'grzip', 'lrzip', 'lz4', 'lzip', 'lzma', 'lzop', 'none', 'rpm', 'uu', 'xz', 'zstd'
You can specify a list of multiple formats/filters to consider if you want to limit which types you support. You can also specify 'all' for either format or filter, which is the default.
Valid write formats:
'7zip', 'ar', 'arbsd', 'argnu', 'arsvr4', 'bsdtar', 'cd9660', 'cpio', 'gnutar', 'iso', 'iso9660', 'mtree', 'mtree-classic', 'newc', 'odc', 'oldtar', 'pax', 'paxr', 'posix', 'raw', 'rpax', 'shar', 'shardump', 'ustar', 'v7tar', 'v7', 'warc', 'xar', 'zip'
Valid write filters:
'b64encode', 'bzip2', 'compress', 'grzip', 'gzip', 'lrzip', 'lz4', 'lzip', 'lzma', 'lzop', 'uuencode', 'xz', 'zstd'
By default, if you write to a file, the extension of the filename will be used to set the format (and possibly filter):
You can override by explicitly specifying a format and/or filters:
Libarchive::Write.new('myfile.tar.gz', format => 'zip');
will create a zip file named 'myfile.tar.gz' (but don't do that).
If you are writing to a stream, you must specify a format:
Libarchive::Write.new($*OUT, format => 'zip');
You can optionally specify one or more filters to use while writing.
Libarchive::Write.new('myfile', format => 'gnutar',
                             filter => <gzip b64encode>);
Multiple filters are built into a pipeline, so the order they are listed is significant.
For more details on the specific way that libarchive handles each format, including some limitations, see the man page: libarchive-formats.5 and the libarchive wiki.
Libarchive Entry methods
An Libarchive::Entry is sort of like a super-stat, holding all of the
information about a file system component.
Str and gist return a single line summary of the archive entry,
kind of like an 'ls -l' or 'tar t' listing.
The other methods can query and/or set various information about the entry:
pathname, size, uid, gid, uname, gname, fflags
perm - Integer permissions, for new files, defaults to 0o644, for new
directories, defaults to 0o755.
atime, mtime, ctime, birthtime - Various times, returned as
DateTimes. Depending on the archive format, these might not be set.
symlink - for a symbolic link, this is what it points to
strmode - Read only unixish string for filetype/permissions
(like -rw-r--r-- or drwxr-x-r-x)
mode - file mode, better to use perm and/or filetype
human-size - uses Number::Bytes::Human to process the size, so you get values like "15M", "25K" or "96B" for the
size of a file.
filetype - returns an Libarchive::Filetype object that numifys to
the Unix/C filetype bits and stringifys to: REG, LINK, SOCK, CHAR,
BLOCK, DIR, FIFO. You can pass in :dir to set filetype to DIR (or
just use '.mkdir');
is-file - Bool shortcut to query for filetype REG
is-dir - Bool shortcut to query for filetype DIR
Libarchive Entry Extraction
A Libarchive::Read produces Libarchive::Entry::Read objects that
are Libarchive::Entrys with several additional methods:
data reads the content of the entry from the data stream and returns
it as a Buf.
content - same as data, but decodes the Buf into a Str
(encoding utf-8 -- if you want other encodings, just call decode
on data).
extract - extracts the entry into a filesystem entity (file,
directory, symlink, socket, fifo, etc.)
You can change the pathname to rename or move the file around. You
can also pass in :destpath either to the main object on creation, or
to extract() and it will be prepended to the pathname.
You can also pass in extract flags, either to the main object, or to
individual extract calls to control the extraction:
Extract flags:
Extract flags can be specified to Libarchive::Read.new(), or to the
.open(), or to .extract(). Flags to .new() and .open() are
sticky, and will affect all future .opens as well. Flags to
.extract() are not -- they affect only the specific extract.
:extract-owner - The user and group IDs should be set on the restored
file. By default, the user and group IDs are not restored.
:extract-perm - Full permissions (including SGID, SUID, and sticky
bits) should be restored exactly as specified, without obeying the
current umask. Note that SUID and SGID bits can only be restored if
the user and group ID of the object on disk are correct. If
:extract_owner is not specified, then SUID and SGID bits will only be
restored if the default user and group IDs of newly-created objects on
disk happen to match those specified in the archive entry. By default,
only basic permissions are restored, and umask is obeyed.
:extract-time - The timestamps (mtime, ctime, and atime) should be
restored. By default, they are ignored. Note that restoring of atime
is not currently supported.
:extract-no-overwrite - Existing files on disk will not be
overwritten. By default, existing regular files are truncated and
overwritten; existing directories will have their permissions updated;
other pre-existing objects are unlinked and recreated from scratch.
:extract-unlink - Existing files on disk will be unlinked before any
attempt to create them. In some cases, this can prove to be a
significant performance improvement. By default, existing files are
truncated and rewritten, but the file is not recreated. In particular,
the default behavior does not break existing hard links.
:extract-acl - Attempt to restore ACLs. By default, extended ACLs are
ignored.
:extract-fflags - Attempt to restore extended file flags. By default,
file flags are ignored.
:extract-xattr - Attempt to restore POSIX.1e extended attributes. By
default, they are ignored.
:extract-secure-symlinks - Refuse to extract any object whose final
location would be altered by a symlink on disk. This is intended to
help guard against a variety of mischief caused by archives that
(deliberately or otherwise) extract files outside of the current
directory. The default is not to perform this check. If
:extract-unlink is specified together with this option, the library
will remove any intermediate symlinks it finds and return an error
only if such symlink could not be removed.
:extract-secure-nodotdot - Refuse to extract a path that contains a
.. element anywhere within it. The default is to not refuse such
paths. Note that paths ending in .. always cause an error, regardless
of this flag.
:extract-secure-noabsolutepaths - Refuse to extract an absolute
path. The default is to not refuse such paths.
:extract-sparse - Scan data for blocks of NUL bytes and try to
recreate them with holes. This results in sparse files, independent of
whether the archive format supports or uses them.
:extract-clear-nochange-fflags - Before removing a file system object
prior to replacing it, clear platform-specific file flags which might
prevent its removal.
Creating a new archive
Writing to an archive
Using either Libarchive::Write.new() or archive-write(), there are
a number of methods for adding/creating filesystem entities.
Add existing filesystem entitities:
add() adds existing entities by filename or IO::Path.
You may find ecosystem modules such as
File::Find or
Concurrent::File::Find
useful for generating lists of files:
    use Libarchive::Simple;
    use Concurrent::File::Find;
    with archive-write('somefile.tar.gz')
    {
        .add '/somedir', find('/somedir'); # Recursively add files
    }
If you add files within a directory, don't forget to add the directory itself if you want it to be created on extraction too.
Create new files
write($filename, $content) will create a new file
$filename can be a Str or something that will convert to a Str,
like an IO::Path. $content can be a Str, a Blob, an
IO::Handle from which the content will be read, or an IO::Path
from which the content will be read.
Create directories
mkdir($pathname) will add a new directory to the archive
Create new symbolic links
symlink($pathname, $symlink) or
symlink($pathname => $symlink)
Add a sequence of Archive::Entrys
Use copy() to read from an archive-read() or archive-slurp()
sequence into a new archive.
LICENSE
Copyright © 2019 United States Government as represented by the Administrator, National Aeronautics and Space Administration. No Copyright is claimed in the United States under Title 17, U.S. Code. All Other Rights Reserved.