Raku Land



[Raku PDF Project] / PDF::Tags


A small DOM-like API for the creation of tagged PDF files.

This module enables PDF tagged content manipulation, with simple construction, XPath queries and basic XML serialization.


use PDF::Tags;
use PDF::Tags::Elem;

use PDF::API6;
use PDF::Annot;
use PDF::XObject::Image;
use PDF::XObject::Form;

my PDF::API6 $pdf .= new;
my PDF::Tags $tags .= create: :$pdf;
# create the document root
my PDF::Tags::Elem $root = $tags.Document;

my $page = $pdf.add-page;
my $header-font = $page.core-font: :family<Helvetica>, :weight<bold>;
my $body-font = $page.core-font: :family<Helvetica>;

$page.graphics: -> $gfx {

    $root.Header1: $gfx, {
        .say('Marked Level 1 Header',
             :position[50, 120]);

    $root.Paragraph: $gfx, {
        .say('Marked paragraph text', :position[50, 100], :font($body-font), :font-size(12));

    # add a marked image
    my PDF::XObject::Image $img .= open: "t/images/lightbulb.gif";
    $root.Figure: $gfx, $img, :Alt('Incandescent apparatus');

    # add a marked link annotation
    my $destination = $pdf.destination( :page(2), :fit(FitWindow) );
    my PDF::Annot $annot = $pdf.annotation: :$page, :$destination, :rect[71, 717, 190, 734];

    $root.Link: $gfx, $annot;

    # tagged XObject Form
    my PDF::XObject::Form $form = $page.xobject-form: :BBox[0, 0, 200, 50];
    my $form-elem = $root.Form;
    $form.text: {
        my $font-size = 12;
        .text-position = [10, 38];

        $form-elem.Header2: $_, {
            .say: "Tagged XObject header", :font($header-font), :$font-size;

        $form-elem.Paragraph: $_, {
            .say: "Some sample tagged text", :font($body-font), :$font-size;

    # render the form contained in $form-elem
    $form-elem.do: $gfx, :position[150, 70];

$pdf.save-as: "/tmp/marked.pdf"


A tagged PDF contains additional markup information describing the logical document structure of PDF documents.

PDF tagging may assist PDF readers and other automated tools in reading PDF documents and locating content such as text and images.

This module provides a DOM like interface for creating and traversing PDF structure and content via tags. It also an XPath like search capability. It is designed for use in conjunction with PDF::Class or PDF::API6.

Standard Tags

Elements may be constructed using their Tag name or Mnemonic, as listed below. For example:

$root.P: $gfx, { .say('Marked paragraph text') };

Can also be written as:

$root.Paragraph: $gfx, { .say('Marked paragraph text') };

Or as:

$root.add-kid(:name<P>).mark: $gfx, { .say('Marked paragraph text') };

Documentation in this section adapted from pdfkit.

"Grouping" elements:

Documentwhole document; must be used if there are multiple parts or articles
Partpart of a document
SectSectionmay nest
DivDivisiongeneric division
BlockQuoteblock quotation
Captiondescribing a figure or table
TOCTableOfContentsmay be nested, and may be used for lists of figures, tables, etc.
TOCITableOfContentsItemtable of contents (leaf) item
Indexindex (text with accompanying Reference content)
NonStructNonStructuralnon-structural grouping element (element itself not intended to be exported to other formats like HTML, but 'transparent' to its content which is processed normally)
Privatecontent only meaningful to the creator (element and its content not intended to be exported to other formats like HTML)

"Block" elements:

Mmemonic | Tag | Description

HHeadingheading (first element in a section, etc.)
H1 - H6Heading1 - Heading6heading of a particular level intended for use only if nesting sections is not possible for some reason
LListshould include optional Caption, and list items
LIListItemshould contain Lbl and/or LBody
LblLabelbullet, number, or "dictionary headword"
LBodyListBody(item text, or "dictionary definition"); may have nested lists or other blocks

"Table" elements:

Tabletable; should either contain TR, or THead, TBody and/or TFoot
THTableHeadertable heading cell
TDTableDatatable data cell
THeadTableHeadtable header row group
TBodyTableBodytable body row group; may have more than one per table
TFootTableFoottable footer row group

"Inline" elements:

Spangeneric inline content
Quoteinline quotation
Notee.g. footnote; may have a Lbl (see "block" elements)
Referencecontent in a document that refers to other content (e.g. page number in an index)
BibEntryBibliographyEntrymay have a Lbl (see "block" elements)
Linkhyperlink; should contain a link annotation
AnnotAnnotationannotation (other than a link)
RubyChinese/Japanese pronunciation/explanation
RBRubyBaseTextRuby base text
RTRubyTextRuby annotation text
WarichuJapanese/Chinese longer description

"Illustration" elements (should have Alt and/or ActualText set):

Formform widget

Non-structure tags:

Artifactused to mark all content not part of the logical structure
ReversedCharsevery string of text has characters in reverse order for technical reasons (due to how fonts work for right-to-left languages); strings may have spaces at the beginning or end to separate words, but may not have spaces in the middle

Classes in this Distribution

See Also

Further Work

The PDF accessibility standard ISO 14289-1 cannot be distributed and needs to be purchased from ISO.