WWW::YouTube
Raku package for getting metadata and transcripts of YouTube videos.
The Raku implementation closely follows the Wolfram Language function YouTubeTranscript
, [AAf1].
Installation
From Zef ecosystem:
zef install WWW::YouTube
From GitHub:
zef install https://github.com/antononcube/Raku-WWW-YouTube.git
Usage
youtube-metadata($id)
- Get the metadata of the YouTube video with identifier
$id
.
youtube-playlist($id)
- Get the video identifiers of the YouTube playlist with identifier
$id
.
youtube-transcript($id)
- Get the transcript of the YouTube video with identifier
$id
.
Details
All three subs, youtube-metadata
, youtube-playlist
, and youtube-transript
,
work with strings that are identifiers or (full) URLs.
youtube-metdata
extracts the metadata associated with a YouTube video identifier.
- Returns a record (hashmap) with keys
<channel-title description publish-date title view-count>
.
youtube-playlist
extracts the video identifiers of a given YouTube playlist identifier.
- Currently, gives only the first 100 videos.
youtube-transcript
extracts the captions of the video, if they exist.
The transcript can be returned as plain text, array of hashmaps, JSON string.
The YouTube Data API has usage quotas.
Not all YouTube videos have automatic or manual captions. If no captions are available, the function returns a message indicating this.
youtube-transcript
processes "captionTracks" of the YouTube Data API, which is a field of YouTube's video metadata.
The field "captionTracks" is an array of objects, where each object represents a single caption track (e.g., for a specific language or type).
From "captionTracks" the "baseURL" string is extracted, which is the URL to fetch the caption content.
Examples
Get the metadata associated with a YouTube video identifier:
use WWW::YouTube;
use Data::Translators;
youtube-metadata('S_3e7liz4KM')
==> to-html(align => 'left')
description | Computationally neat examples with Raku packages featuring graphs and graph plots. (3rd set.)\n\nHere is the presentation Jupyter notebook: https://github.com/antononcube/RakuForPrediction-blog/blob/main/Presentations/Notebooks/Graph-neat-examples-set-3.ipynb\n\n------------------\n\nPlease, consider buying me a coffee: https://buymeacoffee.com/antonov70 |
---|
publish-date | 2024-11-28T11:24:44-08:00 |
---|
title | Graph neat examples in Raku (Set 3) |
---|
view-count | 139 views |
---|
channel-title | N/A |
---|
Transcripts
my $transcript = youtube-transcript('ewU83vHwN8Y');
say $transcript.chars;
say $transcript.substr(^300);
# 36700
# Hi everyone, welcome to a wolf from
# language design review for version 14.3.
# We are talking about LLM
# graph. So,
# okay. So this is for the purpose of of
# knitting together LLM calls like LLM
# function type calls.
# Exactly.
# To support more complex workflows
# um and and to have asynchronous calls to
# LLMs.
Summarize using a Large Language Model (LLM):
use LLM::Functions;
use LLM::Prompts;
llm-synthesize(llm-prompt('Summarize')($transcript), e => 'Gemini')
# This language design review introduces LLM graphs, which orchestrate calls to LLMs for complex workflows, including asynchronous execution. LLM graphs use nodes containing prompts or code (node functions) that can depend on each other, with inputs and outputs managed through associations. The design includes features like listable templates and conditional execution, and it aims to provide a powerful, yet simple, way to build agentic workflows.
Get the transcript as a dataset:
my @t = youtube-transcript('S_3e7liz4KM', format => 'dataset');
@t.head(10) ==> to-html(field-names => <time duration content>, align => 'left')
time | duration | content |
---|
0.52 | 4.64 | this presentation is titled graph neat |
2.8 | 5.2 | examples in Raku set |
5.16 | 4.84 | three my name is Anton Antonov today's |
8 | 5 | November 28th |
10 | 6 | 2024 I have prepared two sets of |
13 | 6.68 | examples nested graphs and file system |
16 | 5.72 | graphs the neat examples in general are |
19.68 | 3.96 | defined as concise or straightforward |
21.72 | 3.76 | code that produce compelling visual |
23.64 | 4.399 | textual outputs I'm going to be |
Playlists
youtube-playlist('PLke9UbqjOSOiMnn8kNg6pb3TFWDsqjNTN')
# [fwQrQyWC7R0 S_3e7liz4KM E7qhutQcWCY kQo3wpiUu6w JHO2Wk1b-Og 5qXgqqRZHow 0uJl9q7jIf8]
CLI
The package provides Command Line Interface (CLI) scripts. Here are their usage messages:
youtube-metadata --help
# Usage:
# youtube-metadata <id> [--format=<Str>] -- Get YouTube video metadata.
#
# <id> Video identifier
# --format=<Str> Format of the result, one of 'json', 'raku', 'asis'. [default: 'json']
youtube-playlist --help
# Usage:
# youtube-playlist <id> -- Get video identifiers of a YouTube playlist.
#
# <id> Video playlist identifier
youtube-transcript --help
# Usage:
# youtube-transcript <id> [--format=<Str>] -- Get YouTube transcripts.
#
# <id> Video identifier
# --format=<Str> Format of the result, one of 'text', 'dataset', or 'json'. [default: 'text']
TODO
- TODO Implementation
- DONE Get transcript for a video identifier
- DONE Video metadata retrieval
- TODO Video identifiers for a playlist
- DONE For playlists with ≤ 100 videos
- TODO Large playlists
- TODO Different transcript output formats
- DONE Text
- DONE Dataset (array of hashmap records)
- DONE JSON
- TODO WebVTT
- TODO SRT
- Implement versions of the subs using a YouTube API key
- TODO Documentation
- DONE Basic usage
- TODO Transcripts retrieval for a playlist
References
[AAf1] Anton Antonov,
YouTubeTranscript,
(2025),
Wolfram Function Repository.