Rand Stats

LLM::RetrievalAugmentedGeneration

zef:antononcube

LLM::RetrievalAugmentedGeneration

Actions Status Actions Status

Raku package for doing LLM Retrieval Augment Generation (RAG).


Motivation and general procedure

Assume we have a large (or largish) collection of (Markdown) documents and we want to interact with it as if a certain LLM model has been specially trained with that collection.

Here is one way to achieve this:

  1. The "data wrangling problem" is the conversion of the a collection of documents into Markdown files, and then partitioning those files into text chunks.
    • There are several packages and functions that can do the conversion.
    • It is not trivial to partition texts into reasonable text chunks.
      • Certain text paragraphs might too big for certain LLMs to make embeddings for.
  2. Each of the text chunks is "vectorized" via LLM embedding.
  3. Then the vectors are put in a vector database or "just" into a "nearest neighbors" finding function object.
  4. When a user query is given:
    • The LLM embedding vector is being found.
    • The closest text chunk vectors are found.
  5. The corresponding closest text chunks are given to the LLM to formulate a response to user's query.

Workflow

Here is the Retrieval Augmented Generation (RAG) workflow we consider:

Component diagram

Here is a Mermaid-JS component diagram that shows the components of performing the Retrieval Augmented Generation (RAG) workflow:

flowchart TD
    subgraph LocalVDB[Local Folder]
        A(Vector Database 1)
        B(Vector Database 2)
        C(Vector Database N)
    end
    ID[Ingest document collection]
    SD[Split Documents]
    EV[Get LLM Embedding Vectors]
    CD[Create Vector Database]
    ID --> SD --> EV --> CD

    EV <-.-> LLMs
    
    CD -.- CArray[[CArray<br>representation]]

    CD -.-> |export| LocalVDB

    subgraph Creation
        ID
        SD
        EV
        CD
    end

    LocalVDB -.- JSON[[JSON<br>representation]]

    LocalVDB -.-> |import|D[Ingest Vector Database]
 
    D -.- CArray
    F -.- |nearest neighbors<br>distance function|CArray
    D --> E
    E[/User Query/] --> F[Retrieval]
    F --> G[Document Selection]
    G -->|Top K documents| H(Model Fine-tuning)
    H --> I[[Generation]]
    I <-.-> LLMs
    I -->J[/Output Answer/]
    G -->|Top K passages| K(Model Fine-tuning)
    K --> I

    subgraph RAG[Retrieval Augmented Generation]
        D 
        E
        F
        G
        H
        I
        J
        K
    end
    
    subgraph LLMs
        direction LR
        OpenAI{{OpenAI}}
        Gemini{{Gemini}}
        MistralAI{{MistralAI}}
        LLaMA{{LLaMA}}
    end

In this diagram:


Implementation notes

Fast nearest neighbors

Smaller export files, faster imports


References

Packages

[AAp1] Anton Antonov, WWW::OpenAI Raku package, (2023), GitHub/antononcube.

[AAp2] Anton Antonov, WWW::PaLM Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, LLM::Functions Raku package, (2023-2024), GitHub/antononcube.

[AAp4] Anton Antonov, LLM::Prompts Raku package, (2023-2024), GitHub/antononcube.

[AAp5] Anton Antonov, ML::FindTextualAnswer Raku package, (2023-2024), GitHub/antononcube.

[AAp6] Anton Antonov, Math::Nearest Raku package, (2024), GitHub/antononcube.

[AAp7] Anton Antonov, Math::DistanceFunctions::Native Raku package, (2024), GitHub/antononcube.

[AAp8] Anton Antonov, ML::StreamsBlendingRecommender Raku package, (2021-2023), GitHub/antononcube.