Skip to main content

Embedder Pipeline

Overview

The Embedder Pipeline converts text documents into vector embeddings that can be used for:

  • Semantic search
  • Retrieval-augmented generation (RAG)
  • Similarity matching
  • Vector database storage

It is typically used after the Tokenizer Pipeline, where documents are already cleaned and split into chunks.

What It Does

  • Takes cleaned, chunked documents as input
  • Generates numerical vector embeddings using a selected model
  • Attaches embeddings to each document

The resulting documents are ready to be:

  • Stored in a vector database
  • Retrieved during question answering
  • Used in RAG pipelines
info

The Embedder Pipeline does not clean or split text.
For best results, connect it after the Tokenizer Pipeline.

Using the Embedder Pipeline

Add to DocProcessorAgent

  • Go to Pipelines
  • Select Embedder Pipeline
  • Add it to DocProcessorAgent
  • Connect it after the Tokenizer Pipeline

Choose an Embedding Model

You can select the embedding model used to generate vectors.

By default:

  • sentence-transformers/all-MiniLM-L6-v2

This model offers:

  • Fast performance
  • Good semantic understanding
  • Efficient vector size

Input Requirements

The pipeline accepts:

  • A list of parsed documents
  • Or a single text input (automatically converted to a document)

Each document should contain meaningful text content for embedding.

Output

After execution:

  • Each document includes a vector embedding
  • The number of output documents matches the input
  • Output can be directly passed to:
    • Writer Pipeline
    • Vector store indexing
    • Retrieval pipelines

If no embeddings are generated, the pipeline raises an error.

Common Use Cases

  • Preparing documents for vector databases
  • Powering semantic and hybrid search
  • Enabling RAG-based assistants
  • Similarity comparison between documents

Summary

The Embedder Pipeline transforms text into vector representations.
It is a core building block for semantic search, retrieval, and RAG workflows.