Embedder Pipeline

Overview

The Embedder Pipeline converts text documents into vector embeddings that can be used for:

Semantic search
Retrieval-augmented generation (RAG)
Similarity matching
Vector database storage

It is typically used after the Tokenizer Pipeline, where documents are already cleaned and split into chunks.

What It Does

Takes cleaned, chunked documents as input
Generates numerical vector embeddings using a selected model
Attaches embeddings to each document

The resulting documents are ready to be:

Stored in a vector database
Retrieved during question answering
Used in RAG pipelines

info

The Embedder Pipeline does not clean or split text.
For best results, connect it after the Tokenizer Pipeline.

Using the Embedder Pipeline

Add to DocProcessorAgent

Go to Pipelines
Select Embedder Pipeline
Add it to DocProcessorAgent
Connect it after the Tokenizer Pipeline

Choose an Embedding Model

You can select the embedding model used to generate vectors.

By default:

sentence-transformers/all-MiniLM-L6-v2

This model offers:

Fast performance
Good semantic understanding
Efficient vector size

Input Requirements

The pipeline accepts:

A list of parsed documents
Or a single text input (automatically converted to a document)

Each document should contain meaningful text content for embedding.

Output

After execution:

Each document includes a vector embedding
The number of output documents matches the input
Output can be directly passed to:
- Writer Pipeline
- Vector store indexing
- Retrieval pipelines

If no embeddings are generated, the pipeline raises an error.

Common Use Cases

Preparing documents for vector databases
Powering semantic and hybrid search
Enabling RAG-based assistants
Similarity comparison between documents

Summary

The Embedder Pipeline transforms text into vector representations.
It is a core building block for semantic search, retrieval, and RAG workflows.

Overview​

What It Does​

Using the Embedder Pipeline​

Add to DocProcessorAgent​

Choose an Embedding Model​

Input Requirements​

Output​

Common Use Cases​

Summary​