Embedder Pipeline
Overview
The Embedder Pipeline converts text documents into vector embeddings that can be used for:
- Semantic search
- Retrieval-augmented generation (RAG)
- Similarity matching
- Vector database storage
It is typically used after the Tokenizer Pipeline, where documents are already cleaned and split into chunks.
What It Does
- Takes cleaned, chunked documents as input
- Generates numerical vector embeddings using a selected model
- Attaches embeddings to each document
The resulting documents are ready to be:
- Stored in a vector database
- Retrieved during question answering
- Used in RAG pipelines
info
The Embedder Pipeline does not clean or split text.
For best results, connect it after the Tokenizer Pipeline.
Using the Embedder Pipeline
Add to DocProcessorAgent
- Go to Pipelines
- Select Embedder Pipeline
- Add it to DocProcessorAgent
- Connect it after the Tokenizer Pipeline
Choose an Embedding Model
You can select the embedding model used to generate vectors.
By default:
sentence-transformers/all-MiniLM-L6-v2
This model offers:
- Fast performance
- Good semantic understanding
- Efficient vector size
Input Requirements
The pipeline accepts:
- A list of parsed documents
- Or a single text input (automatically converted to a document)
Each document should contain meaningful text content for embedding.
Output
After execution:
- Each document includes a vector embedding
- The number of output documents matches the input
- Output can be directly passed to:
- Writer Pipeline
- Vector store indexing
- Retrieval pipelines
If no embeddings are generated, the pipeline raises an error.
Common Use Cases
- Preparing documents for vector databases
- Powering semantic and hybrid search
- Enabling RAG-based assistants
- Similarity comparison between documents
Summary
The Embedder Pipeline transforms text into vector representations.
It is a core building block for semantic search, retrieval, and RAG workflows.