Email Processor Pipeline

Overview

The Email Processor Pipeline processes parsed email documents and stores them in a vector-enabled document store for search and retrieval.

It takes output from the Email Parser Pipeline, performs cleaning, tokenization, splitting, embedding, and indexes both email content and attachments.

This pipeline makes emails searchable, retrievable, and usable in RAG workflows.

What It Does

Accepts parsed email documents as input
Cleans content (whitespace, empty lines, ASCII filtering)
Splits documents into chunks (words, sentences, passages)
Generates embeddings using a configurable model
Indexes:
- Email content into a content index
- Attachments into a separate attachments index

Using the Email Processor Pipeline

Add to Agent

Go to Pipelines
Select Email Processor Pipeline
Drag it into DocProcessorAgent
Place after the Email Parser Pipeline

Configure Dataset and Indices

Configure the agent dataset with the following:

Content index for email body and metadata
Attachments index for email attachments
These indices are selected in the dataset configuration and used by the pipeline to write processed data

Email Processor dataset and indices configuration

Input & Output

Input: Parsed email documents from the Email Parser Pipeline
Output: Embedded and indexed documents and attachments; errors reported for duplicates or runtime issues

Pipeline Flow

Email Parser Pipeline extracts content and attachments
Email Processor Pipeline:
- Cleans content
- Splits into chunks
- Generates embeddings
- Writes to configured indices
Indexed data becomes available for search and retrieval

Common Use Cases

Indexing enterprise inboxes for semantic search
Preparing email data for RAG-based agents
Storing attachments separately for structured retrieval

Summary

The Email Processor Pipeline enables end-to-end preparation of email data for vector search and intelligent querying.
It ensures email content and attachments are fully searchable and ready for downstream workflows.

Overview​

What It Does​

Using the Email Processor Pipeline​

Add to Agent​

Configure Dataset and Indices​

Input & Output​

Pipeline Flow​

Common Use Cases​

Summary​