Skip to main content

Exporting Messages

EmailEngine provides a bulk message export feature that allows you to export large volumes of email messages from any account. Exports are processed asynchronously and output to compressed NDJSON files for efficient storage and downstream processing.

Overview

The export feature:

  • Creates gzip-compressed NDJSON files containing message data
  • Processes exports asynchronously via a job queue
  • Supports date range filtering and folder selection
  • Optionally includes message text content and attachments
  • Automatically encrypts export files when EENGINE_SECRET is configured
  • Provides progress tracking and status monitoring

Common use cases:

  • Email backup and archival
  • Migration to other systems
  • Compliance and legal discovery
  • Data analysis and machine learning training
  • Bulk message processing pipelines

Creating an Export

Create a new export job using the Create Export API endpoint:

curl -X POST "https://your-emailengine.com/v1/account/{account}/export" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"startDate": "2024-01-01T00:00:00Z",
"endDate": "2024-12-31T23:59:59Z",
"folders": ["INBOX", "\\Sent"],
"textType": "*",
"maxBytes": 5242880,
"includeAttachments": false
}'

Request Options

ParameterTypeDefaultDescription
startDateISO 8601RequiredExport messages from this date
endDateISO 8601RequiredExport messages until this date
foldersarrayAll Mail (Gmail/Outlook API); all folders except Junk and Trash (other accounts)Folder paths or special-use flags to export
textTypestring*Text content: plain, html, * (both)
maxBytesnumber5242880Maximum bytes for text content (0 = unlimited)
includeAttachmentsbooleanfalseInclude attachment content as base64

Response

{
"exportId": "exp_abc123def456abc123def456",
"status": "queued",
"created": "2024-01-15T10:30:00.000Z"
}

Monitoring Export Progress

Check export status using the Get Export Status API endpoint:

curl "https://your-emailengine.com/v1/account/{account}/export/{exportId}" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"

Export States

Exports progress through these states:

StatusPhaseDescription
queued-Export is waiting in the queue (no phase field is set)
processingindexingScanning folders and queuing messages
processingexportingFetching and writing messages to file
completedcompleteExport finished successfully
failed-Export encountered an error
cancelled-Export was cancelled before completion

Progress Fields

The response includes detailed progress information:

{
"exportId": "exp_abc123def456abc123def456",
"status": "processing",
"phase": "exporting",
"progress": {
"foldersScanned": 2,
"foldersTotal": 3,
"messagesQueued": 1500,
"messagesExported": 750,
"messagesSkipped": 5,
"bytesWritten": 52428800
},
"created": "2024-01-15T10:30:00.000Z",
"expiresAt": "2024-01-16T10:30:00.000Z"
}
FieldDescription
foldersScannedNumber of folders indexed so far
foldersTotalTotal folders to index
messagesQueuedMessages found and queued for export
messagesExportedMessages successfully written to file
messagesSkippedMessages skipped (deleted or inaccessible)
bytesWrittenTotal bytes written to export file

The response also includes a top-level truncated field (boolean) that indicates whether the export was cut short due to message count or size limits. When true, the export file does not contain all matching messages.

Downloading Export Files

Download a completed export using the Download Export API endpoint:

curl "https://your-emailengine.com/v1/account/{account}/export/{exportId}/download" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-o export.ndjson.gz

The response is a gzip-compressed NDJSON file. Each line contains one message as a JSON object:

{"id":"AAAAAQAACnA","uid":12345,"folder":"INBOX","subject":"Hello","from":{"name":"Sender","address":"sender@example.com"},"date":"2024-01-15T10:30:00.000Z","text":{"plain":"Message content..."},"attachments":[]}
{"id":"AAAAAQAACnB","uid":12346,"folder":"INBOX","subject":"Re: Hello","from":{"name":"Reply","address":"reply@example.com"},"date":"2024-01-15T11:00:00.000Z","text":{"plain":"Reply content..."},"attachments":[]}

If the export was encrypted (when EENGINE_SECRET is set), decryption happens automatically during download.

Concurrency Tuning

Export jobs are processed by dedicated worker threads. You can tune concurrency based on your system resources.

Configuration Options

SettingTypeDefaultDescription
EENGINE_WORKERS_EXPORTenv1Export worker threads
EENGINE_EXPORT_QCenv1Concurrent jobs per worker
exportMaxConcurrentsetting2Max concurrent exports per account
exportMaxGlobalConcurrentsetting8Max concurrent exports system-wide

Calculating Total Concurrency

The maximum number of exports that can run simultaneously is:

MAX_CONCURRENT = EENGINE_WORKERS_EXPORT x EENGINE_EXPORT_QC

This is further capped by exportMaxGlobalConcurrent to prevent system overload.

Example: With EENGINE_WORKERS_EXPORT=2 and EENGINE_EXPORT_QC=2, you can have up to 4 concurrent exports. If exportMaxGlobalConcurrent=8, the global limit won't be a factor. But if you set exportMaxGlobalConcurrent=3, only 3 exports will run concurrently even though the worker configuration allows 4.

Resource Requirements

Each export job consumes memory and disk I/O. Use the table below to estimate resource needs:

Concurrent ExportsMemory (Est.)Disk I/ORedis Load
1 (default)~150 MBLowLow
4 (2x2)~400 MBMediumMedium
8 (4x2 or 2x4)~800 MBHighHigh
16 (4x4)~1.5 GBVery HighVery High

Small deployment (2-4GB RAM)

EENGINE_WORKERS_EXPORT=1
EENGINE_EXPORT_QC=1
# exportMaxGlobalConcurrent=2

Conservative settings for resource-constrained environments. One export at a time.

Medium deployment (8GB RAM)

EENGINE_WORKERS_EXPORT=2
EENGINE_EXPORT_QC=2
# exportMaxGlobalConcurrent=8

Balanced settings for typical production servers. Up to 4 concurrent exports.

Large deployment (16GB+ RAM)

EENGINE_WORKERS_EXPORT=4
EENGINE_EXPORT_QC=2
# exportMaxGlobalConcurrent=16

Higher throughput for large-scale operations. Up to 8 concurrent exports.

Provider-Specific Batch Sizes

EmailEngine fetches messages in batches during export. The batch size can be tuned per email provider to optimize throughput and avoid rate limits.

Configure via the Settings API:

curl -X POST "https://your-emailengine.com/v1/settings" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"gmailExportBatchSize": 10,
"outlookExportBatchSize": 20
}'
SettingDefaultRangeDescription
gmailExportBatchSize101-50Messages fetched in parallel from Gmail API per batch
outlookExportBatchSize201-20Messages fetched in parallel from Microsoft Graph API per batch

Gmail: Supports up to 50 parallel fetches. Higher values speed up exports but increase memory usage and may trigger rate limits on accounts with heavy concurrent usage.

Outlook: Microsoft Graph API limits batch requests to 20 items per batch. Setting a higher value has no effect.

IMAP accounts: Batch size is not configurable for IMAP -- messages are fetched sequentially.

Export Limits

Additional settings control maximum export sizes:

SettingDefaultDescription
exportMaxMessages500,000Maximum messages per export job
exportMaxSize10 GBMaximum export file size
exportMaxConcurrent2Max concurrent exports per account
exportMaxGlobalConcurrent8Max concurrent exports system-wide

Tuning Considerations

  1. Memory: Each export batch loads message data into memory. Monitor memory usage and reduce concurrency if you see memory pressure.

  2. Disk I/O: Multiple concurrent gzip streams can saturate disk bandwidth. Use SSDs for best performance.

  3. Email Provider Limits: High concurrency may trigger rate limits from email providers. Watch for 429 errors in logs.

  4. Redis: Message queues consume approximately 100 bytes per message. Large exports with many messages increase Redis memory usage.

Tuning tips:

  • Start with conservative settings and increase gradually
  • Monitor memory usage with docker stats or top
  • Check logs for rate limiting errors from email providers
  • Use exportMaxGlobalConcurrent to cap total system load regardless of worker configuration

File Storage

Configuration

File storage is configured with environment variables (these options are not available through the settings API or UI):

Environment VariableDefaultDescription
EENGINE_EXPORT_PATHOS temp dirDirectory for export files
EENGINE_EXPORT_MAX_AGE24 hoursFile retention time in milliseconds

Encryption

When EENGINE_SECRET is configured, export files are automatically encrypted using AES-256-GCM:

  • Encrypted files have .ndjson.gz.enc extension
  • Unencrypted files have .ndjson.gz extension
  • Downloads are automatically decrypted by EmailEngine

This ensures exported data is protected at rest without requiring separate encryption handling.

Webhooks

Export completion triggers webhook notifications:

EventDescription
exportCompletedExport finished successfully
exportFailedExport encountered an error

Example webhook payload for exportCompleted:

{
"event": "exportCompleted",
"account": "user123",
"data": {
"exportId": "exp_abc123def456abc123def456",
"messagesExported": 1495,
"messagesSkipped": 5,
"bytesWritten": 104857600
}
}

Managing Exports

List Exports

Get all exports for an account using the List Exports API endpoint:

curl "https://your-emailengine.com/v1/account/{account}/exports" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"

Response:

{
"total": 3,
"page": 0,
"pages": 1,
"exports": [
{
"exportId": "exp_abc123def456abc123def456",
"status": "completed",
"created": "2024-01-15T10:30:00.000Z",
"expiresAt": "2024-01-16T10:30:00.000Z"
}
]
}

Delete Export

Cancel a pending export or delete a completed export file using the Delete Export API endpoint:

curl -X DELETE "https://your-emailengine.com/v1/account/{account}/export/{exportId}" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"

This will:

  • Cancel the export if it's still queued or processing
  • Delete the export file from disk
  • Remove the export record from the system

Handling Failed Exports

Failed exports cannot be resumed. If an export ends up in the failed state, check the error field in the status response for details, then delete the failed export and create a new one (using a narrower date range if the failure was caused by size or rate limits).

Best Practices

Large Exports

For very large exports (millions of messages):

  1. Use date range filtering - Split large exports into smaller date ranges
  2. Monitor progress - Poll the status endpoint to track completion
  3. Handle failures gracefully - Check the error field if status is failed; delete the failed export and create a new one (exports cannot be resumed)
  4. Download promptly - Files expire after the retention period set by EENGINE_EXPORT_MAX_AGE (default 24 hours)

Production Usage

  1. Configure storage path - Set the EENGINE_EXPORT_PATH environment variable to a dedicated volume with sufficient space
  2. Set appropriate retention - Adjust the EENGINE_EXPORT_MAX_AGE environment variable (in milliseconds) based on your download SLA
  3. Monitor disk space - Large exports can consume significant disk space
  4. Use webhooks - Set up webhook handlers for exportCompleted and exportFailed events instead of polling

Processing Export Files

NDJSON format allows streaming processing without loading the entire file into memory:

const readline = require('readline');
const zlib = require('zlib');
const fs = require('fs');

const gunzip = zlib.createGunzip();
const input = fs.createReadStream('export.ndjson.gz').pipe(gunzip);

const rl = readline.createInterface({ input });

rl.on('line', (line) => {
const message = JSON.parse(line);
// Process each message
console.log(`Processing: ${message.subject}`);
});