Configuration

This guide covers all configuration options available in AnonDocs, from basic setup to advanced tuning.

Environment Variables

Configuration is done through environment variables, either via a .env file or system environment variables.

Server Configuration

# Port for the AnonDocs API server
PORT=3000

LLM Provider Configuration

Using Ollama

DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo

Using OpenAI-Compatible APIs

DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required

Note: OPENAI_API_KEY can be not-required for local APIs like vLLM, LM Studio, or LocalAI. For actual OpenAI API, provide a valid key.

Processing Configuration

# Characters per chunk for text processing
# Smaller chunks = more accurate but slower
# Larger chunks = faster but may miss context
CHUNK_SIZE=1500

# Overlap between chunks (in characters)
# Helps maintain context across chunk boundaries
CHUNK_OVERLAP=0

# Enable parallel chunk processing
# true = faster but uses more memory
# false = sequential, safer for limited resources
ENABLE_PARALLEL_CHUNKS=false

Configuration Reference

Complete `.env` Example

# ============================================
# Server Configuration
# ============================================
PORT=3000

# ============================================
# LLM Provider Configuration
# ============================================
# Options: ollama, openai
DEFAULT_LLM_PROVIDER=ollama

# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo

# OpenAI-Compatible API Configuration (if using vLLM, LM Studio, LocalAI)
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required

# ============================================
# Processing Configuration
# ============================================
CHUNK_SIZE=1500
CHUNK_OVERLAP=0
ENABLE_PARALLEL_CHUNKS=false

Configuration Profiles

Development Profile

For local development with minimal resource usage:

PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
ENABLE_PARALLEL_CHUNKS=false

Production Profile

For production with high accuracy requirements:

PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo
CHUNK_SIZE=2000
CHUNK_OVERLAP=200
ENABLE_PARALLEL_CHUNKS=true

High-Performance Profile

For high-throughput deployments with GPU:

PORT=3000
DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
CHUNK_SIZE=2500
CHUNK_OVERLAP=250
ENABLE_PARALLEL_CHUNKS=true

Performance Tuning

Chunk Size Optimization

The CHUNK_SIZE parameter controls how much text is processed at once:

Smaller chunks (1000-1500): More accurate PII detection, better context handling, slower processing
Larger chunks (2000-3000): Faster processing, may miss context-dependent PII

Recommendation: Start with 1500 and adjust based on your documents:

# For short documents with dense PII
CHUNK_SIZE=1000

# For longer documents with scattered PII
CHUNK_SIZE=2000

Parallel Processing

Enable parallel processing for faster handling of large documents:

ENABLE_PARALLEL_CHUNKS=true

Trade-offs:

✅ Faster processing
✅ Better GPU utilization (if available)
❌ Higher memory usage
❌ May hit rate limits on LLM provider

When to enable: When you have sufficient RAM and fast LLM provider

When to disable: Limited resources, rate limiting issues, or when sequential processing is more stable

Chunk Overlap

Overlap helps maintain context across chunk boundaries:

CHUNK_OVERLAP=200

Use overlap when:

PII might span chunk boundaries
Context is important for detection
Processing speed is acceptable

Environment-Specific Configuration

Using Different Configs for Different Environments

Development

# .env.development
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral
ENABLE_PARALLEL_CHUNKS=false

Production

# .env.production
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo
ENABLE_PARALLEL_CHUNKS=true
CHUNK_SIZE=2000

Loading Environment-Specific Config

# Development
cp .env.development .env
npm start

# Production
cp .env.production .env
npm start

Docker Configuration

Using Environment Files with Docker

# Build with environment file
docker build -t anondocs .

# Run with environment file
docker run -d \
  --name anondocs \
  -p 3000:3000 \
  --env-file .env.production \
  anondocs

Docker Compose Configuration

version: '3.8'
services:
  anondocs:
    build: .
    ports:
      - "3000:3000"
    env_file:
      - .env.production
    environment:
      - PORT=3000
    restart: unless-stopped

Kubernetes Configuration

ConfigMap Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: anondocs-config
data:
  PORT: "3000"
  DEFAULT_LLM_PROVIDER: "ollama"
  OLLAMA_BASE_URL: "http://ollama:11434"
  OLLAMA_MODEL: "mistral-nemo"
  CHUNK_SIZE: "1500"
  CHUNK_OVERLAP: "0"
  ENABLE_PARALLEL_CHUNKS: "false"

Using ConfigMap in Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anondocs
spec:
  template:
    spec:
      containers:
      - name: anondocs
        envFrom:
        - configMapRef:
            name: anondocs-config

Validation

Check Configuration

The server validates configuration on startup. Check logs for any configuration errors:

npm start

# Look for:
# ✓ Configuration loaded successfully
# ✓ LLM provider: ollama
# ✓ Model: mistral-nemo

Test Configuration

# Health check
curl http://localhost:3000/health

# Test anonymization
curl -X POST http://localhost:3000/api/anonymize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Contact John Doe at john@example.com",
    "provider": "ollama"
  }'

Next Steps

🤖 LLM Provider Setup - Configure your LLM provider
🚀 Deployment Options - Production deployment strategies
🔒 Production Considerations - Security and scaling

Environment Variables​

Server Configuration​

LLM Provider Configuration​

Using Ollama​

Using OpenAI-Compatible APIs​

Processing Configuration​

Configuration Reference​

Complete .env Example​

Configuration Profiles​

Development Profile​

Production Profile​

High-Performance Profile​

Performance Tuning​

Chunk Size Optimization​

Parallel Processing​

Chunk Overlap​

Environment-Specific Configuration​

Using Different Configs for Different Environments​

Development​

Production​

Loading Environment-Specific Config​

Docker Configuration​

Using Environment Files with Docker​

Docker Compose Configuration​

Kubernetes Configuration​

ConfigMap Example​

Using ConfigMap in Deployment​

Validation​

Check Configuration​

Test Configuration​

Next Steps​