Configuration
This guide covers all configuration options available in AnonDocs, from basic setup to advanced tuning.
Environment Variables
Configuration is done through environment variables, either via a .env file or system environment variables.
Server Configuration
# Port for the AnonDocs API server
PORT=3000
LLM Provider Configuration
Using Ollama
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo
Using OpenAI-Compatible APIs
DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required
Note: OPENAI_API_KEY can be not-required for local APIs like vLLM, LM Studio, or LocalAI. For actual OpenAI API, provide a valid key.
Processing Configuration
# Characters per chunk for text processing
# Smaller chunks = more accurate but slower
# Larger chunks = faster but may miss context
CHUNK_SIZE=1500
# Overlap between chunks (in characters)
# Helps maintain context across chunk boundaries
CHUNK_OVERLAP=0
# Enable parallel chunk processing
# true = faster but uses more memory
# false = sequential, safer for limited resources
ENABLE_PARALLEL_CHUNKS=false
Configuration Reference
Complete .env Example
# ============================================
# Server Configuration
# ============================================
PORT=3000
# ============================================
# LLM Provider Configuration
# ============================================
# Options: ollama, openai
DEFAULT_LLM_PROVIDER=ollama
# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo
# OpenAI-Compatible API Configuration (if using vLLM, LM Studio, LocalAI)
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required
# ============================================
# Processing Configuration
# ============================================
CHUNK_SIZE=1500
CHUNK_OVERLAP=0
ENABLE_PARALLEL_CHUNKS=false
Configuration Profiles
Development Profile
For local development with minimal resource usage:
PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
ENABLE_PARALLEL_CHUNKS=false
Production Profile
For production with high accuracy requirements:
PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo
CHUNK_SIZE=2000
CHUNK_OVERLAP=200
ENABLE_PARALLEL_CHUNKS=true
High-Performance Profile
For high-throughput deployments with GPU:
PORT=3000
DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
CHUNK_SIZE=2500
CHUNK_OVERLAP=250
ENABLE_PARALLEL_CHUNKS=true
Performance Tuning
Chunk Size Optimization
The CHUNK_SIZE parameter controls how much text is processed at once:
- Smaller chunks (1000-1500): More accurate PII detection, better context handling, slower processing
- Larger chunks (2000-3000): Faster processing, may miss context-dependent PII
Recommendation: Start with 1500 and adjust based on your documents:
# For short documents with dense PII
CHUNK_SIZE=1000
# For longer documents with scattered PII
CHUNK_SIZE=2000
Parallel Processing
Enable parallel processing for faster handling of large documents:
ENABLE_PARALLEL_CHUNKS=true
Trade-offs:
- ✅ Faster processing
- ✅ Better GPU utilization (if available)
- ❌ Higher memory usage
- ❌ May hit rate limits on LLM provider
When to enable: When you have sufficient RAM and fast LLM provider
When to disable: Limited resources, rate limiting issues, or when sequential processing is more stable
Chunk Overlap
Overlap helps maintain context across chunk boundaries:
CHUNK_OVERLAP=200
Use overlap when:
- PII might span chunk boundaries
- Context is important for detection
- Processing speed is acceptable
Environment-Specific Configuration
Using Different Configs for Different Environments
Development
# .env.development
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral
ENABLE_PARALLEL_CHUNKS=false
Production
# .env.production
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo
ENABLE_PARALLEL_CHUNKS=true
CHUNK_SIZE=2000
Loading Environment-Specific Config
# Development
cp .env.development .env
npm start
# Production
cp .env.production .env
npm start
Docker Configuration
Using Environment Files with Docker
# Build with environment file
docker build -t anondocs .
# Run with environment file
docker run -d \
--name anondocs \
-p 3000:3000 \
--env-file .env.production \
anondocs
Docker Compose Configuration
version: '3.8'
services:
anondocs:
build: .
ports:
- "3000:3000"
env_file:
- .env.production
environment:
- PORT=3000
restart: unless-stopped
Kubernetes Configuration
ConfigMap Example
apiVersion: v1
kind: ConfigMap
metadata:
name: anondocs-config
data:
PORT: "3000"
DEFAULT_LLM_PROVIDER: "ollama"
OLLAMA_BASE_URL: "http://ollama:11434"
OLLAMA_MODEL: "mistral-nemo"
CHUNK_SIZE: "1500"
CHUNK_OVERLAP: "0"
ENABLE_PARALLEL_CHUNKS: "false"
Using ConfigMap in Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: anondocs
spec:
template:
spec:
containers:
- name: anondocs
envFrom:
- configMapRef:
name: anondocs-config
Validation
Check Configuration
The server validates configuration on startup. Check logs for any configuration errors:
npm start
# Look for:
# ✓ Configuration loaded successfully
# ✓ LLM provider: ollama
# ✓ Model: mistral-nemo
Test Configuration
# Health check
curl http://localhost:3000/health
# Test anonymization
curl -X POST http://localhost:3000/api/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Contact John Doe at john@example.com",
"provider": "ollama"
}'
Next Steps
- 🤖 LLM Provider Setup - Configure your LLM provider
- 🚀 Deployment Options - Production deployment strategies
- 🔒 Production Considerations - Security and scaling