Saltar al contenido principal

Configuration

This guide covers all configuration options available in AnonDocs, from basic setup to advanced tuning.

Environment Variables

Configuration is done through environment variables, either via a .env file or system environment variables.

Server Configuration

# Port for the AnonDocs API server
PORT=3000

LLM Provider Configuration

Using Ollama

DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo

Using OpenAI-Compatible APIs

DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required

Note: OPENAI_API_KEY can be not-required for local APIs like vLLM, LM Studio, or LocalAI. For actual OpenAI API, provide a valid key.

Processing Configuration

# Characters per chunk for text processing
# Smaller chunks = more accurate but slower
# Larger chunks = faster but may miss context
CHUNK_SIZE=1500

# Overlap between chunks (in characters)
# Helps maintain context across chunk boundaries
CHUNK_OVERLAP=0

# Enable parallel chunk processing
# true = faster but uses more memory
# false = sequential, safer for limited resources
ENABLE_PARALLEL_CHUNKS=false

Configuration Reference

Complete .env Example

# ============================================
# Server Configuration
# ============================================
PORT=3000

# ============================================
# LLM Provider Configuration
# ============================================
# Options: ollama, openai
DEFAULT_LLM_PROVIDER=ollama

# Ollama Configuration (if using Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo

# OpenAI-Compatible API Configuration (if using vLLM, LM Studio, LocalAI)
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
OPENAI_API_KEY=not-required

# ============================================
# Processing Configuration
# ============================================
CHUNK_SIZE=1500
CHUNK_OVERLAP=0
ENABLE_PARALLEL_CHUNKS=false

Configuration Profiles

Development Profile

For local development with minimal resource usage:

PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
CHUNK_SIZE=1000
CHUNK_OVERLAP=100
ENABLE_PARALLEL_CHUNKS=false

Production Profile

For production with high accuracy requirements:

PORT=3000
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo
CHUNK_SIZE=2000
CHUNK_OVERLAP=200
ENABLE_PARALLEL_CHUNKS=true

High-Performance Profile

For high-throughput deployments with GPU:

PORT=3000
DEFAULT_LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
CHUNK_SIZE=2500
CHUNK_OVERLAP=250
ENABLE_PARALLEL_CHUNKS=true

Performance Tuning

Chunk Size Optimization

The CHUNK_SIZE parameter controls how much text is processed at once:

  • Smaller chunks (1000-1500): More accurate PII detection, better context handling, slower processing
  • Larger chunks (2000-3000): Faster processing, may miss context-dependent PII

Recommendation: Start with 1500 and adjust based on your documents:

# For short documents with dense PII
CHUNK_SIZE=1000

# For longer documents with scattered PII
CHUNK_SIZE=2000

Parallel Processing

Enable parallel processing for faster handling of large documents:

ENABLE_PARALLEL_CHUNKS=true

Trade-offs:

  • ✅ Faster processing
  • ✅ Better GPU utilization (if available)
  • ❌ Higher memory usage
  • ❌ May hit rate limits on LLM provider

When to enable: When you have sufficient RAM and fast LLM provider

When to disable: Limited resources, rate limiting issues, or when sequential processing is more stable

Chunk Overlap

Overlap helps maintain context across chunk boundaries:

CHUNK_OVERLAP=200

Use overlap when:

  • PII might span chunk boundaries
  • Context is important for detection
  • Processing speed is acceptable

Environment-Specific Configuration

Using Different Configs for Different Environments

Development

# .env.development
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral
ENABLE_PARALLEL_CHUNKS=false

Production

# .env.production
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_MODEL=mistral-nemo
ENABLE_PARALLEL_CHUNKS=true
CHUNK_SIZE=2000

Loading Environment-Specific Config

# Development
cp .env.development .env
npm start

# Production
cp .env.production .env
npm start

Docker Configuration

Using Environment Files with Docker

# Build with environment file
docker build -t anondocs .

# Run with environment file
docker run -d \
--name anondocs \
-p 3000:3000 \
--env-file .env.production \
anondocs

Docker Compose Configuration

version: '3.8'
services:
anondocs:
build: .
ports:
- "3000:3000"
env_file:
- .env.production
environment:
- PORT=3000
restart: unless-stopped

Kubernetes Configuration

ConfigMap Example

apiVersion: v1
kind: ConfigMap
metadata:
name: anondocs-config
data:
PORT: "3000"
DEFAULT_LLM_PROVIDER: "ollama"
OLLAMA_BASE_URL: "http://ollama:11434"
OLLAMA_MODEL: "mistral-nemo"
CHUNK_SIZE: "1500"
CHUNK_OVERLAP: "0"
ENABLE_PARALLEL_CHUNKS: "false"

Using ConfigMap in Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: anondocs
spec:
template:
spec:
containers:
- name: anondocs
envFrom:
- configMapRef:
name: anondocs-config

Validation

Check Configuration

The server validates configuration on startup. Check logs for any configuration errors:

npm start

# Look for:
# ✓ Configuration loaded successfully
# ✓ LLM provider: ollama
# ✓ Model: mistral-nemo

Test Configuration

# Health check
curl http://localhost:3000/health

# Test anonymization
curl -X POST http://localhost:3000/api/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Contact John Doe at john@example.com",
"provider": "ollama"
}'

Next Steps

Proudly made byAI SmartTalkAI SmartTalk