Introduction
Welcome to AnonDocs - an open-source document anonymization tool designed to help you protect privacy while sharing knowledge. Proudly developed by AI SmartTalk, AnonDocs empowers individuals and organizations to remove sensitive information from documents before sharing them, ensuring compliance with privacy regulations like GDPR while maintaining document readability (structure is preserved for DOCX files).
What is AnonDocs?
AnonDocs is a privacy-first, self-hostable microservice that uses AI to automatically detect and anonymize Personally Identifiable Information (PII) in documents. It supports multiple file formats (PDF, DOCX, TXT) and can process both uploaded files and raw text input.
Key Features
- 🔒 Privacy-First: All processing happens locally on your infrastructure - no data ever leaves your control
- 🤖 AI-Powered: Uses advanced LLM models (Ollama, OpenAI-compatible APIs) for intelligent PII detection
- 📄 Multi-Format Support: Handles PDF, DOCX, and plain text files
- ⚡ Real-Time Progress: Server-Sent Events (SSE) for live progress updates during anonymization
- 🌍 Open Source: Fully open-source under MIT license, transparent and auditable
- 🚀 Self-Hostable: Deploy on your own infrastructure for maximum control and compliance
How It Works
AnonDocs follows a microservice architecture:
- Upload/Input: Documents or text are sent to the API endpoints
- Parsing: Files are parsed to extract text content (PDF, DOCX, TXT)
- Detection: LLM models analyze the text to detect PII (names, emails, phones, addresses, IDs)
- Anonymization: Detected PII is replaced with generic placeholders
- Output: Anonymized text is returned (DOCX structure is preserved, PDFs are converted to plain text)
Architecture Overview
Quick Start
Try It Online
The easiest way to try AnonDocs is through our web interface at anondocs.org/anonymize. Simply upload a document or paste text, and get instant anonymization results.
SDK Quick Example (Recommended)
import { AnonDocsClient } from '@aismarttalk/anondocs-sdk';
const client = new AnonDocsClient({
baseUrl: 'http://localhost:3000'
});
const result = await client.anonymizeText(
'Contact John Doe at john@example.com or call 555-1234'
);
console.log(result.anonymizedText);
// Output: Contact [NAME] at [EMAIL] or call [PHONE]
API Quick Example
import requests
# Anonymize text
response = requests.post('http://localhost:3000/api/anonymize', json={
'text': 'Contact John Doe at john@example.com or call 555-1234',
'provider': 'ollama'
})
print(response.json()['data']['anonymizedText'])
# Output: Contact [NAME] at [EMAIL] or call [PHONE]
Self-Host (5 Minutes)
# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Pull a model
ollama pull mistral-nemo
# 3. Clone and start AnonDocs
git clone https://github.com/AI-SmartTalk/AnonDocs.git
cd AnonDocs
npm install
# 4. Configure (create .env)
echo "DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo" > .env
# 5. Start
npm start
For detailed self-hosting instructions, see our Self-Hosting Guide.
Use Cases
- 🏥 Healthcare: Anonymize patient records before sharing with researchers
- ⚖️ Legal: Redact sensitive information from legal documents for public disclosure
- 💼 HR: Process employee data while maintaining privacy
- 🏦 Finance: Sanitize financial documents for analysis
- 📊 Research: Share datasets without exposing personal information
- 🔐 Compliance: Meet GDPR, HIPAA, and other privacy regulations
Privacy & Security
GDPR Compliance
✅ Data Never Leaves Your Infrastructure - All processing happens locally on your servers
✅ Zero Data Retention - Files are immediately deleted after processing, no storage
✅ Open Source & Auditable - Review every line of code yourself
For more details, see our Privacy & Security documentation.
What's Next?
- 📦 SDK & API Reference - Use the TypeScript/JavaScript SDK or REST API
- 📚 Supported Formats - Learn about file format support
- 🚀 Self-Hosting Guide - Deploy your own instance
- 🔒 Privacy & Security - Understand our privacy guarantees
Getting Help
- 💬 GitHub Discussions - Ask questions and share ideas
- 🐛 GitHub Issues - Report bugs or request features
- 📖 GitHub Repository - View source code and contribute
AnonDocs - Protect Privacy, Share Knowledge. Open source document anonymization by AI SmartTalk.