Zum Hauptinhalt springen

Introduction

Welcome to AnonDocs - an open-source document anonymization tool designed to help you protect privacy while sharing knowledge. Proudly developed by AI SmartTalk, AnonDocs empowers individuals and organizations to remove sensitive information from documents before sharing them, ensuring compliance with privacy regulations like GDPR while maintaining document readability (structure is preserved for DOCX files).

What is AnonDocs?

AnonDocs is a privacy-first, self-hostable microservice that uses AI to automatically detect and anonymize Personally Identifiable Information (PII) in documents. It supports multiple file formats (PDF, DOCX, TXT) and can process both uploaded files and raw text input.

Key Features

  • 🔒 Privacy-First: All processing happens locally on your infrastructure - no data ever leaves your control
  • 🤖 AI-Powered: Uses advanced LLM models (Ollama, OpenAI-compatible APIs) for intelligent PII detection
  • 📄 Multi-Format Support: Handles PDF, DOCX, and plain text files
  • Real-Time Progress: Server-Sent Events (SSE) for live progress updates during anonymization
  • 🌍 Open Source: Fully open-source under MIT license, transparent and auditable
  • 🚀 Self-Hostable: Deploy on your own infrastructure for maximum control and compliance

How It Works

AnonDocs follows a microservice architecture:

  1. Upload/Input: Documents or text are sent to the API endpoints
  2. Parsing: Files are parsed to extract text content (PDF, DOCX, TXT)
  3. Detection: LLM models analyze the text to detect PII (names, emails, phones, addresses, IDs)
  4. Anonymization: Detected PII is replaced with generic placeholders
  5. Output: Anonymized text is returned (DOCX structure is preserved, PDFs are converted to plain text)

Architecture Overview

Quick Start

Try It Online

The easiest way to try AnonDocs is through our web interface at anondocs.org/anonymize. Simply upload a document or paste text, and get instant anonymization results.

import { AnonDocsClient } from '@aismarttalk/anondocs-sdk';

const client = new AnonDocsClient({
baseUrl: 'http://localhost:3000'
});

const result = await client.anonymizeText(
'Contact John Doe at john@example.com or call 555-1234'
);

console.log(result.anonymizedText);
// Output: Contact [NAME] at [EMAIL] or call [PHONE]

API Quick Example

import requests

# Anonymize text
response = requests.post('http://localhost:3000/api/anonymize', json={
'text': 'Contact John Doe at john@example.com or call 555-1234',
'provider': 'ollama'
})

print(response.json()['data']['anonymizedText'])
# Output: Contact [NAME] at [EMAIL] or call [PHONE]

Self-Host (5 Minutes)

# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# 2. Pull a model
ollama pull mistral-nemo

# 3. Clone and start AnonDocs
git clone https://github.com/AI-SmartTalk/AnonDocs.git
cd AnonDocs
npm install

# 4. Configure (create .env)
echo "DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral-nemo" > .env

# 5. Start
npm start

For detailed self-hosting instructions, see our Self-Hosting Guide.

Use Cases

  • 🏥 Healthcare: Anonymize patient records before sharing with researchers
  • ⚖️ Legal: Redact sensitive information from legal documents for public disclosure
  • 💼 HR: Process employee data while maintaining privacy
  • 🏦 Finance: Sanitize financial documents for analysis
  • 📊 Research: Share datasets without exposing personal information
  • 🔐 Compliance: Meet GDPR, HIPAA, and other privacy regulations

Privacy & Security

GDPR Compliance

Data Never Leaves Your Infrastructure - All processing happens locally on your servers
Zero Data Retention - Files are immediately deleted after processing, no storage
Open Source & Auditable - Review every line of code yourself

For more details, see our Privacy & Security documentation.

What's Next?

Getting Help


AnonDocs - Protect Privacy, Share Knowledge. Open source document anonymization by AI SmartTalk.

Proudly made byAI SmartTalkAI SmartTalk