Technology
The DocZoom technology platform
for on-premises AI
A production-ready architecture that combines ingestion, hybrid retrieval, and AI agents on an open-source stack. Everything stays inside your infrastructure boundary.
ARCHITECTURE
6 Modular Layers
Fully on-premises architecture ensuring modularity, scalability, and maintainability.
FRONTEND LAYER
React + TypeScript
PWA-ready with Chat Interface, Document Viewer, Semantic Search
API GATEWAY
FastAPI + LangChain
REST API OpenAPI 3.0, JWT/SSO/SAML/MFA, LangGraph Search Agent
DOCUMENT PROCESSING
Rust Engine
High-performance async, 20+ formats, Semantic chunking, Queue Manager
AI ENGINE
PyTorch + vLLM
LLM, Embeddings, OCR, Reranker, NER - all local on GPU
DATA LAYER
pgvector + PostgreSQL
Vector DB, MeiliSearch full-text, Redis cache, Object Storage NFS
INFRASTRUCTURE
Docker + NVIDIA Runtime
GPU Passthrough CUDA, DGX Spark / ZGX with GB10 Superchip
AI ENGINE
Integrated AI Models
Exclusively open-source models running locally. Independence from external APIs, full data control.
LLM Chat & Reasoning
Nemotron-3-Nano-30B
Embeddings
BGE-M3
OCR
Surya
Reranker & NER
Llama-3.1-Nemotron-Nano-VL-8B
RAG ARCHITECTURE
Retrieval-Augmented Generation
Hybrid RAG architecture combining vector semantic search and full-text to maximize precision and recall.
Retrieval Pipeline
Query Embedding
BGE-M3 generates multilingual embedding of user query
Hybrid Search
Parallel search on pgvector (semantic) and MeiliSearch (full-text)
Reranking
Llama-3.1-Nemotron-8B reorders results by contextual relevance
Context Assembly
Top-K chunks assembled with metadata (file, page, score)
Generation Pipeline
Prompt Engineering
System prompt + context chunks + user query with chain-of-thought
LLM Inference
Nemotron-3-Nano-30B via vLLM with CUDA GPU acceleration
Citation Extraction
NER to identify citations and link to source documents
Response Streaming
Streaming output with token-by-token delivery via WebSocket
Multi-RAG Collections
Each collection is an isolated namespace with its own vector index. Supports tenant isolation for multi-client environments.
∞
Collections
HNSW
Index Algorithm
1536
Vector Dimensions
SECURITY & COMPLIANCE
Designed for Critical Environments
AES-256 at rest, TLS 1.3 in transit. SSO/SAML, LDAP/AD, MFA. Complete audit trail for GDPR, SOC2, ISO 27001.
Data Sovereignty
Data physically stays at customer premises, never on external servers.
Native GDPR
No extra-EU data transfers. Automatic regulatory compliance.
Air-Gap Ready
Works completely offline. Ideal for high-security environments.
Zero Vendor Lock-in
100% open-source stack with full portability and independence.
WHO IT'S FOR
Three profiles that get value quickly
Same technology foundation, different adoption paths based on business goals.
CTO / IT Manager
You need document and email AI without external API dependency, while keeping full platform governance.
Outcome: controlled technical rollout on a portable stack.
Compliance / Legal
You operate under strict constraints on data residency, auditability, and access segregation by team and client.
Outcome: simpler and more auditable compliance path.
Operations / Customer Care
You need to reduce time spent across documents and email, and standardize responses on real internal knowledge.
Outcome: faster operations with fewer manual escalations.
PROOF & BENCHMARK
Useful numbers, not just claims
Indicative metrics from real enterprise workloads for an initial technical evaluation.
< 500 ms
Semantic query latency
Top-20 retrieval on corpora up to 1M chunks
1000+ doc/min
Indexing throughput
Parallel pipeline on mixed documents (text + OCR)
99.9%
Service availability target
With redundant on-premises deployment
Metrics vary by hardware, document quality, and security policies. In the demo we share methodology, test dataset, and configuration.
TECHNICAL DOWNLOADS
Download technical materials
Leave your email to unlock practical resources for IT, architecture and compliance teams.
Architecture Vertical One-Pager
Single-slide technical summary for architecture review sessions.
Available after unlocking downloads.
Technology Deep-Dive
Detailed architecture material for CTO, platform and security teams.
Available after unlocking downloads.
Product & Platform Overview
End-to-end overview from ingestion to grounded AI answers with source traceability.
Available after unlocking downloads.
Ready for a technical evaluation?
Our team can arrange an architecture deep-dive and a POC in your environment.