Technology

The DocZoom technology platform
for on-premises AI

A production-ready architecture that combines ingestion, hybrid retrieval, and AI agents on an open-source stack. Everything stays inside your infrastructure boundary.

< 500 ms
Semantic Query
< 10 ms
Full-text Search
1000+ doc/min
Throughput
99.9%
Uptime

ARCHITECTURE

6 Modular Layers

Fully on-premises architecture ensuring modularity, scalability, and maintainability.

FRONTEND LAYER

React + TypeScript

PWA-ready with Chat Interface, Document Viewer, Semantic Search

API GATEWAY

FastAPI + LangChain

REST API OpenAPI 3.0, JWT/SSO/SAML/MFA, LangGraph Search Agent

DOCUMENT PROCESSING

Rust Engine

High-performance async, 20+ formats, Semantic chunking, Queue Manager

AI ENGINE

PyTorch + vLLM

LLM, Embeddings, OCR, Reranker, NER - all local on GPU

DATA LAYER

pgvector + PostgreSQL

Vector DB, MeiliSearch full-text, Redis cache, Object Storage NFS

INFRASTRUCTURE

Docker + NVIDIA Runtime

GPU Passthrough CUDA, DGX Spark / ZGX with GB10 Superchip

AI ENGINE

Integrated AI Models

Exclusively open-source models running locally. Independence from external APIs, full data control.

LLM Chat & Reasoning

Nemotron-3-Nano-30B

30B (A3B)NVIDIA License

Embeddings

BGE-M3

568MMIT License

OCR

Surya

-Apache 2.0

Reranker & NER

Llama-3.1-Nemotron-Nano-VL-8B

8BNVIDIA License

RAG ARCHITECTURE

Retrieval-Augmented Generation

Hybrid RAG architecture combining vector semantic search and full-text to maximize precision and recall.

Retrieval Pipeline

1

Query Embedding

BGE-M3 generates multilingual embedding of user query

2

Hybrid Search

Parallel search on pgvector (semantic) and MeiliSearch (full-text)

3

Reranking

Llama-3.1-Nemotron-8B reorders results by contextual relevance

4

Context Assembly

Top-K chunks assembled with metadata (file, page, score)

Generation Pipeline

1

Prompt Engineering

System prompt + context chunks + user query with chain-of-thought

2

LLM Inference

Nemotron-3-Nano-30B via vLLM with CUDA GPU acceleration

3

Citation Extraction

NER to identify citations and link to source documents

4

Response Streaming

Streaming output with token-by-token delivery via WebSocket

Multi-RAG Collections

Each collection is an isolated namespace with its own vector index. Supports tenant isolation for multi-client environments.

Collections

HNSW

Index Algorithm

1536

Vector Dimensions

SECURITY & COMPLIANCE

Designed for Critical Environments

AES-256 at rest, TLS 1.3 in transit. SSO/SAML, LDAP/AD, MFA. Complete audit trail for GDPR, SOC2, ISO 27001.

Data Sovereignty

Data physically stays at customer premises, never on external servers.

Native GDPR

No extra-EU data transfers. Automatic regulatory compliance.

Air-Gap Ready

Works completely offline. Ideal for high-security environments.

Zero Vendor Lock-in

100% open-source stack with full portability and independence.

WHO IT'S FOR

Three profiles that get value quickly

Same technology foundation, different adoption paths based on business goals.

CTO / IT Manager

You need document and email AI without external API dependency, while keeping full platform governance.

Outcome: controlled technical rollout on a portable stack.

Compliance / Legal

You operate under strict constraints on data residency, auditability, and access segregation by team and client.

Outcome: simpler and more auditable compliance path.

Operations / Customer Care

You need to reduce time spent across documents and email, and standardize responses on real internal knowledge.

Outcome: faster operations with fewer manual escalations.

PROOF & BENCHMARK

Useful numbers, not just claims

Indicative metrics from real enterprise workloads for an initial technical evaluation.

< 500 ms

Semantic query latency

Top-20 retrieval on corpora up to 1M chunks

1000+ doc/min

Indexing throughput

Parallel pipeline on mixed documents (text + OCR)

99.9%

Service availability target

With redundant on-premises deployment

Metrics vary by hardware, document quality, and security policies. In the demo we share methodology, test dataset, and configuration.

TECHNICAL DOWNLOADS

Download technical materials

Leave your email to unlock practical resources for IT, architecture and compliance teams.

By submitting, you agree to be contacted by the DocZoom team.

Architecture Vertical One-Pager

Single-slide technical summary for architecture review sessions.

Available after unlocking downloads.

Technology Deep-Dive

Detailed architecture material for CTO, platform and security teams.

Available after unlocking downloads.

Product & Platform Overview

End-to-end overview from ingestion to grounded AI answers with source traceability.

Available after unlocking downloads.

Ready for a technical evaluation?

Our team can arrange an architecture deep-dive and a POC in your environment.