A complete technical walkthrough of how documents move from upload to cited answer — without ever leaving your server.
Every step runs locally. No data is transmitted to any external server, API, or cloud service. The entire pipeline executes within a single Docker Compose deployment on your hardware.
Documents are uploaded through the Rendex web interface via a standard HTTPS multipart form. Each document is assigned to a matter (case/project) and stored on the local filesystem.
| Parameter | Value |
|---|---|
| Endpoint | POST /api/documents/upload/:matterId |
| Max file size | 50 MB |
| Storage location | /app/uploads/ (Docker volume) |
| File naming | UUID + original extension |
| Format | MIME Type | Extraction Method |
|---|---|---|
| application/pdf | pdf-parse — extracts embedded text layer | |
| DOCX | application/vnd.openxmlformats-... | mammoth — raw text extraction |
| DOC | application/msword | mammoth — raw text extraction |
| TXT | text/plain | UTF-8 buffer read |
1. File is saved to the local /app/uploads/ volume.
2. A database record is created with status processing.
3. An audit log entry is written (user, document, timestamp, IP).
4. Asynchronous background processing begins immediately.
The system detects the MIME type and dispatches to the appropriate parser. All extraction happens in-process using Node.js libraries — no external services.
Scanned PDFs: If a PDF contains no extractable text layer (i.e. it's a scanned image), the system falls back to OCR processing to extract text from the document images.
The extracted text is a single UTF-8 string containing the full document content, which is then passed to the chunking stage.
The extracted text is split into overlapping chunks. Each chunk becomes a separate vector in the database. The overlap ensures that information spanning a chunk boundary is captured in at least one chunk.
Each chunk retains its character position in the original document (start/end offsets), enabling the UI to highlight the exact source passage when a citation is clicked.
Each chunk is converted into a 768-dimensional vector using a local embedding model running on your GPU. This is the mathematical representation that enables semantic search.
No external API calls. The embedding model runs entirely on your GPU via Ollama. The vectors never leave the machine. nomic-embed-text is downloaded once during installation and persists in a Docker volume.
The embedding vector and its metadata are stored in Qdrant, an open-source vector database running locally. Each chunk becomes a "point" in the collection.
Qdrant maintains payload indexes on matter_id and document_id for fast filtered queries. This is how ethical walls are enforced at the retrieval layer — queries only search vectors belonging to matters the user has permission to access.
The document's database record is updated to status indexed, the chunk_count is set, and the indexed_at timestamp is recorded. If any step fails, the status becomes failed with a stored error message.
When a user asks a question, Rendex converts it into a vector, searches for the most relevant chunks, builds a context window, and generates a cited answer using a local language model.
| # | Action | Detail |
|---|---|---|
| A | Resolve permissions | Determine which matters the user can access based on their role and matter-level permissions. |
| B | Embed the question | Convert the user's natural-language question into a 768-dim vector using nomic-embed-text. |
| C | Vector search | Query Qdrant for the top 8 most similar chunks, filtered by the user's accessible matter IDs. Distance metric: cosine similarity. |
| D | Build context | Assemble the 8 retrieved chunks into a numbered context block with source labels. |
| E | Generate answer | Send the context + question to the local LLM (Llama 3 via Ollama) with instructions to cite sources using [Source N] notation. |
| F | Return with citations | The API returns the answer, citation metadata (document name, matter, page, excerpt, similarity score), and confidence metrics. |
Queries are filtered at the vector layer. The Qdrant search includes a matter_id filter that restricts results to only the matters the user has been granted access to. A user on Matter A cannot retrieve chunks from Matter B, even if the content is semantically similar. This is enforced at the database level, not the UI level.
Every answer includes a confidence signal derived from the cosine similarity scores of the retrieved chunks.
When the highest similarity score is below the configured threshold, the UI displays a warning banner indicating that the answer may be less reliable. This signals to the attorney that the available documents may not contain a strong match for their question.
When a document is deleted, Rendex removes it from three places simultaneously:
1. File system — the original file is deleted from /app/uploads/.
2. Vector database — all Qdrant points with the matching document_id are purged.
3. PostgreSQL — the document record is removed (cascading to related entries).
No residual knowledge. Because the AI model is pre-trained and never fine-tuned on your documents, deleting a document fully removes it from the system. There are no shadow copies, cached embeddings, or residual model weights.
Every action in the pipeline is logged to an immutable, append-only audit table in PostgreSQL. The table has a database trigger that prevents all UPDATE and DELETE operations.
| Field | Description |
|---|---|
| user_id | Who performed the action |
| action | What happened (upload, query, delete, login, permission change) |
| resource_type | What was affected (document, matter, user) |
| details | Structured JSON with full context (query text, document name, etc.) |
| ip_address | Client IP address |
| created_at | Timestamp (UTC, immutable) |
| Ollama | LLM + embeddings (GPU) |
| Qdrant | Vector database |
| PostgreSQL | Auth, RBAC, audit log |
| Chat UI | Express.js web app |
| Nginx | TLS + reverse proxy |
| Open ports | 80, 443 |
| Outbound egress | None |
| Internal only | 5432, 6333, 11434 |
| PostgreSQL | 127.0.0.1 only |
Questions? Need a formal security questionnaire completed?
Contact us at info@rendex.ai — we'll turn it around in 48 hours.