Documents
A Document represents a file or text you upload to a knowledge base. Once uploaded, Ragex automatically processes it through the ingestion pipeline: parse, chunk, embed, and index.
Supported File Types
Section titled “Supported File Types”Rich Documents
Section titled “Rich Documents”These files are processed through an advanced AI parser that extracts structured text, preserving layout, tables, and headings. Pages are counted from the actual page count after processing.
| Extension | Format |
|---|---|
.pdf | PDF documents |
.docx | Microsoft Word |
.pptx | Microsoft PowerPoint |
.xlsx | Microsoft Excel |
.png | PNG images (OCR) |
.jpg / .jpeg | JPEG images (OCR) |
.webp | WebP images (OCR) |
.tiff | TIFF images (OCR) |
Text-Based Files
Section titled “Text-Based Files”These text-based formats are ingested directly — no additional processing is needed beyond chunking and embedding. Pages are estimated from file size at ~2,400 bytes per page.
| Extension | Format |
|---|---|
.txt | Plain text |
.md | Markdown |
.html / .htm | HTML |
.csv | Comma-separated values |
.tsv | Tab-separated values |
.json | JSON |
All file types count toward your plan’s pages_processed limit.
Limits
Section titled “Limits”| Limit | Value |
|---|---|
| Max file size | 50 MB |
| Max pages per document | 500 |
Processing Lifecycle
Section titled “Processing Lifecycle”Every document moves through these statuses:
pending → parsing → chunking → embedding → ready ↘ failed| Status | Meaning |
|---|---|
pending | Queued for processing |
parsing | File content is being extracted |
chunking | Text is being split into chunks |
embedding | Chunks are being embedded into vectors |
ready | Document is searchable |
failed | Processing failed — check error_detail |
Poll the document status via GET /v1/knowledge-bases/:kb_id/documents/:doc_id until it reaches ready or failed.
Text Ingestion
Section titled “Text Ingestion”For content that’s already text (scraped pages, generated content, API responses), use the text ingestion endpoint instead of file upload:
POST /v1/knowledge-bases/:kb_id/documents/text{ "text": "Your text content here...", "name": "optional-filename.txt", "metadata": { "source": "scraper" }}The text is stored as a .txt file and processed through the same chunking → embedding → indexing pipeline. The name and metadata fields are optional.
Document Metadata
Section titled “Document Metadata”Each document supports arbitrary JSON metadata. This metadata is:
- Stored with the document
- Returned in search results as
document_metadata - Filterable at query time using filter operators
{ "department": "engineering", "version": 2, "language": "en", "author": "jane@example.com"}Set metadata at upload time via the metadata form field (file upload) or JSON body field (text ingestion). Update it later via PATCH /v1/knowledge-bases/:kb_id/documents/:doc_id.
See Search & Filtering for how to filter search results by metadata.
Document Replacement
Section titled “Document Replacement”To replace a document’s content, use PUT /v1/knowledge-bases/:kb_id/documents/:doc_id. This re-processes the document through the full pipeline, replacing all existing chunks.