Skip to content

Documents

A Document represents a file or text you upload to a knowledge base. Once uploaded, Ragex automatically processes it through the ingestion pipeline: parse, chunk, embed, and index.

These files are processed through an advanced AI parser that extracts structured text, preserving layout, tables, and headings. Pages are counted from the actual page count after processing.

ExtensionFormat
.pdfPDF documents
.docxMicrosoft Word
.pptxMicrosoft PowerPoint
.xlsxMicrosoft Excel
.pngPNG images (OCR)
.jpg / .jpegJPEG images (OCR)
.webpWebP images (OCR)
.tiffTIFF images (OCR)

These text-based formats are ingested directly — no additional processing is needed beyond chunking and embedding. Pages are estimated from file size at ~2,400 bytes per page.

ExtensionFormat
.txtPlain text
.mdMarkdown
.html / .htmHTML
.csvComma-separated values
.tsvTab-separated values
.jsonJSON

All file types count toward your plan’s pages_processed limit.

LimitValue
Max file size50 MB
Max pages per document500

Every document moves through these statuses:

pending → parsing → chunking → embedding → ready
↘ failed
StatusMeaning
pendingQueued for processing
parsingFile content is being extracted
chunkingText is being split into chunks
embeddingChunks are being embedded into vectors
readyDocument is searchable
failedProcessing failed — check error_detail

Poll the document status via GET /v1/knowledge-bases/:kb_id/documents/:doc_id until it reaches ready or failed.

For content that’s already text (scraped pages, generated content, API responses), use the text ingestion endpoint instead of file upload:

POST /v1/knowledge-bases/:kb_id/documents/text
{
"text": "Your text content here...",
"name": "optional-filename.txt",
"metadata": { "source": "scraper" }
}

The text is stored as a .txt file and processed through the same chunking → embedding → indexing pipeline. The name and metadata fields are optional.

Each document supports arbitrary JSON metadata. This metadata is:

  • Stored with the document
  • Returned in search results as document_metadata
  • Filterable at query time using filter operators
{
"department": "engineering",
"version": 2,
"language": "en",
"author": "jane@example.com"
}

Set metadata at upload time via the metadata form field (file upload) or JSON body field (text ingestion). Update it later via PATCH /v1/knowledge-bases/:kb_id/documents/:doc_id.

See Search & Filtering for how to filter search results by metadata.

To replace a document’s content, use PUT /v1/knowledge-bases/:kb_id/documents/:doc_id. This re-processes the document through the full pipeline, replacing all existing chunks.