Skip to content

MultiAgentEval

Unstructured

📄 Unstructured Data Guide¶

High-fidelity extraction from non-tabular sources (PDF, Doc, Web) using LLM-gated scrapers and OCR pipelines. This sector enables the engine to process raw, natural language source material.

Status: HARDENED¶

Architecture: LLM-Gated Scraper Fleet.
Verification: 100% Parity.

Data Sources¶

Web-based Acquisition: Universal scraper fleet (API-compatible).
PDF/OCR: Textual extraction from institutional reports and documentation.

🛠️ Schema (`StandardSchema`)¶

source_uri: URL or File path.
content_type: text, image, or hybrid.
metric: relevance_score, data_density, or sentiment.
value: Numerical reading.
integrity_hash: SHA-256 hash of the raw source.

Back to Index | User Manual | Data Veracity