Metadata-Version: 2.4
Name: adx
Version: 0.1.0
Summary: ADX: Agentic Data Layer. Structured, citeable, inspectable document state for AI agents.
Project-URL: Homepage, https://github.com/harsh-nod/adx
Project-URL: Documentation, https://harsh-nod.github.io/adx/
Project-URL: Repository, https://github.com/harsh-nod/adx
Author: ADX Contributors
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent,ai,citation,document,docx,excel,extraction,pdf,rtf
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: aiosqlite<1.0,>=0.20
Requires-Dist: alembic<2.0,>=1.13
Requires-Dist: click<9.0,>=8.1
Requires-Dist: fastapi<1.0,>=0.110
Requires-Dist: httpx<1.0,>=0.27
Requires-Dist: openpyxl<4.0,>=3.1
Requires-Dist: pydantic<3.0,>=2.0
Requires-Dist: pymupdf<2.0,>=1.24
Requires-Dist: python-docx<2.0,>=1.1
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: rich<14.0,>=13.0
Requires-Dist: sqlalchemy<3.0,>=2.0
Requires-Dist: striprtf<1.0,>=0.0.26
Requires-Dist: uvicorn[standard]<1.0,>=0.29
Provides-Extra: dev
Requires-Dist: mypy<2.0,>=1.10; extra == 'dev'
Requires-Dist: pre-commit<4.0,>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio<1.0,>=0.23; extra == 'dev'
Requires-Dist: pytest-cov<6.0,>=5.0; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.0; extra == 'dev'
Requires-Dist: ruff<1.0,>=0.4; extra == 'dev'
Provides-Extra: llm
Requires-Dist: anthropic<1.0,>=0.30; extra == 'llm'
Requires-Dist: openai<2.0,>=1.30; extra == 'llm'
Provides-Extra: postgres
Requires-Dist: asyncpg<1.0,>=0.29; extra == 'postgres'
Requires-Dist: psycopg2-binary<3.0,>=2.9; extra == 'postgres'
Description-Content-Type: text/markdown

# ADX

Agent-native document intelligence layer. Wraps best-in-class document parsers and exposes structured, citeable, inspectable document state to AI agents.

## What It Does

ADX is **not** a parser. It is an orchestration layer that:

1. **Wraps** parsers (PyMuPDF, openpyxl, csv) behind a uniform interface
2. **Builds** a canonical `DocumentGraph` from any supported format
3. **Exposes** 9 read-only agent tools for inspection and search
4. **Extracts** fields using built-in schemas with field-level citations
5. **Validates** results with rule-based checks

## Install

```bash
pip install adx
```

## Quick Start

```python
from adx import ADX

dn = ADX()
doc_id = dn.upload("invoice.pdf")

# Profile the document
profile = dn.profile(doc_id)

# Extract with a built-in schema
extraction = dn.extract(doc_id, schema="invoice")

# Validate
result = dn.validate(doc_id, extraction["id"])
```

## Agent Tools

| Tool | Description |
|---|---|
| `profile_document` | File metadata, type detection, recommended tools |
| `list_structure` | Sections, tables, page outline |
| `search_document` | Full-text search with citations |
| `get_page` | Page text blocks and tables |
| `get_table` | Table rows with headers |
| `list_sheets` | Spreadsheet sheet metadata |
| `read_range` | Cell range with formulas |
| `find_cells` | Search cells by value |
| `inspect_formula` | Trace formula dependencies |

## Built-in Schemas

- `invoice` — vendor, line items, totals, tax
- `contract` — parties, dates, governing law
- `financial_model` — revenue, expenses, assumptions
- `table` — generic table extraction

## REST API

```bash
adx serve
curl -X POST http://localhost:8000/v1/files -F "file=@invoice.pdf"
curl http://localhost:8000/v1/files/{id}/profile
```

## CLI

```bash
adx upload invoice.pdf
adx profile <id>
adx extract <id> --schema invoice
adx validate <id> --extraction <eid>
```

## Supported Formats

| Format | Parser | Features |
|---|---|---|
| PDF | PyMuPDF | Text blocks, tables, bounding boxes, sections |
| Excel (.xlsx) | openpyxl | Sheets, formulas, hidden content, named ranges |
| DOCX | python-docx | Paragraphs, headings, tables, images, metadata |
| RTF | striprtf | Text extraction with formatting stripped |
| CSV | stdlib | Dialect sniffing, encoding detection |

## Documentation

https://harsh-nod.github.io/adx/

## License

MIT
