Metadata-Version: 2.4
Name: valjson
Version: 2.0.0
Summary: Per-grammar-role loss decomposition for fine-tuned structured JSON output
Author-email: Breck Baldwin <breckbaldwin@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/validjson/valjson
Project-URL: Paper, https://arxiv.org/abs/XXXX.XXXXX
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: transformers>=4.40
Provides-Extra: peft
Requires-Dist: peft>=0.10; extra == "peft"

# valjson

Per-grammar-role loss analysis for structured JSON output from language models.

**Fine-tuning your LLM for JSON? Your aggregate metrics might be hiding per-field regressions.**

```bash
pip install valjson
```

## What it does

`valjson` is the observability layer for structured JSON output. It meets you wherever you are:

| You have | Command | Needs model? |
|----------|---------|:------------:|
| Messy text with JSON | `valjson --extract --data output.txt` | No |
| JSON + schema | `valjson --validate --schema s.json --data output.jsonl` | No |
| Broken JSON + schema | `valjson --fix --schema s.json --data output.jsonl` | No |
| JSON + schema | `valjson --anatomy --schema s.json --data output.jsonl` | No |
| JSON + schema + gold | `valjson --compare --schema s.json --data output.jsonl --gold truth.jsonl` | No |
| Two output sets | `valjson --diff --schema s.json --data a.jsonl --data2 b.jsonl` | No |
| Per-field probabilities | `valjson --gate --data probs.jsonl` | No |
| Model + checkpoint | `valjson --checkpoint lora/ --schema s.json --data test.jsonl` | Yes |

## The problem

Standard fine-tuning + grammar-constrained decoding produces valid JSON. Aggregate loss improves. But:

```
STRUCTURAL         5.33 -> 0.00     -100%   OK
KEY                0.47 -> 0.00     -100%   OK
BOOLEAN            0.46 -> 1.05     +130%   !! REGRESSION
TOTAL              0.55 -> 0.17      -69%
```

Aggregate loss improved 69%. Boolean prediction got 130% worse. `valjson` catches this.

## Per-field accuracy vs gold

When you have human-labeled gold JSONs, `--compare` gives you per-field accuracy
without needing a model — role-aware, so a blown free-text field does not drown
the signal on the constrained fields you actually care about:

```bash
valjson --compare \
    --schema schema.json \
    --data generated.jsonl \
    --gold gold.jsonl \
    --ignore-role STRING,ARRAY
```

- **`--match-by <key>`** — pair records by ID (wrapper-first lookup), not line order.
- **`--ignore-role STRING,ARRAY`** — focus on BOOLEAN / ENUM / NUMBER fields where
  exact equality is meaningful. Unmatched IDs are reported separately.

## Evidential gating: abstain when the model is unsure

When the model returns a probability distribution over allowed values for each
constrained field, `--gate` decides per record whether to **commit**, **abstain**,
or **reject** based on the margin between the top two values:

```bash
valjson --gate \
    --data probs.jsonl \
    --margin-threshold 0.30
```

Input format (one record per line):

```json
{"id": "rec-001",
 "probs": {
   "refundable": {"True": 0.553, "False": 0.447},
   "status":     {"submitted": 0.9, "pending": 0.1, "draft": 0.0}
 }}
```

A field's `margin = top_prob − second_prob`. If `margin ≥ threshold`, the
field is committed; otherwise it abstains. Per-field abstention rates are
reported — fields with persistently high abstention are diagnostic of
underspecified training targets (the regression pattern this paper documents).

## Quick start

See [QUICK_START.md](QUICK_START.md) for a hands-on walkthrough from messy output to full analysis.

## Python API

```python
from valjson import analyze

report = analyze(
    model_name="Qwen/Qwen2.5-0.5B-Instruct",
    checkpoint="my_lora/",
    schema="schema.json",
    data="test.jsonl",
)
print(report)

if report.regressions:
    print(f"REGRESSIONS: {[r.role for r in report.regressions]}")
```

Exit code is 1 if regressions are detected. Use in CI/CD.

## Grammar Roles

| Role | Description | Examples |
|------|-------------|----------|
| STRUCTURAL | JSON syntax | `{` `}` `[` `]` `:` `,` |
| QUOTE | String delimiters | `"` |
| KEY | Object key characters | `city`, `cuisine` |
| ENUM_VALUE | Categorical values | `Italian`, `Economy` |
| BOOLEAN | Boolean strings | `True`, `False` |
| NUMBER | Numeric characters | `42`, `3.14` |
| FREE_TEXT | Non-categorical content | names, addresses |
| WHITESPACE | Formatting | spaces, newlines |

## Links

- [Quick Start Tutorial](QUICK_START.md)
- [Fixing JSON Fine-Tuning](https://validjson.github.io/valjson/) — mitigations, services
- [Paper](https://arxiv.org/abs/XXXX.XXXXX) — "Valid JSON, Wrong Answer" (2026)

## License

MIT
