Metadata-Version: 2.4
Name: datapruning
Version: 1.0.3
Summary: Intelligent data pruning for ML datasets
License: Proprietary
Project-URL: Homepage, https://www.datapruning.com
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: torch>=2.0.0
Dynamic: license-file

# DataPruning Lite

Local-first dataset pruning. Free for datasets up to 100K rows.

## Overview

DataPruning reduces dataset size before training by selecting a smaller,
more informative subset of data. Runs entirely on your machine through a
compiled SDK — no data upload, no external processing.

## Installation

```bash
pip install datapruning
```

**Note:** PyTorch (~2 GB) is a required dependency and will be installed
automatically. See [pytorch.org](https://pytorch.org) for GPU/CPU options.

## Requirements

- Python 3.9–3.13
- PyTorch 2.0+
- Pandas 2.0+
- NumPy 1.24+

## Usage

```python
import pandas as pd
from datapruning import Pruner

df = pd.read_csv("dataset.csv")
p = Pruner(df)
filtered = p.prune(target_col="label", keep_ratio=0.5)

print(f"Kept {len(filtered)} / {len(df)} rows")
```

## Features

- Local execution, no data upload
- Pre-training dataset optimization
- Pandas DataFrame input
- Compiled processing module
- Free tier: up to 100K rows per dataset

## Limits

Datasets exceeding 100,000 rows require an enterprise license.
Visit https://www.datapruning.com for details.

## License

Proprietary software. See LICENSE file.
