Metadata-Version: 2.4
Name: datapruning
Version: 1.0.5
Summary: Intelligent data pruning for ML datasets
License: Proprietary
Project-URL: Homepage, https://www.datapruning.com
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: torch>=2.0.0

# DataPruning

Intelligent dataset pruning for ML — reduces dataset size by selecting the most informative rows.

## Installation

```bash
pip install datapruning
```

**Note:** PyTorch (~2 GB) is a required dependency.

## Requirements

- Python >= 3.10
- PyTorch 2.0+
- Pandas 2.0+
- NumPy 1.24+

## Limits

- Minimum: 1,000 rows per dataset
- Maximum: 300,000 rows per dataset

## Usage

```python
import pandas as pd
from datapruning import Pruner

df = pd.read_csv("dataset.csv")
p = Pruner(df)
filtered = p.prune(target_col="label", keep_ratio=0.5)

print(f"Kept {len(filtered)} / {len(df)} rows")
```

## Features

- Runs locally, no data upload
- Pre-training dataset optimization
- Pandas DataFrame input / output
- Compiled processing module

## Links

- Website: [datapruning.com](https://www.datapruning.com)

## License

Proprietary. See LICENSE file.
