tokeniser-py
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Uses a novel token generation algorithm and a dynamic programming-based segmentation…
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install tokeniser-py
Releases
Version | Released | Bullseye Python 3.9 |
Bookworm Python 3.11 |
Files |
---|---|---|---|---|
0.1.2 | 2025-03-22 | |||
|
||||
0.1.1 | 2025-03-22 | |||
|
||||
0.1.0 | 2025-03-22 | |||
|
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue