piwheels - tokeniser-py

tokeniser-py

A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Uses a novel token generation algorithm and a dynamic programming-based segmentation…

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install tokeniser-py

Releases

Version	Released	Bullseye Python 3.9	Bookworm Python 3.11	Files
0.1.2	2025-03-22

0.1.1	2025-03-22

0.1.0	2025-03-22

Issues with this package?

Search issues for this package
Package or version missing? Open a new issue
Something else? Open a new issue

Page last updated 2025-03-22 20:27:18 UTC

	Build succeeded
	Build failed
	Build skipped
	Build pending

tokeniser-py

Installation

Releases

Issues with this package?

Key