Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have been experimenting with tokenization in Rust in https://github.com/rth/vtext, mostly relying on Unicode segmentation with a few custom rules for tokenization. The obtained performance was also around 10x faster than spacy for comparable precision (see benchmarks section of the readme).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: