HyperAIHyperAI

Command Palette

Search for a command to run...

Minbpe Repository

Date

2 years ago

Size

312.27 KB

Publish URL

github.com

This repository is Karpathy's minbpe project repository.

There are two Tokenizers in this repository, both of which can perform the 3 main functions of a Tokenizer:

  • Train the tokenizer vocabulary and merge it with the given text
  • From text encoding to tokens
  • Decoding from tokens to text

The original intention of the minbpe project is to create the most concise, clear and educational code for the BPE algorithm widely used in LLM. By providing two Tokenizers, the minbpe project implements the core functions of word segmentation training, encoding and decoding. This design not only improves the readability of the code, but also provides users with a more convenient and efficient operation experience.

Specifically, the minbpe project repository contains class-based Tokenizer implementations such as BaseTokenizer and BasicTokenizer. These classes are designed to provide basic functions for training, encoding, and decoding, as well as practical functions such as saving and loading. In addition, implementations such as RegexTokenizer and GPT4Tokenizer further expand the functionality of the project, providing users with more choices and possibilities.

minbpe-master.torrent
Seeding 2Downloading 0Completed 123Total Downloads 150
  • minbpe-master/
    • README.md
      1.65 KB
    • README.txt
      3.3 KB
      • data/
        • minbpe-master.zip
          312.27 KB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Minbpe Repository | Datasets | HyperAI