Notes from Andrej Karpathy's NanoGPT codealong¶
- ChatGPT
- generates text - left to right
- is a probabilistic system
- stands for Chat "Generatively Pre-trained transformer"
- is a language model - it models the sequence of characters or words or token. It predicts how charactes/words/tokens follow each other in a language
- given a question/prompt, ChatGPT is completing the sequence.
-
is based on the transformer architecture (see the 2017 landmark paper, attention is all you need)
-
NanoGPT
- trained on [OpenWebtext]
- reproduces GPT2 124 Million parameter model
Codealong: ~NanoGPT * is a character level language model * trained on Tiny Shakespeare * generates infinite Shakespeare
Tokenization¶
- character level
- used in the codealong
- word level
- sub-word level
- Google Sentence piece
- OpenAI tiktoken (used in GPT)
tradeoff codebook size and sequence lengths: with word level or sub word level tokenization, there are a larger number of tokens, but this results in much more compact encoding.
Training¶
- Training happens on:
- on chunks of data of given blocksize (or context length)
- In each block, there are 'blocksize' number of individual "contexts"
- the blocksize gives us the 'time' component (?)
- on chucks of given batchsize.
- Chunks are trained independently
Bigram Language Model¶
- See makemore series
Jupyter Notebooks¶
Links¶
Links (specific to codealong)¶
- ChatGPT Prompt/Response Library
- Attention is all you need
- Tiny Shakespeare
- nanoGPT
- OpenWebText
- GPT 2 weights released by OpenAI
- Sentence piece
- Tiktoken
To Do¶
- walkthrough the makemore playlist
- revist this lecture from timestamp 25