Notes from Andrej Karpathy's NanoGPT codealong¶

ChatGPT
generates text - left to right
is a probabilistic system
stands for Chat "Generatively Pre-trained transformer"
is a language model - it models the sequence of characters or words or token. It predicts how charactes/words/tokens follow each other in a language
given a question/prompt, ChatGPT is completing the sequence.
is based on the transformer architecture (see the 2017 landmark paper, attention is all you need)
NanoGPT
trained on [OpenWebtext]
reproduces GPT2 124 Million parameter model

Codealong: ~NanoGPT * is a character level language model * trained on Tiny Shakespeare * generates infinite Shakespeare

Tokenization¶

character level
used in the codealong
word level
sub-word level
Google Sentence piece
OpenAI tiktoken (used in GPT)

tradeoff codebook size and sequence lengths: with word level or sub word level tokenization, there are a larger number of tokens, but this results in much more compact encoding.

Training¶

Training happens on:
on chunks of data of given blocksize (or context length)
- In each block, there are 'blocksize' number of individual "contexts"
- the blocksize gives us the 'time' component (?)
on chucks of given batchsize.
- Chunks are trained independently

Notes from Andrej Karpathy's NanoGPT codealong¶

Tokenization¶

Training¶

Bigram Language Model¶

Jupyter Notebooks¶

Links¶

Links (specific to codealong)¶

To Do¶