Back to Roadmap10

Project: Building GPT

Build GPT-2 Small from scratch: prepare data, train the model, and generate text.

The earlier chapters built each component in isolation. This project assembles them into a single codebase that prepares data, trains a model, and generates text. Each stage builds directly on the previous one, so work through them in order.

We will train GPT-2 Small, a ~124M parameter model. The training data is FineWeb-Edu, a filtered educational web corpus. Training at this scale requires an NVIDIA GPU. If you are on a CPU or Apple Silicon, the architecture chapter includes a lighter configuration with fewer layers and a narrower embedding. No other code changes needed.

By The End Of The Project
  • prepare_data.py tokenizes FineWeb-Edu shards into train.bin and val.bin
  • data.py serves aligned x / y batches from those files
  • model.py wraps the Transformer stack into a trainable GPT
  • train.py runs the training loop and saves checkpoints
  • generate.py loads a checkpoint and samples continuations from a prompt