transformers - AI & Cloud by Syd

Part 6: Pretraining SydsGPT on Project Gutenberg

November 10, 2025

In Part 5, I assembled the complete GPT medium model and validated its architecture with forward passes and text generation. In Part 6, I moved into the crucial stage of pretraining. I set out to understand the basics of pretraining by building a complete, reproducible pipeline around a GPT‑2 style model I call SydsGPT. In…

Part 5: Building a complete GPT medium model and first text generation

October 28, 2025

Artificial Intelligence

In Part 4, I focused on attention and built reusable modules that mirror transformer internals. In Part 5, I assembled the complete GPT architecture at medium scale, validated shapes and memory, and ran first text generation. The outputs are gibberish because the model is untrained. That is expected. The goal here is to make sure…

Part 4: attention is all you need (pretty much): From Basic Attention to Multi-Head Attention in PyTorch

October 16, 2025

Artificial Intelligence

In Part 3 of this series, I focused on preparing a dataset for training a language model, combining multiple books into a corpus, tokenizing with tiktoken, and creating PyTorch datasets. With the data pipeline in place, the next step in building a GPT-style model is to understand attention mechanisms. Attention is the core innovation behind…

Part 3: Preparing Data for Training a Language Model

October 6, 2025

Artificial Intelligence

In Part 1 of this series, I built a simple neural network for classification to get comfortable with the basics of deep learning. In Part 2, I created a MiniTokenizer to understand how raw text is transformed into tokens. Now, in Part 3, I am moving one step closer to building a GPT-style model by…

Part 2: Building a Mini Tokenizer from Scratch

October 1, 2025

Artificial Intelligence

In Part 1 of this series, I built a simple neural network for binary and multiclass classification to get comfortable with the fundamentals of deep learning. For Part 2, I shifted focus to something equally important in the world of transformers: tokenization. Transformers do not work directly with raw text. They need text to be…

From Zero to Transformer: My Open-Source GPT Journey

September 16, 2025

Artificial Intelligence

Building My Own Transformer Model I’ve decided to take on a challenge that’s equal parts exciting and intimidating: building my own open-source transformer model from scratch: something in the spirit of GPT-OSS. Right now, I have basic machine learning skills and a working knowledge of Python. Over the coming weeks and months, I’ll be diving…

Tag: transformers