Every once in a while, a project comes along that feels like it should already exist, something obvious in hindsight, yet strangely absent from the landscape. Thoth is one of those projects. In a world where AI tools increasingly rely on cloud‑hosted models, opaque data pipelines, and centralized storage, Thoth takes a very different stance:…
Building a Local, Privacy‑First RAG Pipeline with LangChain: From Embeddings to Hybrid Retrieval As part of my broader project to build a completely local, privacy‑first AI assistant, I’ve been exploring how to design a robust Retrieval‑Augmented Generation (RAG) pipeline using LangChain, LangGraph, and local LLMs. My goal is to create a model‑agnostic system that runs…
I have started building a completely local, privacy‑first AI assistant: a multimodal system that combines retrieval‑augmented generation (RAG) and tool calling powered by local LLMs. I chose a model‑agnostic framework—LangChain—to keep the architecture flexible and to make it easy to swap or compare models. My first step was to learn LangChain and LangGraph deeply so…
Supervised Instruction Fine-Tuning on Alpaca, Deployment, and Why 164M Isn’t Enough In Part 7, I pretrained a 164M parameter GPT-style model (SydsGPTv2) on ~12B tokens using a carefully engineered pipeline on a single NVIDIA 3080 Ti. In this final part of the series, I shift from pure pretraining to instruction fine-tuning (SFT). The goals for…
In this part, I scaled a full pretraining pipeline: a ~10B-token corpus, pre-tokenization and chunking for streaming, a Flash Attention replacement inside the GPT blocks, training-loop features (warmup, cosine decay, gradient accumulation), torch.compile for runtime speedups, and GaloreAdamW as the optimizer. I then ran a long single‑GPU pretraining run (~12B tokens over ~11 days on…
In Part 5, I assembled the complete GPT medium model and validated its architecture with forward passes and text generation. In Part 6, I moved into the crucial stage of pretraining. I set out to understand the basics of pretraining by building a complete, reproducible pipeline around a GPT‑2 style model I call SydsGPT. In…
In Part 4, I focused on attention and built reusable modules that mirror transformer internals. In Part 5, I assembled the complete GPT architecture at medium scale, validated shapes and memory, and ran first text generation. The outputs are gibberish because the model is untrained. That is expected. The goal here is to make sure…
In Part 3 of this series, I focused on preparing a dataset for training a language model, combining multiple books into a corpus, tokenizing with tiktoken, and creating PyTorch datasets. With the data pipeline in place, the next step in building a GPT-style model is to understand attention mechanisms. Attention is the core innovation behind…
In Part 1 of this series, I built a simple neural network for classification to get comfortable with the basics of deep learning. In Part 2, I created a MiniTokenizer to understand how raw text is transformed into tokens. Now, in Part 3, I am moving one step closer to building a GPT-style model by…
In Part 1 of this series, I built a simple neural network for binary and multiclass classification to get comfortable with the fundamentals of deep learning. For Part 2, I shifted focus to something equally important in the world of transformers: tokenization. Transformers do not work directly with raw text. They need text to be…