Every once in a while, a project comes along that feels like it should already exist, something obvious in hindsight, yet strangely absent from the landscape. Thoth is one of those projects.

In a world where AI tools increasingly rely on cloudâhosted models, opaque data pipelines, and centralized storage, Thoth takes a very different stance: your knowledge, your machine, your control. Itâs a localâfirst, privacyâfocused knowledge agent that blends RetrievalâAugmented Generation (RAG), multiâsource search, and a conversational interface. All powered by a locally running LLM via Ollama.
If youâve ever wanted your own personal ChatGPT style assistant that can read your documents, cite its sources, and run entirely on your hardware, Thoth is exactly that.
And itâs just getting started.
Why Thoth?
In Egyptian mythology, Thoth was the god of knowledge, writing, and truth: the divine scribe who recorded everything worth remembering. Naming a private knowledge agent after him feels almost inevitable. Thoth is built to gather, organize, and retrieve knowledge faithfully, without leaking your data to the cloud or relying on external storage.
Itâs a tool for people who want the power of modern LLMs without sacrificing privacy or control.
What Thoth Can Do Today
Thoth is already a surprisingly capable system. It combines a clean Streamlit interface with a robust LangGraphâpowered RAG pipeline under the hood.
Conversational Intelligence
- Multiâturn chat with full history
- Persistent conversation threads stored locally
- Autoânaming of threads
- Seamless switching between conversations
- Thread deletion when you want a clean slate
Smart Context Retrieval
Thoth doesnât just dump documents into a vector store and hope for the best. It uses an LLMâdriven decision node to determine whether new context is needed, then retrieves information from four parallel sources:
| Source | Purpose |
| Uploaded Documents | FAISS vector search over your indexed files |
| Wikipedia | Realâtime article retrieval |
| Arxiv | Academic paper search |
| Web Search | Live results via Tavily |
Retrieved content is compressed intelligently, keeping only whatâs relevant, preserving citations, and accumulating context across turns.
Document Management
- Upload PDFs, DOCX, DOC, and TXT
- Automatic chunking with recursive splitting
- Embedding via Qwen/Qwen3âEmbeddingâ0.6B
- Persistent FAISS vector store
- Duplicate detection
- Oneâclick âclear allâ reset
Cited Answers
Every answer includes explicit citations:
(Source: document.pdf)(Source: https://en.wikipedia.org/...)(Source: https://arxiv.org/abs/...)(Source: https://...)(Source: Internal Knowledge)
This makes Thoth feel less like a black box and more like a trustworthy research assistant.
GitHub Repo
The full code with a detailed Readme.md, is available here:
Thoth-v1
Under the Hood: How Thoth Actually Works
Thoth isnât just a UI wrapped around a vector store. Itâs a carefully orchestrated system built on LangGraph, Ollama, FAISS, and a set of retrieval backends that work together to produce grounded, cited answers. The architecture is intentionally modular so contributors can extend or swap components without rewriting the entire pipeline.
A LangGraphâDriven State Machine
At the core is a LangGraph StateGraph, which gives Thoth deterministic, inspectable control flow â something traditional RAG chains often lack.
The graph has three nodes:
1. needs_context: LLMâbased Retrieval Decisioning
Instead of blindly retrieving on every query, Thoth asks the LLM:
âGiven the accumulated context so far, do we need to fetch new information to answer this question?â
This is a lightweight classification step that returns "Yes" or "No".
It prevents unnecessary retrieval calls, reduces latency, and keeps the context window clean.
2. get_context: Parallel MultiâSource Retrieval
If retrieval is needed, Thoth fans out to four backends in parallel:
- FAISS vector store for uploaded documents
- Wikipedia API
- Arxiv API
- Tavily Web Search API
Each backend returns raw text + metadata.
These results are then passed through a context compression LLM step, which:
- extracts only the relevant spans
- preserves citations
- normalizes formats
- removes redundancy
- appends the compressed context to the session state
This âaccumulated contextâ grows over the conversation, enabling multiâturn reasoning grounded in prior retrievals.
3. generate_answer: Final Prompt Assembly
The final answer is produced by combining:
- a system prompt
- the full accumulated context
- the userâs latest question
The LLM (via Ollama) generates a cited answer, ensuring every claim is traceable.
Local LLM Execution via Ollama
Thoth uses Ollama to run the chat model locally.
The default is: qwen3-vl:8, but the model is configurable in models.py.
Ollama provides:
- GPU or CPU execution
- model caching
- streaming responses
- simple model switching
This keeps everything private and offline.
Document Ingestion & Vectorization
Uploaded documents flow through documents.py, which handles:
- File loading (PDF, DOCX, DOC, TXT)
- Text extraction using PyPDF, Unstructured, or TextLoader
- RecursiveCharacterTextSplitter with:
- chunk size: 4000 chars
- overlap: 200 chars
- Embedding via
Qwen/Qwen3-Embedding-0.6B - FAISS indexing with persistent storage
- Duplicate detection using a processedâfiles manifest
This pipeline is optimized for large documents and multiâfile ingestion.
Thread Persistence with SQLite
Conversation threads are stored in a local SQLite database:
- thread metadata
- autoâgenerated thread names
- timestamps
- LangGraph checkpoints
The checkpointer ensures that conversation state survives restarts, enabling longârunning research sessions.
Streamlit Frontend
The UI is intentionally simple:
- Left panel: thread list + creation/deletion
- Center: chat interface
- Right panel: document upload & management
Streamlit handles session state, while all heavy lifting happens in the backend.
Why This Matters
Thoth sits at the intersection of three important trends:
1. LocalâFirst AI
People want AI that runs on their hardware, not someone elseâs servers.
2. Private Knowledge Management
Your documents, notes, and research shouldnât be uploaded to a cloud model just to get useful answers.
3. Agentic Workflows
RAG is evolving into something more dynamic: systems that can reason about when to retrieve, how to compress, and how to build context over time.
Thoth embraces all three.
Try It Out â And Help Shape Its Future
The project is openâsource and available on GitHub. If youâre interested in:
- local AI
- RAG pipelines
- LangGraph
- privacyâfirst assistants
- or just tinkering with cuttingâedge tooling
âŚyouâll feel right at home.
The full code with a detailed Readme.md, is available here:
Thoth-v1
Clone it, run it locally, upload some documents, and start chatting with your own private knowledge agent.
And if you find bugs, have ideas, or want to contribute, PRs and discussions are very welcome. This is the kind of project that grows best with community input.
Whatâs Coming Next
Thoth already works well, but the roadmap is ambitious and exciting.
1. vLLM Support
Adding vLLM will unlock:
- dramatically faster inference
- better throughput
- support for larger models
- more efficient batching
This will make Thoth feel snappier and more scalable on local hardware.
2. ToolâDriven Retrieval
Instead of manually orchestrated retrieval, Thoth will evolve toward:
- LLMâinvoked tools
- dynamic retrieval strategies
- more agentic behavior
This aligns with the broader shift toward toolâusing LLMs.
3. Becoming a Full Private Personal Assistant
The longâterm vision is clear:
- calendar access
- local file search
- task management
- email drafting
- voice input
- multimodal reasoning
All running locally, all private, all under your control.
Thoth is the foundation for a truly personal AI. Not a cloud service, not a subscription, but a companion that lives on your machine.
Final Thoughts
Thoth is more than a RAG demo. Itâs a statement about where AI should be heading: toward systems that empower individuals, respect privacy, and run locally without compromise.
If that vision resonates with you, jump in. Explore the code. Open issues. Suggest features. Build with it. Break it. Improve it.
Projects like this thrive when curious people get involved.
And Thoth, the god of knowledge, would approve.

Leave a Reply