𓁟 Thoth: A Private, Local‑First Knowledge Agent for the AI‑Native Era

Every once in a while, a project comes along that feels like it should already exist, something obvious in hindsight, yet strangely absent from the landscape. Thoth is one of those projects.

In a world where AI tools increasingly rely on cloud‑hosted models, opaque data pipelines, and centralized storage, Thoth takes a very different stance: your knowledge, your machine, your control. It’s a local‑first, privacy‑focused knowledge agent that blends Retrieval‑Augmented Generation (RAG), multi‑source search, and a conversational interface. All powered by a locally running LLM via Ollama.

If you’ve ever wanted your own personal ChatGPT style assistant that can read your documents, cite its sources, and run entirely on your hardware, Thoth is exactly that.

And it’s just getting started.

Why Thoth?

In Egyptian mythology, Thoth was the god of knowledge, writing, and truth: the divine scribe who recorded everything worth remembering. Naming a private knowledge agent after him feels almost inevitable. Thoth is built to gather, organize, and retrieve knowledge faithfully, without leaking your data to the cloud or relying on external storage.

It’s a tool for people who want the power of modern LLMs without sacrificing privacy or control.

What Thoth Can Do Today

Thoth is already a surprisingly capable system. It combines a clean Streamlit interface with a robust LangGraph‑powered RAG pipeline under the hood.

Conversational Intelligence

Multi‑turn chat with full history
Persistent conversation threads stored locally
Auto‑naming of threads
Seamless switching between conversations
Thread deletion when you want a clean slate

Smart Context Retrieval

Thoth doesn’t just dump documents into a vector store and hope for the best. It uses an LLM‑driven decision node to determine whether new context is needed, then retrieves information from four parallel sources:

Source	Purpose
Uploaded Documents	FAISS vector search over your indexed files
Wikipedia	Real‑time article retrieval
Arxiv	Academic paper search
Web Search	Live results via Tavily

Retrieved content is compressed intelligently, keeping only what’s relevant, preserving citations, and accumulating context across turns.

Document Management

Upload PDFs, DOCX, DOC, and TXT
Automatic chunking with recursive splitting
Embedding via Qwen/Qwen3‑Embedding‑0.6B
Persistent FAISS vector store
Duplicate detection
One‑click “clear all” reset

Cited Answers

Every answer includes explicit citations:

(Source: document.pdf)
(Source: https://en.wikipedia.org/...)
(Source: https://arxiv.org/abs/...)
(Source: https://...)
(Source: Internal Knowledge)

This makes Thoth feel less like a black box and more like a trustworthy research assistant.

GitHub Repo

The full code with a detailed Readme.md, is available here:

Thoth-v1

Under the Hood: How Thoth Actually Works

Thoth isn’t just a UI wrapped around a vector store. It’s a carefully orchestrated system built on LangGraph, Ollama, FAISS, and a set of retrieval backends that work together to produce grounded, cited answers. The architecture is intentionally modular so contributors can extend or swap components without rewriting the entire pipeline.

A LangGraph‑Driven State Machine

At the core is a LangGraph StateGraph, which gives Thoth deterministic, inspectable control flow — something traditional RAG chains often lack.

The graph has three nodes:

1. `needs_context`: LLM‑based Retrieval Decisioning

Instead of blindly retrieving on every query, Thoth asks the LLM:

“Given the accumulated context so far, do we need to fetch new information to answer this question?”

This is a lightweight classification step that returns "Yes" or "No".
It prevents unnecessary retrieval calls, reduces latency, and keeps the context window clean.

2. `get_context`: Parallel Multi‑Source Retrieval

If retrieval is needed, Thoth fans out to four backends in parallel:

FAISS vector store for uploaded documents
Wikipedia API
Arxiv API
Tavily Web Search API

Each backend returns raw text + metadata.
These results are then passed through a context compression LLM step, which:

extracts only the relevant spans
preserves citations
normalizes formats
removes redundancy
appends the compressed context to the session state

This “accumulated context” grows over the conversation, enabling multi‑turn reasoning grounded in prior retrievals.

3. `generate_answer`: Final Prompt Assembly

The final answer is produced by combining:

a system prompt
the full accumulated context
the user’s latest question

The LLM (via Ollama) generates a cited answer, ensuring every claim is traceable.

Local LLM Execution via Ollama

Thoth uses Ollama to run the chat model locally.
The default is: qwen3-vl:8, but the model is configurable in models.py.

Ollama provides:

GPU or CPU execution
model caching
streaming responses
simple model switching

This keeps everything private and offline.

Document Ingestion & Vectorization

Uploaded documents flow through documents.py, which handles:

File loading (PDF, DOCX, DOC, TXT)
Text extraction using PyPDF, Unstructured, or TextLoader
RecursiveCharacterTextSplitter with:
- chunk size: 4000 chars
- overlap: 200 chars
Embedding via Qwen/Qwen3-Embedding-0.6B
FAISS indexing with persistent storage
Duplicate detection using a processed‑files manifest

This pipeline is optimized for large documents and multi‑file ingestion.

Thread Persistence with SQLite

Conversation threads are stored in a local SQLite database:

thread metadata
auto‑generated thread names
timestamps
LangGraph checkpoints

The checkpointer ensures that conversation state survives restarts, enabling long‑running research sessions.

Streamlit Frontend

The UI is intentionally simple:

Left panel: thread list + creation/deletion
Center: chat interface
Right panel: document upload & management

Streamlit handles session state, while all heavy lifting happens in the backend.

Why This Matters

Thoth sits at the intersection of three important trends:

1. Local‑First AI

People want AI that runs on their hardware, not someone else’s servers.

2. Private Knowledge Management

Your documents, notes, and research shouldn’t be uploaded to a cloud model just to get useful answers.

3. Agentic Workflows

RAG is evolving into something more dynamic: systems that can reason about when to retrieve, how to compress, and how to build context over time.

Thoth embraces all three.

Try It Out — And Help Shape Its Future

The project is open‑source and available on GitHub. If you’re interested in:

local AI
RAG pipelines
LangGraph
privacy‑first assistants
or just tinkering with cutting‑edge tooling

…you’ll feel right at home.

The full code with a detailed Readme.md, is available here:

Thoth-v1

Clone it, run it locally, upload some documents, and start chatting with your own private knowledge agent.

And if you find bugs, have ideas, or want to contribute, PRs and discussions are very welcome. This is the kind of project that grows best with community input.

What’s Coming Next

Thoth already works well, but the roadmap is ambitious and exciting.

1. vLLM Support

Adding vLLM will unlock:

dramatically faster inference
better throughput
support for larger models
more efficient batching

This will make Thoth feel snappier and more scalable on local hardware.

2. Tool‑Driven Retrieval

Instead of manually orchestrated retrieval, Thoth will evolve toward:

LLM‑invoked tools
dynamic retrieval strategies
more agentic behavior

This aligns with the broader shift toward tool‑using LLMs.

3. Becoming a Full Private Personal Assistant

The long‑term vision is clear:

calendar access
local file search
task management
email drafting
voice input
multimodal reasoning

All running locally, all private, all under your control.

Thoth is the foundation for a truly personal AI. Not a cloud service, not a subscription, but a companion that lives on your machine.

Final Thoughts

Thoth is more than a RAG demo. It’s a statement about where AI should be heading: toward systems that empower individuals, respect privacy, and run locally without compromise.

If that vision resonates with you, jump in. Explore the code. Open issues. Suggest features. Build with it. Break it. Improve it.

Projects like this thrive when curious people get involved.

And Thoth, the god of knowledge, would approve.

AI & Cloud by Syd

𓁟 Thoth: A Private, Local‑First Knowledge Agent for the AI‑Native Era

Why Thoth?

What Thoth Can Do Today

Conversational Intelligence

Smart Context Retrieval

Document Management

Cited Answers

GitHub Repo

Thoth-v1

Under the Hood: How Thoth Actually Works

A LangGraph‑Driven State Machine

1. `needs_context`: LLM‑based Retrieval Decisioning

2. `get_context`: Parallel Multi‑Source Retrieval

3. `generate_answer`: Final Prompt Assembly

Local LLM Execution via Ollama

Ollama provides:

Document Ingestion & Vectorization

Thread Persistence with SQLite

Streamlit Frontend

Why This Matters

1. Local‑First AI

2. Private Knowledge Management

3. Agentic Workflows

Try It Out — And Help Shape Its Future

Thoth-v1

What’s Coming Next

1. vLLM Support

2. Tool‑Driven Retrieval

3. Becoming a Full Private Personal Assistant

Final Thoughts

Leave a Reply Cancel reply

𓁟 Thoth: A Private, Local‑First Knowledge Agent for the AI‑Native Era

Why Thoth?

What Thoth Can Do Today

Conversational Intelligence

Smart Context Retrieval

Document Management

Cited Answers

GitHub Repo

Thoth-v1

Under the Hood: How Thoth Actually Works

A LangGraph‑Driven State Machine

1. needs_context: LLM‑based Retrieval Decisioning

2. get_context: Parallel Multi‑Source Retrieval

3. generate_answer: Final Prompt Assembly

Local LLM Execution via Ollama

Ollama provides:

Document Ingestion & Vectorization

Thread Persistence with SQLite

Streamlit Frontend

Why This Matters

1. Local‑First AI

2. Private Knowledge Management

3. Agentic Workflows

Try It Out — And Help Shape Its Future

Thoth-v1

What’s Coming Next

1. vLLM Support

2. Tool‑Driven Retrieval

3. Becoming a Full Private Personal Assistant

Final Thoughts

Leave a Reply Cancel reply

1. `needs_context`: LLM‑based Retrieval Decisioning

2. `get_context`: Parallel Multi‑Source Retrieval

3. `generate_answer`: Final Prompt Assembly