Every once in a while, a project comes along that feels like it should already exist, something obvious in hindsight, yet strangely absent from the landscape. Thoth is one of those projects.

In a world where AI tools increasingly rely on cloud‑hosted models, opaque data pipelines, and centralized storage, Thoth takes a very different stance: your knowledge, your machine, your control. It’s a local‑first, privacy‑focused knowledge agent that blends Retrieval‑Augmented Generation (RAG), multi‑source search, and a conversational interface. All powered by a locally running LLM via Ollama.

If you’ve ever wanted your own personal ChatGPT style assistant that can read your documents, cite its sources, and run entirely on your hardware, Thoth is exactly that.

And it’s just getting started.

Why Thoth?

In Egyptian mythology, Thoth was the god of knowledge, writing, and truth: the divine scribe who recorded everything worth remembering. Naming a private knowledge agent after him feels almost inevitable. Thoth is built to gather, organize, and retrieve knowledge faithfully, without leaking your data to the cloud or relying on external storage.

It’s a tool for people who want the power of modern LLMs without sacrificing privacy or control.

What Thoth Can Do Today

Thoth is already a surprisingly capable system. It combines a clean Streamlit interface with a robust LangGraph‑powered RAG pipeline under the hood.

Conversational Intelligence
  • Multi‑turn chat with full history
  • Persistent conversation threads stored locally
  • Auto‑naming of threads
  • Seamless switching between conversations
  • Thread deletion when you want a clean slate
Smart Context Retrieval

Thoth doesn’t just dump documents into a vector store and hope for the best. It uses an LLM‑driven decision node to determine whether new context is needed, then retrieves information from four parallel sources:

SourcePurpose
Uploaded DocumentsFAISS vector search over your indexed files
WikipediaReal‑time article retrieval
ArxivAcademic paper search
Web SearchLive results via Tavily

Retrieved content is compressed intelligently, keeping only what’s relevant, preserving citations, and accumulating context across turns.

Document Management
  • Upload PDFs, DOCX, DOC, and TXT
  • Automatic chunking with recursive splitting
  • Embedding via Qwen/Qwen3‑Embedding‑0.6B
  • Persistent FAISS vector store
  • Duplicate detection
  • One‑click “clear all” reset
Cited Answers

Every answer includes explicit citations:

  • (Source: document.pdf)
  • (Source: https://en.wikipedia.org/...)
  • (Source: https://arxiv.org/abs/...)
  • (Source: https://...)
  • (Source: Internal Knowledge)

This makes Thoth feel less like a black box and more like a trustworthy research assistant.

GitHub Repo

The full code with a detailed Readme.md, is available here:

Thoth-v1

Under the Hood: How Thoth Actually Works

Thoth isn’t just a UI wrapped around a vector store. It’s a carefully orchestrated system built on LangGraph, Ollama, FAISS, and a set of retrieval backends that work together to produce grounded, cited answers. The architecture is intentionally modular so contributors can extend or swap components without rewriting the entire pipeline.

A LangGraph‑Driven State Machine

At the core is a LangGraph StateGraph, which gives Thoth deterministic, inspectable control flow — something traditional RAG chains often lack.

The graph has three nodes:

1. needs_context: LLM‑based Retrieval Decisioning

Instead of blindly retrieving on every query, Thoth asks the LLM:

“Given the accumulated context so far, do we need to fetch new information to answer this question?”

This is a lightweight classification step that returns "Yes" or "No".
It prevents unnecessary retrieval calls, reduces latency, and keeps the context window clean.

2. get_context: Parallel Multi‑Source Retrieval

If retrieval is needed, Thoth fans out to four backends in parallel:

  • FAISS vector store for uploaded documents
  • Wikipedia API
  • Arxiv API
  • Tavily Web Search API

Each backend returns raw text + metadata.
These results are then passed through a context compression LLM step, which:

  • extracts only the relevant spans
  • preserves citations
  • normalizes formats
  • removes redundancy
  • appends the compressed context to the session state

This “accumulated context” grows over the conversation, enabling multi‑turn reasoning grounded in prior retrievals.

3. generate_answer: Final Prompt Assembly

The final answer is produced by combining:

  • a system prompt
  • the full accumulated context
  • the user’s latest question

The LLM (via Ollama) generates a cited answer, ensuring every claim is traceable.

Local LLM Execution via Ollama

Thoth uses Ollama to run the chat model locally.
The default is: qwen3-vl:8, but the model is configurable in models.py.

Ollama provides:
  • GPU or CPU execution
  • model caching
  • streaming responses
  • simple model switching

This keeps everything private and offline.

Document Ingestion & Vectorization

Uploaded documents flow through documents.py, which handles:

  • File loading (PDF, DOCX, DOC, TXT)
  • Text extraction using PyPDF, Unstructured, or TextLoader
  • RecursiveCharacterTextSplitter with:
    • chunk size: 4000 chars
    • overlap: 200 chars
  • Embedding via Qwen/Qwen3-Embedding-0.6B
  • FAISS indexing with persistent storage
  • Duplicate detection using a processed‑files manifest

This pipeline is optimized for large documents and multi‑file ingestion.

Thread Persistence with SQLite

Conversation threads are stored in a local SQLite database:

  • thread metadata
  • auto‑generated thread names
  • timestamps
  • LangGraph checkpoints

The checkpointer ensures that conversation state survives restarts, enabling long‑running research sessions.

Streamlit Frontend

The UI is intentionally simple:

  • Left panel: thread list + creation/deletion
  • Center: chat interface
  • Right panel: document upload & management

Streamlit handles session state, while all heavy lifting happens in the backend.

Why This Matters

Thoth sits at the intersection of three important trends:

1. Local‑First AI

People want AI that runs on their hardware, not someone else’s servers.

2. Private Knowledge Management

Your documents, notes, and research shouldn’t be uploaded to a cloud model just to get useful answers.

3. Agentic Workflows

RAG is evolving into something more dynamic: systems that can reason about when to retrieve, how to compress, and how to build context over time.

Thoth embraces all three.

Try It Out — And Help Shape Its Future

The project is open‑source and available on GitHub. If you’re interested in:

  • local AI
  • RAG pipelines
  • LangGraph
  • privacy‑first assistants
  • or just tinkering with cutting‑edge tooling

…you’ll feel right at home.

The full code with a detailed Readme.md, is available here:

Thoth-v1

Clone it, run it locally, upload some documents, and start chatting with your own private knowledge agent.

And if you find bugs, have ideas, or want to contribute, PRs and discussions are very welcome. This is the kind of project that grows best with community input.

What’s Coming Next

Thoth already works well, but the roadmap is ambitious and exciting.

1. vLLM Support

Adding vLLM will unlock:

  • dramatically faster inference
  • better throughput
  • support for larger models
  • more efficient batching

This will make Thoth feel snappier and more scalable on local hardware.

2. Tool‑Driven Retrieval

Instead of manually orchestrated retrieval, Thoth will evolve toward:

  • LLM‑invoked tools
  • dynamic retrieval strategies
  • more agentic behavior

This aligns with the broader shift toward tool‑using LLMs.

3. Becoming a Full Private Personal Assistant

The long‑term vision is clear:

  • calendar access
  • local file search
  • task management
  • email drafting
  • voice input
  • multimodal reasoning

All running locally, all private, all under your control.

Thoth is the foundation for a truly personal AI. Not a cloud service, not a subscription, but a companion that lives on your machine.

Final Thoughts

Thoth is more than a RAG demo. It’s a statement about where AI should be heading: toward systems that empower individuals, respect privacy, and run locally without compromise.

If that vision resonates with you, jump in. Explore the code. Open issues. Suggest features. Build with it. Break it. Improve it.

Projects like this thrive when curious people get involved.

And Thoth, the god of knowledge, would approve.


Leave a Reply

Your email address will not be published. Required fields are marked *