Knowledge Graph and LLM Wiki Architecture
Patriot University Documentation

Knowledge Graph and LLM Wiki Architecture

Skip to main content
< All Topics
Print

Knowledge Graph and LLM Wiki Architecture

Patriot University’s knowledge system uses a three-tier architecture to organize, connect, and retrieve information across 550+ documents, 980+ accountability profiles, and 13,600+ entity relationships. This document explains how each tier works, when the system chooses one retrieval path over another, and how the knowledge graph stays current.

The Three Tiers

Tier 1: The LLM Wiki (Obsidian Vault)

The foundation is an Obsidian vault implementing Andrej Karpathy’s LLM Wiki pattern — knowledge is compiled at ingestion time so it compounds rather than being re-derived with every query.

The vault has four layers:

Layer Location Purpose
Sources _sources/ Raw, immutable input (PDFs, URL captures, articles). Read-only after capture. Never published.
Drafts _drafts/ Pre-approval documents the AI writes during ingestion. Promoted on human review.
Wiki Top-level category folders The publishable knowledgebase — cross-linked, frontmatter-tagged, quality-gated.
Control _control-center/ Index and append-only ingest log tracking all additions.

The wiki layer is the authoritative source for all downstream systems. Every document carries structured YAML frontmatter with a unique slug, category assignment, status lifecycle (draft → review → published), and cross-reference links via slug wikilinks.

When a document is published, the pipeline pushes it to WordPress (Echo Knowledge Base CPT) and indexes it in Pinecone for vector retrieval. The content hash in publish-manifest.json ensures only changed documents are re-published.

Tier 2: The Entity Knowledge Graph

Built on top of the wiki, the entity knowledge graph maps relationships between people, organizations, events, and financial connections documented across the knowledge base.

Current statistics (as of June 2026):

Metric Value
Profiles extracted 1,011
Total nodes 5,599
Total edges 13,634
L0 communities (fine-grained) 27
L1 communities (thematic clusters) 15
Summarized communities 31

The graph is constructed in three stages:

  1. Entity extraction (scripts/extract_entities.py): Claude processes each accountability profile to identify persons, organizations, events, and financial connections. Incremental — only re-extracts documents whose content hash has changed.
  1. Community detection (scripts/build_communities.py): The Leiden algorithm (RBConfigurationVertexPartition) clusters the entity graph into hierarchical communities at two levels — fine-grained clusters (L0) and broader thematic groupings (L1).
  1. Community summarization (scripts/generate_community_summaries.py): Claude generates narrative summaries for each community, identifying key actors, organizations, themes, and the democratic threats each cluster represents. Summaries carry an AI_GENERATED provenance tag.

The resulting community clusters reveal structural patterns such as:

  • Federal courts and election integrity disputes
  • Corporate-government influence networks (America 250, 1776 Commission)
  • Private prison corporate power and detention accountability
  • Federal communications infrastructure control
  • State-level voter suppression coordination

Tier 3: The Codebase Knowledge Graph (Graphify)

Graphify is the developer-facing knowledge graph that maps the codebase — scripts, modules, classes, functions, and their relationships. It enables AI coding assistants to navigate the project’s architecture without brute-force file searches.

The Graphify graph for Patriot University is part of a unified global graph spanning 31 ITI projects (126,000+ nodes total). Each project maintains a local graph at graphify-out/graph.json for project-scoped queries.

Graphify commands:

Command Purpose
graphify query "question" Scoped subgraph retrieval for architecture questions
graphify path "A" "B" Trace the relationship path between two concepts
graphify explain "Symbol" Understand a specific class, function, or module
graphify . --update Rebuild the local graph after code changes (AST-only, no API cost)

Query Routing: How the System Decides What to Use

The hybrid query router (scripts/lib/query_router.py) classifies incoming questions using keyword heuristics and routes them to the appropriate retrieval backend — with no additional LLM call needed for classification.

Routing Modes

Mode When it fires Retrieval sources
local Entity-specific or factual questions (“Who is Stephen Miller?”, “What are voting rules in Georgia?”) Pinecone vector search — returns top-5 document chunks
global Thematic or corpus-wide questions (“What patterns of corruption exist?”, “Overview of media capture”) Community summaries — keyword-matched against cluster labels, themes, and actors
graph Relationship questions (“How are X and Y connected?”, “What links the Federalist Society to judicial nominations?”) BFS traversal on entity-graph.json — follows typed edges up to 2 hops
hybrid Complex questions with both relationship and theme signals Combines community summaries + graph path traversal

Classification Logic

The router scores each query against two keyword lists:

  • Graph keywords: connect, relationship, between, link, path, network, influence, overlap
  • Global keywords: pattern, trend, theme, overview, systemic, structural, landscape, ecosystem

Scoring rules:

  • Graph score ≥ 2 → graph mode
  • Global score ≥ 2 → global mode
  • Graph = 1 + Global ≥ 1 → hybrid mode
  • Starts with “Who is”, “What did”, “What is”, “Where” → local mode
  • Long questions (10+ words with ?) → global mode
  • Default → local mode (Pinecone vector search)

When the Codebase Graph (Graphify) is Used Instead

The Graphify knowledge graph serves a different purpose — it is for development navigation, not end-user queries. The decision gate is simple:

  • User query about civic content → Entity knowledge graph + Pinecone (handled by query_router.py)
  • Developer question about code architecture → Graphify (invoked by AI coding agents via cursor rules)
  • Project has fewer than 30 files → Skip graph entirely, use direct grep/read

Graphify is never part of the user-facing chatbot pipeline. It exists solely to help AI agents navigate the codebase efficiently during development sessions.

Keeping the Knowledge Graph Current

Automated Update Triggers

Trigger What runs Frequency
Code file modified in a session graphify . --update Per-session (AST-only, zero cost)
Accountability profiles published make rebuild-graph-full After bulk profile additions
Profile content changes scripts/extract_entities.py (incremental) Content-hash change detection
Community membership changes make rebuild-summaries After extraction changes cluster composition
Full knowledge graph rebuild make rebuild-graph-full Chains: extract → communities → summaries
Publish + graph refresh make publish-rebuild-graph Publish to WP then rebuild entity graph

Per-Profile Ego Graphs

Each accountability profile can have an ego-graph — a subgraph showing its immediate network connections. These are generated and synced to WordPress as structured post metadata:

  • make ego-graph SLUG= — generate for one profile
  • make ego-graphs — generate for all profiles
  • make sync-graphs — push ego-graph data to WordPress

Incremental Design

All graph operations are incremental by default:

  • extract_entities.py tracks a content_hash per profile and only re-extracts when the source document changes
  • generate_community_summaries.py tracks a members_hash per community and only regenerates when cluster membership shifts
  • graphify . --update only re-parses modified source files via AST analysis

This means routine operations (adding a single profile, fixing a typo) trigger only the minimum necessary graph work, while --force flags exist for full rebuilds when needed.

Data Files

File Contents
docs/entity-graph.json 1,011 profile extractions + 13,634 typed edges
docs/community-graph.json 42 Leiden communities (27 L0 + 15 L1) with assignments
docs/community-summaries.json 31 LLM-generated thematic cluster summaries
graphify-out/graph.json Codebase AST graph (local project)
~/.graphify/global-graph.json Unified graph across 31 ITI projects

Provenance and Transparency

Community summaries carry an AI_GENERATED provenance tag. When the chatbot uses graph-derived context to answer questions, it notes that connections are based on documented evidence in the knowledge base rather than presenting them as independently verified facts. This follows the ITI Inferred Data Transparency rule — all AI-generated or inferred data must carry a visible provenance indicator.

Was this article helpful?
0 out of 5 stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
Please Share Your Feedback
How Can We Improve This Article?
Table of Contents