Knowledge Graph and LLM Wiki Architecture

PostedJune 9, 2026

UpdatedJuly 19, 2026

ByPU Publish

0 out of 5 stars

5 Stars		0%
4 Stars		0%
3 Stars		0%
2 Stars		0%
1 Stars		0%

Patriot University’s knowledge system uses a three-tier architecture to organize, connect, and retrieve information across 550+ documents, 980+ accountability profiles, and 13,600+ entity relationships. This document explains how each tier works, when the system chooses one retrieval path over another, and how the knowledge graph stays current.

Why We Build This

The knowledge graph is a public resource — not an internal tool. We build it for two audiences beyond the general reader.

Independent journalists. Accountability reporting requires tracing relationships: who funded whom, who coordinated with whom, who held which office when a decision was made. The graph surface makes those connections queryable without requiring a journalist to re-read every source document from scratch. Every edge in the graph traces to a sourced, evidence-tiered KB document.

Future truth-and-reconciliation processes. Democratic transitions that hold officials accountable for promoting autocracy and authoritarianism need a pre-assembled factual record. That record must be grounded in documented actions — not protected speech, not political positions — so it can withstand adversarial scrutiny. This graph is designed to be that record: evidence-tiered, relationship-mapped, and publicly accessible now, so it exists before any formal accountability process begins.

The governing principle is the same in both cases: accountability attaches to what people did, not what they said.

The Three Tiers

Tier 1: The LLM Wiki (Obsidian Vault)

The foundation is an Obsidian vault implementing Andrej Karpathy’s LLM Wiki pattern — knowledge is compiled at ingestion time so it compounds rather than being re-derived with every query.

The vault has four layers:

Layer	Location	Purpose
Sources	`_sources/`	Raw, immutable input (PDFs, URL captures, articles). Read-only after capture. Never published.
Drafts	`_drafts/`	Pre-approval documents the AI writes during ingestion. Promoted on human review.
Wiki	Top-level category folders	The publishable knowledgebase — cross-linked, frontmatter-tagged, quality-gated.
Control	`_control-center/`	Index and append-only ingest log tracking all additions.

The wiki layer is the authoritative source for all downstream systems. Every document carries structured YAML frontmatter with a unique slug, category assignment, status lifecycle (draft → review → published), and cross-reference links via slug wikilinks.

When a document is published, the pipeline pushes it to WordPress (Echo Knowledge Base CPT) and indexes it in Pinecone for vector retrieval. The content hash in publish-manifest.json ensures only changed documents are re-published.

Tier 2: The Entity Knowledge Graph

Built on top of the wiki, the entity knowledge graph maps relationships between people, organizations, events, and financial connections documented across the knowledge base.

Current statistics (as of June 2026):

Metric	Value
Profiles extracted	1,011
Total nodes	5,599
Total edges	13,634
L0 communities (fine-grained)	27
L1 communities (thematic clusters)	15
Summarized communities	31

The graph is constructed in three stages:

Entity extraction (scripts/extract_entities.py): Claude processes each accountability profile to identify persons, organizations, events, and financial connections. Incremental — only re-extracts documents whose content hash has changed.

Community detection (scripts/build_communities.py): The Leiden algorithm (RBConfigurationVertexPartition) clusters the entity graph into hierarchical communities at two levels — fine-grained clusters (L0) and broader thematic groupings (L1).

Community summarization (scripts/generate_community_summaries.py): Claude generates narrative summaries for each community, identifying key actors, organizations, themes, and the democratic threats each cluster represents. Summaries carry an AI_GENERATED provenance tag.

The resulting community clusters reveal structural patterns such as:

Federal courts and election integrity disputes
Corporate-government influence networks (America 250, 1776 Commission)
Private prison corporate power and detention accountability
Federal communications infrastructure control
State-level voter suppression coordination

Tier 3: The Codebase Knowledge Graph (Graphify)

Graphify is the developer-facing knowledge graph that maps the codebase — scripts, modules, classes, functions, and their relationships. It enables AI coding assistants to navigate the project’s architecture without brute-force file searches.

The Graphify graph for Patriot University is part of a unified global graph spanning 31 ITI projects (126,000+ nodes total). Each project maintains a local graph at graphify-out/graph.json for project-scoped queries.

Graphify commands:

Command	Purpose
`graphify query "question"`	Scoped subgraph retrieval for architecture questions
`graphify path "A" "B"`	Trace the relationship path between two concepts
`graphify explain "Symbol"`	Understand a specific class, function, or module
`graphify . --update`	Rebuild the local graph after code changes (AST-only, no API cost)

Query Routing: How the System Decides What to Use

The hybrid query router (scripts/lib/query_router.py) classifies incoming questions using keyword heuristics and routes them to the appropriate retrieval backend — with no additional LLM call needed for classification.

Routing Modes

Mode	When it fires	Retrieval sources
local	Entity-specific or factual questions (“Who is Stephen Miller?”, “What are voting rules in Georgia?”)	Pinecone vector search — returns top-5 document chunks
global	Thematic or corpus-wide questions (“What patterns of corruption exist?”, “Overview of media capture”)	Community summaries — keyword-matched against cluster labels, themes, and actors
graph	Relationship questions (“How are X and Y connected?”, “What links the Federalist Society to judicial nominations?”)	BFS traversal on entity-graph.json — follows typed edges up to 2 hops
hybrid	Complex questions with both relationship and theme signals	Combines community summaries + graph path traversal

Classification Logic

The router scores each query against two keyword lists:

Graph keywords: connect, relationship, between, link, path, network, influence, overlap
Global keywords: pattern, trend, theme, overview, systemic, structural, landscape, ecosystem

Scoring rules:

Graph score ≥ 2 → graph mode
Global score ≥ 2 → global mode
Graph = 1 + Global ≥ 1 → hybrid mode
Starts with “Who is”, “What did”, “What is”, “Where” → local mode
Long questions (10+ words with ?) → global mode
Default → local mode (Pinecone vector search)

When the Codebase Graph (Graphify) is Used Instead

The Graphify knowledge graph serves a different purpose — it is for development navigation, not end-user queries. The decision gate is simple:

User query about civic content → Entity knowledge graph + Pinecone (handled by query_router.py)
Developer question about code architecture → Graphify (invoked by AI coding agents via cursor rules)
Project has fewer than 30 files → Skip graph entirely, use direct grep/read

Graphify is never part of the user-facing chatbot pipeline. It exists solely to help AI agents navigate the codebase efficiently during development sessions.

Keeping the Knowledge Graph Current

Automated Update Triggers

Trigger	What runs	Frequency
Code file modified in a session	`graphify . --update`	Per-session (AST-only, zero cost)
Accountability profiles published	`make rebuild-graph-full`	After bulk profile additions
Profile content changes	`scripts/extract_entities.py` (incremental)	Content-hash change detection
Community membership changes	`make rebuild-summaries`	After extraction changes cluster composition
Full knowledge graph rebuild	`make rebuild-graph-full`	Chains: extract → communities → summaries
Publish + graph refresh	`make publish-rebuild-graph`	Publish to WP then rebuild entity graph

Per-Profile Ego Graphs

Each accountability profile can have an ego-graph — a subgraph showing its immediate network connections. These are generated and synced to WordPress as structured post metadata:

make ego-graph SLUG= — generate for one profile
make ego-graphs — generate for all profiles
make sync-graphs — push ego-graph data to WordPress

Incremental Design

All graph operations are incremental by default:

extract_entities.py tracks a content_hash per profile and only re-extracts when the source document changes
generate_community_summaries.py tracks a members_hash per community and only regenerates when cluster membership shifts
graphify . --update only re-parses modified source files via AST analysis

This means routine operations (adding a single profile, fixing a typo) trigger only the minimum necessary graph work, while --force flags exist for full rebuilds when needed.

Data Files

File	Contents
`docs/entity-graph.json`	1,011 profile extractions + 13,634 typed edges
`docs/community-graph.json`	42 Leiden communities (27 L0 + 15 L1) with assignments
`docs/community-summaries.json`	31 LLM-generated thematic cluster summaries
`graphify-out/graph.json`	Codebase AST graph (local project)
`~/.graphify/global-graph.json`	Unified graph across 31 ITI projects

Provenance and Transparency

Community summaries carry an AI_GENERATED provenance tag. When the chatbot uses graph-derived context to answer questions, it notes that connections are based on documented evidence in the knowledge base rather than presenting them as independently verified facts. This follows the ITI Inferred Data Transparency rule — all AI-generated or inferred data must carry a visible provenance indicator.

Was this article helpful?

0 out of 5 stars

5 Stars		0%
4 Stars		0%
3 Stars		0%
2 Stars		0%
1 Stars		0%

Topics

People

Convicted and Indicted

Federal Legislators

House

Senate

Political Operatives

Trump Administration Officials

Cabinet & Agency Heads

DOJ & Law Enforcement

Economic & Regulatory

National Security

White House & Advisors

Trump Family and Associates

Judicial Figures

Media Figures

State & Local Officials

Governors & State Executives

State Legislators

Corporate Board Members

About Us

Patriot University Documentation

Patriot University Team

Editorial & Research

Leadership

Legal Specialists

OSINT & Investigations

Product & Engineering

Learning Paths

Learning Paths — By Role

Learning Paths — By Use Case

Organizations

Defense

Energy

Finance

Healthcare

Media

Other

Technology

Voting & Elections

Election Threats & Integrity

Felony Re-enfranchisement

Mail & Absentee Voting

State Voting Defense

State Voting Guides

Voter ID & Access

Voter Registration

Know Your Rights

Civil Rights

Constitution & Amendments

Freedom of Speech

Immigration Rights

Legal Frameworks

Protest & Freedom of Assembly

Investigative Toolkit

Tool Catalog

Playbooks

Workflows

AI Skills

Corporate & Financial Intelligence

Editorial & Assessment

Legal & Rights

Media & Document Verification

Network & Data Analysis

OSINT & Identity

Investigative Tools

Speech Analysis

Civic Action & Democracy

Democratic Health

Nonviolent Action

Civic Engagement

Truth and Reconciliation

Press & Media

Information Sources

Press Freedom

Context & Backgrounders

Backgrounders

Contextual Analysis

Jan 6

Reference Docs