Analytical Framework — Claim Tracking & Truth-Density
Speech Analysis

Analytical Framework — Claim Tracking & Truth-Density

Skip to main content
< All Topics
Print

Analytical Framework — Claim Tracking & Truth-Density

Purpose

Extract, classify, and track factual claims in the speech corpus to:

  1. Build a database of verifiable claims with truth-status from authoritative fact-checkers
  1. Track claim density (claims per minute, claims per 1,000 words)
  1. Track repetition of false claims (how often does each false claim recur?)
  1. Identify novel claims appearing in the corpus
  1. Compare truth-density across speakers and over time
  1. Provide source-cited evidence base for civic-education and TRC-documentation use

Claim Types

Following the fact-checking literature, classify claims by:

Verifiability

  • Verifiable factual claims (numerical, historical, attributable, specific): “Crime is up X%”, “I won by Y votes”
  • Causal claims (more difficult to verify but possible): “X caused Y”
  • Counterfactuals: “If X had happened, Y would have happened”
  • Predictions / future claims: “In two months, X will happen” (verify after)
  • Subjective evaluations: “X is the worst Y” (often not verifiable in strict sense)
  • Misattributions: “X said Y” (verify quote attribution)
  • Rhetorical questions / non-claims: Often not coded as claims

Domain

  • Economic (jobs, GDP, inflation, deficit, taxes, trade)
  • Crime and immigration
  • Foreign policy and military
  • Election integrity
  • Personal credentials and biography
  • Opponents’ statements and actions
  • Scientific or medical
  • Other

Truth-Status Coding

Use a calibrated scale aligned with major fact-checking organizations:

Code Label Description
true True Fully accurate
mostly_true Mostly true Accurate with minor missing context
half_true Half true Partially accurate with significant context missing
mostly_false Mostly false Inaccurate with some accurate elements
false False Inaccurate
pants_on_fire Pants on Fire Inaccurate and absurd
unverifiable Unverifiable Cannot be verified
out_of_context Out of context Selectively-presented true statement
predicted Prediction Future-tense; verify when due
subjective Subjective Not amenable to fact-check

For each coded claim, record:

  • Source fact-check organization (PolitiFact, FactCheck.org, Washington Post Fact Checker, AP Fact Check, Snopes)
  • URL of fact-check article
  • Date of fact-check
  • Excerpt of fact-check finding

Claim Extraction Approach

Manual Curation

For high-stakes published analysis:

  1. Editor reads speech transcript
  1. Identifies candidate claims using verifiability criteria
  1. Searches fact-check databases for prior coding
  1. If novel, flags for new fact-check
  1. Records claim + truth-status + source

Semi-Automated Extraction

For high-volume tracking:

  1. Use LLM-assisted claim extraction: Prompt Claude with clear coding instructions to identify candidate factual claims from a transcript
  1. Human review: All LLM-extracted claims reviewed by editor before publication
  1. Match to existing fact-checks: Use semantic similarity (embedding-based) to match claims to existing fact-checks; flag near-matches for review
  1. Track novel claims: Claims with no near-match get flagged for manual fact-check

Database Schema



{
  "claim_id": "...",
  "speech_id": "...",
  "speaker": "Donald J. Trump",
  "date": "2025-XX-XX",
  "text": "[direct quote of the claim]",
  "context": "[surrounding context]",
  "domain": "election_integrity",
  "type": "verifiable_factual_claim",
  "truth_status": "false",
  "fact_check_source": "PolitiFact",
  "fact_check_url": "...",
  "fact_check_excerpt": "[summary of fact-check finding]",
  "fact_check_date": "2025-XX-XX",
  "first_appearance_in_corpus": "2020-11-04",
  "repetition_count": 47,
  "audiences_targeted": ["rally", "interview", "social_post"]
}

Repetition Tracking

For each false claim, track:

  • First appearance: When did it first appear in the corpus?
  • Total repetition count: How many times has it appeared?
  • Repetition over time: Time-series of appearances
  • Audience pattern: Which audiences receive it?
  • Variant patterns: Does the claim morph over time?

This produces a “false-claim repetition database” useful for:

  • Editorial work: “How often has Trump repeated this specific false claim?”
  • Civic education: Public-facing tools showing which false claims persist
  • TRC documentation: Evidence base for any future TRC documentation work
  • Legal matters: When a false claim is at issue in litigation, the repetition record may be relevant evidence

Truth-Density Metric

For each speech (or speech segment), compute:

  • Claim count: Total verifiable factual claims
  • Falsity count: Claims coded as false or pants_on_fire
  • Truth-density: True+mostly_true / total verifiable claims
  • Falsity-density: False+pants_on_fire / total verifiable claims
  • Claims per 1,000 words: Normalized claim density

Track these metrics over time; they provide an objective input to the rhetorical and trajectory analyses.

Cross-Source Verification

For any published analysis citing claim-tracking results:

  • Confirm the fact-check is current and from a reputable source
  • Note any change in fact-check status (rare but happens)
  • Acknowledge fact-checker biases: All fact-checkers have editorial perspectives; no source is neutral. Cite multiple where they agree.
  • Distinguish between “false” and “misleading”: A literally-true statement can be misleading; flag separately

Implementation Stack

  • Anthropic Claude for LLM-assisted claim extraction (with structured prompts; require evidence quotes for each claim)
  • Sentence-transformers for semantic similarity matching to existing fact-checks
  • Fact-check databases: Cached copies of PolitiFact, FactCheck.org, Washington Post Fact Checker, AP Fact Check
  • Human editorial review before publication of any claim-tracking output

Limitations

  • Claim-extraction is hard: Claims often nested in subordinate clauses, quoted speech, sarcasm. LLM extraction misses subtle claims and over-extracts on ambiguous ones.
  • Fact-checker disagreement: Different fact-checkers may rate the same claim differently; flag and document.
  • Claim context matters: A literally-true claim can be misleading; a literally-false claim can be metaphorical. Editorial judgment is required.
  • Predictions and counterfactuals: Difficult to fact-check until verifiable.
  • Volume limits: At Trump-era volume (~25k+ rally events, statements, posts), exhaustive claim tracking is infeasible. Sampling strategy required.

See Also

  • sc-overview.md
  • sc-framework-rhetoric.md (overlapping authoritarianism markers, especially G1-G4)
  • sc-framework-linguistic-features.md
  • sc-framework-topics-frames.md
  • ITI skill: claims-integrity-audit
  • ITI skill: claims-evidence-registry
Was this article helpful?
0 out of 5 stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
5
Please Share Your Feedback
How Can We Improve This Article?
Table of Contents