Purpose: Comprehensive catalog of features, functions, and tools used by investigative journalists and OSINT practitioners — organized as input for the Open Semantic Search major upgrade development plan. Part I covers investigative journalism platforms and workflows. Part II covers the broader Open Source Intelligence (OSINT) resource ecosystem. Part III provides analysis and upgrade planning.
Date: April 2026
Part I — Investigative Journalism Platforms & Workflows
1. Document Ingestion & Processing
The foundation of every investigative project is getting documents into a searchable, analyzable state.
Core Functions
| Function |
Description |
Tools |
| Multi-format ingestion |
Accept PDFs, Word, Excel, email (.eml/.pst), images, HTML, archives (.zip/.tar), audio, video |
Apache Tika, Datashare, Aleph, DocumentCloud |
| OCR (optical character recognition) |
Extract text from scanned documents, images, embedded images in PDFs |
Tesseract, Google Cloud Vision, Amazon Textract, DocumentCloud AI OCR |
| Handwriting recognition |
Transcribe handwritten notes and annotations |
Google Pinpoint, Azure AI |
| Audio/video transcription |
Convert speech to searchable text |
Google Pinpoint (15 languages, up to 2hr files), Whisper, AssemblyAI |
| Email parsing |
Extract headers, body, attachments, thread structure from email archives |
Datashare, Aleph, Open Semantic ETL |
| Table extraction |
Convert scanned/PDF tables into structured spreadsheets |
Google Pinpoint, Camelot, Tabula, Arkham Mirror (vision-based) |
| Metadata extraction |
Pull EXIF, document properties, author info, creation dates, GPS coordinates |
ExifTool, FOCA, Metagoofil, Apache Tika |
| Batch processing |
Ingest thousands/millions of documents with queue management |
Datashare (CLI mode), Aleph, Open Semantic ETL (Celery/RabbitMQ) |
| Incremental crawling |
Re-scan sources and ingest only new/changed documents |
Open Semantic Search (cron), Aleph crawlers |
| Archive decompression |
Recursively unpack nested archives (.zip, .tar.gz, .rar, .7z) |
Apache Tika, Aleph |
Reference Implementations
- ICIJ Datashare: Tika + Tesseract + CoreNLP pipeline; local-first; CLI mode for large-scale batch processing
- Open Semantic Search: ETL framework with Celery task queue, RabbitMQ broker, Tika server, Tesseract cache
- OCCRP Aleph: Ingestors for structured (CSV, SQL) and unstructured (documents) data; FtM entity mapping
2. Search & Discovery
How journalists find needles in document haystacks.
Core Functions
| Function |
Description |
Tools |
| Full-text search |
Keyword search across all indexed content |
Solr, Elasticsearch, Aleph, Datashare |
| Boolean operators |
AND, OR, NOT, grouping with parentheses |
All major search platforms |
| Wildcard / fuzzy search |
Pattern matching, spelling tolerance |
Solr (edit distance), Elasticsearch |
| Proximity search |
Find terms within N words of each other |
Solr, Aleph |
| Phrase search |
Exact phrase matching with quotes |
All major platforms |
| Faceted search / interactive filters |
Navigate by author, date, entity, file type, language, tag |
Open Semantic Search, Datashare, Aleph |
| Semantic search |
Meaning-based search beyond keyword matching |
Trove (RAG), Arkham Mirror (embeddings), IntellyWeave |
| Natural language queries |
Ask questions in plain English, get cited answers |
Trove, Google Pinpoint, Presswork.ai |
| Search-by-list |
Upload a list of names/terms, find all matching documents |
Open Semantic Search, Datashare (batch search API) |
| Date range filtering |
Restrict results to specific time periods |
All major platforms |
| Saved searches / alerts |
Save queries and get notified of new matches |
Aleph, Datashare |
| Search within results |
Progressively narrow result sets |
Open Semantic Search (facet drilldown) |
| Cross-collection search |
Search across multiple document collections simultaneously |
Aleph (multi-dataset), Datashare (server mode) |
| Relevance ranking |
Score and sort results by relevance, with tuning controls |
Solr (boosts), Elasticsearch, Open Semantic Search |
| Document preview / snippets |
Show matching text in context without opening full document |
All major platforms |
| Similar document discovery |
“More like this” — find related documents |
Solr MLT, Elasticsearch |
| Synonym / thesaurus expansion |
Automatically include synonyms and related terms |
Open Semantic Search (SKOS thesaurus) |
3. Entity Extraction & NLP
Automatically identifying people, organizations, places, and relationships within documents.
Core Functions
| Function |
Description |
Tools |
| Named Entity Recognition (NER) |
Extract persons, organizations, locations |
spaCy, CoreNLP, GLiNER, Stanza |
| Email address extraction |
Find and catalog email addresses in text |
Datashare, theHarvester |
| Phone number extraction |
Identify phone numbers across formats |
Custom regex, NER models |
| Financial entity extraction |
Identify monetary amounts, account numbers, transactions |
Trove, Arkham Mirror |
| Date/time extraction |
Normalize dates across formats and languages |
SUTime (CoreNLP), Duckling |
| Address extraction |
Parse physical addresses, normalize formatting |
Libpostal, custom NER |
| Custom entity types |
Define domain-specific entities (laws, case numbers, cryptonyms) |
IntellyWeave (GLiNER), spaCy custom models |
| Entity linking / disambiguation |
Connect extracted entities to known databases (Wikidata, etc.) |
Open Semantic Entity Search API, spaCy entity linker |
| Coreference resolution |
Connect “he,” “the company,” “the defendant” to specific entities |
CoreNLP, NeuralCoref |
| Multilingual NER |
Entity extraction across 40+ languages |
spaCy multilingual models, New/s/leak 2.0, Stanza |
| Sentiment / tone analysis |
Detect emotional tone in communications |
Talkwalker, custom models |
| Topic modeling |
Auto-discover themes across document collections |
LDA, BERTopic |
| Language detection |
Automatically identify document language |
langdetect, fastText, Tika |
| Keyword / keyphrase extraction |
Identify most significant terms per document |
YAKE, KeyBERT, TF-IDF |
| Manual entity annotation |
Human correction and addition of entities |
Open Semantic Search (tagger), Datashare (star/tag), Aleph |
4. Network Analysis & Link Visualization
Revealing hidden relationships between people, companies, and events.
Core Functions
| Function |
Description |
Tools |
| Entity relationship mapping |
Visualize connections between people, orgs, accounts |
Maltego, i2 Analyst’s Notebook, Aleph |
| Network graph visualization |
Interactive node-link diagrams |
Neo4j Browser, Cytoscape.js, Gephi, Sigma.js |
| Link analysis |
Identify shortest paths, clusters, central nodes |
Maltego, i2, Gephi |
| Timeline visualization |
Plot events chronologically to reveal patterns |
Aleph, Trove, TimelineJS, i2 |
| Geographic mapping of networks |
Overlay network data on maps |
Maltego, IntellyWeave (Mapbox), Aleph |
| Automated connection discovery |
AI-driven detection of non-obvious links |
Maltego transforms, Trove |
| Cluster detection |
Identify groups/communities within networks |
Gephi (modularity), Neo4j (community detection) |
| Influence / centrality scoring |
Rank nodes by importance (betweenness, PageRank) |
Gephi, Neo4j, NetworkX |
| Temporal network analysis |
Track how networks evolve over time |
Gephi (dynamic), custom tools |
| Cross-dataset entity matching |
Match entities across different databases |
Aleph (cross-referencing), OpenRefine (reconciliation) |
| Export / publish graphs |
Share visualizations for publication or collaboration |
Maltego, Gephi (SVG/PDF), Aleph (embed) |
Reference Implementations
- Maltego: 200+ data source transforms; 200M+ company records; 1B+ online identities
- i2 Analyst’s Notebook: 30+ year industry standard; drag-and-drop; team collaboration
- OCCRP Aleph: FollowTheMoney (FtM) data model; YAML entity mapping; network diagrams built into platform
- Open Semantic Search: Neo4j graph database integration; Cytoscape.js visualization
5. Knowledge Graphs & Structured Data
Organizing investigative knowledge into queryable, connected structures.
Core Functions
| Function |
Description |
Tools |
| Ontology / thesaurus management |
Define and maintain controlled vocabularies |
Open Semantic Search (SKOS/RDF), Protégé |
| RDF / linked data |
Represent knowledge as machine-readable triples |
Open Semantic Search, Apache Jena |
| Graph database storage |
Store entities and relationships as nodes/edges |
Neo4j, ArangoDB, JanusGraph |
| SPARQL / Cypher queries |
Query knowledge graphs with graph query languages |
Neo4j (Cypher), Apache Jena (SPARQL) |
| FollowTheMoney (FtM) |
Standardized entity model for investigative data |
Aleph, OpenSanctions |
| Schema.org structured data |
Standard vocabularies for web-published findings |
Schema.org, JSON-LD |
| Entity deduplication |
Merge duplicate entity records |
Aleph, OpenRefine, Dedupe.io |
| Taxonomy tagging |
Classify documents by topic/category hierarchies |
Open Semantic Search, custom taxonomies |
| Inference / reasoning |
Derive new facts from existing knowledge |
OWL reasoners, Neo4j GDS |
| Knowledge graph visualization |
Interactive exploration of entity networks |
Neo4j Browser, Open Semantic Search (graph explorer) |
6. Corporate & Financial Intelligence
Following the money and mapping corporate structures.
Core Functions
| Function |
Description |
Tools |
| Company registry search |
Look up company records across jurisdictions |
OpenCorporates (140+ registries), Companies House |
| Beneficial ownership lookup |
Identify ultimate owners behind corporate veils |
OpenSanctions, ICIJ Offshore Leaks, national BO registries |
| Sanctions screening |
Check entities against global sanctions lists |
OpenSanctions (2.1M+ entities, 328 sources), OFAC, EU consolidated list |
| PEP (Politically Exposed Person) screening |
Identify politically connected individuals |
OpenSanctions PEP data, Dow Jones, World-Check |
| Corporate hierarchy mapping |
Visualize parent-subsidiary-affiliate structures |
OpenCorporates, Aleph, Maltego |
| Director / officer cross-referencing |
Find shared directorships across companies |
OpenCorporates, Maltego |
| Financial transaction tracing |
Follow money flows across accounts and entities |
Trove (financial extraction), Chainalysis (crypto) |
| Property / land records |
Search real estate ownership databases |
County recorder databases, Zillow, Regrid |
| Court records / litigation |
Search case filings and court documents |
PACER, CourtListener, RECAP |
| Offshore leak databases |
Search Panama Papers, Paradise Papers, Pandora Papers |
ICIJ Offshore Leaks Database |
| Lobbyist / political donation databases |
Track political influence and money |
OpenSecrets, FEC, state lobbying databases |
| Tax haven / jurisdiction analysis |
Identify structures in low-transparency jurisdictions |
Tax Justice Network, ICIJ resources |
Reference Implementations
- OpenSanctions: Free for investigative use; API for bulk matching; reconciliation endpoints
- OCCRP Aleph: Pre-loaded with sanctions, corporate registries, leak databases
- ICIJ Offshore Leaks: 810K+ offshore entities searchable online
7. Geolocation & Mapping
Placing events, people, and evidence in geographic context.
Core Functions
| Function |
Description |
Tools |
| Satellite imagery analysis |
Examine high-res imagery for physical evidence |
Google Earth Pro, Planet, Maxar, Airbus (via Apollo Mapping) |
| Historical satellite comparison |
Compare imagery over time for changes |
Google Earth (timelapse), Sentinel Hub |
| Street-level verification |
Verify locations using panoramic street imagery |
Google Street View, Mapillary, KartaView |
| Sun/shadow analysis |
Determine time/date from shadow positions |
SunCalc, ShadowMap |
| Geolocation from photos |
Identify where a photo was taken from visual clues |
GeoHints, Google Lens, Bellingcat geolocation tools |
| WiFi network geolocation |
Map WiFi access point locations |
WiGLE |
| Social media geolocation |
Discover geotagged posts from specific areas |
Social Geo Lens, Creepy |
| Custom map creation |
Plot investigation data on interactive maps |
Mapbox, Leaflet, Google My Maps, QGIS |
| EXIF GPS extraction |
Pull GPS coordinates from photo metadata |
ExifTool, Jeffrey’s EXIF Viewer |
| Geofencing / area monitoring |
Track activity within defined geographic areas |
Custom tools, social media monitoring |
| 3D terrain analysis |
Analyze topography for line-of-sight, elevation |
Google Earth Pro, Cesium |
8. Image & Video Verification
Detecting manipulated, AI-generated, or misattributed visual media.
Core Functions
| Function |
Description |
Tools |
| Reverse image search |
Find original source and prior uses of an image |
Google Lens, TinEye, Yandex Images, Bing Visual Search |
| Video verification |
Analyze video authenticity, extract keyframes |
InVID Verification Plugin, YouTube DataViewer (Amnesty) |
| Image forensics |
Detect cloning, splicing, JPEG compression artifacts |
Forensically, FotoForensics, Error Level Analysis |
| AI-generated image detection |
Identify GAN/diffusion-generated fake images |
AmIReal, Hive AI Detection, Illuminarty |
| Deepfake detection |
Identify AI-manipulated video/audio |
Azure AI Video Indexer, Sensity, Reality Defender |
| Metadata analysis |
Examine EXIF, IPTC, XMP data for authenticity clues |
ExifTool, Jeffrey’s EXIF Viewer |
| Chronolocation |
Determine when a photo/video was taken |
SunCalc (shadow analysis), weather correlation |
| Logo / object identification |
Identify organizations, equipment, weapons in images |
Google Lens, CamFind, military identification databases |
| Facial detection |
Detect faces in video for indexing (not biometric ID) |
Azure AI Video Indexer |
9. Social Media Intelligence
Monitoring, collecting, and analyzing social media for investigative leads.
Core Functions
| Function |
Description |
Tools |
| Platform monitoring |
Track mentions, hashtags, accounts across platforms |
Talkwalker, CrowdTangle, Meltwater |
| Historical data retrieval |
Access deleted/archived social media posts |
Wayback Machine, Archive.today, cached versions |
| Twitter/X analysis |
Historical tweet extraction, network mapping |
Various OSINT tools (post-API restrictions) |
| Telegram channel search |
Index and search Telegram channels/groups |
Telegago, TGStat |
| Facebook/Instagram research |
Profile analysis, group monitoring |
CrowdTangle (Meta), Who Posted What |
| Google account profiling |
Discover accounts linked to a Google account |
GHunt |
| Username enumeration |
Find accounts across platforms by username |
Sherlock, Namechk, KnowEm |
| Profile archiving |
Save social media profiles before deletion |
Hunchly, Auto Archiver (Bellingcat) |
| Sentiment / narrative tracking |
Track how narratives spread across platforms |
Talkwalker, Brandwatch |
| Bot detection |
Identify automated/inauthentic accounts |
Botometer, custom analysis |
| Influence network mapping |
Map follower/following networks and amplification |
Maltego, Gephi, custom scrapers |
10. Web Archiving & Evidence Preservation
Capturing and preserving digital evidence with chain of custody.
Core Functions
| Function |
Description |
Tools |
| Webpage snapshots |
Save point-in-time copies of web pages |
Archive.today, Wayback Machine |
| Automatic capture |
Record every page visited during investigation |
Hunchly |
| Batch archiving |
Archive multiple URLs programmatically |
Bellingcat Auto Archiver, ArchiveBox |
| Hash verification |
Cryptographic hash of captured content for integrity |
Hunchly (SHA-256), Auto Archiver |
| Timestamp certification |
Provable timestamp of when content was captured |
Hunchly, blockchain timestamping |
| Screenshot capture |
Visual preservation of page appearance |
Archive.today (PNG), Hunchly (full-page) |
| Social media archiving |
Capture posts, profiles, comments before deletion |
Auto Archiver, Hunchly, Archive.today |
| PDF generation |
Convert web pages to archival PDF format |
SingleFile, print-to-PDF, custom tools |
| Perceptual hashing |
Detect near-duplicate content across archives |
Auto Archiver |
| Chain of custody documentation |
Audit trail of who captured what and when |
Hunchly, Auto Archiver |
| Evidence packaging |
Generate court-ready or publication-ready evidence bundles |
Hunchly |
Reference Implementations
- Hunchly: Commercial; automatic capture, tagging, audit trails, court-ready exports
- Bellingcat Auto Archiver: Open-source; 150K+ pages preserved; batch processing; perceptual hashing
- Archive.today: Free; preserves JS-rendered content; persistent URLs
11. FOIA & Public Records
Filing, tracking, and analyzing government records requests.
Core Functions
| Function |
Description |
Tools |
| FOIA request filing |
Submit public records requests to government agencies |
MuckRock (23K+ agencies), iFOIA |
| Request tracking |
Monitor status of pending requests |
MuckRock, custom tracking |
| Agency database |
Directory of government agencies and their FOIA contacts |
MuckRock, FOIA.gov |
| FOIA log search |
Search existing FOIA logs for prior requests |
MuckRock FOIA Log Explorer (170K+ requests) |
| Response management |
Track incoming documents and fee payments |
MuckRock |
| Appeal templates |
Generate appeals for denied or inadequate responses |
MuckRock, RCFP resources |
| Document hosting |
Publish received documents publicly |
DocumentCloud (6.9M+ public documents) |
| Document annotation |
Annotate key passages in public records |
DocumentCloud, Datashare |
| Bulk embedding |
Embed documents in news articles |
DocumentCloud embed API |
12. Secure Communication & Whistleblower Intake
Protecting sources and handling sensitive materials.
Core Functions
| Function |
Description |
Tools |
| Anonymous document submission |
Allow sources to upload files anonymously |
SecureDrop, GlobaLeaks |
| End-to-end encryption |
Encrypt communications between source and journalist |
Signal, SecureDrop (GPG) |
| Tor-based anonymity |
Route communications through onion network |
SecureDrop (.onion), Tor Browser |
| Air-gapped viewing |
View sensitive documents on non-networked machines |
SecureDrop Secure Viewing Station |
| Metadata stripping |
Remove identifying metadata from documents |
MAT2, ExifTool, SecureDrop |
| Encrypted storage |
Store documents with at-rest encryption |
VeraCrypt, LUKS, SecureDrop |
| Source communication |
Two-way messaging with anonymous sources |
SecureDrop, Signal |
| Organization-owned infrastructure |
No third-party servers; full organizational control |
SecureDrop (on-premises) |
13. Collaboration, Annotation & Redaction
Working in teams on sensitive investigations.
Core Functions
| Function |
Description |
Tools |
| Shared investigation workspaces |
Private team spaces for investigative projects |
Aleph (investigations), DocumentCloud, Trove |
| Document annotation |
Highlight, comment, and mark up documents |
DocumentCloud (notes), Aleph, Hypothesis |
| Entity bookmarking |
Save and organize entities of interest |
Aleph (lists), Datashare (stars/tags) |
| Document tagging |
Categorize documents with custom tags |
Datashare, DocumentCloud, Aleph |
| Permanent redaction |
Irrecoverably remove sensitive information |
Redactable, DocumentCloud, Adobe Acrobat |
| AI-powered PII detection |
Automatically find personal information for redaction |
Redactable, Orson AI |
| Entity anonymization |
Replace real names with placeholders during collaboration |
Orson AI |
| Redaction audit trail |
Log who redacted what and when |
Redactable (certificates), Orson AI |
| Access control / permissions |
Role-based access to investigation materials |
Aleph, Datashare (server mode), DocumentCloud |
| Version control |
Track document changes and annotation history |
DocumentCloud, Git-based workflows |
| Export / publication |
Generate publication-ready document packages |
DocumentCloud (embed), Aleph, Hunchly |
14. Data Cleaning & Record Linkage
Preparing messy real-world data for analysis.
Core Functions
| Function |
Description |
Tools |
| Data cleaning |
Fix inconsistencies, normalize formats, handle missing values |
OpenRefine, pandas, Excel/Sheets |
| Clustering / deduplication |
Group and merge similar records |
OpenRefine (key collision, nearest neighbor) |
| Reconciliation |
Match local data against external databases (Wikidata, etc.) |
OpenRefine reconciliation API |
| Data transformation |
Convert between formats (CSV, JSON, XML, SQL) |
OpenRefine, jq, csvkit |
| Faceting |
Explore data distributions by text, numeric, date facets |
OpenRefine |
| Regular expression extraction |
Pattern-based data extraction from text |
OpenRefine, grep, regex tools |
| Spreadsheet analysis |
Pivot tables, formulas, statistical analysis |
Excel, Google Sheets, Tableau |
| Data visualization |
Charts, graphs, dashboards |
Tableau, Flourish, Datawrapper, D3.js |
| Geocoding |
Convert addresses to coordinates and vice versa |
Google Geocoding API, Nominatim |
| Date normalization |
Parse and standardize dates across formats |
dateutil, Duckling, custom parsers |
15. Transportation Tracking
Monitoring movements of aircraft, ships, and vehicles.
Core Functions
| Function |
Description |
Tools |
| Flight tracking (ADS-B) |
Real-time and historical aircraft position data |
ADS-B Exchange, Flightradar24, FlightAware |
| Aircraft registration lookup |
Identify owners of specific aircraft |
FAA Registry, national aviation databases |
| Ship tracking (AIS) |
Real-time vessel position and voyage data |
MarineTraffic, VesselFinder |
| AIS gap detection |
Identify vessels turning off transponders (sanctions evasion) |
RadianceFleet |
| Maritime anomaly detection |
Flag suspicious vessel behavior patterns |
RadianceFleet, Phantom Tide |
| Sanctions vessel cross-referencing |
Match tracked vessels against sanctions watchlists |
RadianceFleet, MarineTraffic |
| Vehicle plate recognition |
Track vehicles via license plate databases |
Varies by jurisdiction |
| Rail / freight tracking |
Monitor cargo shipments |
Open-source tools, carrier APIs |
| Cross-domain correlation |
Combine air, sea, and ground tracking data |
Phantom Tide |
16. Cryptocurrency & Dark Web
Investigating digital financial crime and hidden networks.
Core Functions
| Function |
Description |
Tools |
| Blockchain transaction tracing |
Follow cryptocurrency flows across wallets |
Chainalysis Reactor, Crystal, Elliptic |
| Wallet identification |
Link wallets to real-world entities |
Chainalysis (134K+ counterparties) |
| Multi-chain analysis |
Trace across Bitcoin, Ethereum, and 27+ blockchains |
Chainalysis, Arkham Intelligence |
| Mixing / tumbling detection |
Identify obfuscated transactions |
Chainalysis (demixing), custom analysis |
| DeFi / smart contract analysis |
Trace swaps, bridges, and complex DeFi activity |
Chainalysis, Etherscan |
| Dark web monitoring |
Scan dark web marketplaces and forums |
Cloudburst, DarkOwl, Flashpoint |
| Dark web + crypto correlation |
Link dark web actors to blockchain activity |
Chainalysis + Cloudburst integration |
| Ransomware tracking |
Trace ransom payments and money laundering |
Chainalysis |
17. AI/LLM-Powered Investigation
Emerging AI capabilities transforming investigative journalism.
Core Functions
| Function |
Description |
Tools |
| RAG-powered document Q&A |
Ask natural language questions across document collections |
Trove (Claude/Gemini), Arkham Mirror, Presswork.ai |
| Semantic / embedding search |
Find conceptually related content beyond keyword matching |
Trove, Arkham Mirror, IntellyWeave (Weaviate) |
| AI-powered summarization |
Generate summaries of long documents or document sets |
Trove, Presswork.ai, Google Pinpoint |
| Contradiction detection |
Flag inconsistencies across documents |
Arkham Mirror |
| Multi-agent reasoning |
Multiple AI agents collaborating on complex analysis |
IntellyWeave (DSPy), multi-agent workflows |
| Automated timeline extraction |
Build event timelines from unstructured text |
Arkham Mirror, Trove |
| FOIA request generation |
AI-drafted public records requests |
Presswork.ai |
| Source CRM |
Track sources, contacts, and leads across investigations |
Presswork.ai |
| Claim extraction & verification |
Extract factual claims from text and assess verifiability |
Sonar Deep Research (Perplexity) |
| Evidence scoring |
Rank evidence by relevance, reliability, and provenance |
Sonar Deep Research, Trove |
| Multi-source aggregation & briefing |
Synthesize intelligence from diverse real-time feeds |
Crucix (27 data sources), GDELT, ACLED |
| Hypothesis-driven investigation |
AI-guided exploration starting from investigative hypotheses |
IntellyWeave |
| Citation / provenance tracking |
Every AI answer linked back to source documents |
Trove, Sonar Deep Research |
Part II — Open Source Intelligence (OSINT) Resources
The sections below catalog the broader OSINT tool ecosystem that investigative journalists draw from beyond their core document analysis platforms. These tools are used for identity research, infrastructure reconnaissance, threat intelligence, and domain-specific monitoring.
18. OSINT Frameworks & Aggregators
Master directories and curated collections that organize the OSINT landscape.
Tool Directories
| Resource |
Maintainer |
Scope |
Access |
| OSINT Framework |
osintframework.com |
Interactive tree of OSINT tools by category |
Free, web-based |
| Bellingcat Online Investigation Toolkit |
Bellingcat |
11 categories; downloadable as CSV |
Free |
| Awesome OSINT |
GitHub community (220+ contributors) |
Curated GitHub list covering all OSINT domains |
Free |
| OSINT Directory |
osintdirectory.com |
540+ tools across 15 categories with tutorials |
Free |
| OSINT Tools Library |
OSINT Newsletter |
Practitioner-curated, reliability-tested tools |
Free |
| Cipher387 OSINT Stuff Tool Collection |
cipher387 |
1,000+ tools with descriptions |
Free |
| OSINTBench |
osintbench.com |
Categorized tools with ratings and comparisons |
Free |
| Legendary OSINT |
K2SOsint (GitHub) |
Dark web, malware, phishing, automation focus |
Free |
| OSINT Bible |
frangelbarrera (GitHub) |
33 specialized categories with methodology guides |
Free |
| Worldwide OSINT Tools Map |
cybdetective.com |
614+ services organized by country on interactive map |
Free |
Key Directory Categories (Bellingcat taxonomy)
| Category |
Focus Area |
| Maps & Satellites |
Google Earth, Planet, Maxar, Sentinel Hub |
| Geolocation |
OpenStreetMap, GeoHints, SunCalc |
| Image/Video |
Google Lens, InVID, Forensically |
| Social Media |
Platform-specific analysis tools |
| People |
Sherlock, Maigret, WhatsMyName |
| Websites |
Wayback Machine, IntelX, DomainTools |
| Companies & Finance |
EDGAR, OpenCorporates, sanctions lists |
| Conflict |
ACLED, LiveUAMap, munitions databases |
| Transport |
FlightAware, Flightradar24, MarineTraffic |
| Environment & Wildlife |
Global Forest Watch, Global Fishing Watch, NASA FIRMS |
| Archiving |
Auto Archiver, Archive.today, Hunchly |
19. People & Identity Investigation
Tools for researching individuals — from username enumeration to facial recognition.
Username & Account Discovery
| Tool |
Description |
Coverage |
License |
| Sherlock |
Search for usernames across social networks |
400+ platforms |
Open Source |
| SherlockOSINT |
Web-based username search |
700+ platforms |
Free |
| Maigret |
Advanced username dossier collection with profile parsing |
3,000+ sites (500 default) |
Open Source |
| WhatsMyName |
Username enumeration with community-maintained site list |
600+ sites |
Open Source |
| Namechk |
Username and domain availability checker |
100+ platforms |
Free |
| KnowEm |
Username search across social networks and domains |
500+ platforms |
Free |
| Blackbird |
Fast username search with OSINT-focused output |
500+ sites |
Open Source |
People Search Engines
| Tool |
Description |
Coverage |
License |
| Pipl |
Deep people search engine (professional/commercial) |
Global identity data |
Commercial |
| ThatsThem |
Reverse lookups by name, email, phone, IP, address |
US-focused |
Free tier |
| Spokeo |
Aggregated people search (public records, social, property) |
US |
Commercial |
| BeenVerified |
Background checks and people search |
US |
Commercial |
| Max Intel |
72 free OSINT tools including people search, ghost finder |
Global |
Free |
| Social Catfish |
Reverse image, phone, email, and name search |
US/Global |
Commercial |
Facial Recognition & Image-Based People Search
| Tool |
Description |
Notes |
| PimEyes |
Facial recognition search engine; finds face matches across the web |
Commercial; controversial privacy implications |
| FaceCheck.ID |
Reverse face search engine |
Commercial |
| Search4faces |
Facial recognition search across VK and Odnoklassniki |
Free; Russian social networks |
| Azure AI Video Indexer |
Facial detection in video (not identification) |
Microsoft; free tier available |
| Google Lens |
Visual search that can match faces to web appearances |
Free |
Identity Verification & Analysis
| Tool |
Description |
Use Case |
| Epieos |
Email and phone OSINT investigation |
Identify accounts linked to email/phone |
| GHunt |
Google account investigation from email |
Discover Google services, reviews, maps contributions |
| Creepy |
Geolocation information gathering from social media |
Map subject movements from geotagged posts |
| Social Geo Lens |
Discover public posts from specific geographic areas |
Location-based people discovery |
20. Email Intelligence
Investigating email addresses — ownership, breach exposure, organizational mapping.
Core Functions
| Function |
Description |
Tools |
| Email verification |
Check if an email address exists and is deliverable |
Hunter.io, Email Hippo, NeverBounce |
| Organizational email discovery |
Find email addresses for a domain/organization |
Hunter.io, RocketReach, Snov.io |
| Email-to-identity mapping |
Link an email to social accounts and real identity |
Epieos, Holehe, GHunt |
| Breach exposure check |
Check if email appears in known data breaches |
Have I Been Pwned, XposedOrNot, DeHashed |
| Email header analysis |
Parse email headers for origin, routing, and spoofing indicators |
MXToolbox, Google Admin Toolbox |
| Reverse email search |
Find social profiles and registrations from an email |
Epieos, Holehe, UserSearch |
| Domain email patterns |
Discover naming conventions used by an organization |
Hunter.io, theHarvester |
| Email infrastructure analysis |
Examine MX records, SPF, DKIM, DMARC configuration |
MXToolbox, dmarcian |
Key Tools
| Tool |
Description |
License |
| Hunter.io |
Email finder and verifier; maps organizational email structures |
Freemium (25 searches/mo free) |
| Holehe |
Check which services an email is registered on (80+ sites) |
Open Source |
| Epieos |
Email and phone OSINT — linked accounts, breach data |
Free |
| theHarvester |
Gather emails, subdomains, IPs from public sources |
Open Source (Kali) |
| Have I Been Pwned |
Email breach exposure database (12B+ compromised accounts) |
Free (API paid) |
| Snov.io |
Email finder, verifier, and outreach automation |
Freemium |
| RocketReach |
Professional email and phone number finder |
Commercial |
| MXToolbox |
Email server diagnostics, blacklist check, header analysis |
Free |
21. Phone Number Intelligence
Tools for investigating phone numbers — carrier, owner, location, and linked accounts.
Core Functions
| Function |
Description |
Tools |
| Carrier lookup |
Identify the carrier/operator for a phone number |
PhoneInfoga, NumLookup, Twilio Lookup |
| Reverse phone search |
Find the owner or identity behind a number |
Truecaller, Sync.me, ThatsThem |
| Number type identification |
Determine if number is mobile, landline, VoIP |
PhoneInfoga, Twilio Lookup API |
| Country/region identification |
Parse international number formatting and origin |
libphonenumber, PhoneInfoga |
| Caller ID aggregation |
Aggregate crowdsourced caller identification data |
Truecaller (7B+ numbers), Sync.me |
| Linked account discovery |
Find accounts registered with a phone number |
Epieos, Signal check, WhatsApp check |
| Spam/scam flagging |
Check if a number is reported as spam or fraud |
Truecaller, Should I Answer |
Key Tools
| Tool |
Description |
License |
| PhoneInfoga |
Advanced phone number scanner — carrier, type, region, linked accounts |
Open Source |
| Truecaller |
Global caller ID database (7B+ numbers, 400M+ users) |
Freemium |
| NumLookup |
Free reverse phone lookup |
Free |
| Sync.me |
Caller ID and reverse phone lookup |
Freemium |
| Twilio Lookup API |
Carrier and caller name lookup via API |
Pay-per-lookup |
| libphonenumber |
Google’s phone number parsing/formatting library |
Open Source |
22. Domain, DNS & Infrastructure Intelligence
Investigating web infrastructure — domains, IP addresses, hosting, certificates, and network topology.
Core Functions
| Function |
Description |
Tools |
| WHOIS lookup |
Domain registration details — owner, registrar, dates |
DomainTools, WHOIS.com, ViewDNS |
| Historical WHOIS |
Track domain ownership changes over time |
DomainTools, SecurityTrails |
| DNS record lookup |
A, AAAA, MX, NS, TXT, CNAME records |
DNSDumpster, DNSRecon, ViewDNS |
| Subdomain enumeration |
Discover all subdomains for a domain |
Subdominator (50+ sources), Amass, Subfinder |
| Certificate transparency search |
Find related domains sharing SSL certificates |
crt.sh, Censys |
| Reverse IP lookup |
Find other domains hosted on the same IP |
ViewDNS, Shodan, SecurityTrails |
| Technology profiling |
Identify web frameworks, CMS, analytics, hosting |
BuiltWith, Wappalyzer, WhatRuns |
| Internet-connected device search |
Discover exposed devices, services, vulnerabilities |
Shodan, Censys, ZoomEye, Fofa |
| Port scanning / service detection |
Identify open ports and running services |
Shodan, Censys, Nmap |
| IP geolocation |
Map IP addresses to physical locations |
MaxMind, ipinfo.io, IPVoid |
| ASN / BGP analysis |
Identify network ownership and routing |
Hurricane Electric BGP, RIPE Stat |
| Website change detection |
Monitor web pages for content changes |
Visualping, ChangeTower, Distill.io |
Key Tools
| Tool |
Description |
License |
| Shodan |
Internet-connected device search engine (3M+ users, 89% of Fortune 100) |
Freemium |
| Censys |
Internet-wide scan data — hosts, certificates, protocols |
Freemium |
| DNSDumpster |
Free domain research and DNS reconnaissance |
Free |
| crt.sh |
Certificate Transparency log search |
Free |
| DomainTools |
WHOIS, reverse WHOIS, domain history |
Commercial |
| SecurityTrails |
Historical DNS, WHOIS, subdomain intelligence |
Freemium |
| BuiltWith |
Technology profiling for websites |
Freemium |
| Wappalyzer |
Browser extension for technology detection |
Free |
| ViewDNS.info |
DNS lookup, reverse IP, WHOIS history, firewall detection |
Free |
| Amass |
In-depth subdomain enumeration and attack surface mapping |
Open Source (OWASP) |
| BBOT |
Recursive modular OSINT framework with 80+ modules |
Open Source |
| Subdominator |
Passive subdomain enumeration from 50+ sources |
Open Source |
| ZoomEye |
Cyberspace search engine (Chinese alternative to Shodan) |
Freemium |
| Fofa |
Cyberspace search and asset mapping |
Freemium |
23. OSINT Search Engines & Dark Web Search
Specialized search engines that index content beyond the surface web.
Surface Web OSINT Search
| Tool |
Description |
Special Capability |
| Intelligence X (IntelX) |
OSINT search for emails, domains, IPs, Bitcoin, files |
Historical data, leaked datasets, dark web content |
| Google Dorking |
Advanced Google operators for targeted search |
site:, filetype:, intitle:, inurl: operators |
| Yandex |
Russian search engine with strong image search |
Often returns results Google misses |
| DuckDuckGo |
Privacy-focused search with !bang shortcuts |
Useful for non-personalized results |
| Carrot2 |
Search results clustering engine |
Groups results by topic automatically |
| The OSINT Vault |
Multi-Search Launcher (80+ platforms from single query) |
Batch search across multiple engines |
| Cylect.io |
AI-powered OSINT search aggregator |
Aggregates multiple search engines and tools |
Dark Web & Tor Search
| Tool |
Description |
Access |
| Ahmia |
Tor hidden services search engine with abuse blacklist |
Web (clearnet) + Tor |
| Torch |
Long-running Tor search engine |
Tor only |
| OnionLand |
Tor hidden service search |
Tor |
| Haystack |
Dark web search engine |
Tor |
| DarkSearch |
Dark web search with API access |
Clearnet interface |
| Kilos |
Dark web market search engine |
Tor |
| IntelX |
Indexes some dark web content alongside clearnet |
Clearnet + archived Tor content |
Code & Pastebin Search
| Tool |
Description |
Useful For |
| GitHub Code Search |
Search across all public GitHub repositories |
Leaked credentials, API keys, internal docs |
| Grep.app |
Full-text search across 500K+ public Git repos |
Fast code/secret search |
| SearchCode |
Multi-platform code search (GitHub, BitBucket, GitLab) |
Cross-platform code discovery |
| Pastebin |
Public paste monitoring |
Leaked data, doxes, breach dumps |
| PasteLert |
Alert service for Pastebin mentions |
Monitor brand/keyword mentions |
| PublicWWW |
Source code search across live websites |
Find sites using specific scripts, pixels |
24. Breach Data & Credential Intelligence
Databases and tools for investigating data breaches and leaked credentials.
Core Functions
| Function |
Description |
Tools |
| Email breach check |
Check if an email appears in known breaches |
Have I Been Pwned, XposedOrNot |
| Password breach check |
Check if a specific password has been leaked |
HIBP Pwned Passwords (k-anonymity) |
| Domain breach check |
Find all breached accounts for an organization |
HIBP Domain Search, SpyCloud |
| Stealer log search |
Search through infostealer malware dumps |
HIBP (stealer logs), Hudson Rock |
| Credential validation |
Check if leaked credentials are still active |
Ethical considerations — use for defense only |
| Combolist monitoring |
Track distribution of credential dumps |
Threat intelligence feeds |
Key Tools
| Tool |
Description |
License |
| Have I Been Pwned (HIBP) |
Gold standard breach notification (12B+ compromised accounts, 900+ breaches) |
Free (API commercial) |
| XposedOrNot |
Alternative breach database with free API |
Free |
| DeHashed |
Breach database search with credential data |
Commercial |
| LeakCheck |
Breach data search by email, username, phone, keyword |
Freemium |
| SpyCloud |
Enterprise breach analytics and credential monitoring |
Commercial |
| Hudson Rock |
Infostealer intelligence — compromised computers and credentials |
Commercial |
| Snusbase |
Breach data search engine |
Commercial |
| IntelX |
Archived leaked datasets and pastes |
Freemium |
25. Public Records & Government Databases
Free and paid access to court records, property data, business filings, and other government records.
US Federal Records
| Resource |
Description |
Access |
| PACER |
Federal court electronic records (dockets, filings, opinions) |
Paid (fees waived < $15/quarter) |
| RECAP |
Free archive of PACER documents + browser extension |
Free (via Free Law Project) |
| CourtListener |
Federal and state court opinions search |
Free |
| Federal Judicial Center IDB |
Integrated federal court docket database |
Free |
| EDGAR |
SEC company filings, insider trading, proxy statements |
Free |
| FEC.gov |
Federal campaign finance data — donations, expenditures |
Free |
| OpenSecrets |
Money in politics — lobbying, donations, PACs |
Free |
| USAspending.gov |
Federal spending and contract data |
Free |
| SAM.gov |
Federal contractor registrations and exclusions |
Free |
| FOIA.gov |
Federal FOIA request portal |
Free |
US State & Local Records
| Resource |
Description |
Access |
| National Center for State Courts |
Directory of all state court websites |
Free |
| Black Book Online |
County-level court records by state/county |
Free |
| County assessor databases |
Property ownership, assessed values, tax records |
Free (varies by county) |
| Secretary of State databases |
Business entity filings, UCC filings, notary records |
Free (varies by state) |
| State corporation commission |
Corporate registration and annual reports |
Free (varies by state) |
International Records & Registries
| Resource |
Description |
Coverage |
| OpenCorporates |
World’s largest open database of company data |
140+ jurisdictions |
| ICIJ Offshore Leaks |
Panama Papers, Paradise Papers, Pandora Papers entities |
810K+ entities |
| OpenSanctions |
Sanctions, PEP, criminal interest entities |
2.1M+ entities, 328 sources |
| UK Companies House |
UK company filings, directors, accounts |
Free, comprehensive |
| EU Transparency Register |
EU lobbyist registrations |
Free |
| Worldwide OSINT Map |
Country-by-country directory of public records databases |
614+ services, 100+ countries |
26. Conflict Monitoring & Weapons Identification
Tools for tracking armed conflicts, political violence, and munitions.
Conflict Event Databases
| Tool |
Description |
Coverage |
| ACLED |
Armed Conflict Location & Event Data Project |
Global political violence + protests |
| GDELT |
Global Database of Events, Language, and Tone |
300+ categories of events, real-time |
| LiveUAMap |
Interactive conflict mapping with real-time events |
Ukraine, Middle East, Syria, global |
| Beholder |
80+ OSINT sources, 45+ specialized dashboards |
Global threat intelligence |
| Uppsala Conflict Data Program |
Academic conflict dataset (1946–present) |
Global armed conflicts |
| ICG CrisisWatch |
Monthly global conflict tracking |
International Crisis Group |
Weapons & Munitions Identification
| Tool |
Description |
Use Case |
| METIS (Fenix Insight) |
6,700+ technical records, 45,000+ munitions images, 500K+ events |
Identify weapons/munitions in photos and video |
| iTrace (Conflict Armament Research) |
Field investigation + weapons tracking database |
Trace weapon supply chains from point of use |
| Open Source Munitions Portal |
Searchable verified munitions image library |
Visual identification reference |
| Bulletpicker.com |
Ammunition guidebooks and armed forces manuals |
Reference for ammunition identification |
| Small Arms Survey |
Research on weapons, violence, and arms transfers |
Data and analysis for policy |
| Janes |
Defense and security intelligence (ships, aircraft, weapons) |
Commercial; used by governments and media |
27. Environmental & Wildlife Monitoring
Satellite and sensor tools for investigating environmental crimes and natural events.
Core Functions
| Function |
Description |
Tools |
| Active fire detection |
Near real-time fire/hotspot detection from satellite |
NASA FIRMS (MODIS/VIIRS), Sentinel Hub |
| Deforestation monitoring |
Track forest loss and illegal logging |
Global Forest Watch, Sentinel Hub |
| Illegal fishing detection |
Vessel tracking + fishing activity analysis |
Global Fishing Watch |
| Air quality monitoring |
Real-time pollution and air quality data |
IQAir, PurpleAir, OpenAQ |
| Water quality / oil spill detection |
Satellite-based oil spill and water monitoring |
Sentinel-1 SAR, SkyTruth |
| Climate data analysis |
Historical and projected climate data |
NOAA, ERA5, Climate Reanalyzer |
| Wildlife trade monitoring |
Track illegal wildlife trade online |
TRAFFIC, WWF |
Key Platforms
| Tool |
Description |
License |
| NASA FIRMS |
Fire Information for Resource Management System — 3-hour latency global fire data |
Free |
| Global Forest Watch |
Satellite-based deforestation monitoring and alerts |
Free |
| Global Fishing Watch |
AIS-based fishing activity tracking and vessel monitoring |
Free |
| Sentinel Hub |
ESA Copernicus satellite imagery browser and API |
Free tier available |
| SkyTruth |
Satellite analysis for environmental investigations |
Free / nonprofit |
| AllTrails |
Trail and outdoor area mapping (useful for geolocation) |
Free |
28. Wireless & Signals Intelligence
Tools for investigating wireless networks, cell towers, and radio frequency data.
Core Functions
| Function |
Description |
Tools |
| WiFi network geolocation |
Map physical location of WiFi access points by SSID/BSSID |
WiGLE |
| Cell tower mapping |
Locate and map cell tower positions |
OpenCellID, CellMapper |
| IMSI catcher detection |
Detect fake base stations / Stingrays |
AIMSICD, SnoopSnitch, Rayhunter |
| Bluetooth device tracking |
Discover and track Bluetooth devices in an area |
nRF Connect, Bluetana |
| Radio frequency monitoring |
Monitor radio traffic (ADS-B, marine VHF, ham) |
RTL-SDR, SDR++ |
| Wardriving |
Systematic scanning and mapping of wireless networks |
WiGLE apps, Kismet, Pwnagotchi |
Key Tools
| Tool |
Description |
License |
| WiGLE |
Global WiFi/cell network map and lookup (billions of networks) |
Free for non-commercial |
| OpenCellID |
Open cell tower location database |
Free (community-contributed) |
| CellMapper |
Cell tower mapping with carrier identification |
Free |
| Kismet |
Wireless network detector, sniffer, and IDS |
Open Source |
| RTL-SDR |
Software-defined radio for monitoring ADS-B, marine, ham frequencies |
Open Source (hardware ~$25) |
29. Cyber Threat Intelligence
Tools for investigating threats, malware, indicators of compromise, and adversary infrastructure.
Core Functions
| Function |
Description |
Tools |
| File/URL reputation check |
Scan files and URLs against 70+ antivirus engines |
VirusTotal |
| IoC enrichment |
Enrich IP, domain, hash indicators with threat context |
VirusTotal, AlienVault OTX, IOCLens |
| Threat actor tracking |
Monitor known APT groups and campaigns |
MITRE ATT&CK, APT Watch |
| Malware sandbox analysis |
Detonate files in isolated environments |
Any.Run, Joe Sandbox, Hybrid Analysis |
| Threat feed aggregation |
Consolidate threat intelligence from multiple sources |
MISP, OpenCTI, APT Watch |
| IP/domain reputation |
Score IP addresses and domains for malicious activity |
AbuseIPDB, GreyNoise, Shodan |
| Phishing detection |
Identify phishing domains and campaigns |
PhishTank, URLScan.io, CheckPhish |
Key Tools
| Tool |
Description |
License |
| VirusTotal |
File/URL/domain/IP analysis against 70+ AV engines; largest threat reputation database |
Freemium (API commercial) |
| AlienVault OTX |
Open Threat Exchange — community-contributed threat indicators |
Free |
| MISP |
Open-source threat intelligence sharing platform with 100+ expansion modules |
Open Source |
| OpenCTI |
Cyber threat intelligence platform with STIX2 data model |
Open Source |
| AbuseIPDB |
IP address abuse reporting and reputation database |
Freemium |
| GreyNoise |
Internet-wide scanner data — distinguish threats from noise |
Freemium |
| URLScan.io |
URL inspection and phishing detection |
Free |
| Any.Run |
Interactive malware sandbox |
Freemium |
| Hybrid Analysis |
Free malware analysis sandbox (by CrowdStrike) |
Free |
| APT Watch |
Aggregated IoCs from OSINT — 5,700+ IPs, 1.5M+ domains |
Open Source |
| IOCLens |
Browser extension for instant IoC enrichment from 9+ sources |
Free |
| MITRE ATT&CK |
Adversary tactics, techniques, and procedures knowledge base |
Free |
30. OSINT Automation Frameworks
Platforms that automate multi-source intelligence collection into unified workflows.
Core Functions
| Function |
Description |
Tools |
| Automated reconnaissance |
Scan a target (domain, email, IP) across hundreds of sources |
SpiderFoot, BBOT, Recon-ng |
| Visual link analysis |
Map relationships between entities from multiple data sources |
Maltego |
| Module-based architecture |
Extensible plugins for different data sources and analysis |
All major frameworks |
| Result correlation |
Cross-reference findings across multiple sources automatically |
SpiderFoot, Maltego |
| Report generation |
Generate structured reports from investigation findings |
Maltego, Maigret, SpiderFoot |
| API orchestration |
Coordinate queries across multiple OSINT APIs |
Recon-ng, BBOT |
| Scheduled monitoring |
Continuous monitoring for changes or new data |
SpiderFoot (HX), custom tools |
Key Frameworks
| Framework |
Description |
Modules |
License |
| Maltego |
Visual link analysis + entity transforms |
200+ transforms; 1B+ identities; 200M+ company records |
Commercial (free CE tier) |
| SpiderFoot |
Automated OSINT collection and correlation |
300+ modules; active/passive modes; web + CLI |
Open Source (HX commercial) |
| Recon-ng |
Metasploit-style reconnaissance framework |
Modular plugins; API integration; SQLite workspaces |
Open Source |
| BBOT |
Recursive modular OSINT framework |
80+ modules; subdomain enum, port scan, web scraping |
Open Source |
| Spiderfoot HX |
Hosted version of SpiderFoot with dashboards and scheduling |
300+ modules; adversarial AI detection |
Commercial |
| sn0int |
Semi-automatic OSINT framework with Rust-based engine |
Package manager for modules; sandbox execution |
Open Source |
| Photon |
Fast web crawler for OSINT data extraction |
URLs, emails, files, social media |
Open Source |
2026 Trends in OSINT Automation
- AI-powered transforms — Maltego 2026 uses NLP to infer hidden connections between entities
- Adversarial AI detection — SpiderFoot flags AI-generated content in scraped data
- STIX/TAXII integration — Frameworks increasingly support standardized threat intelligence sharing
- GraphQL APIs — Modern frameworks expose investigation data via GraphQL for custom frontends
- Multi-agent orchestration — Emerging pattern of coordinating multiple specialized agents (IntellyWeave, Crucix)
31. OSINT Training, Communities & Meta-Resources
Where investigators learn techniques and stay current on tools.
Training & Education
| Resource |
Description |
Access |
| Bellingcat |
Open-source investigation methodology, case studies, how-to guides |
Free |
| SANS SEC497 |
OSINT in the enterprise — formal training course |
Commercial |
| OSINT Curious |
Free training, webcasts, and community events |
Free |
| TraceLabs |
CTF-style events for missing persons investigations (ethical OSINT) |
Free |
| Sector035 OSINT Newsletter |
Weekly OSINT tool and technique roundup |
Free |
| OSINT Newsletter |
Tools, techniques, and investigations weekly digest |
Free |
| The OSINT Vault |
Investigation workflows, bookmarklets, multi-search launchers |
Free |
| NixIntel |
Geolocation challenges and OSINT technique deep-dives |
Free |
| Toddington International |
Comprehensive OSINT training and resources |
Commercial |
Key Communities
| Community |
Platform |
Focus |
| r/OSINT |
Reddit |
General OSINT discussion and tools |
| OSINT Curious Discord |
Discord |
Community chat, challenges, events |
| Trace Labs |
Multiple |
Missing persons CTF competitions |
| Bellingcat Discord |
Discord |
Investigation collaboration |
| OSINT Team |
Medium/Blog |
Tools reviews and tutorials |
| IntelTechniques |
Web |
Michael Bazzell’s OSINT methodology (privacy-focused) |
Bookmarklet & Browser Tools
| Tool |
Description |
Use Case |
| OSINT Vault Multi-Search |
Launch 80+ platform searches from single query |
Rapid username/domain/email investigation |
| Hunchly |
Auto-capture every page visited with timestamps |
Evidence preservation during browsing |
| InVID Verification Plugin |
Right-click image/video verification |
Quick media authentication |
| Wappalyzer |
Browser extension for website technology detection |
Identify CMS, frameworks, analytics |
| Shodan Browser Extension |
See Shodan data for any website you visit |
Quick infrastructure check |
Part III — Analysis & Upgrade Planning
32. Reference Platforms (Full-Stack Investigative)
Major platforms that integrate multiple categories above into unified workflows.
Tier 1: Purpose-Built for Investigative Journalism
| Platform |
Developer |
License |
Key Differentiator |
| OCCRP Aleph |
OCCRP |
Open Source (MIT) |
Cross-referencing against hundreds of datasets; FollowTheMoney data model; network/timeline visualization |
| ICIJ Datashare |
ICIJ |
Open Source |
Local-first privacy; multi-NER pipeline; batch search API; team collaboration |
| Google Pinpoint |
Google News Initiative |
Free (verified journalists) |
Audio transcription; handwriting OCR; table-to-spreadsheet; 200K docs/collection |
| DocumentCloud |
MuckRock Foundation |
Open Source |
6.9M+ public documents; annotation; embedding API; AI OCR add-ons |
| Open Semantic Search |
opensemanticsearch.org |
Open Source (GPL) |
Full-stack ETL + search + knowledge graph; thesaurus/ontology support; faceted exploration |
Tier 2: AI-Native Investigation Platforms (2024-2026)
| Platform |
License |
Key Differentiator |
| Trove |
Commercial |
RAG + Claude/Gemini; financial extraction; network visualization; natural language Q&A |
| Arkham Mirror |
Open Source |
Air-gapped/local-first; offline RAG; contradiction detection; knowledge graph |
| IntellyWeave |
Open Source |
GLiNER entities; Mapbox geo; multi-agent reasoning; hypothesis-driven |
| Presswork.ai |
Commercial |
FOIA generation; source CRM; deep research engine; fact-check assist |
| Crucix |
Open Source |
27 real-time data sources; cross-source correlation; Telegram/Discord alerts |
Tier 3: General OSINT Platforms Used by Journalists
| Platform |
Type |
Key Capability |
| Maltego |
Commercial (free CE tier) |
200+ data transforms; 1B+ identities; link analysis |
| i2 Analyst’s Notebook |
Commercial |
30-year industry standard; visual intelligence analysis |
| SpiderFoot |
Open Source |
200+ recon modules; automated scans |
| Hunchly |
Commercial |
Automatic evidence capture; court-ready packaging |
| OpenSanctions |
Open Data (free for journalism) |
2.1M+ entities; sanctions + PEP + corporate data |
33. Feature Gap Matrix: OSS vs. Ecosystem
How Open Semantic Search’s current capabilities compare to the investigative journalism ecosystem, organized for upgrade planning.
Legend
- Strong = OSS has robust implementation
- Basic = OSS has partial/dated implementation
- Gap = Feature absent from OSS; present in peer platforms
- Emerging = Category emerging in 2024-2026; no peer has mature implementation
| # |
Capability Area |
OSS Current State |
Peer Benchmark |
Priority Signal |
| 1 |
Multi-format document ingestion |
Strong (Tika + ETL) |
Aleph, Datashare |
Maintain |
| 2 |
OCR (printed text) |
Strong (Tesseract) |
Google Vision, Textract |
Maintain; consider cloud OCR option |
| 3 |
Handwriting / audio transcription |
Gap |
Pinpoint, Whisper |
High — differentiator for leak analysis |
| 4 |
Full-text search |
Strong (Solr) |
Elasticsearch (Datashare), Solr (OSS) |
Maintain |
| 5 |
Faceted / exploratory search |
Strong |
Aleph, Datashare |
Maintain; modernize UI |
| 6 |
Semantic / embedding search |
Gap |
Trove, Arkham Mirror, IntellyWeave |
Critical — table-stakes for 2026 |
| 7 |
Natural language Q&A (RAG) |
Gap |
Trove, Pinpoint, Presswork |
Critical — highest user demand |
| 8 |
Named Entity Recognition |
Basic (spaCy) |
GLiNER, CoreNLP, multilingual models |
Upgrade — add more entity types, improve accuracy |
| 9 |
Entity linking / disambiguation |
Basic (Entity Search API) |
Aleph FtM, Wikidata linking |
Upgrade |
| 10 |
Knowledge graph (Neo4j) |
Basic (integration exists) |
Aleph FtM, Arkham Mirror |
Upgrade — richer graph model, better UI |
| 11 |
Network visualization |
Basic (Cytoscape.js) |
Maltego, i2, Aleph diagrams |
Upgrade — interactive, collaborative |
| 12 |
Timeline visualization |
Gap |
Aleph, Trove, TimelineJS |
High — standard investigative feature |
| 13 |
Thesaurus / ontology management |
Strong (SKOS/RDF) |
Unique to OSS |
Maintain — competitive advantage |
| 14 |
Corporate / sanctions data integration |
Gap |
Aleph + OpenSanctions |
High — integrate via API |
| 15 |
FOIA / public records integration |
Gap |
MuckRock, DocumentCloud |
Medium — API integration possible |
| 16 |
Collaboration / team workspaces |
Gap |
Aleph investigations, Datashare server |
Critical — required for team investigations |
| 17 |
Annotation / tagging |
Basic (manual tagging) |
DocumentCloud, Aleph, Hypothesis |
Upgrade |
| 18 |
Redaction with audit trail |
Gap |
Redactable, DocumentCloud |
High — essential for source protection |
| 19 |
Secure document intake |
Gap |
SecureDrop, GlobaLeaks |
Medium — complementary integration |
| 20 |
Web archiving / evidence preservation |
Gap |
Hunchly, Auto Archiver |
Medium — integration with archive APIs |
| 21 |
AI-powered summarization |
Gap |
Trove, Presswork |
Critical — paired with RAG |
| 22 |
Contradiction detection |
Gap |
Arkham Mirror |
Medium-High — unique differentiator |
| 23 |
Automated timeline extraction |
Gap |
Arkham Mirror, Trove |
High |
| 24 |
Cross-dataset entity matching |
Gap |
Aleph cross-referencing |
High |
| 25 |
Data cleaning / record linkage |
Gap |
OpenRefine reconciliation |
Medium — complementary tool |
| 26 |
Transportation tracking integration |
Gap |
ADS-B, MarineTraffic APIs |
Low — niche; API integration |
| 27 |
Crypto / blockchain analysis |
Gap |
Chainalysis |
Low — specialized commercial space |
| 28 |
Geolocation tools integration |
Gap |
Bellingcat toolkit |
Low-Medium — API integration |
| 29 |
Image/video verification |
Gap |
InVID, Forensically |
Low — specialized tools exist |
| 30 |
Modern responsive UI |
Gap (PHP + Django) |
Aleph (React), Trove, Datashare |
Critical — modernize frontend |
| 31 |
REST API / developer ecosystem |
Basic |
Datashare API, Aleph API, DocumentCloud API |
High — enable integrations |
| 32 |
Multi-language support |
Basic (spaCy models) |
New/s/leak (40 languages), Datashare |
Upgrade |
| 33 |
Hypothesis-driven investigation |
Gap |
IntellyWeave |
Emerging |
| 34 |
Multi-agent AI orchestration |
Gap |
IntellyWeave, Crucix |
Emerging |
| 35 |
Real-time data source aggregation |
Gap |
Crucix (GDELT, ACLED, etc.) |
Emerging |
| OSINT Resource Integration (§18-31) |
|
|
|
| 36 |
Username/identity enumeration |
Gap |
Sherlock, Maigret, WhatsMyName |
Low — complementary CLI tools |
| 37 |
Email intelligence pipeline |
Gap |
Hunter.io, Holehe, HIBP API |
Medium — entity enrichment via API |
| 38 |
Domain/infrastructure reconnaissance |
Gap |
Shodan, Censys, DNSDumpster |
Medium — infrastructure context for entities |
| 39 |
Breach data integration |
Gap |
HIBP API, IntelX |
Medium — entity risk scoring from breach exposure |
| 40 |
Public records connectors |
Gap |
PACER/RECAP, CourtListener, EDGAR |
Medium-High — structured data import |
| 41 |
Conflict event data feeds |
Gap |
ACLED, GDELT APIs |
Medium — real-time event ingestion |
| 42 |
Threat intelligence enrichment |
Gap |
VirusTotal API, AbuseIPDB |
Low — niche cybersecurity use case |
| 43 |
OSINT automation orchestration |
Gap |
SpiderFoot, Recon-ng, Maltego |
Low — complementary tools; API integration |
Summary: Top Upgrade Priorities for Open Semantic Search
Based on the gap analysis above, the highest-impact upgrade areas are:
Must-Have (Critical Gaps)
- Semantic / Embedding Search — Move beyond keyword matching to meaning-based retrieval using vector embeddings
- RAG-Powered Natural Language Q&A — Let journalists ask questions in plain English and get cited answers from their document collections
- AI-Powered Summarization — Automatically summarize documents and document sets
- Team Collaboration Workspaces — Private investigation spaces with shared access, annotation, and tagging
- Modern Responsive UI — Replace aging PHP/Django frontend with a modern React/Vue application
High Priority
- Enhanced NER — More entity types (financial, legal), better accuracy, multilingual expansion
- Timeline Visualization — Chronological event plotting extracted from documents
- Richer Knowledge Graph — Upgrade Neo4j integration with better graph models and interactive visualization
- Corporate/Sanctions Data APIs — Integrate OpenSanctions, OpenCorporates, ICIJ Offshore Leaks
- Redaction with Audit Trail — Permanent redaction with PII detection and compliance logging
- Cross-Dataset Entity Matching — Match entities across collections like Aleph does
- REST API Modernization — Full CRUD API enabling third-party integrations and automation
Medium Priority
- Audio/video transcription (Whisper integration)
- Contradiction detection across documents
- Automated timeline extraction from unstructured text
- Web archive integration (Wayback Machine, Archive.today APIs)
- FOIA/public records integration (MuckRock/DocumentCloud APIs)
- Data cleaning/record linkage (OpenRefine-style reconciliation)
Future / Watch
- Hypothesis-driven investigation workflows
- Multi-agent AI orchestration
- Real-time data source aggregation (GDELT, ACLED feeds)
- Blockchain analysis integration
- Transportation tracking integration
OSINT Integration Opportunities (from Part II)
- Public records connectors (PACER/RECAP, CourtListener, EDGAR) — structured data import
- Breach data entity enrichment (HIBP API) — flag entities with known breach exposure
- Email intelligence pipeline (Hunter.io, Holehe) — enrich person entities with email accounts
- Domain infrastructure context (Shodan, Censys) — enrich organization entities with hosting data
- Conflict event data feeds (ACLED, GDELT) — real-time event ingestion for timeline features
This directory was compiled from analysis of: OCCRP Aleph, ICIJ Datashare, Google Pinpoint, DocumentCloud, MuckRock, Bellingcat Online Investigation Toolkit, Maltego, i2 Analyst’s Notebook, OpenSanctions, OpenCorporates, Trove, Arkham Mirror, IntellyWeave, Presswork.ai, Crucix, Hunchly, SecureDrop, GlobaLeaks, OpenRefine, Chainalysis, RadianceFleet, New/s/leak 2.0, Orson AI, Redactable, OSINT Framework, Awesome OSINT, OSINT Tools Library, Sherlock, Maigret, PhoneInfoga, Shodan, Censys, Have I Been Pwned, Intelligence X, WiGLE, NASA FIRMS, ACLED, GDELT, SpiderFoot, Recon-ng, BBOT, VirusTotal, MISP, and the broader OSINT tool ecosystem as of April 2026.