Discover deals from public filings
An open-source engine that scans SEC EDGAR, clusters related filings into M&A transactions, and extracts structured deal facts using LLMs. Every data point traced back to its source filing.
From 15.5 million filings to structured deal records
The pipeline processes SEC EDGAR at scale, grouping filings into transactions and extracting facts that trace back to their source document.
Discover
Full-text search across EDGAR to identify filings related to mergers, acquisitions, and tender offers
Cluster
Group filings by entity, counterparty, and time window into discrete deal clusters
Extract
LLM reads each filing cluster and extracts terms, advisors, timelines, and deal structure
Cite
Every extracted fact is linked to the filing and passage it came from — fully traceable
What this proves
This project demonstrates that SEC filings contain enough signal to reconstruct deal-level data without proprietary databases or manual collection.
Filing-native discovery
Deals are found by clustering public filings — not by scraping news or relying on announcement databases. The filing is the primary source.
LLM-powered extraction
Large language models extract structured deal terms directly from filing text. No templates, no regex, no manual data entry.
Generalizable approach
The same pipeline architecture — discover, cluster, extract, cite — works for any SEC filing type, not just M&A.
See it in action
Browse the extracted dataset, explore deal timelines, or check advisor league tables.