SCSECluster

Search

Search for deals, companies, or navigate pages

SEC-native M&A discovery engine

Discover deals from public filings

An open-source engine that scans SEC EDGAR, clusters related filings into M&A transactions, and extracts structured deal facts using LLMs. Every data point traced back to its source filing.

From 15.5 million filings to structured deal records

The pipeline processes SEC EDGAR at scale, grouping filings into transactions and extracting facts that trace back to their source document.

01

Discover

Full-text search across EDGAR to identify filings related to mergers, acquisitions, and tender offers

02

Cluster

Group filings by entity, counterparty, and time window into discrete deal clusters

03

Extract

LLM reads each filing cluster and extracts terms, advisors, timelines, and deal structure

04

Cite

Every extracted fact is linked to the filing and passage it came from — fully traceable

What this proves

This project demonstrates that SEC filings contain enough signal to reconstruct deal-level data without proprietary databases or manual collection.

Filing-native discovery

Deals are found by clustering public filings — not by scraping news or relying on announcement databases. The filing is the primary source.

LLM-powered extraction

Large language models extract structured deal terms directly from filing text. No templates, no regex, no manual data entry.

Generalizable approach

The same pipeline architecture — discover, cluster, extract, cite — works for any SEC filing type, not just M&A.

See it in action

Browse the extracted dataset, explore deal timelines, or check advisor league tables.