The Hidden Blueprint: How Kai Wu Mined Buffett’s “Unknown Knowns” at the NY Public Library (and Why AI Couldn’t)

How Kai Wu mined Warren Buffett’s ‘unknown knowns’ in NYPL archives to decode his 50-year system—and why AI couldn’t. The repeatable blueprint revealed.


Unknown knowns - AI's kryptonite

Warren Buffett’s legendary 50+ year track record at Berkshire Hathaway is often seen as unrepeatable – a product of his unique genius, intuition, and once-in-a-lifetime access to market inefficiencies.

But what if the core of his success wasn’t magic, but a system?

That’s the groundbreaking question Dr. Kai Wu, Founder & CIO of Sparkline Capital, set out to answer. His tool? Not complex algorithms or big data, but something far more analog: decades of dusty microfilm and physical archives at the New York Public Library.

Wu’s research unearthed what are called unknown knowns – critical, publicly available information about Buffett’s investments that existed physically but was never digitized, effectively hiding it from modern quantitative analysis and AI. By painstakingly reconstructing Buffett’s historical portfolio decisions from primary sources, Wu identified key repeatable factors that powered the Oracle’s success.

Here’s what he found and why it matters:

  1. The “Unknown Knowns”: Buried Treasure in Plain Sight
  2. Wu’s Findings: The Repeatable Buffett Factors
  3. Why AI Couldn’t Have Made This Discovery (Yet): The Data Chasm
  4. The Lesson: Data Isn’t Dead, It’s Just Sleeping (in Libraries)
Watch the short video https://youtu.be/q8ccaUWwlXk

The “Unknown Knowns”: Buried Treasure in Plain Sight

  • What They Were: Physical copies of:
    • Berkshire Hathaway Annual Reports (Pre-Internet): Early editions contained granular details later omitted or summarized online.
    • Local Newspapers & Trade Journals: Where Buffett first found ideas (e.g., The Buffalo News for his Buffalo Evening News investment).
    • SEC Filings (Microfiche): Original filings revealing precise timing, prices, and rationale often lost in digital databases.
    • Specialty Investment Publications (Physical Copies): Like Moody’s Manuals – Buffett’s go-to source for deep financials.
  • Why They Were “Unknown”: This data existed but was trapped in physical form. Digitization efforts skipped vast troves of niche publications, local papers, and early report nuances. Search engines and AI couldn’t access it. It was public, yet practically invisible to modern tech.

Wu’s Findings: The Repeatable Buffett Factors

By analyzing this reconstructed history, Wu identified specific, quantifiable characteristics Buffett consistently favored before his investments became famous (and expensive):

  1. High Profitability (ROE/ROA): Buffett consistently targeted companies with sustained high returns on equity and assets, far above industry averages. This signaled durable competitive advantages before they were widely recognized.
  2. Conservative Financing (Low Debt/High Earnings Quality): A strong preference for companies with low financial leverage, high earnings retention, and “clean” accounting (low accruals). This minimized risk and ensured cash flow reliability.
  3. Value (But Not Deep Value): While Buffett is labeled a value investor, Wu found he rarely bought the absolute cheapest (deep value) stocks. Instead, he bought high-quality companies trading at reasonable prices relative to their robust earnings and assets.
  4. Shareholder Alignment (High Insider Ownership/Buybacks): Buffett favored companies where management had significant skin in the game (high insider ownership) and returned capital rationally (consistent buybacks when shares were undervalued). This aligned incentives and boosted per-share value.
  5. Predictability & Stability: Investments exhibited stable earnings, low volatility in returns, and strong market positions within predictable industries. This allowed for confident long-term holding.

Crucially, Wu demonstrated that a portfolio systematically constructed using these specific factors (high profitability, low leverage, reasonable price, alignment, stability) during Buffett’s era would have closely replicated Berkshire’s actual returns. This strongly suggests Buffett’s success stemmed from the disciplined application of a repeatable quality-value framework, not just stock-picking genius.

Why AI Couldn’t Have Made This Discovery (Yet): The Data Chasm

Wu’s breakthrough relied entirely on accessing the “unknown knowns” – the physical data graveyard. Here’s why AI, as it exists today, would have failed:

  1. The Digitization Gap: AI requires digital data. The critical local newspapers, early Moody’s manuals, nuanced original annual reports, and microfiche filings Wu used simply aren’t available in machine-readable formats. AI has no access to this foundational layer.
  2. Contextual Understanding: Buffett’s rationale was often buried in subtle phrasing in local news reports or footnotes in physical reports. AI struggles with extracting nuanced meaning, sarcasm, or local context from scanned text or microfilm images, especially with varying print quality.
  3. Connecting Disparate Physical Sources: Wu’s genius was cross-referencing physical sources (e.g., finding a mention of Buffett buying a local company in a small-town paper, then locating the specific SEC filing on microfiche). AI lacks the ability to navigate disparate physical archives intuitively.
  4. Handling “Messy” Physical Data: Microfilm degrades, newspapers tear, handwritten notes exist. AI models trained on clean digital data struggle immensely with the noise, damage, and non-standard formats inherent in decades-old physical archives.
  5. Identifying “What Was Known When”: To truly replicate Buffett’s decision point, you need to know exactly what information he had access to at the timeOnly physical archives preserve the exact state of public knowledge on a specific date in the pre-digital era. Digital databases often consolidate or backfill information.

The Lesson: Data Isn’t Dead, It’s Just Sleeping (in Libraries)

Kai Wu’s work is a masterclass in fundamental research rigor. It proves that:

  • Buffett’s edge was systematic: It wasn’t magic; it was the disciplined application of identifiable quality-value factors.
  • “Unknown Knowns” are a major market inefficiency: Vast amounts of valuable historical data remain trapped in physical form, inaccessible to the algorithms dominating modern markets.
  • Human ingenuity still has the edge (for now): When it comes to unearthing deeply buried, context-rich, unstructured historical data, a determined researcher with a library card can still outflank the most powerful AI. The NYPL was Wu’s “moat” in this analysis.

The Future: Wu’s research doesn’t diminish Buffett; it demystifies him and shows his core strategy can be learned and systematized. It also highlights a massive opportunity: the digitization and structuring of these “unknown knowns” could be the next frontier for quantitative investing. Until then, the ghosts of market wisdom past still whisper from the pages stored in libraries, waiting for the next curious mind to listen.

Inspired by: The research of Kai Wu, particularly his papers “Applying the Buffett System” and the concept explored in his Sparkline Capital writings.

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading