American scientist Gary Marcus on Oracle founder Larry Ellison's pointing to AI models 'data training problem'; says: I had warned them 2 years ago that...

American scientist Gary Marcus isn't letting Larry Ellison have his moment. When the Oracle founder recently argued that AI models from OpenAI , Google , Meta, and xAI are all turning into commodities because they train on the same public internet data, Marcus pounced. His message, posted on X, was blunt: he called this exact outcome two years ago, and Silicon Valley simply wasn't listening. "i warned these guys of exactly this problem - no moat because everyone is training on same data - two years ago," the NYU Professor Emeritus wrote. He added that the industry's refusal to hear out anyone beyond its own bubble is "gonna cost them a truly enormous amount of money."

Larry Ellison thinks the next AI goldmine is locked inside private dataEllison's argument came during Oracle's fiscal Q2 2026 earnings call in December. The CTO laid out a simple thesis: every major large language model is trained on the same scraped internet data, so they all end up roughly the same. "They're all basically the same. And that's why they're becoming commoditized so quickly," he said. His pitch was that the next big wave won't be about building better base models. It'll be about feeding AI proprietary enterprise data while keeping it locked down and secure.

Oracle is putting real money behind that bet. The company is projecting around $50 billion in capital expenditures for the year, up from a $35 billion estimate just three months earlier. Its argument is that most valuable private data already sits inside Oracle databases, and its AI Data Platform uses techniques like Retrieval-Augmented Generation to let models pull from that data in real time without security trade-offs.

Marcus called the price wars and the 'no moat' problem back in 2024For Marcus, this is vindication he's been waiting for. Back around March 2024, he predicted a crowded field of GPT-4-level models, brutal price wars, no real moat, and stubborn reliability problems that scaling alone wouldn't fix. Much of that played out. Claude, Gemini, Llama, Grok, and open-source challengers like DeepSeek now cluster tightly on benchmarks, while API prices have collapsed and profits stay elusive for everyone except chipmakers like Nvidia.

Marcus has long pushed alternatives to brute-force scaling, favouring neurosymbolic systems that blend pattern recognition with logic and structured reasoning. He's also been vocal about what he sees as reckless capital spending, once branding the AI capex spree one of the biggest misallocations of money in history.

He's not fully writing off Ellison's idea, though. Marcus concedes proprietary data helps "in some narrow use cases," just not as the broad rescue the industry hopes for. Critics counter that real moats may lie elsewhere anyway, in distribution, brand, safety, or the user-interaction data flowing through consumer apps rather than static enterprise records.