AI Intelligencer-Why AI's math gold wins matter

Reuters
Jul 24, 2025
AI Intelligencer-Why AI's math gold wins matter

By Krystal Hu

July 23 (Reuters) - (Artificial Intelligencer is published every Wednesday. Think your friend or colleague should know about us? Forward this newsletter to them. They can also subscribe here.)

At the Reuters Momentum AI conference in Silicon Valley last week, I heard two phrases over and over from Fortune 500 executives: "human in the loop" and "flat is the new up."

They reflect a cautious but ambitious strategy: While nearly every company still keeps humans working alongside AI, the early impact is already showing that companies are growing revenue without hiring more people.

What’s changed? The nature of work within organizations. The first cuts are already hitting outsourced labor. Employees are shifting to higher-value work, such as handling complicated tasks and reviewing AI’s output. Revenue per head is on the rise, or as some say, “flat headcount is the new up.”

Despite the narrative that 2025 will be the year of the AI agent, truly agentic workflows still seem distant for complex use cases. In fact, some executives still view AI models as just pattern matchers, not true reasoners.

Researchers at Google and OpenAI would beg to differ, as I learned after speaking with them following both labs’ gold medal wins at this year’s International Mathematical Olympiad. I believe this is an exciting milestone for the reasoning paradigm that AI models are striving to advance. Scroll down to read why this matters.

Email me at krystal.hu@tr.com or follow me on LinkedIn to share any feedback, and what you want to read about next in AI. 

OUR LATEST REPORTING TECH & AI

Exclusive-Blackstone drops out of group bid for TikTok US

White House to unveil plan to push US AI abroad, crackdown on US AI rules

Trump administration seeks pathway for US companies to export AI chips

Nvidia CEO's China charm offensive underscores rock star status in key market

AI models with systemic risks given pointers on how to comply with EU AI rules

TSMC posts record quarterly profit on AI demand, but wary about tariffs

HOW AI WON MATH GOLD

AI crossed a threshold that even caught the best researchers by surprise. For the first time, an AI from Google DeepMind won a gold medal at the International Mathematical Olympiad, the world’s most elite high school math competition.

OpenAI, which did not officially participate in this year’s IMO, said its model also achieved gold-medal performance, based on solutions graded by external experts using IMO guidelines.

While it’s tempting to see this as just another headline in AI’s relentless march, I spent time speaking with the minds behind these models—some of whom are former IMO medalists themselves—to understand how we got here and what these wins reveal about the frontier of AI.

The main takeaway? The reasoning abilities demonstrated by models like DeepMind’s Gemini Pro and OpenAI’s o1 series have endless possibilities. This win is also a testament to the classic recipe for model improvement: high-quality data and huge amounts of compute.

While neither lab revealed the full details of their methods, both demonstrated the power of thinking for longer. Since last year, top AI labs have shifted focus from scaling up pre-training and increasing model sizes to using test-time compute to give models more “thinking time”.

OpenAI described how its model tackled each problem dozens of times simultaneously, using consensus and multi-agent strategies to aggregate the best solutions. DeepMind, meanwhile, employed its “Deep Think” technique, enabling Gemini to explore many solution paths at the same time, synthesize ideas, and generate rigorous, human-readable proofs.

In what researchers dubbed a “paradigm shift,” DeepMind’s AI has gone from needing expert human translation just a year ago to solving five of six IMO problems in natural language this week.

This breakthrough directly challenges the long-held skepticism that AI models are just clever mimics, predicting the next word. Math, requiring multi-step, creative proofs, has become the ultimate test of true reasoning, and AI just passed.

We don’t know exactly how much parallel computation went into solving each question, but OpenAI told us it was “very expensive.” After all, the models were given about 4.5 hours—just like human contestants—to work through each set.

This highlights how today’s most intelligent models demand vast compute resources, helping explain AI labs’ insatiable appetite for chips like Nvidia’s GPUs . And as these methods expand into other domains—coding, science, creative writing—the computational demands will continue to grow.

Both labs also credit their breakthroughs to high-quality data: step-by-step, annotated proofs, not just final answers. DeepMind, in particular, pointed to new reinforcement learning techniques that reward not just correctness, but the elegance and clarity of a proof.

So what does this mean for the future? The “can AI reason?” debate may be settled—at least for domains as challenging as Olympiad mathematics. The ever-growing emergence of true thinking capabilities inside AI models has the potential to transform many domains as researchers crack the code on math and move on to new frontiers.

DeepMind is already working to put its system in the hands of mathematicians and, soon, the wider public. OpenAI says it’s using what it’s learned from this model to train others, but this particular capability won’t be included in the upcoming GPT-5 release this summer.

CHART OF THE WEEK

You’re probably reading this AI newsletter because you’re already an AI user, which will put you in the basket of 61% of Americans who have welcomed AI into their lives. The rest, a solid 39%, remain unconvinced, according to a report from Menlo Ventures .

The top blocker? Good old-fashioned human connection. About 80% of non-adopters say they’d rather deal with a person than a machine, especially for important decisions. In fact, 53% say they want accountability and oversight from another human, not just a digital assistant who always gives instant responses.

Other top hurdles include data privacy worries (71%), skepticism about AI’s usefulness (63%), and a healthy distrust of the information AI serves up (58%). So, while the bots may be ready, the humans are holding out for more trust, transparency, and—let’s face it—a bit more humanity.

Reasons consumers are not using AI https://www.reuters.com/graphics/AI-INTELLIGENCER/mopadjaonva/chart.png

(Reporting by Krystal Hu; Editing by Lisa Shumaker)

((krystal.hu@thomsonreuters.com, +1 917-691-1815))

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10