AI Programming Battle Ignites as OpenAI and Anthropic Simultaneously Launch New Models

Deep News
Feb 06

OpenAI released GPT-5.3-Codex on Thursday, describing it as the most capable programming agent to date. Notably, the launch was timed to coincide precisely with Anthropic's release of its flagship model upgrade, Claude Opus 4.6. Industry observers view the simultaneous unveiling as the opening salvo in an "AI programming war"—a high-stakes battle for the enterprise software development market.

Minutes after the model's release, OpenAI's CEO wrote on X:

"I love developing with this model; the feeling of progress far exceeds what benchmark scores show." "It's stunning to see how we used 5.3-Codex to build 5.3-Codex, accelerating our release pace so dramatically. This undoubtedly points to the future direction."

The model's involvement in its own construction is seen as a significant milestone in AI development. According to OpenAI's announcement, the Codex team used an early version of GPT-5.3-Codex to debug its own training process, manage deployment infrastructure, and diagnose test results and evaluations. OpenAI described it as "our first model that played a key role in its own creation process."

GPT-5.3-Codex achieved double-digit leads over Claude on multiple benchmarks. OpenAI stated that the new model showed significant improvements across several industry benchmarks. GPT-5.3-Codex scored 57% on SWE-Bench Pro, an extremely rigorous real-world software engineering evaluation covering four programming languages, with a focus on data contamination resistance and industrially relevant challenges.

The model scored 77.3% on Terminal-Bench 2.0, which measures essential terminal operation capabilities for programming agents, and 64% on OSWorld, a test requiring models to complete productivity tasks in a visual desktop environment, emphasizing "agentic" computer usage.

The Terminal-Bench 2.0 results are particularly striking. According to performance data released Wednesday, GPT-5.3-Codex scored 77.3%, compared to 64.0% for GPT-5.2-Codex and 62.2% for the base GPT-5.2 model—a 13-percentage-point improvement in just one generation. An X platform user noted that this result "completely crushes" Anthropic's Opus 4.6, which reportedly scored 65.4% on the same benchmark.

OpenAI also highlighted that these achievements were realized with significantly improved efficiency: the new model uses less than half the tokens required by the previous generation for equivalent tasks, while inference speed per token increased by over 25%.

The announcement stated:

"It is noteworthy that GPT-5.3-Codex uses fewer tokens than any prior model, enabling users to accomplish more."

Beyond benchmark improvements, OpenAI's positioning of GPT-5.3-Codex is more significant. The company explicitly stated:

"Codex is evolving from an agent that can only write and review code into one that can perform almost any task developers and professionals do on a computer."

This expanded capability includes debugging, deployment, monitoring, writing product requirement documents, editing copy, conducting user research, creating presentations, and analyzing data in spreadsheet applications. The model performed exceptionally on the GDPVal evaluation, a measure released by OpenAI in 2025 to assess a model's ability to complete well-defined knowledge work tasks across 44 professions.

This expansion signals that OpenAI's target extends beyond the developer tools market to the broader enterprise productivity software sector, where established players like Microsoft, Salesforce, and ServiceNow are rapidly embedding AI agents into their platforms.

The shift toward general computing capability also introduces new security considerations. OpenAI stated that GPT-5.3-Codex is its first model classified under the "readiness framework" as having "high capability" for cybersecurity-related tasks, and the first directly trained to identify software vulnerabilities.

OpenAI indicated: "While we have not found conclusive evidence that it can automate cyber attacks end-to-end, we adopted a cautious strategy, deploying the most comprehensive cybersecurity protection system to date." Measures include dual-use safety training, automated monitoring, trusted access mechanisms for advanced capabilities, and an execution pipeline incorporating threat intelligence.

The CEO also emphasized this progress on X:

"This is our first model to reach a 'high' cybersecurity capability level in the readiness framework. We are piloting a trusted access framework and committing $10 million in API credits to accelerate cyber defense."

Additionally, OpenAI is expanding private testing of its security research agent Aardvark and collaborating with open-source maintainers to provide free code repository scans for widely used projects. Citing Next.js as an example, OpenAI mentioned that a security researcher used Codex last week to discover and disclose a relevant vulnerability.

However, the cybersecurity announcements were soon overshadowed by the intensifying rivalry between OpenAI and Anthropic. The significance of Thursday's simultaneous release timing is difficult to appreciate without context.

Anthropic, an AI safety-focused startup founded in 2021 by several former OpenAI researchers, scheduled its major product launch for the same day at 10 AM Pacific Time. It released Claude Opus 4.6, describing it as the "smartest model" with "more cautious planning, longer-lasting agentic task execution, reliable operation in massive codebases, and the ability to find and correct its own errors."

This direct confrontation follows a week of escalating tensions. Anthropic announced it would air a Super Bowl ad mocking OpenAI's recent decision to test ads with free ChatGPT users.

The CEO subsequently issued a rare direct response, calling the ads in a lengthy X post "amusing" but "clearly dishonest."

He wrote:

"We would obviously never run ads the way Anthropic's ad depicts. We're not stupid, and we know users would never accept that." "I suppose this fits Anthropic's typical 'double-speak' style—using a misleading ad to criticize a theoretical, non-existent misleading ad—but the Super Bowl isn't where I expected to see this."

He further characterized Anthropic as an "authoritarian company" that "wants to control how people use AI."

He added:

"Anthropic offers expensive products to the wealthy. More Texans use the free version of ChatGPT than the total number of Claude users in the U.S., so we're dealing with entirely different scales of problems."

Behind the public spat lies a serious business competition. This clash occurs against a backdrop of explosive growth in enterprise AI applications, with both companies vying for a rapidly expanding market.

According to a survey released this week by Andreessen Horowitz, enterprise spending on large language models has far exceeded even the most optimistic previous forecasts. In 2025, average enterprise LLM spending reached $7 million, a 180% increase from the $2.5 million actually spent in 2024 and 56% higher than enterprises' own predictions for 2025 made a year ago. Spending per enterprise is projected to reach $11.6 million in 2026, a further 65% increase.

The data also reveals shifting market dynamics. OpenAI still holds the largest share of enterprise AI spending, but its share is shrinking—from 62% in 2024 to a projected 53% in 2026. Meanwhile, Anthropic's share grew from 14% to a projected 18%, with Google showing similar growth trends.

Usage patterns are more nuanced. While OpenAI leads in overall usage, only 46% of surveyed OpenAI customers use its most powerful model in production environments, compared to 75% for Anthropic and 76% for Google. Including testing environments, 89% of Anthropic's customers are testing or using its top model, the highest rate among major providers.

In software development—the core application scenario for both companies' programming agents—the survey indicates OpenAI holds about 35% market share, while Anthropic commands a substantial and growing portion of the remaining market.

Looking ahead, OpenAI stated that GPT-5.3-Codex is immediately available to paying ChatGPT users across all Codex use cases, including desktop apps, command-line interfaces, IDE extensions, and web access, with an API expected to follow.

The model also introduces a new interactive feature: users can choose between a "pragmatic" or "friendly" personality. The CEO noted strong user preference for this option. More substantively, the model frequently provides progress updates during task execution, allowing users to interact in real time, ask questions, discuss ideas, and steer solutions without losing context.

OpenAI stated: "You no longer need to wait for a final outcome; you can interact in real time. GPT-5.3-Codex explains what it's doing, responds to feedback, and keeps you informed throughout."

The company promised more capabilities in the coming weeks. The CEO stated bluntly: "I believe Codex will win."

Responding to Anthropic, he framed the competition with a philosophical remark:

"This era belongs to builders, not to those who want to control them."

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10