OpenAI Unveils GPT-5.4, Its Most Advanced Professional Model with Native Computer Control

Deep News
Yesterday

OpenAI has launched its new flagship foundation model, GPT-5.4, which is now available in ChatGPT, its API, and the Codex developer tools. The announcement comes just one day after the release of the faster GPT-5.3 Instant model. OpenAI describes GPT-5.4 as its "most capable and efficient professional frontier model to date," specifically targeting enterprise office and complex knowledge work scenarios.

The most significant enhancement in GPT-5.4 is the strengthening of its AI agent capabilities. Through the API and Codex, GPT-5.4 introduces native "computer control" functionality for the first time, enabling agents to execute complex workflows across different software applications. The model can now not only generate text and code but also directly operate computer software, browse the web, and control a mouse and keyboard to complete tasks. It features deep integration with enterprise applications like spreadsheets and financial analysis tools, including Microsoft Excel and Google Sheets.

Within ChatGPT, GPT-5.4 supports a "think-ahead" feature that allows users to adjust the task direction while the model is generating a response. It also shows improvements in deep web search and maintaining context over long, logical conversations.

Industry observers believe these upgrades signal a shift for AI models from being conversational tools towards becoming automated digital agent systems that are further penetrating enterprise productivity software and professional knowledge work.

OpenAI released two versions simultaneously: GPT-5.4 Thinking, which excels at complex reasoning, and the high-performance GPT-5.4 Pro, catering to paying users and high-end enterprise clients, respectively.

In the OSWorld-Verified benchmark for computer control, GPT-5.4 achieved a success rate of 75.0%, surpassing the human baseline of 72.4% and marking a substantial jump from the 47.3% rate of its predecessor, GPT-5.2. Concurrently released financial service suite results showed GPT-5.4's score on an internal OpenAI investment banking benchmark leaped from GPT-5's 43.7% to 88.0%.

Early testers provided positive feedback. Daniel Swiecki, Head of AI Solutions at investment firm Walleye Capital, reported a 30-percentage-point increase in accuracy on internal finance and Excel evaluations with GPT-5.4. Brendan Foody, CEO of AI talent platform Mercor, called it the "best model we have tried to date," noting it ranked first on their APEX-Agents benchmark for professional services work.

**Native Computer Control Breaks New Ground**

The most groundbreaking capability of GPT-5.4 is its native computer control, a first for a general-purpose model. Via the API and Codex, the model can operate a computer like a human, performing multi-step workflows across applications. It can control a computer by writing code using libraries like Playwright or by directly responding to screenshots with mouse and keyboard commands. Developers can also configure custom confirmation strategies for different risk tolerance scenarios.

Benchmark data substantiates this progress: in the WebArena-Verified browser control test, it scored 67.3%, up from GPT-5.2's 65.4%; in the Online-Mind2Web test using only screenshots, it achieved a 92.8% success rate. For web search capability, the BrowseComp test showed a 17-percentage-point improvement over GPT-5.2, with GPT-5.4 Pro setting a new record high score of 89.3%.

Dod Fraser, CEO of real estate tech company Mainstay, reported that in tests involving approximately 30,000 property tax portals, GPT-5.4 achieved a 95% success rate on the first attempt and 100% within three tries. This is a significant improvement over previous computer control models (which had success rates around 73-79%), with tasks completed about three times faster and token consumption reduced by approximately 70%.

**Redesigned Tool Search Mechanism Cuts Token Use**

As the tool ecosystem grows, efficiently managing tool calls has become a bottleneck. GPT-5.4 introduces a "Tool Search" mechanism in the API that changes how tool definitions are handled. Previously, the entire definition of every available tool had to be pre-loaded into the prompt for each request, often adding thousands or tens of thousands of tokens, increasing cost, latency, and diluting context. The new mechanism provides the model with a lightweight list of tools and retrieves the full definition of a tool only when it is actually needed.

OpenAI provided data showing that on a set of 250 tasks using Scale's MCP Atlas benchmark with 36 MCP servers enabled, the Tool Search mode reduced total token usage by 47% while maintaining the same accuracy level compared to exposing all MCP functions directly in the context.

Wade, CEO of Zapier, stated that GPT-5.4 performed excellently across hundreds of advanced, real-world workflow benchmarks at their company, calling it "the most persistent model to date."

**Finance and Enterprise: Deep Excel Integration, Investment Banking Scores Double**

Released alongside GPT-5.4 is an "OpenAI Financial Services" suite for businesses and financial institutions. The core product is ChatGPT for Excel and Google Sheets (beta), which embeds ChatGPT directly into spreadsheet cells to support building, analyzing, and updating complex financial models. The suite also integrates data partners like FactSet, MSCI, Third Bridge, and Moody's, and introduces reusable "Skills" for frequent financial tasks such as earnings previews, comparable company analysis, DCF valuation, and investment memo writing.

On the internal investment banking benchmark, GPT-5.4 Thinking's score jumped to 88.0% from GPT-5's 43.7%. In a test simulating a junior investment banking analyst's spreadsheet modeling tasks, GPT-5.4 scored an average of 87.3%, far exceeding GPT-5.2's 68.4%.

Niko Grupen, Head of Applied Research at legal AI platform Harvey, reported that GPT-5.4 scored 91% on their BigLaw Bench evaluation, stating it "currently outperforms other models in structured complex deal analysis, maintaining accuracy across long documents, and providing the high level of detail required by legal practitioners."

**Knowledge Work and Hallucination Reduction: Matching Professionals**

OpenAI demonstrated GPT-5.4's capabilities on benchmarks measuring real-world workplace output. On the GDPval test, which covers knowledge work tasks across 44 professions, GPT-5.4 met or exceeded the level of industry professionals in 83.0% of comparisons, up from 71.0% for GPT-5.2.

In presentation quality assessments, human reviewers preferred GPT-5.4's output 68.0% of the time, citing superior visual aesthetics, greater visual diversity, and more effective use of generated images.

Regarding hallucination and factual error control, OpenAI stated GPT-5.4 is its "most factually accurate model to date." On a de-identified test set of prompts previously flagged for factual errors, the per-statement error rate was reduced by 33% compared to GPT-5.2, and the probability of any error appearing in a full response decreased by 18%.

In programming, GPT-5.4 performed on par with or better than GPT-5.3-Codex on SWE-Bench Pro, with lower latency across reasoning strength settings. Codex's /fast mode can increase token generation speed for GPT-5.4 by up to 1.5x. Mario Rodriguez, Chief Product Officer at GitHub, highlighted GPT-5.4's strength in logical reasoning and executing complex, multi-step, tool-dependent workflows, calling it "a model enterprises should adopt on day one."

**Two Versions for Different Needs, Context Window Up to 1 Million Tokens**

GPT-5.4 Thinking is designed for general professional scenarios requiring deep reasoning, while GPT-5.4 Pro targets the most complex tasks demanding peak performance.

In ChatGPT, GPT-5.4 Thinking is available starting now for Plus, Team, and Pro users, replacing GPT-5.2 Thinking, which will be retired on June 5, 2026. GPT-5.4 Pro is exclusive to Pro and Enterprise plan users. Free users may have limited access via automatic system routing. Enterprise and Education plan administrators can enable early access.

In the API, GPT-5.4 is available under the `gpt-5.4` identifier, and GPT-5.4 Pro under `gpt-5.4-pro`. Both are accessible in the Codex platform. The maximum API output remains 128,000 tokens. The API and Codex now support a context window of up to 1 million tokens, the largest offered by OpenAI, suitable for planning, executing, and verifying long-chain, multi-step tasks.

**Pricing Higher Than Predecessor, Efficiency Gains Partly Offset Cost**

API pricing for GPT-5.4 is higher than for GPT-5.2. Specifically: - GPT-5.4: Input: $2.50 per million tokens; Output: $15.00 per million tokens (GPT-5.2 was Input: $1.75; Output: $14.00) - GPT-5.4 Pro: Input: $30.00 per million tokens; Output: $180.00 per million tokens (GPT-5.2 Pro was Input: $21.00; Output: $168.00) Batch and Flex pricing enjoys a 50% discount, while Priority processing costs twice the standard rate.

Notably, input exceeding 272,000 tokens in a single request is billed at twice the standard rate. In Codex, the default compression limit is 272,000 tokens, but developers can manually increase this limit, with the higher billing rate applying only to the portion exceeding the limit.

OpenAI justified the higher pricing with three points: 1) superior capabilities on complex tasks like programming, computer control, deep research, advanced document generation, and tool use; 2) significant technological advances from its research roadmap; and 3) a more efficient inference mechanism that consumes fewer reasoning tokens for the same task, partially offsetting the per-token price increase. OpenAI also stated that even with the increase, GPT-5.4's pricing remains lower than that of competing frontier models with comparable capabilities.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10