Recently, OpenAI co-founder and president Greg Brockman shared his latest insights on key AI topics including AI technology development bottlenecks and the relationship between research and engineering at the World AI Engineers Conference. As an industry veteran who entered AI in 2015, when responding to questions about GPT-6 development challenges, Brockman made a crucial observation: as computing power and data scale rapidly expand, fundamental research is making a comeback, with algorithms once again becoming prominent as a key bottleneck for future AI technology development.
For Brockman, this isn't necessarily a bad thing. He feels that constantly revolving around the classic "Attention is All You Need" paper and Transformers has become somewhat tedious, intellectually leaving people feeling "unsatisfied." Currently, reinforcement learning has become one of the new directions in algorithmic research, but he also recognizes that many obvious capability gaps still exist.
▲Greg Brockman (right) with the host (left)
Engineering and research are the two major engines driving AI development. Coming from an engineering background, Brockman believes that engineers' contributions are on par with researchers, and even more important in some aspects. Without research innovation, there would be nothing to do; without engineering capabilities, those ideas cannot be realized.
OpenAI has insisted from the beginning on treating engineering and research equally, though the two have different thinking approaches. For new engineers joining OpenAI, Brockman's first lesson is: maintain technical humility, because methods that work effectively at traditional internet giants may not necessarily apply at OpenAI.
Resource coordination between products and research is also a problem OpenAI frequently faces. Brockman admitted in the interview that to support the massive computing power demands of product launches, OpenAI has had to borrow some computing power that should have been used for research in what amounts to almost "mortgaging the future." However, he believes this trade-off is worthwhile.
Brockman also reviewed his childhood interest in mathematics, his transition to programming, transferring from Harvard to MIT, and eventually dropping out to join fintech startup Stripe. Due to space limitations, the transcript doesn't include this portion of content.
At the end of the interview, Brockman answered two questions from NVIDIA founder and CEO Jensen Huang, addressing the future form of AI infrastructure and the evolution of development processes.
Greg Brockman's interview was recorded in June this year. The following is an organization of some highlights (with some additions and deletions made while preserving the original meaning):
**1. Engineers and Researchers Are Equally Important, First Lesson at OpenAI Is Technical Humility**
Host: In 2022, you said it's time to become a machine learning engineer, that great engineers can contribute to future progress at the same level as great researchers. Does this still hold true today?
Greg Brockman: I think engineers make contributions comparable to researchers, or even greater. Initially, OpenAI was a group of PhD-graduated research scientists proposing ideas and testing them, with engineering being essential to this research. AlexNet was essentially the engineering of "implementing fast convolutional kernels on GPUs." Interestingly, people in Alex Krizhevsky's lab at the time actually dismissed this research, thinking AlexNet was just fast kernels for some image dataset and wasn't important. But Ilya said: "We can apply this to ImageNet. It will definitely work well." This decision combined great engineering with theoretical innovation.
I think my previous view still holds today. The engineering needed by the industry now isn't just building specific kernels, but constructing complete systems, scaling them to 100,000 GPUs, building reinforcement learning systems, and coordinating relationships between various components. Without innovative ideas, there's nothing to do; without engineering capabilities, those ideas cannot be realized. What we need to do is harmoniously combine both aspects.
The relationship between Ilya and Alex symbolizes research-engineering collaboration, and this collaboration is now OpenAI's philosophy. OpenAI has believed from the beginning that engineering and research are equally important, and both teams need to work closely together.
The relationship between research and engineering is also a problem that can never be completely solved. After solving problems at the current level, we face more complex problems. I notice that the problems we encounter are basically the same as those faced by other labs, except we might go further or encounter some different variants. I think there are some fundamental reasons behind this.
Initially, I clearly felt that people with engineering backgrounds and research backgrounds have very different understandings of system constraints. As an engineer, you think: "If the interface is determined, I don't need to care about the implementation behind it. I can implement it any way I want." But as a researcher, you think: "If any part of the system goes wrong, all I see is slightly degraded performance, no error prompts, and I don't know where the error is. I must be responsible for the entire code."
Unless the interface is very solid and completely trustworthy—which is a high standard—researchers must be responsible for this code. This difference often creates friction.
I once saw in an early project that after engineers wrote code, researchers would extensively discuss every line, making progress extremely slow. Later, we changed our approach. I directly participated in the project, proposing five ideas at once. Researchers would say four of them wouldn't work, and I felt this was exactly the feedback I wanted.
The greatest value we realized, and what I often emphasize to new OpenAI colleagues from the engineering world, is technical humility. You bring valuable skills here, but this is an environment completely different from traditional internet startups. Learning to distinguish when you can rely on original intuition and when you need to set it aside isn't easy.
Most importantly, maintain humility, listen carefully, and assume there are still things you don't understand until you truly understand the reasons. Only then should you change the architecture and adjust abstraction layers. Truly understanding and doing things with this humility is the key factor determining success or failure.
**2. Some Research Computing Power Diverted to Products, OpenAI Sometimes Must "Mortgage the Future"**
Host: Let's talk about some of OpenAI's recent major releases and share some interesting stories. One particularly noteworthy aspect is the scalability issue—at different orders of magnitude, everything can collapse. When ChatGPT was released, it attracted 1 million users in just five days; when this year's 4.0 ImageGen was released, it similarly reached 100 million users within five days. How do these two phases compare?
Greg Brockman: They're similar in many ways. ChatGPT was originally just a low-key research preview that we quietly released, but system crashes quickly occurred. We expected it to be popular, but thought we'd need to wait until GPT-4 to truly achieve this level of enthusiasm. Internal colleagues had already been exposed to it, so they weren't amazed. This is also a characteristic of this field—the update pace is very fast. You might have just seen "this is the most magical thing I've ever seen," and the next moment think: "Why can't it merge 10 PRs (pull requests) at once?"
ImageGen's situation was similar. After release, it was extremely popular, with incredible spread speed and user growth. To support these two releases, we even broke convention by diverting some computational resources from research for product launches. This amounts to "mortgaging the future" to make systems work properly, but if we can deliver on time and meet demand, letting more people experience the magic of technology, this trade-off is worthwhile.
We consistently adhere to the same philosophy—providing the best user experience, advancing technology, creating unprecedented results, and doing everything possible to bring them to the world and achieve success.
**3. AI Programming Beyond "Showing Off," Moving Toward Serious Software Engineering**
Host: "Vibe coding" has now become a phenomenon. What's your view on it?
Greg Brockman: Vibe coding as an empowerment mechanism is very magical and represents future development trends. Its specific forms will continue changing over time. Even with technologies like Codex, our vision is: when these agents are truly deployed, it won't be just one or ten copies, but potentially hundreds, thousands, or even 100,000 agents running simultaneously.
You'll want to collaborate with them like colleagues—they run in the cloud and can connect to various systems. Even when you're sleeping or your laptop is off, they can continue working. Currently, people generally view vibe coding as an interactive loop, but this form will change. Future interactions will increase, while agentic AI will intervene and transcend this model, thereby driving more system construction.
An interesting phenomenon is that many vibe coding demonstrations focus on creating interesting applications or prank websites and other "cool" projects, but what's truly novel and transformative is that AI has begun to be able to transform and deeply penetrate existing applications. Many companies dealing with legacy codebases need migration, library updates, converting old languages like COBOL to modern languages—this is both difficult and tedious, and AI is gradually solving these problems.
Vibe coding starts with "making cool applications," but it's evolving toward serious software engineering—especially in the ability to deeply penetrate existing systems and make improvements. This will enable companies to develop faster, and this is the direction we're heading.
Host: I heard Codex is somewhat like a "child you raised yourself" for you. From the beginning, you emphasized making it modular and well-documented. How do you think Codex will change the way we program?
Greg Brockman: Saying it's my "child" is a bit of an overstatement. I have an excellent team that I've been working hard to support along with their vision. This direction is both fascinating and full of potential.
The most interesting point is that the structure of codebases determines how much value you can get from Codex. Existing codebases are mostly designed to leverage human strengths, while models are better at handling diverse tasks and don't connect concepts as deeply as humans do. If systems could better match model characteristics, results would be better.
The ideal approach is: break code into smaller modules, write fast, runnable, high-quality tests, then have models fill in the details. Models will run tests themselves and complete implementation. Connections between components (architecture diagrams) are relatively easy to build, while detail filling is often most difficult.
This approach sounds like good software engineering practice, but in reality, because humans can handle more complex conceptual abstractions in their minds, they often skip this step. Writing and perfecting tests is a heavy task, while models can run 100 or even 1000 times more tests than humans, thereby taking on more work.
In a sense, we want to build codebases more like those designed for junior developers to maximize model value. Of course, as model capabilities improve, whether this structure remains optimal will be an interesting question.
The benefit of this approach is that it aligns with practices humans should follow for maintainability. The future of software engineering may need to reintroduce practices we abandoned to take shortcuts, thereby allowing systems to deliver maximum value.
**4. Training Systems Increasingly Complex, Checkpoint Design Needs Simultaneous Updates**
Q: The tasks we're now executing often take longer, occupy more GPUs, and have low reliability, frequently failing and causing training interruptions. This is well known. However, you mentioned that you can restart a run, which is fine. But when you need to train agents with long-term trajectories, how do you handle this? Because if trajectories themselves are non-deterministic and already halfway through, it's difficult to truly restart from the beginning.
Greg Brockman: As model capabilities improve, you constantly encounter new problems, solve them, then face new challenges. When runtime is short, these problems aren't significant; but if tasks need to run for days, you must seriously consider details like how to save state.
Simply put, as training system complexity increases, these types of problems must be taken seriously. A few years ago, we mainly focused on traditional unsupervised training, where saving checkpoints was relatively simple, but even so, it wasn't easy. If you want to move from "occasionally saving checkpoints" to "saving at every step," you must seriously consider how to avoid data duplication, blocking, and other problems.
In more complex reinforcement learning systems, checkpoints remain important, such as saving caches to avoid redundant calculations. Our system has an advantage: language model states are relatively clear and easy to store and process. But if connected external tools themselves have states, smooth recovery after interruption may not be possible. Therefore, checkpoint mechanisms for the entire system need to be planned end-to-end.
Perhaps in some situations, interrupting and restarting systems, allowing some fluctuation in result curves, is acceptable because models are smart enough to handle such situations. A new feature we plan to launch allows users to take over virtual machines, save their states, and then resume operation.
**5. Building AGI Isn't Just Software, Requires Simultaneously Building Supercomputers**
Jensen Huang: I wish I could ask you questions in person. In this new world, data center workloads and AI infrastructure will become extremely diverse. On one hand, some agents conduct deep research, responsible for thinking, reasoning, and planning, requiring substantial memory; on the other hand, some agents need to respond as quickly as possible. How do you build AI infrastructure that can efficiently handle large amounts of prefill tasks, large amounts of decode tasks, and workloads in between, while also meeting the needs of those requiring low-latency, high-performance multimodal vision and speech AI? These AIs are like your R2-D2 (robot from Star Wars), or your always-available companion. These two types of workloads are completely different: one is super compute-intensive and may run for a long time; the other requires low latency. What would ideal future AI infrastructure look like?
Greg Brockman: Of course, this requires a lot of GPUs. If I were to summarize, Jensen wants me to tell him what kind of hardware to build. There are two types of needs: one is long-term, large-scale computing needs, the other is real-time, instant computing needs. This is indeed difficult because it's a complex co-design problem.
I come from a software background. We initially thought we were just developing AGI (Artificial General Intelligence) software, but quickly realized that to achieve these goals, we must build large-scale infrastructure. If we want to create systems that truly change the world, we may need to build the largest computer in human history, which is reasonable to some extent.
A simple approach is to indeed need two types of accelerators: one pursuing maximum computing performance, the other pursuing extremely low latency. Stack lots of high-bandwidth memory (HBM) on one type, lots of compute units on the other, and that basically solves the problem.
What's really difficult is predicting the proportion of both types of needs. If the balance is wrong, parts of the cluster may become useless, which sounds scary. However, since this field has no fixed rules and constraints and is mainly optimization problems, if engineer resource allocation is biased, we can usually find ways to utilize these resources, though possibly at significant cost.
For example, the entire industry is moving toward Mixture-of-Experts models. To some extent, this is because some DRAM was idle, so we use these idle resources to increase model parameters, thereby improving machine learning computational efficiency without adding extra computational costs. So even if resource balance is wrong, it won't cause disaster.
Accelerator homogenization is a good starting point, but I think ultimately customizing accelerators for specific purposes is also reasonable. As infrastructure capital expenditure reaches staggering scales, highly optimizing workloads also becomes reasonable. But the industry hasn't reached consensus because research development speed is very fast, and this largely dominates the entire direction.
**6. Fundamental Research Is Returning, Algorithms Replace Data and Computing Power as Key Bottleneck**
Q: I wasn't planning to ask this question, but you mentioned research. Can you rank the bottlenecks faced during GPT-6 scaling? Compute, data, algorithms, power, funding. Which are first and second? Which one constrains OpenAI the most?
Greg Brockman: I think we're now in an era where fundamental research is returning, which is very exciting. There was a time when people's focus was: we have Transformers, so let's keep scaling them. Among these clear problems, the main task was just improving metrics, which is interesting but somewhat intellectually unchallenging and unsatisfying.
Life shouldn't only revolve around the original "Attention is All You Need" paper's thinking. Now, what we're seeing is that as computing power and data scale rapidly expand, the importance of algorithms is once again prominent, almost becoming the key bottleneck for future progress.
These problems are all fundamental and critical links. While they may seem unbalanced in daily operations, fundamentally, these balances must be maintained. Seeing progress in paradigms like reinforcement learning is very exciting, and this is also an area we've consciously invested in for years.
When we trained GPT-4, the first time we interacted with it, everyone would think: "Is this AGI?" Obviously it's not AGI yet, but it's hard to clearly explain why not. It performs very smoothly but sometimes goes in wrong directions. This shows that reliability is still a core issue: it has never truly experienced this world, more like someone who has only read all books or only learned about the world through observation, separated from the world by a glass window.
Therefore, we realized we need different paradigms and continue pushing improvements until systems truly possess practical capabilities. I think this situation still exists today, with many obvious capability gaps that need filling. As long as we keep pushing forward, we'll eventually reach our goal.
**7. "Diversified Model Library" Gradually Taking Shape, Future Economy Will Be AI-Driven**
Jensen Huang: For AI-native engineers in the audience, they might be thinking that in the coming years, OpenAI will have AGI (Artificial General Intelligence), and they'll build domain-specific agents on top of OpenAI's AGI. As OpenAI's AGI becomes increasingly powerful, how will their development processes change?
Greg Brockman: I think this is a very interesting question. You can look at it from a very broad perspective, with firm but different viewpoints. My view is: first, everything is possible. Maybe future AI will be so powerful that we only need them to write all code; maybe there will be AI running in the cloud; maybe there will be many domain-specific agents requiring lots of customization work to achieve.
I think the trend is moving toward this "diversified model library" direction, which is very exciting because different models have different reasoning costs, and from a system perspective, distillation technology works well. Actually, much capability comes from one model's ability to call other models. This will create numerous opportunities, and we're moving toward an AI-driven economy.
Although we haven't fully arrived, signs are already apparent. People currently present are building all of this. Economic systems are very large, diverse, and dynamic. When people envision AI's potential, it's easy to only focus on what we're doing now and the ratio of AI to humans. But the real focus is: how to increase economic output 10-fold so everyone gains greater benefits?
In the future, models will be more powerful, foundational technology more complete, we'll do more with it, and barriers to entry will be lower. Like in healthcare, you can't simply apply it; you need to responsibly think about the right approach; education involves parents, teachers, and students, with each link requiring professional knowledge and substantial work.
Therefore, there will be numerous opportunities to build these systems, and every engineer present has the energy to achieve this goal.
免责声明:投资有风险,本文并非投资建议,以上内容不应被视为任何金融产品的购买或出售要约、建议或邀请,作者或其他用户的任何相关讨论、评论或帖子也不应被视为此类内容。本文仅供一般参考,不考虑您的个人投资目标、财务状况或需求。TTM对信息的准确性和完整性不承担任何责任或保证,投资者应自行研究并在投资前寻求专业建议。