NVIDIA and Alibaba Re-evaluate AI, Discarding FLOPS as Obsolete

On March 17, Jensen Huang took the stage at NVIDIA GTC 2026, speaking for over two hours in his signature leather jacket. Following the event, discussions across the internet centered on NVIDIA's ambition to become the "Token King." However, a closer listen to the keynote reveals that Huang's repeated emphasis was not on tokens themselves, but on "Tokens per Watt." While presenting inference performance charts, he explicitly introduced this concept, stating that every data center and every AI factory is fundamentally constrained by power. A 1GW facility can never become 2GW; this is dictated by the laws of physics. Under a fixed power budget, whoever achieves the highest token output per watt will have the lowest production costs and the steepest revenue growth curve. This statement encapsulates the core message of GTC 2026.

Public discourse has focused on comparisons like how much more powerful Vera Rubin is than Blackwell, Groq LPX increasing inference speed by 35 times, or NVIDIA's plans for space-based data centers. While these are significant, they are essentially different expressions of the same underlying logic: maximizing intelligent output from every watt of electricity within energy constraints. When Huang positions "Tokens/W" as the core metric for AI factory output, it carries deeper industrial significance. The framework for computing competition is shifting from individual chips to entire systems, from peak performance parameters to end-to-end energy efficiency, and from which chip is faster to which system converts energy into intelligence more efficiently.

Within the current product and technology portfolio, both NVIDIA and Jensen Huang remain focused on the challenge of tokens per watt. Achieving the status of "Token King" requires many further steps. This represents a shift in the "language of intelligent metrics," and the industrial perspective opened by this shift is far more worthy of in-depth discussion than any single new chip.

Coincidentally, just one day before GTC officially began, Alibaba announced the establishment of the Alibaba Token Hub, to be led personally by Wu Yongming. Significantly, Alibaba's AI core is named after "Token," not "AI," elevating Token to the level of Alibaba's AI strategy. This similarly demonstrates that viewing AI through a systemic lens is gradually becoming a new industry consensus. This is the concept this article aims to emphasize and represents its core purpose.

**01 The Most Significant Change at GTC 2026 Lies Beyond the Chips Themselves**

At GTC 2026, the focus remained on new products and terms like Vera Rubin, Rubin POD, LPX, and the DSX AI Factory. However, viewing these announcements collectively reveals a shift in the narrative of computing competition from individual chips to the level of computing infrastructure—specifically, the entire AI factory comprising computing, networking, storage, power, cooling, control systems, and software. Rubin is described as a POD-scale platform, where multiple racks form a large-scale, coherent system. The DSX is defined as a reference design for AI factories, aiming to maximize tokens per watt. This indicates that true industry competition is transforming from the raw power of a single chip to the strength of the entire computing system. More precisely, it's about whether the entire system can efficiently organize limited power, cooling, and network resources into stable AI output. The specific unit of measurement for this is Tokens per Watt (Token/W). This article aims to use the Token/W metric to gain insight into the implications of these announcements and the opportunities they present for developing the AI infrastructure industry.

**02 As Competition Shifts to Systems, the Measurement Framework Cannot Remain at the Chip Level**

The measurement system of the chip era is familiar: peak FLOPS, memory bandwidth, FLOPS/W, TOPS/W, bit/J. These metrics are important as they describe the capability boundaries of individual components. However, this leads to an awkward practical situation: there is no objective, unified, and universal unit of measurement for intelligent computing centers. Typically, data centers are measured in megawatts (MW) of power, while in domestic intelligent computing center construction, the computing unit PFlops (based on FP16) is used. Yet, clusters with the same nominal computing power or power capacity can have vastly different efficiencies depending on their internal chips, networking, and cooling systems. The reason is straightforward: previous metrics only measure specific dimensions. Peak FLOPS describes the theoretical computational capacity of a chip, bit/J describes the energy efficiency of local data movement, and bandwidth describes the information pathway capacity of a single subsystem. These are all component-level metrics. However, a complete AI system must ultimately answer the question: under fixed power budgets, cooling conditions, and data center constraints, how much valid AI output can it produce? Chip-level metrics alone cannot answer this question. From NVIDIA's discourse at this event, terms like token cost, throughput per watt, token performance per watt, and tokens per watt are prominent. The language of measurement is transitioning from component language to system language. Therefore, if common chip-level metrics are peak FLOPS, bandwidth, and bit/J, then a more appropriate system-level metric should be Token/W. The former measures component capability; the latter measures overall output. The former corresponds to local optimization; the latter corresponds to system-wide optimization.

**03 Token/W Connects the Chain from Energy to Intelligent Output**

In the official GTC 2026 transcript, NVIDIA refers to the token as the "basic unit" of modern AI. This characterization is apt. For large language models, inference services, and Agent systems, what users ultimately pay for is the system's ability to generate and process tokens. From a business operations perspective, the token has three advantages: 1) It is directly coupled with the model inference process. 2) It is directly coupled with the revenue model. 3) It is suitable for covering new workloads of the inference era, such as Agents, multi-turn conversations, long context, retrieval-augmented generation, tool use, and chain-of-thought reasoning. These new workloads are difficult to describe with a single FLOPS metric but can all be measured in terms of tokens, latency, and goodput. More importantly, the fundamental constraint on AI infrastructure is increasingly becoming a direct energy constraint. The IEA's "Energy and AI" report predicts global data center electricity use will rise to approximately 945 TWh by 2030, a significant increase from current levels, with AI being a major driver and the US accounting for a large share of this growth. In other words, many challenges facing the AI industry may appear to be chip problems on the surface but are, in essence, problems of power, cooling, and infrastructure organization. The Token/W concept is valuable because it connects the most critical chain in the AI industry: electrical power input, processed through computing, networking, storage, scheduling, and cooling, ultimately resulting in token output. In this sense, Token/W does not simply replace FLOPS/W or bit/J. It adds a previously overlooked perspective: how much energy is the AI system converting into intelligent output? The most noteworthy aspect of this GTC is precisely this: we can no longer view chips in isolation but must place them within systems and view those systems within industrial constraints. This is the perspective consistently advocated by the author. Evaluating AI chips requires looking not only at peak FLOPS, memory bandwidth and capacity, or interface specs, but also at how they collaborate within networks, how they are deployed in racks, how they access power in campuses, how they form cost structures for customers, and ultimately how they translate into real output for businesses. To some extent, GTC 2026 publicly validated this systemic perspective. When NVIDIA itself shifts its narrative center to the AI factory, the industry is already moving from AI computing chip-centrism to computing system-centrism. This is crucial. Many industries become fixated on component parameters in their early stages because they are easiest to measure and promote. But once an industry enters large-scale deployment, the deciding factor often becomes system organization capability. Today's AI infrastructure has reached that stage.

**04 Extending from Token/W, the Importance of Optical Interconnect Will Rise Significantly**

Once the measurement system migrates to the system level, the importance of many previously considered auxiliary elements increases. Optical interconnect is a prime example. Previously discussed from the perspectives of optical modules, communications, or devices—focusing on higher bandwidth, longer transmission distances, lower pJ/bit, better bandwidth density, and lower insertion loss—its value was understood at the component or subsystem level. Within the Token/W framework, the value of optical interconnect becomes more intuitive: it reduces the energy cost of data movement, enhancing the ability of large-scale AI computing systems to convert electricity into tokens. When discussing NVIDIA's optical networking products, the emphasis on co-packaged optics (CPO) achieving up to 5x higher energy efficiency compared to pluggable modules, along with reduced latency and support for larger AI factory scaling, shifts the focus from merely advanced links to larger system scale and higher system efficiency. From an industrial logic perspective, this is easy to understand. As models grow larger, contexts lengthen, and clusters expand, a significant portion of system energy consumption occurs not in arithmetic units but in data movement—across chips, boards, racks, and PODs. At this stage, improving Token/W can no longer rely solely on more powerful GPUs; it also requires more efficient interconnects. Therefore, from the Token/W viewpoint, developing optical interconnect is not just about being cutting-edge; it is becoming a necessary energy-saving method for large-scale AI systems.

**05 Optical Computing is More Nascent Than Optical Interconnect, But the Logic is Becoming Valid**

It must be acknowledged that optical computing is at an earlier stage of development than optical interconnect. Challenges related to generality, precision, compilers, manufacturing consistency, and system integration are still being addressed. However, if the observation boundary is expanded to the system level, its industrial significance is becoming easier to articulate. The reason is that Token/W is concerned with end-to-end energy efficiency. Any technology that can significantly reduce energy consumption for specific high-frequency, high-density, repeatably mappable computational paths has the potential to increase token output efficiency at the system level. This logic does not require optical computing to replace the entire GPU or immediately become a general-purpose computing foundation. It only requires one thing: for certain key workloads, to reduce the Joules per token for the entire system and increase token output under a fixed power budget. This is also why the narrative around optical computing needs to shift from single-device efficiency to its contribution to system-level energy savings. If the industry only looks at TOPS/W or MAC/J, it remains more of a laboratory story. But if the industry begins to consider Token/W, it has a chance to enter infrastructure discussions. This change is particularly important for optical computing because it finally provides a higher-level language with which to communicate with customers, campuses, power providers, and capital expenditure decisions.

**06 As Computing Metrics Shift from Chips to Systems, Optical Interconnect and Computing Move to the Industry Mainstream**

When computing competition primarily resided at the chip level, optical interconnect was viewed more as an I/O technology, and optical computing as exploratory research into advanced devices. However, as competition migrates to large-scale, system-level AI infrastructure, the situation changes. System efficiency increasingly depends on the organization of dense computing power consumption, data movement, context management, cross-node coordination, and power supply and thermal management. These are precisely the areas where optics has the greatest potential to make an impact. From the Token/W perspective, optical interconnect addresses the "shipping cost" of electricity behind each token generated, while optical computing attempts to rewrite part of the "computation cost" of electricity behind each token. Together, they influence the token output efficiency of the entire system. This is the fundamental reason they are entering the industry mainstream. More pragmatically, beyond chip capacity and supply, future constraints for data centers and AI factories will include grid access, data center cooling, campus energy consumption, rack power density, and deployment speed. Previous assessments by the International Energy Agency regarding AI's energy consumption and NVIDIA's current emphasis on AI factories point in the same direction: AI infrastructure is becoming a systems engineering problem measured by energy usage. Looking forward from this new direction, optical interconnect and computing address the parts of the AI era that are becoming increasingly expensive and difficult to optimize using traditional electrical paths: the energy cost of data movement and the unit energy consumption of high-density computation. This reflects a more complete systems-thinking approach. This is also why GTC 2026 once again highlighted photonic and silicon photonics technology products: as the metrics for computing power shift from chips to systems, optics will gradually transition from an advanced technology option to a worthwhile industrial infrastructure worth building. From this perspective, the future for CPO and optical computing systems looks very promising.

**Final Thoughts: The Driving Axis of AGI Development**

In daily work, the author consistently advocates for establishing objective and measurable computing power standards and uses the Token/W method to evaluate tests of different computing chips. Looking back at technological history, the automobile, the airplane, and the rocket only became possible when the ratio of output energy to the weight of the internal combustion engine became sufficiently high. In the AI era, as the ratio of output (currently tokens) to energy consumption for AI systems becomes increasingly high, intelligence will grow smarter, and AGI might emerge from within this progression. What is truly worth remembering from GTC 2026 is not the fortunes of a single company like NVIDIA or whether Jensen Huang becomes the "Token King," but the clarification of a new metric for the AI era. Furthermore, NVIDIA, Alibaba, and likely many other industry giants have begun to realize that the development of the AI industry must be viewed from a systems-thinking perspective. This, in fact, aligns with the main axis of human civilization development: using less energy to collect, transmit, and process more information. AGI will be no exception.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

NVIDIA and Alibaba Re-evaluate AI, Discarding FLOPS as Obsolete

Most Discussed