The first team to use large language model capabilities for "systematic transformation of recommendation systems."
Before GPT-3.5 emerged, "recommendation" might have been the most profound technology in internet history. Without recommendations, people would lose their channel to communicate with the vast ocean of internet information. It was recommendation technology that wove a massive network, connecting people, content, products, and currency, creating the vibrant online world and remarkable economic miracles we see today.
As Jiang Yuning told AI Tech Review, "The past decade has been a decade dominated by recommendation systems for traffic flow."
As the lead algorithm specialist for Taobao's recommendation algorithms and a distinguished algorithm expert at Alibaba's China E-commerce Business Group, Jiang Yuning leads his team to steer the recommendation algorithm system of this globally leading e-commerce platform. Hundreds of billions of yuan in transaction volume are facilitated by his algorithm team annually, with every action behind them affecting millions of merchants and billions of transaction orders.
In early July, Taobao's recommendation algorithm technology team launched a 100-billion parameter recommendation large model called RecGPT, implementing Artificial Intelligence Generative Recommendation (AIGR) technology upgrades for the "Guess What You Like" feature. In Jiang Yuning's words, they are the first team to use large language model capabilities for "systematic transformation of recommendation algorithms."
To some degree, RecGPT is probably a model that has been underestimated by the outside world since its release. Compared to various manufacturers competing on benchmark rankings with foundation general large models, RecGPT focuses on transforming the specific task of recommendation, which initially didn't garner sufficient media attention. However, it represents a turning point for e-commerce recommendation systems that have spanned over twenty years.
According to Jiang Yuning, starting from this point, it will gradually grow into a more mature, completely new e-commerce recommendation system with AI large models as the central command brain.
In August, leveraging the opportunity of Jiang Yuning's team releasing the RecGPT technical report, AI Tech Review conducted an exclusive interview with him. We sought to explore why Taobao was able to pioneer the full-scale deployment of AIGR in recommendation systems, what Taobao's approach and strategy for combining recommendation systems with large language models is, what impact its emergence will have on Taobao's ecosystem, and why the system-level large-scale implementation of AIGR occurred exactly 35 months after ChatGPT-3.5 went online.
You can simply attribute some of these answers to Taobao's emphasis on AI—something everyone knows. Alibaba is one of the most aggressive investors in large model technology. As early as 2024, Zheng Bo, Chief Scientist and Technology President of Alibaba's China E-commerce Business Group, proposed the AIGX technology system, like a generative AI technology roadmap covering all scenarios needed for e-commerce business operations.
But unlike all other AI players, Taobao itself is a special ecosystem where every move affects massive transaction volumes, and it has accumulated what may be China's most comprehensive, richest, and longest-standing e-commerce data. Especially with the "food delivery wars" in full swing, the surge in daily active users brought by flash shopping has posed more challenges to this algorithm system.
Jiang Yuning told AI Tech Review that AI large models actually endow traditional recommendation systems with some entirely new capabilities.
First, he said, traditional recommendation systems are black boxes, where sometimes their recommendation results cannot be controlled by users, and even the algorithm engineers building the recommendation systems cannot fully explain them. Large language models have excellent language compliance capabilities, which can make traditional recommendation systems more "white-box."
With large language model enhancement, recommendation systems can better follow user instructions and execute platform policy intentions.
Second, unlike traditional systems that focus more on users' short-term behavior, large language models have the ability to understand longer context windows, allowing massive long-term historical user data to be fed in, enabling the system to understand user behavior over longer time dimensions.
The reasoning capability of language models enables them to predict the evolution of user needs, allowing the system to break out of the "information cocoon" phenomenon that relies solely on users' short-term behavior.
In RecGPT, the large model is more like adding a modular capability outside the traditional prediction model. It doesn't replace the prediction model but acts as a more efficient filtering and sorting device placed before the prediction model, making the system more flexible.
Jiang Yuning believes this flexibility will further drive changes in Taobao's recommendation ecosystem—new users, long-tail products, and creative products with high emotional value will benefit, such as "the categories with the most exposure growth are actually trendy fashion and novelty toys."
Jiang Yuning thinks the AI progress of recommendation systems is closely related to platform attributes and goals. Taobao's advantage lies in rich product supply and user data resources, while Taobao's strategy is "Omnipotent Taobao," meaning users need more immersive consumption experiences in the mobile app, enabling high-quality connections between massive products and highly personalized users—all of which are exactly what large language models excel at.
"Technical architecture design must serve business strategy," Jiang Yuning told AI Tech Review. Conversely, if the approach is "aggressively pushing top bestsellers" or "focusing on low-price products," there's actually no need for a large model to assist the recommendation model.
Jiang Yuning revealed that the current version of RecGPT assists recommendation systems at various stages, while the next step is to build a "large model commander" across all stages to coordinate and direct all recommendation processes, giving the recommendation system better consistency.
Regarding the industry's hotly discussed "end-to-end" solutions, Jiang Yuning believes they might be the optimal future solution but currently require a cautious exploratory approach. He told AI Tech Review that current "end-to-end" solutions merely borrow the scaling-up approach from large models while wasting the rich world knowledge and powerful reasoning capabilities of large language models, essentially "buying the box and returning the pearl."
Everything ultimately comes down to ROI. Whenever Jiang Yuning mentions "end-to-end," he always pairs it with ROI, maintaining a non-dismissive but also non-excited observational stance.
As he said at the end, his dozen years of AI algorithm experience taught him that "AI must create business value, must land in business scenarios and form positive business loops for AI to take root and flourish."
This is probably the biggest difference between Taobao's AI team and other teams.
**01 Recommendation Systems Are "Black Boxes," Large Models Can Make Them "White-Box"**
**AI Tech Review**: We heard you're the first team to implement large model capabilities in recommendation systems.
**Jiang Yuning**: Actually, there's been quite a bit of research combining large models with recommendation algorithms both domestically and internationally over the past two years. However, recommendation is a system-level capability divided into many stages and modules. We've transformed every stage and module, so we're the first to systematically transform recommendation systems (using large models) and fully deploy them in production environments.
**AI Tech Review**: Different players seem to use different logic for recommendations. For example, Kuaishou and Amazon have proposed end-to-end concepts, but you use a segmented approach.
**Jiang Yuning**: Yes, these are two completely different approaches. End-to-end essentially doesn't use large model capabilities—it's mimicking large models' successful experience in NLP, borrowing the scaling law methodology. We're also trying similar end-to-end approaches. But its ROI might not be very high, potentially requiring massive resources for marginal improvements. So at the current stage, segmentation combined with existing recommendation systems is something large models can achieve returns on relatively quickly.
**AI Tech Review**: Can we understand that our current segmentation is temporary, and we'll eventually do end-to-end?
**Jiang Yuning**: (End-to-end) will definitely be done. But large models currently have strong capabilities, and if you don't use these capabilities but only use their modeling methods, it's somewhat like "buying the box and returning the pearl." Second, past recommendation models were black-box forms. Why something is recommended to you is completely unknown, with very poor controllability and no interpretability. It's like how people browse TikTok now—they need to "train" their accounts, liking many things before getting content they want to see. But large models can promote system "white-boxing"—because recommendation systems have language compliance capabilities, they can direct the system toward what platforms or users want.
**AI Tech Review**: Very interesting. Previously, everyone said large models were black boxes, but now they can actually make recommendation systems white-box?
**Jiang Yuning**: Large models themselves are certainly black boxes. But when used, they already have more interpretability than original algorithms. Past NLP or CV problems had unexplainable results. But now large models have a "thinking" process. Although why large models think this way is ultimately unexplainable, if you use them as plugins connected to original algorithms, the original algorithms gain some interpretability.
**AI Tech Review**: Why are recommendation systems black boxes? Can you explain?
**Jiang Yuning**: The essence of recommendation systems is having a user on one side and billions of products on the other, needing to find the 20-30 most matching products. Matching scores come from a model with a dual-tower structure—user features on one side, product features on the other—which calculates and produces a number, say "0.9." But what does "0.9" represent? It lacks interpretability—to what extent it matches your interests or characteristics is actually unknown. Like when you encounter a blogger while browsing short videos, platforms have so many bloggers, but why A instead of B? The system says it thinks you prefer A, but as for why, the system can't actually answer that question.
**AI Tech Review**: Can't traditional recommendation models restore how their weights are distributed? Can we try to deconstruct these weights to see what major categories of influence they have?
**Jiang Yuning**: This involves a basic principle of recommendation systems—"collaborative filtering." You can understand it this way: recommendation model weights aren't learned just from your individual behavior, but from hundreds of millions of users' behaviors. Which users with similar behaviors to yours clicked on which products—we construct billions of such behavioral pairs, and model weights are statistical values learned from this data foundation. Therefore, they cannot be simply reduced to individual behaviors: what characteristics you have, thus what results are recommended. Of course we can try to explain, but this is more like post-hoc rationalization—scores have already been generated, then we play Monday morning quarterback.
**AI Tech Review**: Since it's all matching, why are traditional recommendation models hard to explain, but large models can be?
**Jiang Yuning**: First, traditional scoring models haven't been replaced—large models are essentially helping them find candidate sets. The advantage of large models is that during matching, I can directly ask the system to provide rough matching reasons. Then let the system follow your language instructions, matching products through different dimensional breakdowns of users. For example, given a user profile and a recommendation large model with semantic compliance capabilities, you can tell it, "Please recommend products based on the user's purchasing behavior in the past 3 days," or "Please recommend products based on the weather where the user lives." This way, candidate products actually follow different dimensions.
Second, large models' thinking capabilities give them extended reasoning possibilities. For example, the system discovers I bought Ultraman items, then infers I might have a child who likes Japanese anime at home, thus deducing I might need children's books. It can essentially jump beyond past historical behaviors for further extended analysis, and its extension dimensions follow your prompt instructions.
**AI Tech Review**: So if large models are used, can the system actually recommend things that couldn't be recommended before?
**Jiang Yuning**: Yes, I believe the biggest benefits large language models bring to recommendation systems come from two points: reasoning capabilities and language compliance capabilities. As mentioned above, large models' reasoning capabilities enable recommendation systems to have more "interpretable discoverability," making recommendation results both surprising yet reasonable. Language compliance capabilities give large models "schedulability"—we can directly command large models on what dimensions to recommend: "don't recommend viewed items," "don't recommend items with too long history," "want fresh content." This truly achieves "human in the loop."
**AI Tech Review**: Does this mean operational talent can also participate?
**Jiang Yuning**: Yes, its benefit to recommendation systems is opening a gateway for many people beyond algorithm engineers to participate and contribute ideas. Large internet platforms have very practical business problems—recommendation systems in high-traffic platforms undertake many scheduling functions, not purely efficiency-oriented. You can think of recommendation systems as a power grid, needing to schedule traffic and distribute it to different industries and content based on demand—besides satisfying certain efficiency constraints, they must also consider how to be responsive to direction. Past methods might require algorithm colleagues to "schedule" systems or do weighting/deweighting to achieve this goal. But now I can directly tell the system about today's hot topics and emphasize recommending these things. One sentence gets it done.
**AI Tech Review**: So schedulability is actually very important.
**Jiang Yuning**: I come from an algorithm background—algorithms themselves pursue efficiency maximization. But in practice, no algorithm is omniscient and omnipotent; it needs strategic intervention. For example, if there's a sudden hot topic today, like military coats becoming popular after some product launch, purely algorithm-driven efficiency will definitely have lag. How to quickly and efficiently schedule traffic distribution tests the overall design of recommendation systems. I believe in the combination of large models + recommendation systems, we must pursue efficiency improvements while also considering schedulability. Recent papers published by competitors haven't truly considered schedulable recommendation scenarios. But we must know that recommendation systems actually have platform intentions behind them.
**AI Tech Review**: Do different route choices relate to platform characteristics? Because Taobao is actually a relatively operation-heavy company.
**Jiang Yuning**: Not necessarily. Actually, many platforms have strong operational attributes. This ultimately becomes a balance problem between optimal efficiency and schedulability. Like a child who can always score high but doesn't communicate well with people and can't follow your instructions—sometimes that's also frustrating. Large models now provide a feasible path to have both.
**02 Large Models Are Late to Recommendation Systems Because Baselines Are Too High**
**AI Tech Review**: Actually, GPT-3.5 has been out for over two years, and your technical report mentions that attempts to use large models to transform recommendation systems have been relatively few. Why is this?
**Jiang Yuning**: Depends on what you're comparing with. After large models emerged, everyone's first reaction was to transform search, not recommendation. Because large models are naturally suited for dialogue, while recommendation has no dialogue entry point. Recommendation is like entering a restaurant where the server brings you whatever dishes without giving you menu rights, but search systems let you order. So recommendation transformation lagged behind search—this is determined by large language models' inherent characteristics.
**AI Tech Review**: But recommendation is a high-value scenario. So logically, once new technology appears, everyone should explore it. What do you think are the technical difficulties behind its relatively late implementation?
**Jiang Yuning**: I think the biggest difficulty is that the original system's baseline is already too high. Recommendation algorithms are actually system science, developed for over ten years. Whether based on collaborative filtering or other methods, they already recommend very accurately. Especially for deep users' behavior, having accumulated lots of data, the system understands your historical behavior sequences very well. Although it calculates in a black-box manner, it can definitely find very good matches and pull your overall user metrics very high.
But recommendation systems sometimes have "toxic" stickiness, creating very strong cocoon effects. If recommendations are based on historical information, they become increasingly similar. On the flip side, if users are new entrants, the original system actually finds it very difficult to recommend accurately. These are actually two sides of the same problem.
**AI Tech Review**: What exactly is the relationship between traditional models and large models? Why can't they be replaced by large models?
**Jiang Yuning**: Actually, some current end-to-end concepts still connect a traditional deep learning model at the end. Suppose we have products A and B—recommendation systems don't simply do ranking, knowing A is better than B. They need to score them: how much better is A than B, 20%, 50%, or 100%? Because recommendation systems, especially in e-commerce, need to relate to transaction amounts, advertising revenue, commissions, and other numerical values, so you need to quantify recommendation scores.
AI is indeed suitable for many tasks, like reasoning based on long contexts, but it's just not good at precise numerical calculations. So AI currently does initial screening, leaving precise numerical calculation parts to traditional scoring models.
**AI Tech Review**: Is this long-context input technically efficient to implement?
**Jiang Yuning**: I think this is one of our core advances this time. Original recommendation systems preferred focusing on recent behavior, especially what was bought or viewed in the past week or two, pushing these aggressively while often forgetting users' long-term interests. RecGPT can start from long-term interests and complete more exploration.
Not only that, we've gradually achieved some reasoning capabilities based on user data. For example, if you bought pregnancy-related items a year ago, you can't be recommended pregnancy products a year later—instead, you should be recommended baby products. So labels themselves have gained evolution and reasoning capabilities.
**AI Tech Review**: So past labels didn't evolve?
**Jiang Yuning**: Previous labels could only evolve after you had certain behaviors. For example, when do labels evolve from pregnant woman to mother? When you start buying baby products. So traditional recommendation systems' learning was post-hoc: you first have some behavior, the model learns it, then recommends to you. But large models' reasoning capabilities enable recommendation model updates to occur before user behaviors.
**AI Tech Review**: You mentioned earlier that large model development in search was earlier than recommendations. But in e-commerce scenarios, it seems different—recommendation implementation seems ahead of search?
**Jiang Yuning**: Actually not really. E-commerce has many implementations that haven't been seen by everyone. But one point is: e-commerce is a consumption decision-oriented scenario, not an information gathering scenario. So large model applications in e-commerce search aren't about bringing fancy information interaction forms, but more about how to more accurately understand user intent, generate higher-quality data, and subtly influence user decisions. These backend improvements just aren't easily noticed.
**AI Tech Review**: People have actually tried using natural language interaction for e-commerce search before.
**Jiang Yuning**: As mentioned earlier, when people search for specific things, they've mostly already made decisions and don't need large language models to write long paragraphs telling them what to buy. Real large model applications in e-commerce search should be: when users search "tennis racket," you need to know what characteristics this user has. For example, price-sensitive or service-sensitive? Beginner or advanced player? Then based on user characteristics, recommend merchants with the best service or cheapest prices, beginner equipment or advanced equipment. This is where user experience can truly be improved.
**03 All Recommendation Systems Are EE Problems**
**AI Tech Review**: You once mentioned 70% content is based on recommendation engines, 30% content is trial and error, aimed at preventing systems from entering cocoon effects while ensuring efficiency.
**Jiang Yuning**: Yes, we need to find ways to improve the efficiency of the 30% while ensuring the 70% doesn't drop in efficiency. Actually, all recommendation systems are EE problems (Exploitation and Exploration), seeking balance between the two Es. Previously, the Exploration part used almost random strategies, like rolling dice. But now with RecGPT's large model assistance, this part's efficiency will significantly improve. It can also follow instructions, like having users explore snack categories, clothing categories, etc., no longer completely blind exploration.
**AI Tech Review**: Exploration provides more data for Exploitation. If the former is more efficient with higher data accumulation efficiency, then the latter would correspondingly be more efficient, forming a cycle between AI models and traditional models?
**Jiang Yuning**: Yes. We can divide the system into "efficiency circles" and "exploration circles." Only when "exploration circles" grow fast can data supplement "efficiency circles," and products that connect with users will become increasingly numerous.
**AI Tech Review**: Can we understand it this way: efficiency circles rely on traditional recommendation models, exploration circles rely on large language models?
**Jiang Yuning**: That's not accurate. Actually, both circles or tasks have been upgraded to the new mode of large language models plus traditional recommendation models. It's just that this mode helps Exploration more than Exploitation.
**AI Tech Review**: How much difference can there be in efficiency improvements between the two?
**Jiang Yuning**: Efficiency circles see single-digit increases, while exploration circles can improve by over fifty percent.
Going back to the difficult problem of recommendation systems. When large models wanted to improve "efficiency circle" effects, they found that despite investing massive resources, improvement effects were limited because the system was already doing very well. Instead, "exploration circles" are parts that traditional methods find very difficult to improve, so large models have much application space.
**AI Tech Review**: Does this mean large model applications will be very friendly to new users?
**Jiang Yuning**: (Yes) New users and long-tail products (both friendly). This system alleviates the Matthew effect of products. If our efficiency circle exposure is 70%, this 70% exposure actually only goes to 10% of products, while the remaining 30% exploration circle exposure is distributed among 90% of products. This is actually very uneven. The prerequisite for a product to move from exploration circle to efficiency circle must be creating clicks between products and people, then accurate scores can be calculated. But because long-tail products' scoring is very difficult to be accurate, exploration circle click efficiency is very low. If 300 exploration exposures only harvest 6 clicks, then actually only 6 effective data points can be learned by efficiency circles; but after large models come up, I might have 10 clicks, so more products will be activated and able to enter efficiency circles.
**AI Tech Review**: So Taobao had lots of data before, but much of it wasn't actually activated.
**Jiang Yuning**: Taobao is also called "Omnipotent Taobao." Among all e-commerce platforms, Taobao has the richest product diversity. The Taobao we usually see is only a small part of Taobao's product library—it has many interesting products that haven't been pushed out. So we need to use large models to improve this point.
**AI Tech Review**: Actually, many users complain about why big data recommends lots of content they've already purchased. Why can't this be avoided? Like making a simple rule.
**Jiang Yuning**: Because any rule has more or less loopholes. For example, I bought a pack of pistachios, thought they were great, and I really want to repurchase, but the system never recommends them again. Or how should this rule's time range be set? Don't show for 3 days, or 3 months, 9 months? How should this cycle be determined? What if today I'm repurchasing laundry detergent—does this cycle still apply?
So I now prefer putting things in the front chain, letting large models learn and judge whether this product actually has repurchase attributes, how the cycle should be set, rather than simply setting a rule.
**AI Tech Review**: You mentioned earlier that RecGPT is friendly to new users. So when businesses like flash shopping and food delivery come in, platform daily active users increase a lot with new users—is this pressure for recommendation systems? Can RecGPT just play a role?
**Jiang Yuning**: We certainly hope to see such growth. Whether new users can convert well actually depends largely on how well we recommend. This is actually a big challenge and opportunity for us. Because their previous shopping was all non-traditional e-commerce behavior, like food delivery, milk tea, and some identity and location information.
**AI Tech Review**: What kind of product recommendations are suitable for retaining flash shopping users?
**Jiang Yuning**: Snacks are a very natural conversion category. What takeout users have eaten, what flavors they like—if they love spicy food, the homepage can recommend spicy strips. I recently discovered we often recommend Chongqing rice noodles to Sichuan colleagues, with pretty good results.
**04 If You Only Push Low Prices, You Don't Need Large Language Models**
**AI Tech Review**: We seem to use both manual evaluation and a small model for assessment. What considerations led you to do this?
**Jiang Yuning**: If using a large model for chatbots, one of the most difficult things might be defining standards for good conversations. E-commerce is the same—when large models summarize user profiles, whether these profiles are good, bad, or comprehensive, we actually do extensive manual verification—not annotation, just verification. For example, if my label is "geek," is it reasonable for the large model to recommend a pure titanium water cup? Does such a pure titanium water cup actually exist in the product library? If not, then hallucination occurred.
But manual annotation costs are very high, so we record results and use another model to learn manual annotations.
**AI Tech Review**: Do you think recommendations have data flywheels? If a platform has more data, more supply, more behavioral dimensions, will such platforms have more opportunities in the future AI era?
**Jiang Yuning**: Of course, without doubt. Doing AI definitely requires sufficient cash flow, good business cycles, and good data cycles—all indispensable.
**AI Tech Review**: Do we involve multimodal content? Will future large models learn human aesthetics for recommendations?
**Jiang Yuning**: Aesthetics are still decided by users—platforms just do matching. But we'll next create a new multimodal-based ID system. This way, product ID quantities will be greatly reduced, no longer depending on the original one-product-one-link-one-ID system. This way, even if a product changes its product link, its semantic ID won't change.
**AI Tech Review**: This sounds like a very significant change.
**Jiang Yuning**: It will change many merchants' operational habits. Previously, many merchants liked "nurturing links," hanging different products on one link to inherit traffic, causing some "goods not matching descriptions" situations, but this will have no benefits in the future.
**AI Tech Review**: What differences do you think different platform ecosystems have in their needs for recommendation strategies and technologies?
**Jiang Yuning**: Of course there are differences. For example, if you're doing an extreme low-price strategy, then you're creating price competition within the same products, only showing the lowest-priced product for each ID. So the entire recommendation architecture necessarily serves this business strategy.
Our RecGPT essentially serves "Omnipotent Taobao." We neither aggressively push top bestsellers nor focus on low-price products, so we need algorithm designs like RecGPT. Actually, product richness is a big advantage for Taobao's AI development—our recommendation system can answer more questions than others.
Honestly, if you're just pushing one low-price item in the same category, you don't actually need AI to recommend.
**AI Tech Review**: If we're entering the era of AI large model recommendations, what suggestions do you have for merchants?
**Jiang Yuning**: Study platform policies and rules more, do more creative work. You know, after the system went online, which category grew fastest? Toys. New, novel, special products with higher emotional value will definitely get better traffic returns.
**05 Recommendation Systems Are Still Far from "Shocking"**
**AI Tech Review**: Have you considered that if users feel the system understands them too well, they might feel offended?
**Jiang Yuning**: At current technical levels, such situations aren't common. I encountered a coincidence recently—I was eating some pork floss a colleague bought while browsing Taobao, and suddenly found it recommended exactly the same product, which startled me. Later I specifically studied this case—it was actually just a simple coincidence. Because this pork floss merchant was heavily promoting ads, so my colleague was influenced, and what I encountered was actually just the merchant's advertisement.
If such "precise" situations really appear in the future, I really don't know whether there would be more surprises or more shocks. But overall, recommendation systems are still very far from making people feel "monitored." Currently, the most complained-about issue is still "information cocoons."
**AI Tech Review**: But suppose I buy diapers and the platform immediately knows I'm having a baby, I might feel offended.
**Jiang Yuning**: Like we recently added "recommendation reasons" below products on the "Guess What You Like" homepage, internally there's clear review and risk control—any evaluations involving user age, height, or appearance won't appear.
Actually, privacy isn't just users' business—it's also a big risk for platforms. For example, at sensitive time points, mistakenly recommending sensitive products to culturally sensitive populations. We could only write hard rules before, but the system itself couldn't understand. With large models, similar risks will be easier to avoid.
**06 Future: Let Large Models Be "Recommendation Commanders"**
**AI Tech Review**: In your view, what's the future direction of recommendation system technology evolution?
**Jiang Yuning**: Three paths. The first path I call "plugin-style," which is RecGPT's current approach—using large models to transform and enhance every stage of current recommendation systems. Utilizing their reasoning and long-cycle memory capabilities to enrich system capabilities.
The second path is letting large models be recommendation systems' "commanders," building a brain to control every stage. Because recommendation systems still have very many stages, if each stage optimizes and iterates separately, it will cause recommendation system inconsistencies. If there's a commander, I can deploy different strategies at different time stages. For example, during Double 11, maximize transaction efficiency, so all recommendation stages target transactions; or during regular times, focus mainly on seeding goals, so all stages adjust and align to seeding goals. This large model brain schedules the entire recommendation system through hyperparameters, improving consistency.
The third path is what we call "end-to-end"—reducing intermediate stages while applying scaling law to scoring models. If we believe scale can create miracles, then since it succeeded in NLP and CV problems, recommendation systems might also succeed.
**AI Tech Review**: Do you think these are different evolutionary stages? From 1 to 2, then to 3?
**Jiang Yuning**: From implementation difficulty perspective, yes.
**AI Tech Review**: Actually, everyone is still quite obsessed with the third path.
**Jiang Yuning**: Ten years ago, recommendation systems were just simple regression models. Deep learning first proved itself in CV and NLP problems before being used in recommendation systems, becoming mainstream deep recommendation models like DIN. Why do people believe in one model (end-to-end)? Because this seems very similar to ten years ago—again a new model structure, larger than before (deep learning was also much larger than logistic regression), and also proved successful in CV and NLP problems, so it's easy to create some cognitive inertia.
I never deny this direction's possibility, but recommendation task nature differs greatly from natural language. In this task, how much must recommendation models scale-up to gain qualitative change capabilities, and what costs must be paid to reach this qualitative change inflection point? We need to calculate ROI.
**AI Tech Review**: Does this relate to current large models' intelligence ceiling?
**Jiang Yuning**: Doing scaling law in recommendation systems has nothing to do with this, because it doesn't really use large models' intelligence—it just makes scoring models dozens or even hundreds of times larger in parameters. It doesn't have world knowledge, just a stronger examiner.
**AI Tech Review**: What stage do you think Taobao is at now?
**Jiang Yuning**: Between 1 and 2. Next we'll move toward direction 2, and I believe this approach of large models as commanders will be realized quickly. Meanwhile, some people will explore direction 3 end-to-end things.
**AI Tech Review**: You were employee number five at Megvii, experiencing over ten years from CV AI to large language models. What do you think is the most valuable experience from your past for you now?
**Jiang Yuning**: AI must create business value. It must find business scenarios through positive business cycles for AI to take root and flourish there.