Establishing Standards for AI in Credit: Industry and Academia Discuss First Multimodal Benchmark

Stock News
Feb 06

A live discussion focusing on the frontier of industry and academia, centered on "How to Set Standards for Multimodal AI in Credit," was hosted by Qifu Technology on February 5th. The core topic of the discussion was FCMBench-V1.0, the first multimodal evaluation benchmark for credit scenarios, recently released by Qifu Technology in collaboration with researchers from Fudan University and South China University of Technology. This benchmark is derived from real-world credit business scenarios and is designed with evaluation tasks centered on key areas such as multimodal perception, reasoning, and decision-making. The accompanying open-source dataset and evaluation tools aim to establish a widely recognized "measuring stick" for financial AI.

In this dialogue, three guests from the industry frontlines and academic research highlighted the same issue from different perspectives: without unified standards, the practical implementation of financial AI faces significant challenges. Dr. Yang Yehui, Head of Multimodal AI at Qifu Technology, began by discussing industrial practice. He used the metaphor of a "hoe and the land" to illustrate the relationship between AI and application scenarios: AI is the tool, while high-barrier industries like finance and healthcare represent the sufficiently "fertile" land. Given the inherent high demands of financial services for privacy, security, and compliance, the true reliability of a model's capabilities cannot be determined solely by self-proclaimed results.

"Prioritizing evaluation is essentially about creating a standard measure," Dr. Yang pointed out. He noted that financial institutions often face confusion when selecting models and solutions, encountering situations where "different models claim scores of 95 and 98 respectively, but which one is actually better?" Without a unified, fair, and open evaluation system, decision-making can easily lose focus. The value of FCMBench lies in bringing models to the same starting line, allowing their capabilities to be tested under real business conditions.

To this end, FCMBench emphasizes "practical applicability" in its design. From reconstructing the data system under compliance prerequisites, to mapping tasks to real business processes, and simulating over ten types of real-world interference scenarios such as lighting, angles, and reflections, the evaluation directly addresses the most challenging reasoning problems in financial risk control. For instance, identifying contradictions between occupational information and anomalous transaction records is a key test of a large model's financial reasoning ability. Dr. Yang acknowledged that developing an evaluation benchmark is not a short-term gain project, but in the long run, the formation of industry consensus and open-source collaboration will ultimately benefit the business itself.

Professor Xu Yanwu from South China University of Technology provided another reference point for the development of financial AI based on cross-industry experience. He pointed out that the common intuition that AI's "presence is not strongly felt" in finance is actually inaccurate. AI is already deeply involved in areas like insurance pricing, asset valuation, and quantitative trading; however, this value is not directly visible in consumer-facing products, hence it remains "unseen."

Comparing this to the decade-long research, development, and approval cycles in medical AI, Professor Xu suggested that the shorter business iteration cycles in the financial industry provide a more practical foundation for model evaluation and updates. He categorized the development of datasets into three stages: first, solidifying data quality; second, building influence through academic and competition operations; and finally, achieving official recognition at the industry level, becoming a "gatekeeping standard" similar to TOEFL or IELTS. In his view, FCMBench is at a starting point with significant potential.

From a broader perspective, Professor Chen Tao from Fudan University brought the discussion back to the history of AI development itself. He noted that the true watershed moment for deep learning was not just algorithmic breakthroughs, but the emergence of ImageNet, which for the first time enabled an order-of-magnitude leap in evaluation scale, ending the era of disparate claims based on small datasets.

"Financial AI is currently at a similar stage," Professor Chen emphasized. He stated that, judging by data scale, task coverage, and the systematic nature of the evaluation design, FCMBench is already the largest and most authoritative unified evaluation benchmark in the domestic financial sector, and even in international financial AI research. Furthermore, it is designed not to serve a single institution but aims for industry consensus, defining the boundaries of truly valuable problems. In Professor Chen's view, a good dataset itself defines "good questions." Crucially, financial AI should not remain at the stage of pre-training and fine-tuning general models but should build an endogenous chain of financial thinking, enabling models to naturally understand interest rates, rules, and risks, thereby achieving safe and trustworthy reasoning capabilities. This is a problem that requires collaborative solutions from both academia and industry.

In the concluding segment, the host stated that Qifu Technology has taken a crucial first step. However, for financial AI to truly move towards规模化 and规范化 development, continuous collaboration from industry, academia, and research institutions remains essential. The host also extended an invitation during the live stream, encouraging more partners to participate in dataset testing, evaluation, and competitions, allowing this potential "ImageNet for Finance" to be continuously calibrated through collaboration and truly take shape through consensus.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10