百度发布新一代文字识别AI模型PP-OCRv5，仅0.07B，部分测试超GPT-4o

IT之家

Sep 13, 2025

IT 之家 9 月 13 日消息，百度于 9 月 10 日在 Hugging Face 发布新一代文字识别解决方案 PP-OCRv5。

百度介绍称，PP-OCRv5 是一个为缓解大型视觉语言模型（VLMs）局限性而设计的专用 OCR 模型，它提供了一种高效、准确且轻量级的解决方案。

PP-OCRv5 通过保持模块化、两阶段的流程，专门针对高速、精确的文本检测和识别，解决了大型 VLMs 的精确文本定位和边界框精度局限性问题。

PP-OCRv5 的亮点如下：

效率：该模型参数量仅为 0.07B，能够在 CPU 和边缘设备上实现更高性能，其移动版本在英特尔 Xeon Gold 6271C CPU 上每秒可处理超过 370 个字符。

性能：PP-OCRv5 在 OCR 特定基准测试中优于通用型 VLM 模型，如 Gemini 2.5 Pro、Qwen2.5-VL 和 GPT-4o，包括手写和印刷的中英文以及拼音文本。

定位：PP-OCRv5 旨在提供精确的文本行边界框坐标，这对于结构化数据提取和内容分析是关键要求。

多语言支持：该模型支持五种文字类型 —— 简体中文、繁体中文、英文、日文和拼音，并能识别超过 40 种语言。

PP-OCRv5 由四个核心组件构成：

图像预处理：处理图像的旋转和畸变，以标准化输入。

文本检测：识别图像中文本行的精确位置。

文本行方向：分类检测到的文本方向，以确保其正确对齐以进行识别。

文本识别：将每行文本中的字符解码为文本字符串。

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

1
2
3
4
5
6
7
8
9
10

{"basename":"","ssrTDKData":{"titleTemplate":"%s - Tiger Brokers","title":"Tiger Brokers | Global Stocks, Options & Futures Trading App","description":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","keywords":"tiger brokers,tiger trade,tiger brokers singapore,broker online,stock trading in singapore,share trading singapore,brokerage firm singapore,trading app,stock broker singapore,stock trading platforms,trading account","social":{"ogDescription":"Tiger Brokers, one-stop investment in US stocks, SGX stocks, HK stocks, A-shares & other global assets. One of the best stock trading platforms in Singapore.","ogImage":"https://c1.itigergrowtha.com/portal5/static/media/og-logo.be62fbe1.png","ogUrl":"https://www.itiger.com/news/2567131541"},"companyName":"Tiger Brokers"},"pageData":{"isMobile":false,"isTiger":false,"isTTM":true,"region":"SGP","license":"TBSG","edition":"fundamental"},"isCrawlerRequest":true,"__swrFallback__":{"@#url:\"https://stock-news.skytigris.cn/v3/news\",params:#id:\"2567131541\",edition:\"fundamental\",auth_exemption:1,,,undefined,":{"share":"https://ttm.financial/m/news/2567131541?lang=en_US&edition=fundamental","thumbnail":"","is_english":false,"pubTime":"2025-09-13 22:11","share_image_url":"https://static.laohu8.com/e9f99090a1c2ed51c021029395664489","id":"2567131541","market":"sh","top_or_hot":-1,"title":"百度发布新一代文字识别AI模型PP-OCRv5，仅0.07B，部分测试超GPT-4o","media":"IT之家","content":"<html><body><p>IT 之家 9 月 13 日消息，<a href=\"https://laohu8.com/S/BIDU\">百度</a>于 9 月 10 日在 Hugging Face 发布新一代文字识别解决方案 <strong>PP-OCRv5</strong>。</p><p><a href=\"https://laohu8.com/S/09888\">百度</a>介绍称，PP-OCRv5 是一个为缓解大型视觉语言模型（VLMs）局限性而设计的专用 OCR 模型，它提供了一种<strong>高效、准确且轻量级的解决方案</strong>。</p><p>PP-OCRv5 通过保持模块化、两阶段的流程，专门针对高速、精确的文本检测和识别，解决了大型 VLMs 的精确文本定位和边界框精度局限性问题。</p><p>PP-OCRv5 的亮点如下：</p><p>效率：<strong>该模型参数量仅为 0.07B</strong>，能够在 CPU 和边缘设备上实现更高性能，其移动版本在<a href=\"https://laohu8.com/S/INTC\">英特尔</a> Xeon Gold 6271C CPU 上每秒可处理超过 370 个字符。</p><p>性能：PP-OCRv5 在 OCR 特定基准测试中优于通用型 VLM 模型，如 Gemini 2.5 Pro、Qwen2.5-VL 和 GPT-4o，包括手写和印刷的中英文以及拼音文本。</p><p>定位：PP-OCRv5 旨在提供精确的文本行边界框坐标，这对于结构化数据提取和内容分析是关键要求。</p><p>多语言支持：该模型支持五种文字类型 —— <strong>简体中文、繁体中文、英文、日文和拼音</strong>，并能识别超过 40 种语言。</p><p><img src=\"https://x0.ifengimg.com/res/2025/5CE39363CD871A5AAC95D29E4652DB6CDE7ACCC4_size138_w1032_h460.png\"/></p><p>PP-OCRv5 由四个核心组件构成：</p><p>图像预处理：处理图像的旋转和畸变，以标准化输入。</p><p>文本检测：识别图像中文本行的精确位置。</p><p>文本行方向：分类检测到的文本方向，以确保其正确对齐以进行识别。</p><p>文本识别：将每行文本中的字符解码为文本字符串。</p><p><img src=\"https://x0.ifengimg.com/res/2025/794AED6334343FF8479D5100B498598D0AE86B01_size227_w1033_h644.png\"/></p></body></html>","source":"fenghuang_stock","html":"<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no\"/>\n<meta name=\"format-detection\" content=\"telephone=no,email=no,address=no\" />\n<title>百度发布新一代文字识别AI模型PP-OCRv5，仅0.07B，部分测试超GPT-4o</title>\n<style type=\"text/css\">\na,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,\nem,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,\nobject,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{ font:inherit;margin:0;padding:0;vertical-align:baseline;border:0 }\nbody{ font-size:16px; line-height:1.5; color:#999; background:transparent; }\n.wrapper{ overflow:hidden;word-break:break-all;padding:10px; }\nh1,h2{ font-weight:normal; line-height:1.35; margin-bottom:.6em; }\nh3,h4,h5,h6{ line-height:1.35; margin-bottom:1em; }\nh1{ font-size:24px; }\nh2{ font-size:20px; }\nh3{ font-size:18px; }\nh4{ font-size:16px; }\nh5{ font-size:14px; }\nh6{ font-size:12px; }\np,ul,ol,blockquote,dl,table{ margin:1.2em 0; }\nul,ol{ margin-left:2em; }\nul{ list-style:disc; }\nol{ list-style:decimal; }\nli,li p{ margin:10px 0;}\nimg{ max-width:100%;display:block;margin:0 auto 1em; }\nblockquote{ color:#B5B2B1; border-left:3px solid #aaa; padding:1em; }\nstrong,b{font-weight:bold;}\nem,i{font-style:italic;}\ntable{ width:100%;border-collapse:collapse;border-spacing:1px;margin:1em 0;font-size:.9em; }\nth,td{ padding:5px;text-align:left;border:1px solid #aaa; }\nth{ font-weight:bold;background:#5d5d5d; }\n.symbol-link{font-weight:bold;}\n/* header{ border-bottom:1px solid #494756; } */\n.title{ margin:0 0 8px;line-height:1.3;color:#ddd; }\n.meta {color:#5e5c6d;font-size:13px;margin:0 0 .5em; }\na{text-decoration:none; color:#2a4b87;}\n.meta .head { display: inline-block; overflow: hidden}\n.head .h-thumb { width: 30px; height: 30px; margin: 0; padding: 0; border-radius: 50%; float: left;}\n.head .h-content { margin: 0; padding: 0 0 0 9px; float: left;}\n.head .h-name {font-size: 13px; color: #eee; margin: 0;}\n.head .h-time {font-size: 11px; color: #7E829C; margin: 0;line-height: 11px;}\n.small {font-size: 12.5px; display: inline-block; transform: scale(0.9); -webkit-transform: scale(0.9); transform-origin: left; -webkit-transform-origin: left;}\n.smaller {font-size: 12.5px; display: inline-block; transform: scale(0.8); -webkit-transform: scale(0.8); transform-origin: left; -webkit-transform-origin: left;}\n.bt-text {font-size: 12px;margin: 1.5em 0 0 0}\n.bt-text p {margin: 0}\n</style>\n</head>\n<body>\n<div class=\"wrapper\">\n<header>\n<h2 class=\"title\">\n百度发布新一代文字识别AI模型PP-OCRv5，仅0.07B，部分测试超GPT-4o\n</h2>\n\n<h4 class=\"meta\">\n\n\n2025-09-13 22:11 北京时间&nbsp;&nbsp;&nbsp;<a href=https://tech.ifeng.com/c/8mcjKV2Ppo5><strong>IT之家</strong></a>\n\n\n</h4>\n\n</header>\n<article>\n<div>\n<p>IT 之家 9 月 13 日消息，百度于 9 月 10 日在 Hugging Face 发布新一代文字识别解决方案 PP-OCRv5。百度介绍称，PP-OCRv5 是一个为缓解大型视觉语言模型（VLMs）局限性而设计的专用 OCR 模型，它提供了一种高效、准确且轻量级的解决方案。PP-OCRv5 通过保持模块化、两阶段的流程，专门针对高速、精确的文本检测和识别，解决了大型 VLMs 的精确文本定位...</p>\n\n<a href=\"https://tech.ifeng.com/c/8mcjKV2Ppo5\">Source Link</a>\n\n</div>\n\n\n</article>\n</div>\n</body>\n</html>\n","isBrief":false,"type":0,"news_type":1,"symbol":"IE00B0JY6N72.USD","symbol_name":"PINEBRIDGE GLOBAL EMERGING MARKETS FOCUS EQUITY \"A\" (USD) ACC","start_time":0,"source_url":"https://tech.ifeng.com/c/8mcjKV2Ppo5","article_id":"2567131541","we_media_id":null,"thumbnails":[],"rights":null,"url":"https://stock-news.laohu8.com/highlight/detail?id=2567131541","pubTimestamp":1757772704,"columns":[],"sourceInfo":{"source_id":"fenghuang_stock","name":"凤凰网"},"weMediaInfo":null,"summary":"IT 之家 9 月 13 日消息，百度于 9 月 10 日在 Hugging Face 发布新一代文字识别解决方案 PP-OCRv5。性能：PP-OCRv5 在 OCR 特定基准测试中优于通用型 VLM 模型，如 Gemini 2.5 Pro、Qwen2.5-VL 和 GPT-4o，包括手写和印刷的中英文以及拼音文本。定位：PP-OCRv5 旨在提供精确的文本行边界框坐标，这对于结构化数据提取和内容分析是关键要求。PP-OCRv5 由四个核心组件构成：图像预处理：处理图像的旋转和畸变，以标准化输入。","collect":0,"end_time":0,"defaultTopTitle":"ifeng.com","property":[],"viewcount":null,"language":"zh","relate_stocks":{"IE00B0JY6N72.USD":"PINEBRIDGE GLOBAL EMERGING MARKETS FOCUS EQUITY \"A\" (USD) ACC","BK4579":"人工智能","BK4588":"碎股","BK4614":"Manus概念股","LU0640798160.USD":"EASTSPRING INVESTMENTS GLOBAL EMERGING MARKET DYNAMIC \"A\" (USD) ACC","BK4548":"巴美列捷福持仓","BK4552":"Archegos爆仓风波概念","LU0287142896.SGD":"Fidelity China Focus A-SGD","BK4574":"无人驾驶","BK4543":"AI","BK4077":"互动媒体与服务","BK4531":"中概回港概念","BIDU":"百度","BK4526":"热门中概股","BK4585":"ETF&股票定投概念","LU0173614495.USD":"富达中国焦点A","BK4514":"搜索引擎","LU0359201612.USD":"贝莱德中国基金A2","LU0359202008.SGD":"Blackrock China Fund A2 SGD-H","LU1115378108.SGD":"Eastspring Investments - Global Emerging Markets Dynamic AS SGD","BK4602":"量子计算概念","LU1023057109.AUD":"BGF CHINA \"A2\" (AUDHDG) ACC","BK4587":"ChatGPT概念","BK4535":"淡马锡持仓","BK4612":"AI芯片","PP":"THE MEET KEVIN PRICING POWER ETF","BK4504":"桥水持仓"},"translate_title":"Baidu releases a new generation of text recognition AI model PP-OCRv5, only 0.07 B, and some tests exceed GPT-4o","themeId":null,"isJumpTheme":false,"ttsUrl":null,"symbols_score_info":{"PP":1,"BIDU":1},"content_text":"IT 之家 9 月 13 日消息，百度于 9 月 10 日在 Hugging Face 发布新一代文字识别解决方案 PP-OCRv5。百度介绍称，PP-OCRv5 是一个为缓解大型视觉语言模型（VLMs）局限性而设计的专用 OCR 模型，它提供了一种高效、准确且轻量级的解决方案。PP-OCRv5 通过保持模块化、两阶段的流程，专门针对高速、精确的文本检测和识别，解决了大型 VLMs 的精确文本定位和边界框精度局限性问题。PP-OCRv5 的亮点如下：效率：该模型参数量仅为 0.07B，能够在 CPU 和边缘设备上实现更高性能，其移动版本在英特尔 Xeon Gold 6271C CPU 上每秒可处理超过 370 个字符。性能：PP-OCRv5 在 OCR 特定基准测试中优于通用型 VLM 模型，如 Gemini 2.5 Pro、Qwen2.5-VL 和 GPT-4o，包括手写和印刷的中英文以及拼音文本。定位：PP-OCRv5 旨在提供精确的文本行边界框坐标，这对于结构化数据提取和内容分析是关键要求。多语言支持：该模型支持五种文字类型 —— 简体中文、繁体中文、英文、日文和拼音，并能识别超过 40 种语言。PP-OCRv5 由四个核心组件构成：图像预处理：处理图像的旋转和畸变，以标准化输入。文本检测：识别图像中文本行的精确位置。文本行方向：分类检测到的文本方向，以确保其正确对齐以进行识别。文本识别：将每行文本中的字符解码为文本字符串。","kind":"news","is_publish_news":true,"is_publish_highlight":false,"is_publish_live":false,"is_publish_wemedia":null,"editions":null,"column":"","sentiment":"1","news_tag":"productRelease","news_rank":0,"symbols":[],"gpt_button":0,"need_auth":false,"code":"91000000","status":"200"}}}