```Mistral OCR 3``` 米斯特拉 OCR 3
Mistral OCR 3

原始链接: https://mistral.ai/news/mistral-ocr-3

## Mistral OCR 3:文档理解领域的突破 Mistral AI 发布了 OCR 3,一种新的光学字符识别模型,在各种文档(包括表格、扫描文件、复杂表格和手写内容)上实现了最先进的准确率——**比 Mistral OCR 2 高 74% 的胜率**。它擅长提取文本*和*图像,并使用 HTML 标签重建表格结构,以准确保留布局。 与专门的 OCR 解决方案不同,Mistral OCR 3 处理各种文档类型,并在处理低质量扫描和手写内容等具有挑战性的输入时性能有所提高。它比竞争对手小得多,价格具有优势,为 **每 1,000 页 2 美元(或使用 Batch-API 折扣 1 美元)**。 开发者可以通过 API 集成,而用户可以使用新的 **Document AI Playground**——一个用于将 PDF/图像解析为文本或结构化 JSON 的拖放界面。早期采用者正在利用它进行发票处理、档案数字化和改进的企业搜索。行业分析师强调 OCR 在赋能生成式和代理 AI 方面的关键作用,并将 Mistral OCR 3 定位为释放数据价值的关键工具。

## Mistral OCR 3 讨论总结 一篇Hacker News讨论围绕着Mistral AI的新OCR模型Mistral OCR 3。虽然定价简单(每1千页1美元),但其性能存在争议。一些用户认为其性能不如PaddleOCR、MinerU和Chandra等开源替代方案,尤其是在处理具有挑战性的文档(如18世纪的葡萄牙出生登记册)时。另一些用户报告称,将PaddleOCR与Gemini 3结合使用,可以在财务报表上实现接近人类精度的识别。 一个关键点是,Mistral将其与非VLM(视觉-语言模型)计算机视觉服务进行比较,后者提供精确的边界框,但可能在“文档理解”方面不如基于VLM的提取。 几位用户强调缺乏全面的OCR排行榜,并且由于基于token的系统,难以比较定价。一位用户分享了一个项目[ocrarena.ai](https://www.ocrarena.ai/leaderboard),提供模型之间的直接比较。对话还涉及安装和运行开源OCR工具的挑战,以及对外国语言艺术书籍进行原地OCR翻译的需求。最后,Chandra OCR (Datalab) 的创始人提供了设置方面的帮助。
相关文章

原文

Highlights

  • Breakthrough performance: 74% overall win rate over Mistral OCR 2 on forms, scanned documents, complex tables, and handwriting.

  • State-of-the-art accuracy, outperforming both enterprise document processing solutions as well as AI-native OCR solutions

  • Now powers Document AI Playground in Mistral AI Studio, a simple drag-and-drop interface for parsing PDFs/images into clean text or structured JSON

  • Major upgrade over Mistral OCR 2 in forms, handwritten content, low-quality scans, and tables

Overview

Mistral OCR 3 is designed to extract text and embedded images from a wide range of documents with exceptional fidelity. It supports markdown output enriched with HTML-based table reconstruction, enabling downstream systems to understand not just document content, but also structure. As a much smaller model than most competitive solutions, it is available at an industry-leading price of $2 per 1,000 pages, with a 50% Batch-API discount, reducing the cost to $1 per 1,000 pages.

Developers can integrate the model (mistral-ocr-2512) via API, and users can leverage Document AI, a UI that parses documents into text or structured JSON instantly.

TABLE 21. Doctoral degrees awarded to men, by major field group: 1966-2012

Academic year endingAll fieldsScience and engineering fieldsNon-S&E fields
TotalBiological and agricultural sciencesEarth, atmospheric, and ocean sciencesMathematics and computer sciencesPhysical sciencesPsychologySocial sciencesEngineering
196615,86310,6462,3863927222,5358941,4242,2935,217
196717,96112,0132,5654127822,9301,0301,6992,5955,948
196820,00513,3283,0284329243,0641,1311,9062,8436,677
196922,35514,7813,2784871,0133,2411,3502,1563,2567,574
197025,52716,4043,6274931,1483,6661,4462,6043,4209,123
197127,27117,3853,8975381,1423,7181,6152,9923,4839,886
197227,75417,1913,8025611,1853,4041,6703,0883,48110,563
197327,67016,8533,7645571,1133,2091,7413,1513,31810,817
197426,59416,0433,5715471,0962,9021,7973,0163,11410,551
197525,75115,8703,6235351,0382,8111,8783,0352,9509,881
197625,26215,3753,5595318902,6171,9373,0612,7809,887
197723,85814,7753,4705888372,4771,9022,9322,5699,083
197822,55314,1993,4495248282,3641,9282,7362,3708,354
197922,30114,1283,5165428332,3811,8312,5962,4298,173
198021,61213,8143,5995308462,1991,7872,4642,3897,798
198121,46314,0563,6074848222,3181,8852,5112,4297,407
198221,01613,9243,5945128242,3371,7202,4152,5227,092
198320,74813,9203,4295018382,4301,7502,3152,6576,828
198420,63613,9543,5654728412,4461,6252,2442,7616,682
198520,55214,0433,5304708592,4521,5762,1882,9686,509
198620,59214,2683,3784629592,5851,5272,2073,1506,324
198720,93414,5803,3074919992,6861,4742,1533,4706,354
198821,67715,2673,4775411,0872,7591,3922,1113,9006,410
198921,81115,6223,4815421,2092,6271,4082,1884,1676,189
199022,96016,4983,6805811,3292,8401,3682,2214,4796,462
199123,52116,9823,7436261,5142,9191,2492,2294,7026,539
199224,23517,4233,8165851,5882,9611,3272,2854,8616,812
199324,38717,5713,8005661,6022,8681,3222,3155,0986,816
199425,06118,1673,9406221,6413,1041,2732,4355,1526,894
199525,16218,1193,9895771,7272,9221,2452,3885,2717,043
199625,29318,4614,1015651,6562,9611,1632,5235,4926,832
199724,94418,0844,0466081,5942,8801,1622,4785,3166,860
199824,63017,8104,0755641,6382,8651,2052,3525,1116,820
199923,43916,7353,9195351,4952,7221,2092,3504,5056,704
200023,16616,5183,9434951,5072,5461,2032,3654,4596,648
200122,78216,1893,7664611,4072,5311,1282,3244,5726,593
200221,81215,3923,8304771,2912,3351,0652,2174,1776,420
200322,25715,7613,7774701,4192,3961,0432,2864,3706,496
200422,96516,4173,8304481,5202,4701,0812,3134,7556,548
200523,73617,4053,9134701,7822,6701,0582,2855,2276,331
200625,02318,3753,9984902,0742,8319362,3175,7296,648
200726,20319,5424,3885422,3082,9059412,3176,1416,661
200826,27219,8574,4895482,3532,9579992,3436,1686,415
200926,33219,8404,4955392,3272,9959912,4876,0066,492
201025,52719,5704,3554962,4352,9241,0332,5235,8045,957
201126,19220,3804,4645222,4943,1671,0032,5276,2035,812
201227,39021,2334,5784962,6833,2141,0472,6886,5276,157

S&E = science and engineering. NOTE: See appendix B for specific fields that are included in each category. SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Earned Doctorates.

Benchmarks

To raise the bar, we introduced more challenging internal benchmarks based on real business use-case examples from customers. We then evaluated several models across the domains highlighted below, comparing their outputs to ground truth using fuzzy-match metric for accuracy.

Ocr Multilangual

Ocr 3

Upgrades over previous generations of OCR models

Whereas most OCR solutions today specialize in specific document types, Mistral OCR 3 is designed to excel at processing the vast majority of document types in organizations and everyday settings.

  • Handwriting: Mistral OCR accurately interprets cursive, mixed-content annotations, and handwritten text layered over printed forms.

  • Forms: Improved detection of boxes, labels, handwritten entries, and dense layouts. Works well on invoices, receipts, compliance forms, government documents, and such.

  • Scanned & complex documents: Significantly more robust to compression artifacts, skew, distortion, low DPI, and background noise.

  • Complex tables: Reconstructs table structures with headers, merged cells, multi-row blocks, and column hierarchies. Outputs HTML table tags with colspan/rowspan to fully preserve layout.

Mistral OCR 3 is a significant upgrade across all languages and document form factors compared to Mistral OCR 2. 

Win Rates   Mistral Ocr 3 Vs Ocr 2

Recommend use cases and applications

Mistral OCR 3 is ideal for both high-volume enterprise pipelines and interactive document workflows. Developers can use it for:

  • Extracting text and images into markdown for downstream agents and knowledge systems

  • Automated parsing of forms, invoices, and operational documents

  • End-to-end document understanding pipelines

  • Digitization of handwritten or historical documents

  • Any other document → knowledge transformation applications. 

Our early customers are using Mistral OCR 3 to process invoices into structured fields, digitize company archives, extract clean text from technical and scientific reports, and improve enterprise search. 

“OCR remains foundational for enabling generative AI and agentic AI,” said Tim Law, IDC Director of Research for AI and Automation. “Those organizations that can efficiently and cost-effectively extract text and embedded images with high fidelity will unlock value and will gain a competitive advantage from their data by providing richer context.”

Available today 

Access the model either through the API or via the new Document AI Playground interface, both in Mistral AI Studio. Mistral OCR 3 is fully backward compatible with Mistral OCR 2. For more details, head over to mistral.ai/docs

联系我们 contact @ memedata.com