We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website trafficโ€ฆ

๐Ÿš€ New models by Bria.ai, generate and edit images at scale ๐Ÿš€

PaddlePaddle/

PaddleOCR-VL-0.9B

$0.14

in

$0.80

out

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

PaddlePaddle/PaddleOCR-VL-0.9B cover image
demoapi

6e98c1c058cf17da3130bb8f8330a6b91ed520bf

2025-10-17T21:19:37+00:00