diff --git a/README.md b/README.md
index b1ab46b..bc96110 100755
--- a/README.md
+++ b/README.md
@@ -5,11 +5,10 @@
 <p>
 
 <h1 align="center">
-dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
+dots.ocr
 </h1>
 
-[![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
-[![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr-1.5)
 [![Arxiv](https://img.shields.io/badge/arXiv-Paper-B31B1B.svg?logo=arxiv)](https://arxiv.org/abs/2512.02498)
 
 
@@ -26,974 +25,533 @@ dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
 
 ## Introduction
 
-**dots.ocr** is a powerful, multilingual document parser that unifies layout detection and content recognition within a single vision-language model while maintaining good reading order. Despite its compact 1.7B-parameter LLM foundation, it achieves state-of-the-art(SOTA) performance.
-
-1. **Powerful Performance:** **dots.ocr** achieves SOTA performance for text, tables, and reading order on [OmniDocBench](https://github.com/opendatalab/OmniDocBench), while delivering formula recognition results comparable to much larger models like Doubao-1.5 and gemini2.5-pro.
-2. **Multilingual Support:** **dots.ocr** demonstrates robust parsing capabilities for low-resource languages, achieving decisive advantages across both layout detection and content recognition on our in-house multilingual documents benchmark.
-3. **Unified and Simple Architecture:** By leveraging a single vision-language model, **dots.ocr** offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Switching between tasks is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO.
-4.  **Efficient and Fast Performance:** Built upon a compact 1.7B LLM, **dots.ocr** provides faster inference speeds than many other high-performing models based on larger foundations.
-
-
-### Performance Comparison: dots.ocr vs. Competing Models
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/chart.png" border="0" />
-
-> **Notes:** 
-> - The EN, ZH metrics are the end2end evaluation results of [OmniDocBench](https://github.com/opendatalab/OmniDocBench), and Multilingual metric is the end2end evaluation results of dots.ocr-bench.
-
+**dots.ocr** Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text. 
 
 ## News 
-* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
-* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
+* ```2026.2.16 ``` 🚀 We release [dots.ocr-1.5](https://huggingface.co/rednote-hilab/dots.ocr-1.5), trying to recognize any human scripts and symbols, not only the document parsing, but also the image parsing. We are simultaneously releasing [dots.ocr-1.5-svg](https://huggingface.co/rednote-hilab/dots.ocr-1.5-svg), which has more robust performance on image parsing
+* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr). Try it out!
+* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://huggingface.co/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
 
 
 
-## Benchmark Results
 
-### 1. OmniDocBench
+## Evaluation
 
-#### The end-to-end evaluation results of different tasks.
+### 1. Document Parsing
+
+#### 1.1 Elo Score of different bench between latest models
+
+<table style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif;">
+  <thead>
+    <tr style="background-color: #f2f2f2; text-align: left;">
+      <th style="border: 1px solid #ddd; padding: 8px;">models</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">olmOCR-Bench</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">OmniDocBench (v1.5)</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">XDocParse</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">GLM-OCR</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">859.9</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">937.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">742.1</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">PaddleOCR-VL-1.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">873.6</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">965.6</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">797.6</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">HuanyuanOCR</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">978.9</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">974.4</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">895.9</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">dots.ocr</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1027.4</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">994.7</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1133.4</td>
+    </tr>
+    <!-- Highlighting dots.ocr-1.5 row -->
+    <tr style="background-color: #e6f7ff; font-weight: bold;">
+      <td style="border: 1px solid #ddd; padding: 8px;">dots.ocr-1.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1089.0</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1025.8</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1157.1</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">Gemini 3 Pro</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1171.2</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1102.1</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1273.9</td>
+    </tr>
+
+  </tbody>
+</table>
+
+> **Notes:** 
+> - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
+> - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: [Elo Score Prompt](https://github.com/rednote-hilab/dots.ocr/blob/master/tools/elo_score_prompt.py). These results are consistent with the findings on [ocrarena](https://www.ocrarena.ai/battle).
+
+
+#### 1.2 olmOCR-bench
+<!DOCTYPE html>
+<html lang="zh">
+<head>
+<meta charset="UTF-8">
+<style>
+    table {
+        width: 100%;
+        border-collapse: collapse;
+        font-family: Arial, sans-serif;
+        font-size: 14px;
+        color: #333;
+    }
+    th, td {
+        border: 1px solid #e0e0e0;
+        padding: 12px 8px;
+        text-align: left;
+    }
+    th {
+        background-color: #fafafa;
+        font-weight: normal;
+        vertical-align: top;
+        line-height: 1.4;
+    }
+    tr:nth-child(even) {
+        background-color: #ffffff;
+    }
+    tr:hover {
+        background-color: #f5f5f5;
+    }
+    .bold-row {
+        font-weight: bold;
+    }
+</style>
+</head>
+<body>
 
 <table>
-<thead>
-<tr>
-<th rowspan="2"><strong>Model<br>Type</strong></th>
-<th rowspan="2"><strong>Methods</strong></th>
-<th colspan="2"><strong>Overall<sup>Edit</sup>↓</strong></th>
-<th colspan="2"><strong>Text<sup>Edit</sup>↓</strong></th>
-<th colspan="2"><strong>Formula<sup>Edit</sup>↓</strong></th>
-<th colspan="2"><strong>Table<sup>TEDS</sup>↑</strong></th>
-<th colspan="2"><strong>Table<sup>Edit</sup>↓</strong></th>
-<th colspan="2"><strong>Read Order<sup>Edit</sup>↓</strong></th>
-</tr>
-<tr>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-<th><em>EN</em></th>
-<th><em>ZH</em></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td rowspan="8"><strong>Pipeline<br>Tools</strong></td>
-<td>MinerU</td>
-<td>0.150</td>
-<td>0.357</td>
-<td>0.061</td>
-<td>0.215</td>
-<td>0.278</td>
-<td>0.577</td>
-<td>78.6</td>
-<td>62.1</td>
-<td>0.180</td>
-<td>0.344</td>
-<td>0.079</td>
-<td>0.292</td>
-</tr>
-<tr>
-<td>Marker</td>
-<td>0.336</td>
-<td>0.556</td>
-<td>0.080</td>
-<td>0.315</td>
-<td>0.530</td>
-<td>0.883</td>
-<td>67.6</td>
-<td>49.2</td>
-<td>0.619</td>
-<td>0.685</td>
-<td>0.114</td>
-<td>0.340</td>
-</tr>
-<tr>
-<td>Mathpix</td>
-<td>0.191</td>
-<td>0.365</td>
-<td>0.105</td>
-<td>0.384</td>
-<td>0.306</td>
-<td>0.454</td>
-<td>77.0</td>
-<td>67.1</td>
-<td>0.243</td>
-<td>0.320</td>
-<td>0.108</td>
-<td>0.304</td>
-</tr>
-<tr>
-<td>Docling</td>
-<td>0.589</td>
-<td>0.909</td>
-<td>0.416</td>
-<td>0.987</td>
-<td>0.999</td>
-<td>1</td>
-<td>61.3</td>
-<td>25.0</td>
-<td>0.627</td>
-<td>0.810</td>
-<td>0.313</td>
-<td>0.837</td>
-</tr>
-<tr>
-<td>Pix2Text</td>
-<td>0.320</td>
-<td>0.528</td>
-<td>0.138</td>
-<td>0.356</td>
-<td>0.276</td>
-<td>0.611</td>
-<td>73.6</td>
-<td>66.2</td>
-<td>0.584</td>
-<td>0.645</td>
-<td>0.281</td>
-<td>0.499</td>
-</tr>
-<tr>
-<td>Unstructured</td>
-<td>0.586</td>
-<td>0.716</td>
-<td>0.198</td>
-<td>0.481</td>
-<td>0.999</td>
-<td>1</td>
-<td>0</td>
-<td>0.06</td>
-<td>1</td>
-<td>0.998</td>
-<td>0.145</td>
-<td>0.387</td>
-</tr>
-<tr>
-<td>OpenParse</td>
-<td>0.646</td>
-<td>0.814</td>
-<td>0.681</td>
-<td>0.974</td>
-<td>0.996</td>
-<td>1</td>
-<td>64.8</td>
-<td>27.5</td>
-<td>0.284</td>
-<td>0.639</td>
-<td>0.595</td>
-<td>0.641</td>
-</tr>
-<tr>
-<td>PPStruct-V3</td>
-<td>0.145</td>
-<td>0.206</td>
-<td>0.058</td>
-<td>0.088</td>
-<td>0.295</td>
-<td>0.535</td>
-<td>-</td>
-<td>-</td>
-<td>0.159</td>
-<td>0.109</td>
-<td>0.069</td>
-<td>0.091</td>
-</tr>
-<tr>
-<td rowspan="9"><strong>Expert<br>VLMs</strong></td>
-<td>GOT-OCR</td>
-<td>0.287</td>
-<td>0.411</td>
-<td>0.189</td>
-<td>0.315</td>
-<td>0.360</td>
-<td>0.528</td>
-<td>53.2</td>
-<td>47.2</td>
-<td>0.459</td>
-<td>0.520</td>
-<td>0.141</td>
-<td>0.280</td>
-</tr>
-<tr>
-<td>Nougat</td>
-<td>0.452</td>
-<td>0.973</td>
-<td>0.365</td>
-<td>0.998</td>
-<td>0.488</td>
-<td>0.941</td>
-<td>39.9</td>
-<td>0</td>
-<td>0.572</td>
-<td>1.000</td>
-<td>0.382</td>
-<td>0.954</td>
-</tr>
-<tr>
-<td>Mistral OCR</td>
-<td>0.268</td>
-<td>0.439</td>
-<td>0.072</td>
-<td>0.325</td>
-<td>0.318</td>
-<td>0.495</td>
-<td>75.8</td>
-<td>63.6</td>
-<td>0.600</td>
-<td>0.650</td>
-<td>0.083</td>
-<td>0.284</td>
-</tr>
-<tr>
-<td>OLMOCR-sglang</td>
-<td>0.326</td>
-<td>0.469</td>
-<td>0.097</td>
-<td>0.293</td>
-<td>0.455</td>
-<td>0.655</td>
-<td>68.1</td>
-<td>61.3</td>
-<td>0.608</td>
-<td>0.652</td>
-<td>0.145</td>
-<td>0.277</td>
-</tr>
-<tr>
-<td>SmolDocling-256M</td>
-<td>0.493</td>
-<td>0.816</td>
-<td>0.262</td>
-<td>0.838</td>
-<td>0.753</td>
-<td>0.997</td>
-<td>44.9</td>
-<td>16.5</td>
-<td>0.729</td>
-<td>0.907</td>
-<td>0.227</td>
-<td>0.522</td>
-</tr>
-<tr>
-<td>Dolphin</td>
-<td>0.206</td>
-<td>0.306</td>
-<td>0.107</td>
-<td>0.197</td>
-<td>0.447</td>
-<td>0.580</td>
-<td>77.3</td>
-<td>67.2</td>
-<td>0.180</td>
-<td>0.285</td>
-<td>0.091</td>
-<td>0.162</td>
-</tr>
-<tr>
-<td>MinerU 2</td>
-<td>0.139</td>
-<td>0.240</td>
-<td>0.047</td>
-<td>0.109</td>
-<td>0.297</td>
-<td>0.536</td>
-<td>82.5</td>
-<td>79.0</td>
-<td>0.141</td>
-<td>0.195</td>
-<td>0.069<</td>
-<td>0.118</td>
-</tr>
-<tr>
-<td>OCRFlux</td>
-<td>0.195</td>
-<td>0.281</td>
-<td>0.064</td>
-<td>0.183</td>
-<td>0.379</td>
-<td>0.613</td>
-<td>71.6</td>
-<td>81.3</td>
-<td>0.253</td>
-<td>0.139</td>
-<td>0.086</td>
-<td>0.187</td>
-</tr>
-<tr>
-<td>MonkeyOCR-pro-3B</td>
-<td>0.138</td>
-<td>0.206</td>
-<td>0.067</td>
-<td>0.107</td>
-<td><strong>0.246</strong></td>
-<td>0.421</td>
-<td>81.5</td>
-<td>87.5</td>
-<td>0.139</td>
-<td>0.111</td>
-<td>0.100</td>
-<td>0.185</td>
-</tr>
-<tr>
+    <thead>
+        <tr>
+            <th></th>
+            <th>ArXiv</th>
+            <th>Old scans math</th>
+            <th>Tables</th>
+            <th>Old scans</th>
+            <th>Headers & footers</th>
+            <th>Multi column</th>
+            <th>Long tiny text</th>
+            <th>Base</th>
+            <th>Overall</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Mistral OCR API</td>
+            <td>77.2</td>
+            <td>67.5</td>
+            <td>60.6</td>
+            <td>29.3</td>
+            <td>93.6</td>
+            <td>71.3</td>
+            <td>77.1</td>
+            <td>99.4</td>
+            <td>72.0±1.1</td>
+        </tr>
+        <tr>
+            <td>Marker 1.10.1</td>
+            <td>83.8</td>
+            <td>66.8</td>
+            <td>72.9</td>
+            <td>33.5</td>
+            <td>86.6</td>
+            <td>80.0</td>
+            <td>85.7</td>
+            <td>99.3</td>
+            <td>76.1±1.1</td>
+        </tr>
+        <tr>
+            <td>MinerU 2.5.4*</td>
+            <td>76.6</td>
+            <td>54.6</td>
+            <td>84.9</td>
+            <td>33.7</td>
+            <td>96.6</td>
+            <td>78.2</td>
+            <td>83.5</td>
+            <td>93.7</td>
+            <td>75.2±1.1</td>
+        </tr>
+        <tr>
+            <td>DeepSeek-OCR</td>
+            <td>77.2</td>
+            <td>73.6</td>
+            <td>80.2</td>
+            <td>33.3</td>
+            <td>96.1</td>
+            <td>66.4</td>
+            <td>79.4</td>
+            <td>99.8</td>
+            <td>75.7±1.0</td>
+        </tr>
+        <tr>
+            <td>Nanonets-OCR2-3B</td>
+            <td>75.4</td>
+            <td>46.1</td>
+            <td>86.8</td>
+            <td>40.9</td>
+            <td>32.1</td>
+            <td>81.9</td>
+            <td>93.0</td>
+            <td>99.6</td>
+            <td>69.5±1.1</td>
+        </tr>
+        <tr>
+            <td>PaddleOCR-VL*</td>
+            <td>85.7</td>
+            <td>71.0</td>
+            <td>84.1</td>
+            <td>37.8</td>
+            <td>97.0</td>
+            <td>79.9</td>
+            <td>85.7</td>
+            <td>98.5</td>
+            <td>80.0±1.0</td>
+        </tr>
+        <tr>
+            <td>Infinity-Parser 7B*</td>
+            <td>84.4</td>
+            <td>83.8</td>
+            <td>85.0</td>
+            <td>47.9</td>
+            <td>88.7</td>
+            <td>84.2</td>
+            <td>86.4</td>
+            <td>99.8</td>
+            <td>82.5±?</td>
+        </tr>
+        <tr>
+            <td>olmOCR v0.4.0</td>
+            <td>83.0</td>
+            <td>82.3</td>
+            <td>84.9</td>
+            <td>47.7</td>
+            <td>96.1</td>
+            <td>83.7</td>
+            <td>81.9</td>
+            <td>99.7</td>
+            <td>82.4±1.1</td>
+        </tr>
+        <tr>
+            <td>Chandra OCR 0.1.0*</td>
+            <td>82.2</td>
+            <td>80.3</td>
+            <td>88.0</td>
+            <td>50.4</td>
+            <td>90.8</td>
+            <td>81.2</td>
+            <td>92.3</td>
+            <td>99.9</td>
+            <td>83.1±0.9</td>
+        </tr>
+        <tr>
+            <td>dots.ocr</td>
+            <td>82.1</td>
+            <td>64.2</td>
+            <td>88.3</td>
+            <td>40.9</td>
+            <td>94.1</td>
+            <td>82.4</td>
+            <td>81.2</td>
+            <td>99.5</td>
+            <td>79.1% ± 1.0%</td>
+        </tr>
+        <tr>
+            <td class="bold-row">dots.ocr-1.5</td>
+            <td><strong>85.9</strong></td>
+            <td><strong>85.5</strong></td>
+            <td><strong>90.7</strong></td>
+            <td>48.2</td>
+            <td>94.0</td>
+            <td><strong>85.3</strong></td>
+            <td>81.6</td>
+            <td>99.7</td>
+            <td><strong>83.9% ± 0.9</strong></td>
+        </tr>
+    </tbody>
+</table>
 
-<td rowspan="5"><strong>General<br>VLMs</strong></td>
-<td>GPT4o</td>
-<td>0.233</td>
-<td>0.399</td>
-<td>0.144</td>
-<td>0.409</td>
-<td>0.425</td>
-<td>0.606</td>
-<td>72.0</td>
-<td>62.9</td>
-<td>0.234</td>
-<td>0.329</td>
-<td>0.128</td>
-<td>0.251</td>
-</tr>
+</body>
+</html>
+
+> **Note:**
+> - The metrics are from [olmocr](https://github.com/allenai/olmocr), and our own internal evaluations.
+> - We delete the Page-header and Page-footer cells in the result markdown.
+
+
+#### 1.3 Other Benchmarks
+
+<table>
+  <thead>
     <tr>
-      <td>Qwen2-VL-72B</td>
-      <td>0.252</td>
-      <td>0.327</td>
-      <td>0.096</td>
-      <td>0.218</td>
-      <td>0.404</td>
-      <td>0.487</td>
-      <td>76.8</td>
-      <td>76.4</td>
-      <td>0.387</td>
-      <td>0.408</td>
-      <td>0.119</td>
-      <td>0.193</td>
+      <th>Model Type</th>
+      <th>Methods</th>
+      <th>Size</th>
+      <th>OmniDocBench(v1.5)<br>TextEdit↓</th>
+      <th>OmniDocBench(v1.5)<br>Read OrderEdit↓</th>
+      <th>pdf-parse-bench</th>
+    </tr>
+  </thead>
+  <tbody>
+    <!-- GeneralVLMs Group (Reversed Order, 3 rows) -->
+    <tr>
+      <td rowspan="3"><strong>GeneralVLMs</strong></td>
+      <td>Gemini-2.5 Pro</td>
+      <td>-</td>
+      <td>0.075</td>
+      <td>0.097</td>
+      <td>9.06</td>
     </tr>
     <tr>
-      <td>Qwen2.5-VL-72B</td>
-      <td>0.214</td>
-      <td>0.261</td>
-      <td>0.092</td>
-      <td>0.18</td>
-      <td>0.315</td>
-      <td>0.434</td>
-      <td>82.9</td>
-      <td>83.9</td>
-      <td>0.341</td>
-      <td>0.262</td>
-      <td>0.106</td>
-      <td>0.168</td>
+      <td>Qwen3-VL-235B-A22B-Instruct</td>
+      <td>235B</td>
+      <td>0.069</td>
+      <td>0.068</td>
+      <td><strong>9.71</strong></td>
     </tr>
     <tr>
-      <td>Gemini2.5-Pro</td>
-      <td>0.148</td>
-      <td>0.212</td>
-      <td>0.055</td>
-      <td>0.168</td>
-      <td>0.356</td>
-      <td>0.439</td>
-      <td>85.8</td>
-      <td>86.4</td>
-      <td>0.13</td>
-      <td>0.119</td>
-      <td>0.049</td>
-      <td>0.121</td>
+      <td>gemini3pro</td>
+      <td>-</td>
+      <td>0.066</td>
+      <td>0.079</td>
+      <td>9.68</td>
+    </tr>
+    <!-- SpecializedVLMs Group (Reversed Order, 12 rows) -->
+    <tr>
+      <td rowspan="12"><strong>SpecializedVLMs</strong></td>
+      <td>Mistral OCR</td>
+      <td>-</td>
+      <td>0.164</td>
+      <td>0.144</td>
+      <td>8.84</td>
     </tr>
     <tr>
-      <td>doubao-1-5-thinking-vision-pro-250428</td>
-      <td>0.140</td>
-      <td>0.162</td>
-      <td>0.043</td>
-      <td>0.085</td>
-      <td>0.295</td>
-      <td><strong>0.384</strong></td>
-      <td>83.3</td>
-      <td><strong>89.3</strong></td>
-      <td>0.165</td>
-      <td><strong>0.085</strong></td>
+      <td>Deepseek-OCR</td>
+      <td>3B</td>
+      <td>0.073</td>
+      <td>0.086</td>
+      <td>8.26</td>
+    </tr>
+    <tr>
+      <td>MonkeyOCR-3B</td>
+      <td>3B</td>
+      <td>0.075</td>
+      <td>0.129</td>
+      <td>9.27</td>
+    </tr>
+    <tr>
+      <td>OCRVerse</td>
+      <td>4B</td>
       <td>0.058</td>
-      <td>0.094</td>
+      <td>0.071</td>
+      <td>--</td>
     </tr>
-<tr>
-<td rowspan="1"><strong>Expert VLMs</strong></td>
-<td><strong>dots.ocr</strong></td>
-<td><strong>0.125</strong></td>
-<td><strong>0.160</strong></td>
-<td><strong>0.032</strong></td>
-<td><strong>0.066</strong></td>
-<td>0.329</td>
-<td>0.416</td>
-<td><strong>88.6</strong></td>
-<td>89.0</td>
-<td><strong>0.099</strong></td>
-<td>0.092</td>
-<td><strong>0.040</strong></td>
-<td><strong>0.067</strong></td>
-</tr>
-<tr>
-</tbody>
+    <tr>
+      <td>MonkeyOCR-pro-3B</td>
+      <td>3B</td>
+      <td>0.075</td>
+      <td>0.128</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>MinerU2.5</td>
+      <td>1.2B</td>
+      <td>0.047</td>
+      <td>0.044</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>PaddleOCR-VL</td>
+      <td>0.9B</td>
+      <td>0.035</td>
+      <td>0.043</td>
+      <td>9.51</td>
+    </tr>
+    <tr>
+      <td>HunyuanOCR</td>
+      <td>0.9B</td>
+      <td>0.042</td>
+      <td>-</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>PaddleOCR-VL1.5</td>
+      <td>0.9B</td>
+      <td>0.035</td>
+      <td>0.042</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>GLMOCR</td>
+      <td>0.9B</td>
+      <td>0.04</td>
+      <td>0.043</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>dots.ocr</td>
+      <td>3B</td>
+      <td>0.048</td>
+      <td>0.053</td>
+      <td>9.29</td>
+    </tr>
+    <tr>
+      <td><u><strong>dots.ocr-1.5</strong></u></td>
+      <td>3B</td>
+      <td><strong>0.031</strong></td>
+      <td><strong>0.029</strong></td>
+      <td>9.54</td>
+    </tr>
+  </tbody>
 </table>
 
+> **Note:**
+> - Metrics are sourced from [OmniDocBench](https://github.com/opendatalab/OmniDocBench) and other model publications. [pdf-parse-bench](https://github.com/phorn1/pdf-parse-bench) results are reproduced by Qwen3-VL-235B-A22B-Instruct.
+> - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.
 
-#### The end-to-end text recognition performance across 9 PDF page types.
+
+### 2. Vision-Language Parsing
+Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. **dots.ocr-1.5** unifies the interpretation of these elements by parsing them directly into **SVG code**.
 
 <table>
-<thead>
-<tr>
-<th><strong>Model<br>Type</strong></th>
-<th><strong>Models</strong></th>
-<th><strong>Book</strong></th>
-<th><strong>Slides</strong></th>
-<th><strong>Financial<br>Report</strong></th>
-<th><strong>Textbook</strong></th>
-<th><strong>Exam<br>Paper</strong></th>
-<th><strong>Magazine</strong></th>
-<th><strong>Academic<br>Papers</strong></th>
-<th><strong>Notes</strong></th>
-<th><strong>Newspaper</strong></th>
-<th><strong>Overall</strong></th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td rowspan="3"><strong>Pipeline<br>Tools</strong></td>
-<td>MinerU</td>
-<td>0.055</td>
-<td>0.124</td>
-<td><u>0.033</u></td>
-<td>0.102</td>
-<td>0.159</td>
-<td><strong>0.072</strong></td>
-<td><u>0.025</u></td>
-<td>0.984</td>
-<td>0.171</td>
-<td>0.206</td>
-</tr>
-<tr>
-<td>Marker</td>
-<td>0.074</td>
-<td>0.340</td>
-<td>0.089</td>
-<td>0.319</td>
-<td>0.452</td>
-<td>0.153</td>
-<td>0.059</td>
-<td>0.651</td>
-<td>0.192</td>
-<td>0.274</td>
-</tr>
-<tr>
-<td>Mathpix</td>
-<td>0.131</td>
-<td>0.220</td>
-<td>0.202</td>
-<td>0.216</td>
-<td>0.278</td>
-<td>0.147</td>
-<td>0.091</td>
-<td>0.634</td>
-<td>0.690</td>
-<td>0.300</td>
-</tr>
-<tr>
-<td rowspan="5"><strong>Expert<br>VLMs</strong></td>
-<td>GOT-OCR</td>
-<td>0.111</td>
-<td>0.222</td>
-<td>0.067</td>
-<td>0.132</td>
-<td>0.204</td>
-<td>0.198</td>
-<td>0.179</td>
-<td>0.388</td>
-<td>0.771</td>
-<td>0.267</td>
-</tr>
-<tr>
-<td>Nougat</td>
-<td>0.734</td>
-<td>0.958</td>
-<td>1.000</td>
-<td>0.820</td>
-<td>0.930</td>
-<td>0.830</td>
-<td>0.214</td>
-<td>0.991</td>
-<td>0.871</td>
-<td>0.806</td>
-</tr>
-<tr>
-<td>Dolphin</td>
-<td>0.091</td>
-<td>0.131</td>
-<td>0.057</td>
-<td>0.146</td>
-<td>0.231</td>
-<td>0.121</td>
-<td>0.074</td>
-<td>0.363</td>
-<td>0.307</td>
-<td>0.177</td>
-</tr>
-<tr>
-<td>OCRFlux</td>
-<td>0.068</td>
-<td>0.125</td>
-<td>0.092</td>
-<td>0.102</td>
-<td>0.119</td>
-<td>0.083</td>
-<td>0.047</td>
-<td>0.223</td>
-<td>0.536</td>
-<td>0.149</td>
-</tr>
-<tr>
-<td>MonkeyOCR-pro-3B</td>
-<td>0.084</td>
-<td>0.129</td>
-<td>0.060</td>
-<td>0.090</td>
-<td>0.107</td>
-<td>0.073</td>
-<td>0.050</td>
-<td>0.171</td>
-<td>0.107</td>
-<td>0.100</td>
-</tr>
-<tr>
-<td rowspan="4"><strong>General<br>VLMs</strong></td>
-<td>GPT4o</td>
-<td>0.157</td>
-<td>0.163</td>
-<td>0.348</td>
-<td>0.187</td>
-<td>0.281</td>
-<td>0.173</td>
-<td>0.146</td>
-<td>0.607</td>
-<td>0.751</td>
-<td>0.316</td>
-</tr>
-<tr>
-<td>Qwen2.5-VL-7B</td>
-<td>0.148</td>
-<td>0.053</td>
-<td>0.111</td>
-<td>0.137</td>
-<td>0.189</td>
-<td>0.117</td>
-<td>0.134</td>
-<td>0.204</td>
-<td>0.706</td>
-<td>0.205</td>
-</tr>
-<tr>
-<td>InternVL3-8B</td>
-<td>0.163</td>
-<td>0.056</td>
-<td>0.107</td>
-<td>0.109</td>
-<td>0.129</td>
-<td>0.100</td>
-<td>0.159</td>
-<td>0.150</td>
-<td>0.681</td>
-<td>0.188</td>
-</tr>
-<tr>
-<td>doubao-1-5-thinking-vision-pro-250428</td>
-<td>0.048</td>
-<td>0.048</td>
-<td>0.024</td>
-<td><strong>0.062</strong></td>
-<td>0.085</td>
-<td>0.051</td>
-<td>0.039</td>
-<td><strong>0.096</strong></td>
-<td>0.181</td>
-<td>0.073</td>
-</tr>
-<tr>
-<td rowspan="1"><strong>Expert VLMs</strong></td>
-<td><strong>dots.ocr</strong></td>
-<td><strong>0.031</strong></td>
-<td><strong>0.047</strong></td>
-<td><strong>0.011</strong></td>
-<td>0.082</td>
-<td><strong>0.079</strong></td>
-<td><strong>0.028</strong></td>
-<td><strong>0.029</strong></td>
-<td>0.109</td>
-<td><strong>0.056</strong></td>
-<td><strong>0.055</strong></td>
-</tr>
-
-</tbody>
-</table>
-
-> **Notes:** 
-> - The metrics are from [MonkeyOCR](https://github.com/Yuliang-Liu/MonkeyOCR), [OmniDocBench](https://github.com/opendatalab/OmniDocBench), and our own internal evaluations.
-> - We delete the Page-header and Page-footer cells in the result markdown.
-> - We use tikz_preprocess pipeline to upsample the images to dpi 200.
-
-
-### 2. **dots.ocr-bench**
-
-This is an inhouse benchmark which contain 1493 pdf images with 100 languages.
-
-#### The end-to-end evaluation results of different tasks.
-
-<table>
-<thead>
-<tr>
-<th rowspan="1"><strong>Methods</strong></th>
-<th colspan="1"><strong>Overall<sup>Edit</sup>↓</strong></th>
-<th colspan="1"><strong>Text<sup>Edit</sup>↓</strong></th>
-<th colspan="1"><strong>Formula<sup>Edit</sup>↓</strong></th>
-<th colspan="1"><strong>Table<sup>TEDS</sup>↑</strong></th>
-<th colspan="1"><strong>Table<sup>Edit</sup>↓</strong></th>
-<th colspan="1"><strong>Read Order<sup>Edit</sup>↓</strong></th>
-</tr>
-</thead>
-<tbody>
-<td>MonkeyOCR-3B</td>
-<td>0.483</td>
-<td>0.445</td>
-<td>0.627</td>
-<td>50.93</td>
-<td>0.452</td>
-<td>0.409</td>
-</tr>
-<tr>
-<td>doubao-1-5-thinking-vision-pro-250428</td>
-<td>0.291</td>
-<td>0.226</td>
-<td>0.440</td>
-<td>71.2</td>
-<td>0.260</td>
-<td>0.238</td>
-</tr>
-<tr>
-<td>doubao-1-6</td>
-<td>0.299</td>
-<td>0.270</td>
-<td>0.417</td>
-<td>71.0</td>
-<td>0.258</td>
-<td>0.253</td>
-</tr>
-<tr>
-<td>Gemini2.5-Pro</td>
-<td>0.251</td>
-<td>0.163</td>
-<td>0.402</td>
-<td>77.1</td>
-<td>0.236</td>
-<td>0.202</td>
-</tr>
-<tr>
-<td><strong>dots.ocr</strong> </td>
-<td><strong>0.177</strong></td>
-<td><strong>0.075</strong></td>
-<td><strong>0.297</strong></td>
-<td><strong>79.2</strong></td>
-<td><strong>0.186</strong></td>
-<td><strong>0.152</strong></td>
-</tr>
-
-</tbody>
-</table>
-
-> **Notes:** 
-> - We use the same metric calculation pipeline of [OmniDocBench](https://github.com/opendatalab/OmniDocBench).
-> - We delete the Page-header and Page-footer cells in the result markdown.
-
-#### Layout Detection
-
-<table>
-<thead>
-<tr>
-<th rowspan="2"><strong>Method</strong></th>
-<th colspan="5" style="text-align: center;"><strong>F1@IoU=.50:.05:.95↑</strong></th>
-<th colspan="5" style="text-align: center;"><strong>F1@IoU=.50↑</strong></th>
-</tr>
-<tr>
-<th>Overall</th>
-<th>Text</th>
-<th>Formula</th>
-<th>Table</th>
-<th>Picture</th>
-<th>Overall</th>
-<th>Text</th>
-<th>Formula</th>
-<th>Table</th>
-<th>Picture</th>
-</tr>
-</thead>
-
-<tbody>
-<td>DocLayout-YOLO-DocStructBench</td>
-<td>0.733</td>
-<td>0.694</td>
-<td>0.480</td>
-<td>0.803</td>
-<td>0.619</td>
-<td>0.806</td>
-<td>0.779</td>
-<td>0.620</td>
-<td>0.858</td>
-<td>0.678</td>
-</tr>
-
-<tr>
-<td>dots.ocr-parse all</td>
-<td>0.831</td>
-<td>0.801</td>
-<td>0.654</td>
-<td>0.838</td>
-<td>0.748</td>
-<td>0.922</td>
-<td>0.909</td>
-<td>0.770</td>
-<td>0.888</td>
-<td>0.831</td>
-</tr>
-
-<tr>
-<td> <strong>dots.ocr-detection only</strong> </td>
-<td><strong>0.845</strong></td>
-<td><strong>0.816</strong></td>
-<td><strong>0.716</strong></td>
-<td><strong>0.875</strong></td>
-<td><strong>0.765</strong></td>
-<td><strong>0.930</strong></td>
-<td><strong>0.917</strong></td>
-<td><strong>0.832</strong></td>
-<td><strong>0.918</strong></td>
-<td><strong>0.843</strong></td>
-</tr>
-
-</tbody>
-</table>
-
-> **Notes:**  
-> - prompt_layout_all_en for **parse all**, prompt_layout_only_en for **detection only**, please refer to [prompts](https://github.com/rednote-hilab/dots.ocr/blob/master/dots_ocr/utils/prompts.py)
-
-
-### 3. olmOCR-bench.
-
-<table>
-<thead>
-<tr>
-<th>Model</th>
-<th>ArXiv</th>
-<th>Old Scans<br>Math</th>
-<th>Tables</th>
-<th>Old Scans</th>
-<th>Headers and<br>Footers</th>
-<th>Multi<br>column</th>
-<th>Long Tiny<br>Text</th>
-<th>Base</th>
-<th>Overall</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td>GOT OCR</td>
-<td>52.7</td>
-<td>52.0</td>
-<td>0.2</td>
-<td>22.1</td>
-<td>93.6</td>
-<td>42.0</td>
-<td>29.9</td>
-<td>94.0</td>
-<td>48.3 ± 1.1</td>
-</tr>
-<tr>
-<td>Marker</td>
-<td>76.0</td>
-<td>57.9</td>
-<td>57.6</td>
-<td>27.8</td>
-<td>84.9</td>
-<td>72.9</td>
-<td>84.6</td>
-<td>99.1</td>
-<td>70.1 ± 1.1</td>
-</tr>
-<tr>
-<td>MinerU</td>
-<td>75.4</td>
-<td>47.4</td>
-<td>60.9</td>
-<td>17.3</td>
-<td><strong>96.6</strong></td>
-<td>59.0</td>
-<td>39.1</td>
-<td>96.6</td>
-<td>61.5 ± 1.1</td>
-</tr>
-<tr>
-<td>Mistral OCR</td>
-<td>77.2</td>
-<td>67.5</td>
-<td>60.6</td>
-<td>29.3</td>
-<td>93.6</td>
-<td>71.3</td>
-<td>77.1</td>
-<td>99.4</td>
-<td>72.0 ± 1.1</td>
-</tr>
-<tr>
-<td>Nanonets OCR</td>
-<td>67.0</td>
-<td>68.6</td>
-<td>77.7</td>
-<td>39.5</td>
-<td>40.7</td>
-<td>69.9</td>
-<td>53.4</td>
-<td>99.3</td>
-<td>64.5 ± 1.1</td>
-</tr>
-<tr>
-<td>GPT-4o<br>(No Anchor)</td>
-<td>51.5</td>
-<td><strong>75.5</strong></td>
-<td>69.1</td>
-<td>40.9</td>
-<td>94.2</td>
-<td>68.9</td>
-<td>54.1</td>
-<td>96.7</td>
-<td>68.9 ± 1.1</td>
-</tr>
-<tr>
-<td>GPT-4o<br>(Anchored)</td>
-<td>53.5</td>
-<td>74.5</td>
-<td>70.0</td>
-<td>40.7</td>
-<td>93.8</td>
-<td>69.3</td>
-<td>60.6</td>
-<td>96.8</td>
-<td>69.9 ± 1.1</td>
-</tr>
-<tr>
-<td>Gemini Flash 2<br>(No Anchor)</td>
-<td>32.1</td>
-<td>56.3</td>
-<td>61.4</td>
-<td>27.8</td>
-<td>48.0</td>
-<td>58.7</td>
-<td><strong>84.4</strong></td>
-<td>94.0</td>
-<td>57.8 ± 1.1</td>
-</tr>
-<tr>
-<td>Gemini Flash 2<br>(Anchored)</td>
-<td>54.5</td>
-<td>56.1</td>
-<td>72.1</td>
-<td>34.2</td>
-<td>64.7</td>
-<td>61.5</td>
-<td>71.5</td>
-<td>95.6</td>
-<td>63.8 ± 1.2</td>
-</tr>
-<tr>
-<td>Qwen 2 VL<br>(No Anchor)</td>
-<td>19.7</td>
-<td>31.7</td>
-<td>24.2</td>
-<td>17.1</td>
-<td>88.9</td>
-<td>8.3</td>
-<td>6.8</td>
-<td>55.5</td>
-<td>31.5 ± 0.9</td>
-</tr>
-<tr>
-<td>Qwen 2.5 VL<br>(No Anchor)</td>
-<td>63.1</td>
-<td>65.7</td>
-<td>67.3</td>
-<td>38.6</td>
-<td>73.6</td>
-<td>68.3</td>
-<td>49.1</td>
-<td>98.3</td>
-<td>65.5 ± 1.2</td>
-</tr>
-<tr>
-<td>olmOCR v0.1.75<br>(No Anchor)</td>
-<td>71.5</td>
-<td>71.4</td>
-<td>71.4</td>
-<td><strong>42.8</strong></td>
-<td>94.1</td>
-<td>77.7</td>
-<td>71.0</td>
-<td>97.8</td>
-<td>74.7 ± 1.1</td>
-</tr>
-<tr>
-<td>olmOCR v0.1.75<br>(Anchored)</td>
-<td>74.9</td>
-<td>71.2</td>
-<td>71.0</td>
-<td>42.2</td>
-<td>94.5</td>
-<td>78.3</td>
-<td>73.3</td>
-<td>98.3</td>
-<td>75.5 ± 1.0</td>
-</tr>
-<tr>
-<td>MonkeyOCR-pro-3B</td>
-<td><strong>83.8</strong></td>
-<td>68.8</td>
-<td>74.6</td>
-<td>36.1</td>
-<td>91.2</td>
-<td>76.6</td>
-<td>80.1</td>
-<td>95.3</td>
-<td>75.8 ± 1.0</td>
-</tr>
-<tr>
-<td><strong>dots.ocr</strong></td>
-<td>82.1</td>
-<td>64.2</td>
-<td><strong>88.3</strong></td>
-<td>40.9</td>
-<td>94.1</td>
-<td><strong>82.4</strong></td>
-<td>81.2</td>
-<td><strong>99.5</strong></td>
-<td><strong>79.1 ± 1.0</strong></td>
-</tr>
-</tbody>
+  <thead>
+    <tr>
+      <th rowspan="2" style="text-align: left;">Methods</th>
+      <th colspan="3">Unisvg</th>
+      <th rowspan="2">Chartmimic</th>
+      <th rowspan="2">Design2Code</th>
+      <th rowspan="2">Genexam</th>
+      <th rowspan="2">SciGen</th>
+      <th rowspan="2">ChemDraw</th>
+    </tr>
+    <tr>
+      <th>Low-Level</th>
+      <th>High-Level</th>
+      <th>Score</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left;">OCRVerse</td>
+      <td>0.632</td>
+      <td>0.852</td>
+      <td>0.763</td>
+      <td>0.799</td>
+      <td>-</td>
+      <td>-</td>
+      <td>-</td>
+      <td>0.881</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;">Gemini 3 Pro</td>
+      <td>0.563</td>
+      <td>0.850</td>
+      <td>0.735</td>
+      <td>0.788</td>
+      <td>0.760</td>
+      <td>0.756</td>
+      <td>0.783</td>
+      <td>0.839</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;">dots.ocr-1.5</td>
+      <td>0.850</td>
+      <td>0.923</td>
+      <td>0.894</td>
+      <td>0.772</td>
+      <td>0.801</td>
+      <td>0.664</td>
+      <td>0.660</td>
+      <td>0.790</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;"><strong>dots.ocr-1.5-svg</strong></td>
+      <td><strong>0.860</strong></td>
+      <td><strong>0.931</strong></td>
+      <td><strong>0.902</strong></td>
+      <td><strong>0.905</strong></td>
+      <td><strong>0.834</strong></td>
+      <td><strong>0.8</strong></td>
+      <td><strong>0.797</strong></td>
+      <td><strong>0.901</strong></td>
+    </tr>
+  </tbody>
 </table>
 
 
 > **Note:**
-> - The metrics are from [MonkeyOCR](https://github.com/Yuliang-Liu/MonkeyOCR), 
-[olmocr](https://github.com/allenai/olmocr), and our own internal evaluations.
-> - We delete the Page-header and Page-footer cells in the result markdown.
+> - We use the ISVGEN metric from [UniSVG](https://ryanlijinke.github.io/) to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image. 
+> - [OCRVerse](https://github.com/DocTron-hub/OCRVerse) results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.ocr-1.5 are based specifically on SVG code.
+> - Due to the capacity constraints of a 3B-parameter VLM, dots.ocr-1.5 may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.ocr-1.5-svg. We plan to further address these limitations in future updates.
+
+
+### 3. General Vision Tasks
+
+<table>
+    <thead>
+        <tr>
+            <th>Model</th>
+            <th>CharXiv_descriptive</th>
+            <th>CharXiv_reasoning</th>
+            <th>OCR_Reasoning</th>
+            <th>infovqa</th>
+            <th>docvqa</th>
+            <th>ChartQA</th>
+            <th>OCRBench</th>
+            <th>AI2D</th>
+            <th>CountBenchQA</th>
+            <th>refcoco</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Qwen3vl-2b-instruct</td>
+            <td>62.3</td>
+            <td>26.8</td>
+            <td>-</td>
+            <td>72.4</td>
+            <td>93.3</td>
+            <td>-</td>
+            <td>85.8</td>
+            <td>76.9</td>
+            <td>88.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td><strong>dots.ocr-1.5</strong></td>
+            <td>77.4</td>
+            <td>55.3</td>
+            <td>22.85</td>
+            <td>73.76</td>
+            <td>91.85</td>
+            <td>83.2</td>
+            <td>86.0</td>
+            <td>82.16</td>
+            <td>94.46</td>
+            <td>80.03</td>
+        </tr>
+    </tbody>
+</table>
 
 
 
 # Quick Start
 ## 1. Installation
-### Install dots.ocr
+### Install dots.ocr-1.5
 ```shell
 conda create -n dots_ocr python=3.12
 conda activate dots_ocr
@@ -1015,7 +573,7 @@ pip install -e .
 
 
 ### Download Model Weights
-> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
+> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR_1_5` instead of `dots.ocr-1.5`) for the model save path. This is a temporary workaround pending our integration with Transformers.
 ```shell
 python3 tools/download_model.py
 
@@ -1028,14 +586,30 @@ python3 tools/download_model.py --type modelscope
 ### vLLM inference
 We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
 
+> **Note:**
+> - We found a little bit performance drop when using vLLM 0.11.0. We are working on a fix.
+
 ```shell
 # Launch vLLM model server
-vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
+## dots.ocr-1.5
+CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5 --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
+
+## dots.ocr-1.5-svg
+CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5-svg --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
 
 # vLLM API Demo
 # See dots_ocr/model/inference.py for details on parameter and prompt settings 
 # that help achieve the best output quality.
+## document parsing
 python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
+## web parsing 
+
+## scene spoting
+
+## image parsing with svg code
+
+## general qa
+
 ```
 
 ### Hugginface inference
@@ -1052,7 +626,7 @@ from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
 from qwen_vl_utils import process_vision_info
 from dots_ocr.utils import dict_promptmode_to_prompt
 
-model_path = "./weights/DotsOCR"
+model_path = "./weights/DotsOCR_1_5"
 model = AutoModelForCausalLM.from_pretrained(
     model_path,
     attn_implementation="flash_attention_2",
@@ -1146,8 +720,6 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
 # Parse text only, except Page-header and Page-footer
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
 
-# Parse layout info by bbox
-python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
 
 ```
 **Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`. 
@@ -1164,62 +736,48 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --
 
 </details>
 
+
 ## 4. Demo
-You can run the demo with the following command, or try directly at [live demo](https://dotsocr.xiaohongshu.com/)
-```bash
-python demo/demo_gradio.py
-```
-
-We also provide a demo for grounding ocr:
-```bash
-python demo/demo_gradio_annotion.py
-```
+Have fun with the [live demo](https://dotsocr.xiaohongshu.com/).
 
 
-### Example for formula document
+### Examples for document parsing
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula1.png" alt="formula1.png" border="0" />
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula2.png" alt="formula2.png" border="0" />
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula3.png" alt="formula3.png" border="0" />
-
-### Example for table document
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table1.png" alt="table1.png" border="0" />
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table2.png" alt="table2.png" border="0" />
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table3.png" alt="table3.png" border="0" />
-
-### Example for multilingual document
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/Tibetan.png" alt="Tibetan.png" border="0" />
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/tradition_zh.png" alt="tradition_zh.png" border="0" />
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/nl.png" alt="nl.png" border="0" />
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/kannada.png" alt="kannada.png" border="0" />
 <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/russian.png" alt="russian.png" border="0" />
 
-### Example for reading order
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/reading_order.png" alt="reading_order.png" border="0" />
 
-### Example for grounding ocr
-<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/grounding.png" alt="grounding.png" border="0" />
+### Examples for image parsing
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_1.png" alt="svg_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_2.png" alt="svg_2.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_4.png" alt="svg_4.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_5.png" alt="svg_5.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_6.png" alt="svg_6.png" border="0" />
 
+> **Note:**
+> - Inferenced by dots.ocr-1.5-svg
 
-# Acknowledgments
-We would like to thank [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [aimv2](https://github.com/apple/ml-aim), [MonkeyOCR](https://github.com/Yuliang-Liu/MonkeyOCR), 
-[OmniDocBench](https://github.com/opendatalab/OmniDocBench), [PyMuPDF](https://github.com/pymupdf/PyMuPDF), for providing code and models. 
+### Example for web parsing
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/webpage_1.png" alt="webpage_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/webpage_2.png" alt="webpage_2.png" border="0" />
+
+### Examples for scene spotting
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/scene_1.png" alt="scene_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/scene_2.png" alt="scene_2.png" border="0" />
 
-We also thank [DocLayNet](https://github.com/DS4SD/DocLayNet), [M6Doc](https://github.com/HCIILAB/M6Doc), [CDLA](https://github.com/buptlihang/CDLA), [D4LA](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) for providing valuable datasets. 
 
 # Limitation & Future Work
 
 - **Complex Document Elements:**
-  - **Table&Formula**: dots.ocr is not yet perfect for high-complexity tables and formula extraction.
-  - **Picture**: Pictures in documents are currently not parsed.
+  - **Table&Formula**: The extraction of complex tables and mathematical formulas persists as a difficult task given the model's compact architecture.
+  - **Picture**: We have adopted an SVG code representation for parsing structured graphics; however, the performance has yet to achieve the desired level of robustness.
 
-- **Parsing Failures:** The model may fail to parse under certain conditions:
-  - When the character-to-pixel ratio is excessively high. Try enlarging the image or increasing the PDF parsing DPI (a setting of 200 is recommended). However, please note that the model performs optimally on images with a resolution under 11289600 pixels.
-  - Continuous special characters, such as ellipses (`...`) and underscores (`_`), may cause the prediction output to repeat endlessly. In such scenarios, consider using alternative prompts like `prompt_layout_only_en`, `prompt_ocr`, or `prompt_grounding_ocr` ([details here](https://github.com/rednote-hilab/dots.ocr/blob/master/dots_ocr/utils/prompts.py)).
-    
-- **Performance Bottleneck:** Despite its 1.7B parameter LLM foundation, **dots.ocr** is not yet optimized for high-throughput processing of large PDF volumes. 
+- **Parsing Failures:** While we have reduced the rate of parsing failures compared to the previous version, these issues may still occur occasionally. We remain committed to further resolving these edge cases in future updates. 
 
-We are committed to achieving more accurate table and formula parsing, as well as enhancing the model's OCR capabilities for broader generalization, all while aiming for **a more powerful, more efficient model**. Furthermore, we are actively considering the development of **a more general-purpose perception model** based on Vision-Language Models (VLMs), which would integrate general detection, image captioning, and OCR tasks into a unified framework. **Parsing the content of the pictures in the documents** is also a key priority for our future work.
-We believe that collaboration is the key to tackling these exciting challenges. If you are passionate about advancing the frontiers of document intelligence and are interested in contributing to these future endeavors, we would love to hear from you. Please reach out to us via email at: [yanqing4@xiaohongshu.com].
 
 # Citation
 
diff --git a/README_hf.md b/README_hf.md
new file mode 100755
index 0000000..a43b508
--- /dev/null
+++ b/README_hf.md
@@ -0,0 +1,778 @@
+---
+license: mit
+library_name: dots_ocr_1_5
+pipeline_tag: image-text-to-text
+tags:
+- image-to-text
+- ocr
+- document-parse
+- layout
+- table
+- formula
+- transformers
+- custom_code
+language:
+- en
+- zh
+- multilingual
+---
+
+<div align="center">
+
+<p align="center">
+    <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>
+<p>
+
+<h1 align="center">
+dots.ocr-1.5: Recognize Any Human Scripts and Symbols
+</h1>
+
+[![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr-1.5)
+
+
+<div align="center">
+  <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> | 
+  <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> | 
+  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
+</div>
+
+</div>
+
+
+
+## Introduction
+
+We present **dots.ocr-1.5**, a 3B-parameter multimodal model composed of a 1.2B vision encoder and a 1.7B language model. Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text. Furthermore, the model demonstrates competitive performance in general OCR, object grounding & counting tasks.
+
+1. **Stronger Document Parsing Performance:** dots.ocr-1.5 maintains SOTA performance among latest OCR models, particularly on **multilingual documents**. Addressing the significant bias inherent in the detection & matching rules of certain benchmarks —which often fail to accurately reflect a model's true capabilities—we adopted an **Elo score** evaluation system. Under this metric, the performance landscape shifts significantly, highlighting the superior robustness of our model compared to conventional rankings.
+2. **Unified Vision-Language Parsing**: Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge, akin to natural language. dots.ocr-1.5 unifies the interpretation of these elements by parsing them directly into SVG code. We have validated the effectiveness of this approach, demonstrating impressive results in structural and semantic recognition.
+3. **Broader and More General Capabilities**: Compared to dots.ocr, dots.ocr-1.5 supports a significantly wider array of tasks. It extends beyond standard OCR to handle web screen parsing, scene text spotting, object grounding & counting, and other general OCR QA tasks.
+
+
+## Evaluation
+
+### 1. Document Parsing
+
+#### 1.1 Elo Score of different bench between latest models
+
+<table style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif;">
+  <thead>
+    <tr style="background-color: #f2f2f2; text-align: left;">
+      <th style="border: 1px solid #ddd; padding: 8px;">models</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">olmOCR-Bench</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">OmniDocBench (v1.5)</th>
+      <th style="border: 1px solid #ddd; padding: 8px;">XDocParse</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">GLM-OCR</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">859.9</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">937.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">742.1</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">PaddleOCR-VL-1.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">873.6</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">965.6</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">797.6</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">HuanyuanOCR</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">978.9</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">974.4</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">895.9</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">dots.ocr</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1027.4</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">994.7</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1133.4</td>
+    </tr>
+    <!-- Highlighting dots.ocr-1.5 row -->
+    <tr style="background-color: #e6f7ff; font-weight: bold;">
+      <td style="border: 1px solid #ddd; padding: 8px;">dots.ocr-1.5</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1089.0</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1025.8</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1157.1</td>
+    </tr>
+    <tr>
+      <td style="border: 1px solid #ddd; padding: 8px;">Gemini 3 Pro</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1171.2</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1102.1</td>
+      <td style="border: 1px solid #ddd; padding: 8px;">1273.9</td>
+    </tr>
+
+  </tbody>
+</table>
+
+> **Notes:** 
+> - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
+> - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: [Elo Score Prompt](https://github.com/rednote-hilab/dots.ocr/blob/master/tools/elo_score_prompt.py). These results are consistent with the findings on [ocrarena](https://www.ocrarena.ai/battle).
+
+
+#### 1.2 olmOCR-bench
+<!DOCTYPE html>
+<html lang="zh">
+<head>
+<meta charset="UTF-8">
+<style>
+    table {
+        width: 100%;
+        border-collapse: collapse;
+        font-family: Arial, sans-serif;
+        font-size: 14px;
+        color: #333;
+    }
+    th, td {
+        border: 1px solid #e0e0e0;
+        padding: 12px 8px;
+        text-align: left;
+    }
+    th {
+        background-color: #fafafa;
+        font-weight: normal;
+        vertical-align: top;
+        line-height: 1.4;
+    }
+    tr:nth-child(even) {
+        background-color: #ffffff;
+    }
+    tr:hover {
+        background-color: #f5f5f5;
+    }
+    .bold-row {
+        font-weight: bold;
+    }
+</style>
+</head>
+<body>
+
+<table>
+    <thead>
+        <tr>
+            <th></th>
+            <th>ArXiv</th>
+            <th>Old scans math</th>
+            <th>Tables</th>
+            <th>Old scans</th>
+            <th>Headers & footers</th>
+            <th>Multi column</th>
+            <th>Long tiny text</th>
+            <th>Base</th>
+            <th>Overall</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Mistral OCR API</td>
+            <td>77.2</td>
+            <td>67.5</td>
+            <td>60.6</td>
+            <td>29.3</td>
+            <td>93.6</td>
+            <td>71.3</td>
+            <td>77.1</td>
+            <td>99.4</td>
+            <td>72.0±1.1</td>
+        </tr>
+        <tr>
+            <td>Marker 1.10.1</td>
+            <td>83.8</td>
+            <td>66.8</td>
+            <td>72.9</td>
+            <td>33.5</td>
+            <td>86.6</td>
+            <td>80.0</td>
+            <td>85.7</td>
+            <td>99.3</td>
+            <td>76.1±1.1</td>
+        </tr>
+        <tr>
+            <td>MinerU 2.5.4*</td>
+            <td>76.6</td>
+            <td>54.6</td>
+            <td>84.9</td>
+            <td>33.7</td>
+            <td>96.6</td>
+            <td>78.2</td>
+            <td>83.5</td>
+            <td>93.7</td>
+            <td>75.2±1.1</td>
+        </tr>
+        <tr>
+            <td>DeepSeek-OCR</td>
+            <td>77.2</td>
+            <td>73.6</td>
+            <td>80.2</td>
+            <td>33.3</td>
+            <td>96.1</td>
+            <td>66.4</td>
+            <td>79.4</td>
+            <td>99.8</td>
+            <td>75.7±1.0</td>
+        </tr>
+        <tr>
+            <td>Nanonets-OCR2-3B</td>
+            <td>75.4</td>
+            <td>46.1</td>
+            <td>86.8</td>
+            <td>40.9</td>
+            <td>32.1</td>
+            <td>81.9</td>
+            <td>93.0</td>
+            <td>99.6</td>
+            <td>69.5±1.1</td>
+        </tr>
+        <tr>
+            <td>PaddleOCR-VL*</td>
+            <td>85.7</td>
+            <td>71.0</td>
+            <td>84.1</td>
+            <td>37.8</td>
+            <td>97.0</td>
+            <td>79.9</td>
+            <td>85.7</td>
+            <td>98.5</td>
+            <td>80.0±1.0</td>
+        </tr>
+        <tr>
+            <td>Infinity-Parser 7B*</td>
+            <td>84.4</td>
+            <td>83.8</td>
+            <td>85.0</td>
+            <td>47.9</td>
+            <td>88.7</td>
+            <td>84.2</td>
+            <td>86.4</td>
+            <td>99.8</td>
+            <td>82.5±?</td>
+        </tr>
+        <tr>
+            <td>olmOCR v0.4.0</td>
+            <td>83.0</td>
+            <td>82.3</td>
+            <td>84.9</td>
+            <td>47.7</td>
+            <td>96.1</td>
+            <td>83.7</td>
+            <td>81.9</td>
+            <td>99.7</td>
+            <td>82.4±1.1</td>
+        </tr>
+        <tr>
+            <td>Chandra OCR 0.1.0*</td>
+            <td>82.2</td>
+            <td>80.3</td>
+            <td>88.0</td>
+            <td>50.4</td>
+            <td>90.8</td>
+            <td>81.2</td>
+            <td>92.3</td>
+            <td>99.9</td>
+            <td>83.1±0.9</td>
+        </tr>
+        <tr>
+            <td>dots.ocr</td>
+            <td>82.1</td>
+            <td>64.2</td>
+            <td>88.3</td>
+            <td>40.9</td>
+            <td>94.1</td>
+            <td>82.4</td>
+            <td>81.2</td>
+            <td>99.5</td>
+            <td>79.1% ± 1.0%</td>
+        </tr>
+        <tr>
+            <td class="bold-row">dots.ocr-1.5</td>
+            <td><strong>85.9</strong></td>
+            <td><strong>85.5</strong></td>
+            <td><strong>90.7</strong></td>
+            <td>48.2</td>
+            <td>94.0</td>
+            <td><strong>85.3</strong></td>
+            <td>81.6</td>
+            <td>99.7</td>
+            <td><strong>83.9% ± 0.9</strong></td>
+        </tr>
+    </tbody>
+</table>
+
+</body>
+</html>
+
+> **Note:**
+> - The metrics are from [olmocr](https://github.com/allenai/olmocr), and our own internal evaluations.
+> - We delete the Page-header and Page-footer cells in the result markdown.
+
+
+#### 1.3 Other Benchmarks
+
+<table>
+  <thead>
+    <tr>
+      <th>Model Type</th>
+      <th>Methods</th>
+      <th>Size</th>
+      <th>OmniDocBench(v1.5)<br>TextEdit↓</th>
+      <th>OmniDocBench(v1.5)<br>Read OrderEdit↓</th>
+      <th>pdf-parse-bench</th>
+    </tr>
+  </thead>
+  <tbody>
+    <!-- GeneralVLMs Group (Reversed Order, 3 rows) -->
+    <tr>
+      <td rowspan="3"><strong>GeneralVLMs</strong></td>
+      <td>Gemini-2.5 Pro</td>
+      <td>-</td>
+      <td>0.075</td>
+      <td>0.097</td>
+      <td>9.06</td>
+    </tr>
+    <tr>
+      <td>Qwen3-VL-235B-A22B-Instruct</td>
+      <td>235B</td>
+      <td>0.069</td>
+      <td>0.068</td>
+      <td><strong>9.71</strong></td>
+    </tr>
+    <tr>
+      <td>gemini3pro</td>
+      <td>-</td>
+      <td>0.066</td>
+      <td>0.079</td>
+      <td>9.68</td>
+    </tr>
+    <!-- SpecializedVLMs Group (Reversed Order, 12 rows) -->
+    <tr>
+      <td rowspan="12"><strong>SpecializedVLMs</strong></td>
+      <td>Mistral OCR</td>
+      <td>-</td>
+      <td>0.164</td>
+      <td>0.144</td>
+      <td>8.84</td>
+    </tr>
+    <tr>
+      <td>Deepseek-OCR</td>
+      <td>3B</td>
+      <td>0.073</td>
+      <td>0.086</td>
+      <td>8.26</td>
+    </tr>
+    <tr>
+      <td>MonkeyOCR-3B</td>
+      <td>3B</td>
+      <td>0.075</td>
+      <td>0.129</td>
+      <td>9.27</td>
+    </tr>
+    <tr>
+      <td>OCRVerse</td>
+      <td>4B</td>
+      <td>0.058</td>
+      <td>0.071</td>
+      <td>--</td>
+    </tr>
+    <tr>
+      <td>MonkeyOCR-pro-3B</td>
+      <td>3B</td>
+      <td>0.075</td>
+      <td>0.128</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>MinerU2.5</td>
+      <td>1.2B</td>
+      <td>0.047</td>
+      <td>0.044</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>PaddleOCR-VL</td>
+      <td>0.9B</td>
+      <td>0.035</td>
+      <td>0.043</td>
+      <td>9.51</td>
+    </tr>
+    <tr>
+      <td>HunyuanOCR</td>
+      <td>0.9B</td>
+      <td>0.042</td>
+      <td>-</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>PaddleOCR-VL1.5</td>
+      <td>0.9B</td>
+      <td>0.035</td>
+      <td>0.042</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>GLMOCR</td>
+      <td>0.9B</td>
+      <td>0.04</td>
+      <td>0.043</td>
+      <td>-</td>
+    </tr>
+    <tr>
+      <td>dots.ocr</td>
+      <td>3B</td>
+      <td>0.048</td>
+      <td>0.053</td>
+      <td>9.29</td>
+    </tr>
+    <tr>
+      <td><u><strong>dots.ocr-1.5</strong></u></td>
+      <td>3B</td>
+      <td><strong>0.031</strong></td>
+      <td><strong>0.029</strong></td>
+      <td>9.54</td>
+    </tr>
+  </tbody>
+</table>
+
+> **Note:**
+> - Metrics are sourced from [OmniDocBench](https://github.com/opendatalab/OmniDocBench) and other model publications. [pdf-parse-bench](https://github.com/phorn1/pdf-parse-bench) results are reproduced by Qwen3-VL-235B-A22B-Instruct.
+> - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.
+
+
+### 2. Vision-Language Parsing
+Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. **dots.ocr-1.5** unifies the interpretation of these elements by parsing them directly into **SVG code**.
+
+<table>
+  <thead>
+    <tr>
+      <th rowspan="2" style="text-align: left;">Methods</th>
+      <th colspan="3">Unisvg</th>
+      <th rowspan="2">Chartmimic</th>
+      <th rowspan="2">Design2Code</th>
+      <th rowspan="2">Genexam</th>
+      <th rowspan="2">SciGen</th>
+      <th rowspan="2">ChemDraw</th>
+    </tr>
+    <tr>
+      <th>Low-Level</th>
+      <th>High-Level</th>
+      <th>Score</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td style="text-align: left;">OCRVerse</td>
+      <td>0.632</td>
+      <td>0.852</td>
+      <td>0.763</td>
+      <td>0.799</td>
+      <td>-</td>
+      <td>-</td>
+      <td>-</td>
+      <td>0.881</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;">Gemini 3 Pro</td>
+      <td>0.563</td>
+      <td>0.850</td>
+      <td>0.735</td>
+      <td>0.788</td>
+      <td>0.760</td>
+      <td>0.756</td>
+      <td>0.783</td>
+      <td>0.839</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;">dots.ocr-1.5</td>
+      <td>0.850</td>
+      <td>0.923</td>
+      <td>0.894</td>
+      <td>0.772</td>
+      <td>0.801</td>
+      <td>0.664</td>
+      <td>0.660</td>
+      <td>0.790</td>
+    </tr>
+    <tr>
+      <td style="text-align: left;"><strong>dots.ocr-1.5-svg</strong></td>
+      <td><strong>0.860</strong></td>
+      <td><strong>0.931</strong></td>
+      <td><strong>0.902</strong></td>
+      <td><strong>0.905</strong></td>
+      <td><strong>0.834</strong></td>
+      <td><strong>0.8</strong></td>
+      <td><strong>0.797</strong></td>
+      <td><strong>0.901</strong></td>
+    </tr>
+  </tbody>
+</table>
+
+
+> **Note:**
+> - We use the ISVGEN metric from [UniSVG](https://ryanlijinke.github.io/) to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image. 
+> - [OCRVerse](https://github.com/DocTron-hub/OCRVerse) results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.ocr-1.5 are based specifically on SVG code.
+> - Due to the capacity constraints of a 3B-parameter VLM, dots.ocr-1.5 may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.ocr-1.5-svg. We plan to further address these limitations in future updates.
+
+
+### 3. General Vision Tasks
+
+<table>
+    <thead>
+        <tr>
+            <th>Model</th>
+            <th>CharXiv_descriptive</th>
+            <th>CharXiv_reasoning</th>
+            <th>OCR_Reasoning</th>
+            <th>infovqa</th>
+            <th>docvqa</th>
+            <th>ChartQA</th>
+            <th>OCRBench</th>
+            <th>AI2D</th>
+            <th>CountBenchQA</th>
+            <th>refcoco</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Qwen3vl-2b-instruct</td>
+            <td>62.3</td>
+            <td>26.8</td>
+            <td>-</td>
+            <td>72.4</td>
+            <td>93.3</td>
+            <td>-</td>
+            <td>85.8</td>
+            <td>76.9</td>
+            <td>88.4</td>
+            <td>-</td>
+        </tr>
+        <tr>
+            <td><strong>dots.ocr-1.5</strong></td>
+            <td>77.4</td>
+            <td>55.3</td>
+            <td>22.85</td>
+            <td>73.76</td>
+            <td>91.85</td>
+            <td>83.2</td>
+            <td>86.0</td>
+            <td>82.16</td>
+            <td>94.46</td>
+            <td>80.03</td>
+        </tr>
+    </tbody>
+</table>
+
+
+
+# Quick Start
+## 1. Installation
+### Install dots.ocr-1.5
+```shell
+conda create -n dots_ocr python=3.12
+conda activate dots_ocr
+
+git clone https://github.com/rednote-hilab/dots.ocr.git
+cd dots.ocr
+
+# Install pytorch, see https://pytorch.org/get-started/previous-versions/ for your cuda version
+pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
+pip install -e .
+```
+
+If you have trouble with the installation, try our [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) for an easier setup, and follow these steps:
+```shell
+git clone https://github.com/rednote-hilab/dots.ocr.git
+cd dots.ocr
+pip install -e .
+```
+
+
+### Download Model Weights
+> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR_1_5` instead of `dots.ocr-1.5`) for the model save path. This is a temporary workaround pending our integration with Transformers.
+```shell
+python3 tools/download_model.py
+```
+
+
+## 2. Deployment
+### vLLM inference
+We highly recommend using vllm for deployment and inference. 
+
+```shell
+# launch vllm server
+## dots.ocr-1.5
+CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5 --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
+
+## dots.ocr-1.5-svg
+CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5-svg --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
+
+# vllm api demo
+## document parsing
+python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
+## web parsing 
+
+## scene spoting
+
+## image parsing with svg code
+
+## general qa
+
+```
+
+### Hugginface inference
+```shell
+python3 demo/demo_hf.py
+```
+
+<details>
+<summary><b>Hugginface inference details</b></summary>
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
+from qwen_vl_utils import process_vision_info
+from dots_ocr.utils import dict_promptmode_to_prompt
+
+model_path = "./weights/DotsOCR_1_5"
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    attn_implementation="flash_attention_2",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+
+image_path = "demo/demo_image1.jpg"
+prompt = """Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
+
+1. Bbox format: [x1, y1, x2, y2]
+
+2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
+
+3. Text Extraction & Formatting Rules:
+    - Picture: For the 'Picture' category, the text field should be omitted.
+    - Formula: Format its text as LaTeX.
+    - Table: Format its text as HTML.
+    - All Others (Text, Title, etc.): Format their text as Markdown.
+
+4. Constraints:
+    - The output text must be the original text from the image, with no translation.
+    - All layout elements must be sorted according to human reading order.
+
+5. Final Output: The entire output must be a single JSON object.
+"""
+
+messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "image",
+                    "image": image_path
+                },
+                {"type": "text", "text": prompt}
+            ]
+        }
+    ]
+
+# Preparation for inference
+text = processor.apply_chat_template(
+    messages, 
+    tokenize=False, 
+    add_generation_prompt=True
+)
+image_inputs, video_inputs = process_vision_info(messages)
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    videos=video_inputs,
+    padding=True,
+    return_tensors="pt",
+)
+
+inputs = inputs.to("cuda")
+
+# Inference: Generation of the output
+generated_ids = model.generate(**inputs, max_new_tokens=24000)
+generated_ids_trimmed = [
+    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)
+print(output_text)
+
+```
+
+</details>
+
+## 3. Document Parse
+**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
+```bash
+
+# Parse all layout info, both detection and recognition
+# Parse a single image
+python3 dots_ocr/parser.py demo/demo_image1.jpg
+# Parse a single PDF
+python3 dots_ocr/parser.py demo/demo_pdf1.pdf  --num_thread 64  # try bigger num_threads for pdf with a large number of pages
+
+# Layout detection only
+python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
+
+# Parse text only, except Page-header and Page-footer
+python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
+
+
+```
+
+<details>
+<summary><b>Output Results</b></summary>
+
+1.  **Structured Layout Data** (`demo_image1.json`): A JSON file containing the detected layout elements, including their bounding boxes, categories, and extracted text.
+2.  **Processed Markdown File** (`demo_image1.md`): A Markdown file generated from the concatenated text of all detected cells.
+    *   An additional version, `demo_image1_nohf.md`, is also provided, which excludes page headers and footers for compatibility with benchmarks like Omnidocbench and olmOCR-bench.
+3.  **Layout Visualization** (`demo_image1.jpg`): The original image with the detected layout bounding boxes drawn on it.
+
+</details>
+
+## 4. Demo
+Have fun with the [live demo](https://dotsocr.xiaohongshu.com/).
+
+
+### Examples for document parsing
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula1.png" alt="formula1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table3.png" alt="table3.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/Tibetan.png" alt="Tibetan.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/tradition_zh.png" alt="tradition_zh.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/nl.png" alt="nl.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/kannada.png" alt="kannada.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/russian.png" alt="russian.png" border="0" />
+
+
+### Examples for image parsing
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_1.png" alt="svg_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_2.png" alt="svg_2.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_4.png" alt="svg_4.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_5.png" alt="svg_5.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/svg_6.png" alt="svg_6.png" border="0" />
+
+> **Note:**
+> - Inferenced by dots.ocr-1.5-svg
+
+### Example for web parsing
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/webpage_1.png" alt="webpage_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/webpage_2.png" alt="webpage_2.png" border="0" />
+
+### Examples for scene spotting
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/scene_1.png" alt="scene_1.png" border="0" />
+<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/scene_2.png" alt="scene_2.png" border="0" />
+
+
+
+## Limitation & Future Work
+
+- **Complex Document Elements:**
+  - **Table&Formula**: The extraction of complex tables and mathematical formulas persists as a difficult task given the model's compact architecture.
+  - **Picture**: We have adopted an SVG code representation for parsing structured graphics; however, the performance has yet to achieve the desired level of robustness.
+
+- **Parsing Failures:** While we have reduced the rate of parsing failures compared to the previous version, these issues may still occur occasionally. We remain committed to further resolving these edge cases in future updates. 
\ No newline at end of file
diff --git a/assets/showcase_dots_ocr_1_5/result/scene_1.png b/assets/showcase_dots_ocr_1_5/result/scene_1.png
new file mode 100644
index 0000000..f5f3a57
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/scene_1.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/scene_2.png b/assets/showcase_dots_ocr_1_5/result/scene_2.png
new file mode 100644
index 0000000..a8031c6
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/scene_2.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/svg_1.png b/assets/showcase_dots_ocr_1_5/result/svg_1.png
new file mode 100644
index 0000000..5b8c1f3
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/svg_1.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/svg_2.png b/assets/showcase_dots_ocr_1_5/result/svg_2.png
new file mode 100644
index 0000000..65b64a7
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/svg_2.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/svg_4.png b/assets/showcase_dots_ocr_1_5/result/svg_4.png
new file mode 100644
index 0000000..c458456
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/svg_4.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/svg_5.png b/assets/showcase_dots_ocr_1_5/result/svg_5.png
new file mode 100644
index 0000000..fe8d103
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/svg_5.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/svg_6.png b/assets/showcase_dots_ocr_1_5/result/svg_6.png
new file mode 100644
index 0000000..803fcd4
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/svg_6.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/webpage_1.png b/assets/showcase_dots_ocr_1_5/result/webpage_1.png
new file mode 100644
index 0000000..d2c9042
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/webpage_1.png differ
diff --git a/assets/showcase_dots_ocr_1_5/result/webpage_2.png b/assets/showcase_dots_ocr_1_5/result/webpage_2.png
new file mode 100644
index 0000000..a7da568
Binary files /dev/null and b/assets/showcase_dots_ocr_1_5/result/webpage_2.png differ
diff --git a/requirements.txt b/requirements.txt
index 7eed6f1..15852ca 100755
--- a/requirements.txt
+++ b/requirements.txt
@@ -7,5 +7,6 @@ qwen_vl_utils
 transformers==4.51.3
 huggingface_hub
 modelscope
-flash-attn==2.8.0.post2
+# flash-attn==2.8.0.post2  # to speed up inference need flash-attn
 accelerate
+cairosvg
\ No newline at end of file
diff --git a/tools/download_model.py b/tools/download_model.py
index 32d7087..d5db841 100755
--- a/tools/download_model.py
+++ b/tools/download_model.py
@@ -5,11 +5,11 @@ import os
 if __name__ == '__main__':
     parser = ArgumentParser()
     parser.add_argument('--type', '-t', type=str, default="huggingface")
-    parser.add_argument('--name', '-n', type=str, default="rednote-hilab/dots.ocr")
+    parser.add_argument('--name', '-n', type=str, default="rednote-hilab/dots.ocr-1.5")
     args = parser.parse_args()
     script_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
     print(f"Attention: The model save dir dots.ocr should be replace by a name without `.` like DotsOCR, util we merge our code to transformers.")
-    model_dir = os.path.join(script_dir, "weights/DotsOCR")
+    model_dir = os.path.join(script_dir, "weights/DotsOCR_1_5")
     if not os.path.exists(model_dir):
         os.makedirs(model_dir)
     if args.type == "huggingface":
diff --git a/tools/elo_score_prompt.py b/tools/elo_score_prompt.py
new file mode 100644
index 0000000..c8b5e06
--- /dev/null
+++ b/tools/elo_score_prompt.py
@@ -0,0 +1,89 @@
+def construct_prompt(c1_text, c2_text):
+    """
+    Constructs the complete Prompt sent to Gemini (English Version).
+    c1_text: Markdown text from Model 1
+    c2_text: Markdown text from Model 2
+    """
+    
+    prompt = f"""You are an expert in evaluating OCR content accuracy. Please compare the model outputs with the original image, focusing heavily on **content accuracy** while ignoring formatting and layout differences.
+
+【Evaluation Focus - Focus ONLY on Content Accuracy】
+1. **Text Accuracy**:
+   - Typos: Character recognition errors (e.g., "test" recognized as "tost").
+   - Omissions: Missing characters or words present in the original text.
+   - Hallucinations: Adding characters that do not exist in the original text.
+
+2. **Table Accuracy**:
+   - Correctness of data and text within the table.
+   - Completeness of cell content.
+   - Correct row/column alignment.
+
+3. **Formula Accuracy** (Evaluate based on):
+   - **Correctness**: Are mathematical symbols, variables, and operators preserved accurately?
+   - **Completeness**: Are all parts of the formula present without omission?
+   - **Semantic Equivalence**: Does the extracted formula convey the exact same mathematical meaning?
+
+【Tie Judgment Criteria - Important】
+You must judge as a **tie** in the following cases:
+- Text content is identical, differing only in Markdown formatting.
+- Table data is identical, differing only in Markdown table syntax.
+- Formula content is semantically equivalent, differing only in LaTeX representation.
+- Both models correctly identified the core content; minor differences do not affect information retrieval.
+- Both models share the same minor errors or are both perfect.
+- **Image/Figure processing differs** (one extracts text, one gives bbox, one ignores it), but the main text is accurate.
+
+【Items to Ignore - Do NOT factor into scoring】
+- Markdown formatting differences (e.g., `# Header` vs `## Header`, `*` vs `-` for lists).
+- Layout and typesetting differences (newlines, indentation, alignment).
+- Recognition differences in non-body text like Headers, Footers, and Page Numbers.
+- Text wrapping and paragraph segmentation nuances.
+- Table border styles (e.g., `|---|---|` vs `|:--|--:|`).
+- Different but equivalent LaTeX representations for formulas.
+- **Image/Figure Processing Differences (ABSOLUTELY IGNORE)**: 
+  - How the model parses image/figure regions is **completely excluded** from the scoring standard.
+  - Whether it parses as a `figure` field, outputs bbox coordinates, extracts text inside the image, provides a caption, describes the image content, or **completely ignores/skips the image**, these are all considered equivalent.
+  - Do NOT declare a winner based on image handling.
+
+【Model 1 Output】:
+```markdown
+{c1_text} 
+```
+
+【Model 2 Output】:
+```markdown
+{c2_text}
+```
+
+【Evaluation Process】
+1. Carefully compare the text content against the original image.
+2. Identify errors, omissions, or additions in text recognition for both models.
+3. Check the accuracy of table data.
+4. Evaluate the correctness, completeness, and semantic equivalence of mathematical formulas.
+5. **Ignore image regions**: Confirm that differences in image/figure parsing are not used for scoring.
+6. Important: If the substance is the same and only the format differs, judge as a tie.
+7. Only declare a winner if there is a significant difference in **content accuracy**.
+
+【Examples of Ties】
+- Model 1: "# Title", Model 2: "## Title" (Same content, different level).
+- Model 1: "* Item", Model 2: "- Item" (Same content, different bullet).
+- Formula: Model 1 "$x^2$", Model 2 "$x*x$" (Different LaTeX, same meaning).
+- Table data is identical, but column alignment syntax differs.
+- Identification is identical, but one model parsed the footer while the other didn't (Judge as Tie).
+- **Image handling**: Model 1 outputs an image bbox, Model 2 outputs an image description, Model 3 ignores the image. As long as the main text is accurate, this is a **Tie**.
+
+【Output Requirement】 Please strictly return the result in the following JSON format:
+
+{{"winner": "tie", "reason": "Detailed explanation of the judgment, specifically noting the logic for a tie"}}
+
+The value of "winner" must be one of:
+- "1": Model 1 is clearly better in content accuracy.
+- "2": Model 2 is clearly better in content accuracy.
+- "tie": Both models perform equally in content accuracy (including cases of identical content but different formatting/image handling).
+
+In the "reason" field, specifically explain:
+- If a tie: Explain the consistency of the content and explicitly mention which formatting or image handling differences were ignored.
+- If a winner: Specifically point out the accuracy differences (typos, missing words, table/formula errors).
+- **Note**: It is better to judge a tie than to incorrectly determine a winner based on minor formatting or image parsing differences. **Content accuracy of the main text is the ONLY standard.**
+"""
+    
+    return prompt
\ No newline at end of file

models	olmOCR-Bench	OmniDocBench (v1.5)	XDocParse
GLM-OCR	859.9	937.5	742.1
PaddleOCR-VL-1.5	873.6	965.6	797.6
HuanyuanOCR	978.9	974.4	895.9
dots.ocr	1027.4	994.7	1133.4
dots.ocr-1.5	1089.0	1025.8	1157.1
Gemini 3 Pro	1171.2	1102.1	1273.9
Model Type	Methods	Overall^Edit↓		Text^Edit↓		Formula^Edit↓		Table^TEDS↑		Table^Edit↓		Read Order^Edit↓
Model Type	Methods	EN	ZH	EN	ZH	EN	ZH	EN	ZH	EN	ZH	EN	ZH
Pipeline Tools	MinerU	0.150	0.357	0.061	0.215	0.278	0.577	78.6	62.1	0.180	0.344	0.079	0.292
	Marker	0.336	0.556	0.080	0.315	0.530	0.883	67.6	49.2	0.619	0.685	0.114	0.340
	Mathpix	0.191	0.365	0.105	0.384	0.306	0.454	77.0	67.1	0.243	0.320	0.108	0.304
	Docling	0.589	0.909	0.416	0.987	0.999	1	61.3	25.0	0.627	0.810	0.313	0.837
	Pix2Text	0.320	0.528	0.138	0.356	0.276	0.611	73.6	66.2	0.584	0.645	0.281	0.499
	Unstructured	0.586	0.716	0.198	0.481	0.999	1	0	0.06	1	0.998	0.145	0.387
	OpenParse	0.646	0.814	0.681	0.974	0.996	1	64.8	27.5	0.284	0.639	0.595	0.641
	PPStruct-V3	0.145	0.206	0.058	0.088	0.295	0.535	-	-	0.159	0.109	0.069	0.091
Expert VLMs	GOT-OCR	0.287	0.411	0.189	0.315	0.360	0.528	53.2	47.2	0.459	0.520	0.141	0.280
	Nougat	0.452	0.973	0.365	0.998	0.488	0.941	39.9	0	0.572	1.000	0.382	0.954
	Mistral OCR	0.268	0.439	0.072	0.325	0.318	0.495	75.8	63.6	0.600	0.650	0.083	0.284
	OLMOCR-sglang	0.326	0.469	0.097	0.293	0.455	0.655	68.1	61.3	0.608	0.652	0.145	0.277
	SmolDocling-256M	0.493	0.816	0.262	0.838	0.753	0.997	44.9	16.5	0.729	0.907	0.227	0.522
	Dolphin	0.206	0.306	0.107	0.197	0.447	0.580	77.3	67.2	0.180	0.285	0.091	0.162
	MinerU 2	0.139	0.240	0.047	0.109	0.297	0.536	82.5	79.0	0.141	0.195	0.069<	0.118
	OCRFlux	0.195	0.281	0.064	0.183	0.379	0.613	71.6	81.3	0.253	0.139	0.086	0.187
	MonkeyOCR-pro-3B	0.138	0.206	0.067	0.107	0.246	0.421	81.5	87.5	0.139	0.111	0.100	0.185
	ArXiv	Old scans math	Tables	Old scans	Headers & footers	Multi column	Long tiny text	Base	Overall
Mistral OCR API	77.2	67.5	60.6	29.3	93.6	71.3	77.1	99.4	72.0±1.1
Marker 1.10.1	83.8	66.8	72.9	33.5	86.6	80.0	85.7	99.3	76.1±1.1
MinerU 2.5.4*	76.6	54.6	84.9	33.7	96.6	78.2	83.5	93.7	75.2±1.1
DeepSeek-OCR	77.2	73.6	80.2	33.3	96.1	66.4	79.4	99.8	75.7±1.0
Nanonets-OCR2-3B	75.4	46.1	86.8	40.9	32.1	81.9	93.0	99.6	69.5±1.1
PaddleOCR-VL*	85.7	71.0	84.1	37.8	97.0	79.9	85.7	98.5	80.0±1.0
Infinity-Parser 7B*	84.4	83.8	85.0	47.9	88.7	84.2	86.4	99.8	82.5±?
olmOCR v0.4.0	83.0	82.3	84.9	47.7	96.1	83.7	81.9	99.7	82.4±1.1
Chandra OCR 0.1.0*	82.2	80.3	88.0	50.4	90.8	81.2	92.3	99.9	83.1±0.9
dots.ocr	82.1	64.2	88.3	40.9	94.1	82.4	81.2	99.5	79.1% ± 1.0%
dots.ocr-1.5	85.9	85.5	90.7	48.2	94.0	85.3	81.6	99.7	83.9% ± 0.9

Qwen2-VL-72B	0.252	0.327	0.096	0.218	0.404	0.487	76.8	76.4	0.387	0.408	0.119	0.193	Model Type	Methods	Size	OmniDocBench(v1.5) TextEdit↓	OmniDocBench(v1.5) Read OrderEdit↓	pdf-parse-bench
GeneralVLMs	Gemini-2.5 Pro	-	0.075	0.097	9.06
	Qwen2.5-VL-72B	0.214	0.261	0.092	0.18	0.315	0.434	82.9	83.9	0.341	0.262	0.106	0.168	Qwen3-VL-235B-A22B-Instruct	235B	0.069	0.068	9.71
	Gemini2.5-Pro	0.148	0.212	0.055	0.168	0.356	0.439	85.8	86.4	0.13	0.119	0.049	0.121	gemini3pro	-	0.066	0.079	9.68
SpecializedVLMs	Mistral OCR	-	0.164	0.144	8.84
	doubao-1-5-thinking-vision-pro-250428	0.140	0.162	0.043	0.085	0.295	0.384	83.3	89.3	0.165	0.085	Deepseek-OCR	3B	0.073	0.086	8.26
	MonkeyOCR-3B	3B	0.075	0.129	9.27
	OCRVerse	4B	0.058	0.094	0.071	--
	Expert VLMs	dots.ocr	0.125	0.160	0.032	0.066	0.329	0.416	88.6	89.0	0.099	0.092	0.040	0.067
	MonkeyOCR-pro-3B	3B	0.075	0.128	-
	MinerU2.5	1.2B	0.047	0.044	-
	PaddleOCR-VL	0.9B	0.035	0.043	9.51
	HunyuanOCR	0.9B	0.042	-	-
	PaddleOCR-VL1.5	0.9B	0.035	0.042	-
	GLMOCR	0.9B	0.04	0.043	-
dots.ocr	3B	0.048	0.053	9.29
dots.ocr-1.5	3B	0.031	0.029	9.54
Methods	Overall^Edit↓	Text^Edit↓	Formula^Edit↓	Table^TEDS↑	Table^Edit↓	Read Order^Edit↓
MonkeyOCR-3B	0.483	0.445	0.627	50.93	0.452	0.409
doubao-1-5-thinking-vision-pro-250428	0.291	0.226	0.440	71.2	0.260	0.238
doubao-1-6	0.299	0.270	0.417	71.0	0.258	0.253
Gemini2.5-Pro	0.251	0.163	0.402	77.1	0.236	0.202
dots.ocr	0.177	0.075	0.297	79.2	0.186	0.152
Method	F1@IoU=.50:.05:.95↑					F1@IoU=.50↑
Method	Overall	Text	Formula	Table	Picture	Overall	Text	Formula	Table	Picture
DocLayout-YOLO-DocStructBench	0.733	0.694	0.480	0.803	0.619	0.806	0.779	0.620	0.858	0.678
dots.ocr-parse all	0.831	0.801	0.654	0.838	0.748	0.922	0.909	0.770	0.888	0.831
dots.ocr-detection only	0.845	0.816	0.716	0.875	0.765	0.930	0.917	0.832	0.918	0.843
Model	ArXiv	Old Scans Math	Tables	Old Scans	Headers and Footers	Multi column	Long Tiny Text	Base	Overall
GOT OCR	52.7	52.0	0.2	22.1	93.6	42.0	29.9	94.0	48.3 ± 1.1
Marker	76.0	57.9	57.6	27.8	84.9	72.9	84.6	99.1	70.1 ± 1.1
MinerU	75.4	47.4	60.9	17.3	96.6	59.0	39.1	96.6	61.5 ± 1.1
Mistral OCR	77.2	67.5	60.6	29.3	93.6	71.3	77.1	99.4	72.0 ± 1.1
Nanonets OCR	67.0	68.6	77.7	39.5	40.7	69.9	53.4	99.3	64.5 ± 1.1
GPT-4o (No Anchor)	51.5	75.5	69.1	40.9	94.2	68.9	54.1	96.7	68.9 ± 1.1
GPT-4o (Anchored)	53.5	74.5	70.0	40.7	93.8	69.3	60.6	96.8	69.9 ± 1.1
Gemini Flash 2 (No Anchor)	32.1	56.3	61.4	27.8	48.0	58.7	84.4	94.0	57.8 ± 1.1
Gemini Flash 2 (Anchored)	54.5	56.1	72.1	34.2	64.7	61.5	71.5	95.6	63.8 ± 1.2
Qwen 2 VL (No Anchor)	19.7	31.7	24.2	17.1	88.9	8.3	6.8	55.5	31.5 ± 0.9
Qwen 2.5 VL (No Anchor)	63.1	65.7	67.3	38.6	73.6	68.3	49.1	98.3	65.5 ± 1.2
olmOCR v0.1.75 (No Anchor)	71.5	71.4	71.4	42.8	94.1	77.7	71.0	97.8	74.7 ± 1.1
olmOCR v0.1.75 (Anchored)	74.9	71.2	71.0	42.2	94.5	78.3	73.3	98.3	75.5 ± 1.0
MonkeyOCR-pro-3B	83.8	68.8	74.6	36.1	91.2	76.6	80.1	95.3	75.8 ± 1.0
dots.ocr	82.1	64.2	88.3	40.9	94.1	82.4	81.2	99.5	79.1 ± 1.0
Methods	Unisvg			Chartmimic	Design2Code	Genexam	SciGen	ChemDraw
Methods	Low-Level	High-Level	Score	Chartmimic	Design2Code	Genexam	SciGen	ChemDraw
OCRVerse	0.632	0.852	0.763	0.799	-	-	-	0.881
Gemini 3 Pro	0.563	0.850	0.735	0.788	0.760	0.756	0.783	0.839
dots.ocr-1.5	0.850	0.923	0.894	0.772	0.801	0.664	0.660	0.790
dots.ocr-1.5-svg	0.860	0.931	0.902	0.905	0.834	0.8	0.797	0.901
Model	CharXiv_descriptive	CharXiv_reasoning	OCR_Reasoning	infovqa	docvqa	ChartQA	OCRBench	AI2D	CountBenchQA	refcoco
Qwen3vl-2b-instruct	62.3	26.8	-	72.4	93.3	-	85.8	76.9	88.4	-
dots.ocr-1.5	77.4	55.3	22.85	73.76	91.85	83.2	86.0	82.16	94.46	80.03