align
@@ -420,16 +420,6 @@ Please see above.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR MonkeyOCR CODE --------------
|
|
||||||
|
|
||||||
Copyright notice:No copyright info provided
|
|
||||||
|
|
||||||
License:apache2.0
|
|
||||||
|
|
||||||
Please see above.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR OmniDocbench CODE --------------
|
------------- LICENSE FOR OmniDocbench CODE --------------
|
||||||
|
|
||||||
Copyright notice:No copyright info provided
|
Copyright notice:No copyright info provided
|
||||||
@@ -450,31 +440,6 @@ Please see above.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR aimv2 CODE --------------
|
|
||||||
|
|
||||||
Copyright notice: Copyright (C) 2024 Apple Inc. All Rights Reserved.
|
|
||||||
|
|
||||||
License:
|
|
||||||
|
|
||||||
IMPORTANT: This Apple software is supplied to you by Apple Inc. ("Apple") in consideration of your agreement to the following terms, and your use, installation, modification or redistribution of this Apple software constitutes acceptance of these terms. If you do not agree with these terms, please do not use, install, modify or
|
|
||||||
redistribute this Apple software.
|
|
||||||
|
|
||||||
In consideration of your agreement to abide by the following terms, and subject to these terms, Apple grants you a personal, non-exclusive license, under Apple's copyrights in this original Apple software (the "Apple Software"), to use, reproduce, modify and redistribute the Apple Software, with or without modifications, in source and/or binary forms; provided that if you redistribute the Apple Software in its entirety and without modifications, you must retain this notice and the following text and disclaimers in all such redistributions of the Apple Software. Neither the name, trademarks, service marks or logos of Apple Inc. May be used to endorse or promote products derived from the Apple Software without specific prior written permission from Apple. Except as expressly stated in this notice, no other rights or licenses, express or implied, are granted by Apple herein, including but not limited to any patent rights that may be infringed by your derivative works or by other works in which the Apple Software may be incorporated.
|
|
||||||
|
|
||||||
The Apple Software is provided by Apple on an "AS IS" basis. APPLE MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS
|
|
||||||
FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS. IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL
|
|
||||||
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION, MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
||||||
|
|
||||||
SOFTWARE DISTRIBUTED WITH AUTOREGRESSIVE IMAGE MODELS:
|
|
||||||
|
|
||||||
The Autoregressive Image Models software includes a number of subcomponents with
|
|
||||||
separate copyright notices and license terms - please see the file ACKNOWLEDGEMENTS.
|
|
||||||
|
|
||||||
Acknowledgements:
|
|
||||||
|
|
||||||
Portions of the Autoregressive Image Models project may utilize the following copyrighted material, the use of which is hereby acknowledged.
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR Hugging Face CODE --------------
|
------------- LICENSE FOR Hugging Face CODE --------------
|
||||||
|
|
||||||
Copyright notice:Copyright 2019 Ross Wightman
|
Copyright notice:Copyright 2019 Ross Wightman
|
||||||
@@ -587,193 +552,3 @@ Section 7. Miscellaneous
|
|||||||
|
|
||||||
7.5 The Community Data License Agreement workgroup under The Linux Foundation is the steward of this Agreement (“Steward”). No one other than the Steward has the right to modify or publish new versions of this Agreement. Each version will be given a distinguishing version number. You may Use and Publish Data Received hereunder under the terms of the version of the Agreement under which You originally Received the Data, or under the terms of any subsequent version published by the Steward.
|
7.5 The Community Data License Agreement workgroup under The Linux Foundation is the steward of this Agreement (“Steward”). No one other than the Steward has the right to modify or publish new versions of this Agreement. Each version will be given a distinguishing version number. You may Use and Publish Data Received hereunder under the terms of the version of the Agreement under which You originally Received the Data, or under the terms of any subsequent version published by the Steward.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR M6Doc --------------
|
|
||||||
|
|
||||||
Copyright notice:No copyright info provided
|
|
||||||
|
|
||||||
License:Attribution-NonCommercial-NoDerivatives 4.0
|
|
||||||
|
|
||||||
=======Attribution-NonCommercial-NoDerivatives 4.0 International==========
|
|
||||||
|
|
||||||
Creative Commons Corporation ("Creative Commons") is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an "as-is" basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.
|
|
||||||
|
|
||||||
Using Creative Commons Public Licenses
|
|
||||||
|
|
||||||
Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
|
|
||||||
|
|
||||||
Considerations for licensors: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC- licensed material, or material used under an exception or limitation to copyright. More considerations for licensors: wiki.creativecommons.org/Considerations_for_licensors
|
|
||||||
|
|
||||||
Considerations for the public: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor's permission is not necessary for any reason--for example, because of any applicable exception or limitation to copyright--then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. More considerations for the public: wiki.creativecommons.org/Considerations_for_licensees
|
|
||||||
|
|
||||||
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
|
|
||||||
|
|
||||||
By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
|
|
||||||
|
|
||||||
|
|
||||||
Section 1 -- Definitions.
|
|
||||||
|
|
||||||
a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
|
|
||||||
|
|
||||||
b. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
|
|
||||||
|
|
||||||
c. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
|
|
||||||
|
|
||||||
d. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
|
|
||||||
|
|
||||||
e. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
|
|
||||||
|
|
||||||
f. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
|
|
||||||
|
|
||||||
g. Licensor means the individual(s) or entity(ies) granting rights under this Public License.
|
|
||||||
|
|
||||||
h. NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.
|
|
||||||
|
|
||||||
i. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
|
|
||||||
|
|
||||||
j. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
|
|
||||||
|
|
||||||
k. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
|
|
||||||
|
|
||||||
|
|
||||||
Section 2 -- Scope.
|
|
||||||
|
|
||||||
a. License granted.
|
|
||||||
|
|
||||||
1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
|
|
||||||
|
|
||||||
a. reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and
|
|
||||||
|
|
||||||
b. produce and reproduce, but not Share, Adapted Material for NonCommercial purposes only.
|
|
||||||
|
|
||||||
2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
|
|
||||||
|
|
||||||
3. Term. The term of this Public License is specified in Section 6(a).
|
|
||||||
|
|
||||||
4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a) (4) never produces Adapted Material.
|
|
||||||
|
|
||||||
5. Downstream recipients.
|
|
||||||
|
|
||||||
a. Offer from the Licensor -- Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
|
|
||||||
|
|
||||||
b. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
|
|
||||||
|
|
||||||
6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that you are, or that your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
|
|
||||||
|
|
||||||
b. Other rights.
|
|
||||||
|
|
||||||
1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
|
|
||||||
|
|
||||||
2. Patent and trademark rights are not licensed under this Public License.
|
|
||||||
|
|
||||||
3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes.
|
|
||||||
|
|
||||||
|
|
||||||
Section 3 -- License Conditions.
|
|
||||||
|
|
||||||
Your exercise of the Licensed Rights is expressly made subject to the
|
|
||||||
following conditions.
|
|
||||||
|
|
||||||
a. Attribution.
|
|
||||||
|
|
||||||
1. If You Share the Licensed Material, You must:
|
|
||||||
|
|
||||||
a. retain the following if it is supplied by the Licensor with the Licensed Material:
|
|
||||||
|
|
||||||
i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
|
|
||||||
|
|
||||||
ii. a copyright notice;
|
|
||||||
|
|
||||||
iii. a notice that refers to this Public License;
|
|
||||||
|
|
||||||
iv. a notice that refers to the disclaimer of warranties;
|
|
||||||
|
|
||||||
v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
|
|
||||||
|
|
||||||
b. indicate if you modified the Licensed Material and retain an indication of any previous modifications; and
|
|
||||||
|
|
||||||
c. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. For the avoidance of doubt, you do not have permission under this Public License to Share Adapted Material.
|
|
||||||
|
|
||||||
2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which you share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
|
|
||||||
|
|
||||||
3. If requested by the Licensor, you must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
|
|
||||||
|
|
||||||
|
|
||||||
Section 4 -- Sui Generis Database Rights.
|
|
||||||
|
|
||||||
Where the Licensed Rights include Sui Generis Database Rights that
|
|
||||||
apply to Your use of the Licensed Material:
|
|
||||||
|
|
||||||
a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only and provided You do not Share Adapted Material;
|
|
||||||
|
|
||||||
b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
|
|
||||||
|
|
||||||
c. You must comply with the conditions in Section 3(a) if you share all or a substantial portion of the contents of the database.
|
|
||||||
|
|
||||||
For the avoidance of doubt, this Section 4 supplements and does not replace your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
|
|
||||||
|
|
||||||
|
|
||||||
Section 5 -- Disclaimer of Warranties and Limitation of Liability.
|
|
||||||
|
|
||||||
a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
|
|
||||||
|
|
||||||
b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
|
|
||||||
|
|
||||||
c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer andwaiver of all liability.
|
|
||||||
|
|
||||||
|
|
||||||
Section 6 -- Term and Termination.
|
|
||||||
|
|
||||||
a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
|
|
||||||
|
|
||||||
b. Where your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
|
|
||||||
|
|
||||||
1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
|
|
||||||
|
|
||||||
2. upon express reinstatement by the Licensor.
|
|
||||||
|
|
||||||
For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
|
|
||||||
|
|
||||||
c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
|
|
||||||
|
|
||||||
d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
|
|
||||||
|
|
||||||
|
|
||||||
Section 7 -- Other Terms and Conditions.
|
|
||||||
|
|
||||||
a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
|
|
||||||
|
|
||||||
b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
|
|
||||||
|
|
||||||
|
|
||||||
Section 8 -- Interpretation.
|
|
||||||
|
|
||||||
a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
|
|
||||||
|
|
||||||
b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
|
|
||||||
|
|
||||||
c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
|
|
||||||
|
|
||||||
d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
|
|
||||||
|
|
||||||
Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the "Licensor." The text of the Creative Commons public licenses is dedicated to the public domain under the CC0 Public
|
|
||||||
Domain Dedication. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at creativecommons.org/policies, Creative Commons does not authorize the use of the trademark "Creative Commons" or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
|
|
||||||
|
|
||||||
Creative Commons may be contacted at creativecommons.org.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR CDLA --------------
|
|
||||||
|
|
||||||
Copyright notice:No copyright info provided
|
|
||||||
|
|
||||||
License:No License info provided
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------- LICENSE FOR D4LA --------------
|
|
||||||
|
|
||||||
Copyright notice:No copyright info provided
|
|
||||||
|
|
||||||
License:No License info provided
|
|
||||||
@@ -8,13 +8,13 @@
|
|||||||
dots.ocr
|
dots.ocr
|
||||||
</h1>
|
</h1>
|
||||||
|
|
||||||
[](https://huggingface.co/rednote-hilab/dots.ocr-1.5)
|
[](https://huggingface.co/rednote-hilab/dots.mocr)
|
||||||
[](https://arxiv.org/abs/2512.02498)
|
[](https://arxiv.org/abs/2512.02498)
|
||||||
|
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
|
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
|
||||||
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
<a href="assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
||||||
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
|
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
|
||||||
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
|
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
|
||||||
</div>
|
</div>
|
||||||
@@ -48,45 +48,59 @@ dots.ocr
|
|||||||
<th>olmOCR-Bench</th>
|
<th>olmOCR-Bench</th>
|
||||||
<th>OmniDocBench (v1.5)</th>
|
<th>OmniDocBench (v1.5)</th>
|
||||||
<th>XDocParse</th>
|
<th>XDocParse</th>
|
||||||
|
<th>Average</th>
|
||||||
</tr>
|
</tr>
|
||||||
</thead>
|
</thead>
|
||||||
<tbody>
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>MonkeyOCR-pro-3B</td>
|
||||||
|
<td>895.0</td>
|
||||||
|
<td>811.3</td>
|
||||||
|
<td>637.1</td>
|
||||||
|
<td>781.1</td>
|
||||||
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>GLM-OCR</td>
|
<td>GLM-OCR</td>
|
||||||
<td>859.9</td>
|
<td>884.2</td>
|
||||||
<td>937.5</td>
|
<td>972.6</td>
|
||||||
<td>742.1</td>
|
<td>820.7</td>
|
||||||
|
<td>892.5</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>PaddleOCR-VL-1.5</td>
|
<td>PaddleOCR-VL-1.5</td>
|
||||||
<td>873.6</td>
|
<td>897.3</td>
|
||||||
<td>965.6</td>
|
<td>997.9</td>
|
||||||
<td>797.6</td>
|
<td>866.4</td>
|
||||||
|
<td>920.5</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>HuanyuanOCR</td>
|
<td>HuanyuanOCR</td>
|
||||||
<td>978.9</td>
|
<td>997.6</td>
|
||||||
<td>974.4</td>
|
<td>1003.9</td>
|
||||||
<td>895.9</td>
|
<td>951.1</td>
|
||||||
|
<td>984.2</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>dots.ocr</td>
|
<td>dots.ocr</td>
|
||||||
<td>1027.4</td>
|
<td>1041.1</td>
|
||||||
<td>994.7</td>
|
<td>1027.2</td>
|
||||||
<td>1133.4</td>
|
<td>1190.3</td>
|
||||||
|
<td>1086.2</td>
|
||||||
</tr>
|
</tr>
|
||||||
<!-- Highlighting dots.ocr-1.5 row with bold tags -->
|
<!-- Highlighting dots.mocr row with bold tags -->
|
||||||
<tr>
|
<tr>
|
||||||
<td><strong>dots.ocr-1.5</strong></td>
|
<td><strong>dots.mocr</strong></td>
|
||||||
<td><strong>1089.0</strong></td>
|
<td><strong>1104.4</strong></td>
|
||||||
<td><strong>1025.8</strong></td>
|
<td><strong>1059.0</strong></td>
|
||||||
<td><strong>1157.1</strong></td>
|
<td><strong>1210.7</strong></td>
|
||||||
|
<td><strong>1124.7</strong></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>Gemini 3 Pro</td>
|
<td>Gemini 3 Pro</td>
|
||||||
<td>1171.2</td>
|
<td>1180.4</td>
|
||||||
<td>1102.1</td>
|
<td>1128.0</td>
|
||||||
<td>1273.9</td>
|
<td>1323.7</td>
|
||||||
|
<td>1210.7</td>
|
||||||
</tr>
|
</tr>
|
||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
@@ -94,7 +108,7 @@ dots.ocr
|
|||||||
|
|
||||||
> **Notes:**
|
> **Notes:**
|
||||||
> - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
|
> - Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
|
||||||
> - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: [Elo Score Prompt](https://github.com/rednote-hilab/dots.ocr/blob/master/tools/elo_score_prompt.py). These results are consistent with the findings on [ocrarena](https://www.ocrarena.ai/battle).
|
> - The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: [Elo Score Prompt](tools/elo_score_prompt.py). These results are consistent with the findings on [ocrarena](https://www.ocrarena.ai/battle).
|
||||||
|
|
||||||
|
|
||||||
#### 1.2 olmOCR-bench
|
#### 1.2 olmOCR-bench
|
||||||
@@ -235,7 +249,7 @@ dots.ocr
|
|||||||
<td>79.1±1.0</td>
|
<td>79.1±1.0</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><strong>dots.ocr-1.5</strong></td>
|
<td><strong>dots.mocr</strong></td>
|
||||||
<td><strong>85.9</strong></td>
|
<td><strong>85.9</strong></td>
|
||||||
<td><strong>85.5</strong></td>
|
<td><strong>85.5</strong></td>
|
||||||
<td><strong>90.7</strong></td>
|
<td><strong>90.7</strong></td>
|
||||||
@@ -372,7 +386,7 @@ dots.ocr
|
|||||||
<td>9.29</td>
|
<td>9.29</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><u><strong>dots.ocr-1.5</strong></u></td>
|
<td><u><strong>dots.mocr</strong></u></td>
|
||||||
<td>3B</td>
|
<td>3B</td>
|
||||||
<td><strong>0.031</strong></td>
|
<td><strong>0.031</strong></td>
|
||||||
<td><strong>0.029</strong></td>
|
<td><strong>0.029</strong></td>
|
||||||
@@ -386,8 +400,8 @@ dots.ocr
|
|||||||
> - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.
|
> - Formula and Table metrics for OmniDocBench1.5 are omitted due to their high sensitivity to detection and matching protocols.
|
||||||
|
|
||||||
|
|
||||||
### 2. Vision-Language Parsing
|
### 2. Structured Graphics Parsing
|
||||||
Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. **dots.ocr-1.5** unifies the interpretation of these elements by parsing them directly into **SVG code**.
|
Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate dense human knowledge. **dots.mocr** unifies the interpretation of these elements by parsing them directly into **SVG code**.
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<thead>
|
<thead>
|
||||||
@@ -430,7 +444,7 @@ Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate
|
|||||||
<td>0.839</td>
|
<td>0.839</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td style="text-align: left;">dots.ocr-1.5</td>
|
<td style="text-align: left;">dots.mocr</td>
|
||||||
<td>0.850</td>
|
<td>0.850</td>
|
||||||
<td>0.923</td>
|
<td>0.923</td>
|
||||||
<td>0.894</td>
|
<td>0.894</td>
|
||||||
@@ -441,7 +455,7 @@ Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate
|
|||||||
<td>0.790</td>
|
<td>0.790</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td style="text-align: left;"><strong>dots.ocr-1.5-svg</strong></td>
|
<td style="text-align: left;"><strong>dots.mocr-svg</strong></td>
|
||||||
<td><strong>0.860</strong></td>
|
<td><strong>0.860</strong></td>
|
||||||
<td><strong>0.931</strong></td>
|
<td><strong>0.931</strong></td>
|
||||||
<td><strong>0.902</strong></td>
|
<td><strong>0.902</strong></td>
|
||||||
@@ -457,8 +471,8 @@ Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate
|
|||||||
|
|
||||||
> **Note:**
|
> **Note:**
|
||||||
> - We use the ISVGEN metric from [UniSVG](https://ryanlijinke.github.io/) to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image.
|
> - We use the ISVGEN metric from [UniSVG](https://ryanlijinke.github.io/) to evaluate the parsing result. For benchmarks that do not natively support image parsing, we use the original images as input, and calculate the ISVGEN score between the rendered output and the original image.
|
||||||
> - [OCRVerse](https://github.com/DocTron-hub/OCRVerse) results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.ocr-1.5 are based specifically on SVG code.
|
> - [OCRVerse](https://github.com/DocTron-hub/OCRVerse) results are derived from various code formats (e.g., SVG, Python), whereas results for Gemini 3 Pro and dots.mocr are based specifically on SVG code.
|
||||||
> - Due to the capacity constraints of a 3B-parameter VLM, dots.ocr-1.5 may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.ocr-1.5-svg. We plan to further address these limitations in future updates.
|
> - Due to the capacity constraints of a 3B-parameter VLM, dots.mocr may not excel in all tasks yet like svg. To complement this, we are simultaneously releasing dots.mocr-svg. We plan to further address these limitations in future updates.
|
||||||
|
|
||||||
|
|
||||||
### 3. General Vision Tasks
|
### 3. General Vision Tasks
|
||||||
@@ -494,7 +508,20 @@ Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate
|
|||||||
<td>-</td>
|
<td>-</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><strong>dots.ocr-1.5</strong></td>
|
<td>Qwen3vl-4b-instruct</td>
|
||||||
|
<td>76.2</td>
|
||||||
|
<td>39.7</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>80.3</td>
|
||||||
|
<td>95.3</td>
|
||||||
|
<td>-</td>
|
||||||
|
<td>88.1</td>
|
||||||
|
<td>84.1</td>
|
||||||
|
<td>84.9</td>
|
||||||
|
<td>-</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><strong>dots.mocr</strong></td>
|
||||||
<td>77.4</td>
|
<td>77.4</td>
|
||||||
<td>55.3</td>
|
<td>55.3</td>
|
||||||
<td>22.85</td>
|
<td>22.85</td>
|
||||||
@@ -513,29 +540,25 @@ Visual languages (e.g., charts, graphics, chemical formulas, logos) encapsulate
|
|||||||
|
|
||||||
# Quick Start
|
# Quick Start
|
||||||
## 1. Installation
|
## 1. Installation
|
||||||
### Install dots.ocr-1.5
|
### Install dots.mocr
|
||||||
```shell
|
```shell
|
||||||
conda create -n dots_ocr python=3.12
|
conda create -n dots_mocr python=3.12
|
||||||
conda activate dots_ocr
|
conda activate dots_mocr
|
||||||
|
|
||||||
git clone https://github.com/rednote-hilab/dots.ocr.git
|
git clone https://github.com/rednote-hilab/dots.mocr.git
|
||||||
cd dots.ocr
|
cd dots.mocr
|
||||||
|
|
||||||
# Install pytorch, see https://pytorch.org/get-started/previous-versions/ for your cuda version
|
# Install pytorch, see https://pytorch.org/get-started/previous-versions/ for your cuda version
|
||||||
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
|
# pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
|
||||||
|
# install flash-attn==2.8.0.post2 for faster inference
|
||||||
pip install -e .
|
pip install -e .
|
||||||
```
|
```
|
||||||
|
|
||||||
If you have trouble with the installation, try our [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) for an easier setup, and follow these steps:
|
If you have trouble with the installation, try our [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) for an easier setup, and follow these steps:
|
||||||
```shell
|
|
||||||
git clone https://github.com/rednote-hilab/dots.ocr.git
|
|
||||||
cd dots.ocr
|
|
||||||
pip install -e .
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Download Model Weights
|
### Download Model Weights
|
||||||
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR_1_5` instead of `dots.ocr-1.5`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
> 💡**Note:** Please use a directory name without periods (e.g., `DotsMOCR` instead of `dots.mocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
||||||
```shell
|
```shell
|
||||||
python3 tools/download_model.py
|
python3 tools/download_model.py
|
||||||
|
|
||||||
@@ -546,28 +569,25 @@ python3 tools/download_model.py --type modelscope
|
|||||||
|
|
||||||
## 2. Deployment
|
## 2. Deployment
|
||||||
### vLLM inference
|
### vLLM inference
|
||||||
We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
|
We highly recommend using vLLM for deployment and inference. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
|
||||||
|
|
||||||
> **Note:**
|
|
||||||
> - We found a little bit performance drop when using vLLM 0.11.0. We are working on a fix.
|
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
# Launch vLLM model server
|
# Launch vLLM model server
|
||||||
## dots.ocr-1.5
|
## dots.mocr
|
||||||
CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5 --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
|
CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.mocr --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
|
||||||
|
|
||||||
## dots.ocr-1.5-svg
|
## dots.mocr-svg
|
||||||
CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.ocr-1.5-svg --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
|
CUDA_VISIBLE_DEVICES=0 vllm serve rednote-hilab/dots.mocr-svg --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name model --trust-remote-code
|
||||||
|
|
||||||
# vLLM API Demo
|
# vLLM API Demo
|
||||||
# See dots_ocr/model/inference.py and dots_ocr/utils/prompts.py for details on parameter and prompt settings
|
# See dots_mocr/model/inference.py and dots_mocr/utils/prompts.py for details on parameter and prompt settings
|
||||||
# that help achieve the best output quality.
|
# that help achieve the best output quality.
|
||||||
## document parsing
|
## document parsing
|
||||||
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
||||||
## web parsing
|
## web parsing
|
||||||
python3 ./demo/demo_vllm.py --prompt_mode prompt_web_parsing --image_path ./assets/showcase_dots_ocr_1_5/origin/webpage_1.png
|
python3 ./demo/demo_vllm.py --prompt_mode prompt_web_parsing --image_path ./assets/showcase/origin/webpage_1.png
|
||||||
## scene spoting
|
## scene spoting
|
||||||
python3 ./demo/demo_vllm.py --prompt_mode prompt_scene_spotting --image_path ./assets/showcase_dots_ocr_1_5/origin/scene_1.jpg
|
python3 ./demo/demo_vllm.py --prompt_mode prompt_scene_spotting --image_path ./assets/showcase/origin/scene_1.jpg
|
||||||
## image parsing with svg code
|
## image parsing with svg code
|
||||||
python3 ./demo/demo_vllm_svg.py --prompt_mode prompt_image_to_svg
|
python3 ./demo/demo_vllm_svg.py --prompt_mode prompt_image_to_svg
|
||||||
## general qa
|
## general qa
|
||||||
@@ -586,9 +606,9 @@ python3 demo/demo_hf.py
|
|||||||
import torch
|
import torch
|
||||||
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
|
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
|
||||||
from qwen_vl_utils import process_vision_info
|
from qwen_vl_utils import process_vision_info
|
||||||
from dots_ocr.utils import dict_promptmode_to_prompt
|
from dots_mocr.utils import dict_promptmode_to_prompt
|
||||||
|
|
||||||
model_path = "./weights/DotsOCR_1_5"
|
model_path = "./weights/DotsMOCR"
|
||||||
model = AutoModelForCausalLM.from_pretrained(
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
model_path,
|
model_path,
|
||||||
attn_implementation="flash_attention_2",
|
attn_implementation="flash_attention_2",
|
||||||
@@ -672,21 +692,21 @@ Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues
|
|||||||
|
|
||||||
# Parse all layout info, both detection and recognition
|
# Parse all layout info, both detection and recognition
|
||||||
# Parse a single image
|
# Parse a single image
|
||||||
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
python3 dots_mocr/parser.py demo/demo_image1.jpg
|
||||||
# Parse a single PDF
|
# Parse a single PDF
|
||||||
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
|
python3 dots_mocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
|
||||||
|
|
||||||
# Layout detection only
|
# Layout detection only
|
||||||
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
python3 dots_mocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
||||||
|
|
||||||
# Parse text only, except Page-header and Page-footer
|
# Parse text only, except Page-header and Page-footer
|
||||||
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
|
python3 dots_mocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
|
**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
|
||||||
|
|
||||||
> Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
|
> Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsMOCRParser(..,use_hf=True)`
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><b>Output Results</b></summary>
|
<summary><b>Output Results</b></summary>
|
||||||
@@ -704,32 +724,32 @@ Have fun with the [live demo](https://dotsocr.xiaohongshu.com/).
|
|||||||
|
|
||||||
|
|
||||||
### Examples for document parsing
|
### Examples for document parsing
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/formula1.png" alt="formula1.png" border="0" />
|
<img src="assets/showcase/result/formula1.png" alt="formula1.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/table3.png" alt="table3.png" border="0" />
|
<img src="assets/showcase/result/table3.png" alt="table3.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/Tibetan.png" alt="Tibetan.png" border="0" />
|
<img src="assets/showcase/result/Tibetan.png" alt="Tibetan.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/tradition_zh.png" alt="tradition_zh.png" border="0" />
|
<img src="assets/showcase/result/tradition_zh.png" alt="tradition_zh.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/nl.png" alt="nl.png" border="0" />
|
<img src="assets/showcase/result/nl.png" alt="nl.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/kannada.png" alt="kannada.png" border="0" />
|
<img src="assets/showcase/result/kannada.png" alt="kannada.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase/russian.png" alt="russian.png" border="0" />
|
<img src="assets/showcase/result/russian.png" alt="russian.png" border="0" />
|
||||||
|
|
||||||
|
|
||||||
### Examples for image parsing
|
### Examples for image parsing
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/svg_1.png" alt="svg_1.png" border="0" />
|
<img src="assets/showcase/result/svg_1.png" alt="svg_1.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/svg_2.png" alt="svg_2.png" border="0" />
|
<img src="assets/showcase/result/svg_2.png" alt="svg_2.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/svg_4.png" alt="svg_4.png" border="0" />
|
<img src="assets/showcase/result/svg_4.png" alt="svg_4.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/svg_5.png" alt="svg_5.png" border="0" />
|
<img src="assets/showcase/result/svg_5.png" alt="svg_5.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/svg_6.png" alt="svg_6.png" border="0" />
|
<img src="assets/showcase/result/svg_6.png" alt="svg_6.png" border="0" />
|
||||||
|
|
||||||
> **Note:**
|
> **Note:**
|
||||||
> - Inferenced by dots.ocr-1.5-svg
|
> - Inferenced by dots.mocr-svg
|
||||||
|
|
||||||
### Example for web parsing
|
### Example for web parsing
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/webpage_1.png" alt="webpage_1.png" border="0" />
|
<img src="assets/showcase/result/webpage_1.png" alt="webpage_1.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/webpage_2.png" alt="webpage_2.png" border="0" />
|
<img src="assets/showcase/result/webpage_2.png" alt="webpage_2.png" border="0" />
|
||||||
|
|
||||||
### Examples for scene spotting
|
### Examples for scene spotting
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/scene_1.png" alt="scene_1.png" border="0" />
|
<img src="assets/showcase/result/scene_1.png" alt="scene_1.png" border="0" />
|
||||||
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/showcase_dots_ocr_1_5/result/scene_2.png" alt="scene_2.png" border="0" />
|
<img src="assets/showcase/result/scene_2.png" alt="scene_2.png" border="0" />
|
||||||
|
|
||||||
|
|
||||||
# Limitation & Future Work
|
# Limitation & Future Work
|
||||||
@@ -743,6 +763,18 @@ Have fun with the [live demo](https://dotsocr.xiaohongshu.com/).
|
|||||||
|
|
||||||
# Citation
|
# Citation
|
||||||
|
|
||||||
|
```BibTeX
|
||||||
|
@misc{zheng2026multimodalocrparsedocuments,
|
||||||
|
title={Multimodal OCR: Parse Anything from Documents},
|
||||||
|
author={Handong Zheng and Yumeng Li and Kaile Zhang and Liang Xin and Guangwei Zhao and Hao Liu and Jiayu Chen and Jie Lou and Jiyu Qiu and Qi Fu and Rui Yang and Shuo Jiang and Weijian Luo and Weijie Su and Weijun Zhang and Xingyu Zhu and Yabin Li and Yiwei ma and Yu Chen and Zhaohui Yu and Guang Yang and Colin Zhang and Lei Zhang and Yuliang Liu and Xiang Bai},
|
||||||
|
year={2026},
|
||||||
|
eprint={2603.13032},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CV},
|
||||||
|
url={https://arxiv.org/abs/2603.13032},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
```BibTeX
|
```BibTeX
|
||||||
@misc{li2025dotsocrmultilingualdocumentlayout,
|
@misc{li2025dotsocrmultilingualdocumentlayout,
|
||||||
title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
|
title={dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model},
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 943 KiB After Width: | Height: | Size: 943 KiB |
|
Before Width: | Height: | Size: 662 KiB After Width: | Height: | Size: 662 KiB |
|
Before Width: | Height: | Size: 292 KiB After Width: | Height: | Size: 292 KiB |
|
Before Width: | Height: | Size: 263 KiB After Width: | Height: | Size: 263 KiB |
|
After Width: | Height: | Size: 145 KiB |
|
Before Width: | Height: | Size: 445 KiB After Width: | Height: | Size: 445 KiB |
|
Before Width: | Height: | Size: 1.1 MiB After Width: | Height: | Size: 1.1 MiB |
|
Before Width: | Height: | Size: 673 KiB After Width: | Height: | Size: 673 KiB |
|
Before Width: | Height: | Size: 1.7 MiB After Width: | Height: | Size: 1.7 MiB |
|
Before Width: | Height: | Size: 164 KiB After Width: | Height: | Size: 164 KiB |
|
Before Width: | Height: | Size: 129 KiB After Width: | Height: | Size: 129 KiB |
|
After Width: | Height: | Size: 102 KiB |
|
Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB |
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
|
After Width: | Height: | Size: 62 KiB |
|
Before Width: | Height: | Size: 112 KiB After Width: | Height: | Size: 112 KiB |
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 54 KiB |
|
Before Width: | Height: | Size: 53 KiB After Width: | Height: | Size: 53 KiB |
|
After Width: | Height: | Size: 57 KiB |
|
Before Width: | Height: | Size: 755 KiB After Width: | Height: | Size: 755 KiB |
|
Before Width: | Height: | Size: 920 KiB After Width: | Height: | Size: 920 KiB |
|
Before Width: | Height: | Size: 2.0 MiB After Width: | Height: | Size: 2.0 MiB |
|
Before Width: | Height: | Size: 937 KiB After Width: | Height: | Size: 937 KiB |
|
Before Width: | Height: | Size: 4.1 MiB After Width: | Height: | Size: 4.1 MiB |
|
Before Width: | Height: | Size: 374 KiB After Width: | Height: | Size: 374 KiB |
|
After Width: | Height: | Size: 4.7 MiB |
|
Before Width: | Height: | Size: 2.8 MiB After Width: | Height: | Size: 2.8 MiB |
|
Before Width: | Height: | Size: 1.2 MiB After Width: | Height: | Size: 1.2 MiB |
|
Before Width: | Height: | Size: 1.7 MiB After Width: | Height: | Size: 1.7 MiB |
|
Before Width: | Height: | Size: 1.0 MiB After Width: | Height: | Size: 1.0 MiB |
|
Before Width: | Height: | Size: 1013 KiB After Width: | Height: | Size: 1013 KiB |
|
Before Width: | Height: | Size: 1.8 MiB After Width: | Height: | Size: 1.8 MiB |
|
Before Width: | Height: | Size: 3.7 MiB After Width: | Height: | Size: 3.7 MiB |
|
Before Width: | Height: | Size: 2.8 MiB After Width: | Height: | Size: 2.8 MiB |
|
Before Width: | Height: | Size: 2.9 MiB After Width: | Height: | Size: 2.9 MiB |
|
Before Width: | Height: | Size: 1.9 MiB After Width: | Height: | Size: 1.9 MiB |
|
Before Width: | Height: | Size: 1.5 MiB After Width: | Height: | Size: 1.5 MiB |
|
Before Width: | Height: | Size: 783 KiB After Width: | Height: | Size: 783 KiB |
|
Before Width: | Height: | Size: 721 KiB After Width: | Height: | Size: 721 KiB |
|
Before Width: | Height: | Size: 753 KiB After Width: | Height: | Size: 753 KiB |
|
Before Width: | Height: | Size: 887 KiB After Width: | Height: | Size: 887 KiB |
|
Before Width: | Height: | Size: 946 KiB After Width: | Height: | Size: 946 KiB |
|
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
|
Before Width: | Height: | Size: 1.7 MiB After Width: | Height: | Size: 1.7 MiB |
|
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
|
Before Width: | Height: | Size: 1.8 MiB After Width: | Height: | Size: 1.8 MiB |
|
Before Width: | Height: | Size: 1.0 MiB After Width: | Height: | Size: 1.0 MiB |
|
Before Width: | Height: | Size: 921 KiB After Width: | Height: | Size: 921 KiB |
@@ -35,21 +35,143 @@ DEFAULT_CONFIG = {
|
|||||||
'port_vllm': 8000,
|
'port_vllm': 8000,
|
||||||
'min_pixels': MIN_PIXELS,
|
'min_pixels': MIN_PIXELS,
|
||||||
'max_pixels': MAX_PIXELS,
|
'max_pixels': MAX_PIXELS,
|
||||||
'test_images_dir': "./assets/showcase_origin",
|
'test_images_dir': "./assets/showcase/origin",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# ==================== Multi-Model Server Configuration ====================
|
||||||
|
MODEL_SERVERS = {
|
||||||
|
"dots.mocr": {
|
||||||
|
'ip': "127.0.0.1",
|
||||||
|
'port_vllm': 8000,
|
||||||
|
'description': "dots.mocr"
|
||||||
|
},
|
||||||
|
"dots.mocr-svg": {
|
||||||
|
'ip': "127.0.0.1",
|
||||||
|
'port_vllm': 8000, # 请根据实际情况修改端口
|
||||||
|
'description': "dots.mocr-svg"
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#每个prompt的预处理写死
|
||||||
|
PROMPT_TO_FITZ_PREPROCESS = {
|
||||||
|
"prompt_layout_all_en": True, # 文档布局分析 - 启用预处理
|
||||||
|
"prompt_layout_only_en": True, # 仅布局检测 - 启用预处理
|
||||||
|
"prompt_ocr": True, # 仅文字识别 - 启用预处理
|
||||||
|
"prompt_web_parsing": False, # 网页解析 - 禁用预处理
|
||||||
|
"prompt_scene_spotting": False, # 场景检测 - 禁用预处理
|
||||||
|
"prompt_image_to_svg": False, # SVG 转换 - 禁用预处理
|
||||||
|
"prompt_general": False, # 自由问答 - 禁用预处理
|
||||||
|
}
|
||||||
|
|
||||||
|
#不同任务需要不同temperature
|
||||||
|
PROMPT_TO_TEMPERATURE = {
|
||||||
|
"prompt_layout_all_en": 0.1, # 文档布局分析 - 低温度,更确定性
|
||||||
|
"prompt_layout_only_en": 0.1, # 仅布局检测 - 低温度
|
||||||
|
"prompt_ocr": 0.1, # OCR 识别 - 低温度
|
||||||
|
"prompt_web_parsing": 0.1, # 网页解析 - 稍高一点
|
||||||
|
"prompt_scene_spotting": 0.1, # 场景检测 - 中等温度
|
||||||
|
"prompt_image_to_svg": 0.9, # SVG 转换 - 较低温度
|
||||||
|
"prompt_general": 0.1, # 自由问答 - 高温度,更有创造性
|
||||||
|
}
|
||||||
|
|
||||||
|
# 不同prompt_mode对应的模型
|
||||||
|
PROMPT_TO_MODEL = {
|
||||||
|
"prompt_image_to_svg": "dots.mocr-svg", # SVG任务使用SVG模型
|
||||||
|
}
|
||||||
|
|
||||||
|
# ==================== Demo Case Configuration ====================
|
||||||
|
# 根据文件名自动选择 prompt_mode 和预设的 custom_prompt
|
||||||
|
DEMO_CASE_CONFIG = {
|
||||||
|
# 格式: "文件名关键字": {"prompt_mode": "xxx", "custom_prompt": "xxx"}
|
||||||
|
|
||||||
|
# 布局分析类
|
||||||
|
"doc": {"prompt_mode": "prompt_layout_all_en"},
|
||||||
|
"formula": {"prompt_mode": "prompt_layout_all_en"},
|
||||||
|
"table": {"prompt_mode": "prompt_layout_all_en"},
|
||||||
|
|
||||||
|
# 仅布局检测
|
||||||
|
"detect": {"prompt_mode": "prompt_layout_only_en"},
|
||||||
|
# OCR 识别
|
||||||
|
"ocr": {"prompt_mode": "prompt_ocr"},
|
||||||
|
|
||||||
|
# 网页解析
|
||||||
|
"webpage": {"prompt_mode": "prompt_web_parsing"},
|
||||||
|
|
||||||
|
# 场景文字检测
|
||||||
|
"scene": {"prompt_mode": "prompt_scene_spotting"},
|
||||||
|
|
||||||
|
# SVG 转换
|
||||||
|
"svg": {"prompt_mode": "prompt_image_to_svg"},
|
||||||
|
|
||||||
|
# QA 任务(带预设 prompt)
|
||||||
|
"general_qa": {
|
||||||
|
"prompt_mode": "prompt_general",
|
||||||
|
"custom_prompt": "Across panels 1-12 plotting against clean accuracy, which variable appears most positively correlated with clean accuracy?"
|
||||||
|
},
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
# 默认配置(找不到匹配时使用)
|
||||||
|
DEFAULT_DEMO_CONFIG = {"prompt_mode": "prompt_layout_all_en"}
|
||||||
|
|
||||||
|
def get_config_for_file(file_path):
|
||||||
|
"""
|
||||||
|
根据文件名自动匹配 prompt_mode 和 custom_prompt
|
||||||
|
支持部分匹配(文件名包含关键字即可)
|
||||||
|
"""
|
||||||
|
if not file_path:
|
||||||
|
return DEFAULT_DEMO_CONFIG.copy()
|
||||||
|
|
||||||
|
filename = os.path.basename(file_path).lower()
|
||||||
|
|
||||||
|
# 遍历配置字典,查找匹配的关键字
|
||||||
|
for keyword, config in DEMO_CASE_CONFIG.items():
|
||||||
|
if keyword.lower() in filename:
|
||||||
|
return config.copy()
|
||||||
|
|
||||||
|
# 没有匹配则返回默认配置
|
||||||
|
return DEFAULT_DEMO_CONFIG.copy()
|
||||||
|
|
||||||
# ==================== Global Variables ====================
|
# ==================== Global Variables ====================
|
||||||
# Store current configuration
|
# Store current configuration
|
||||||
current_config = DEFAULT_CONFIG.copy()
|
current_config = DEFAULT_CONFIG.copy()
|
||||||
|
|
||||||
# Create DotsOCRParser instance
|
# Parser cache for multiple models
|
||||||
dots_parser = DotsOCRParser(
|
_parser_cache = {}
|
||||||
ip=DEFAULT_CONFIG['ip'],
|
|
||||||
port=DEFAULT_CONFIG['port_vllm'],
|
def get_parser(model_name: str, min_pixels: int = None, max_pixels: int = None) -> DotsMOCRParser:
|
||||||
|
"""
|
||||||
|
Get or create a parser instance for the specified model.
|
||||||
|
Uses cache to avoid recreating parsers for the same model.
|
||||||
|
"""
|
||||||
|
if model_name not in MODEL_SERVERS:
|
||||||
|
raise ValueError(f"Unknown model: {model_name}")
|
||||||
|
|
||||||
|
model_config = MODEL_SERVERS[model_name]
|
||||||
|
|
||||||
|
# Create cache key based on model and pixel settings
|
||||||
|
cache_key = model_name
|
||||||
|
|
||||||
|
# If parser exists in cache, update its settings and return
|
||||||
|
if cache_key in _parser_cache:
|
||||||
|
parser = _parser_cache[cache_key]
|
||||||
|
parser.min_pixels = min_pixels or DEFAULT_CONFIG['min_pixels']
|
||||||
|
parser.max_pixels = max_pixels or DEFAULT_CONFIG['max_pixels']
|
||||||
|
return parser
|
||||||
|
|
||||||
|
# Create new parser instance
|
||||||
|
parser = DotsMOCRParser(
|
||||||
|
ip=model_config['ip'],
|
||||||
|
port=model_config['port_vllm'],
|
||||||
dpi=200,
|
dpi=200,
|
||||||
min_pixels=DEFAULT_CONFIG['min_pixels'],
|
min_pixels=min_pixels or DEFAULT_CONFIG['min_pixels'],
|
||||||
max_pixels=DEFAULT_CONFIG['max_pixels']
|
max_pixels=max_pixels or DEFAULT_CONFIG['max_pixels']
|
||||||
)
|
)
|
||||||
|
_parser_cache[cache_key] = parser
|
||||||
|
return parser
|
||||||
|
|
||||||
def get_initial_session_state():
|
def get_initial_session_state():
|
||||||
return {
|
return {
|
||||||
@@ -71,7 +193,8 @@ def get_initial_session_state():
|
|||||||
"file_type": None,
|
"file_type": None,
|
||||||
"is_parsed": False,
|
"is_parsed": False,
|
||||||
"results": []
|
"results": []
|
||||||
}
|
},
|
||||||
|
'auto_custom_prompt': None,
|
||||||
}
|
}
|
||||||
|
|
||||||
def read_image_v2(img):
|
def read_image_v2(img):
|
||||||
@@ -118,6 +241,46 @@ def load_file_for_preview(file_path, session_state):
|
|||||||
|
|
||||||
return pages[0], f"<div id='page_info_box'>1 / {len(pages)}</div>", session_state
|
return pages[0], f"<div id='page_info_box'>1 / {len(pages)}</div>", session_state
|
||||||
|
|
||||||
|
def on_test_image_select(file_path, session_state):
|
||||||
|
"""选择测试图片时的回调:加载预览 + 自动设置 prompt_mode + 自动切换模型"""
|
||||||
|
preview_image, page_info, session_state = load_file_for_preview(file_path, session_state)
|
||||||
|
|
||||||
|
if not file_path:
|
||||||
|
return (
|
||||||
|
preview_image,
|
||||||
|
page_info,
|
||||||
|
session_state,
|
||||||
|
gr.update(),
|
||||||
|
gr.update(),
|
||||||
|
gr.update()
|
||||||
|
)
|
||||||
|
|
||||||
|
auto_config = get_config_for_file(file_path)
|
||||||
|
prompt_mode_value = auto_config["prompt_mode"]
|
||||||
|
custom_prompt_value = auto_config.get("custom_prompt", "")
|
||||||
|
|
||||||
|
session_state['auto_custom_prompt'] = custom_prompt_value if custom_prompt_value else None
|
||||||
|
|
||||||
|
is_free_qa = prompt_mode_value == 'prompt_general'
|
||||||
|
if is_free_qa and custom_prompt_value:
|
||||||
|
prompt_text = custom_prompt_value
|
||||||
|
else:
|
||||||
|
prompt_text = update_prompt_display(prompt_mode_value)
|
||||||
|
|
||||||
|
# 根据prompt_mode自动选择模型
|
||||||
|
auto_model = PROMPT_TO_MODEL.get(prompt_mode_value, list(MODEL_SERVERS.keys())[0])
|
||||||
|
|
||||||
|
return (
|
||||||
|
preview_image,
|
||||||
|
page_info,
|
||||||
|
session_state,
|
||||||
|
gr.update(value=prompt_mode_value),
|
||||||
|
gr.update(value=prompt_text, interactive=is_free_qa),
|
||||||
|
gr.update(value=auto_model),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def turn_page(direction, session_state):
|
def turn_page(direction, session_state):
|
||||||
"""Page turning function"""
|
"""Page turning function"""
|
||||||
pdf_cache = session_state['pdf_cache']
|
pdf_cache = session_state['pdf_cache']
|
||||||
@@ -152,20 +315,23 @@ def get_test_images():
|
|||||||
test_images = []
|
test_images = []
|
||||||
test_dir = current_config['test_images_dir']
|
test_dir = current_config['test_images_dir']
|
||||||
if os.path.exists(test_dir):
|
if os.path.exists(test_dir):
|
||||||
test_images = [os.path.join(test_dir, name) for name in os.listdir(test_dir)
|
test_images = sorted([
|
||||||
if name.lower().endswith(('.png', '.jpg', '.jpeg', '.pdf'))]
|
os.path.join(test_dir, name)
|
||||||
|
for name in os.listdir(test_dir)
|
||||||
|
if name.lower().endswith(('.png', '.jpg', '.jpeg', '.pdf'))
|
||||||
|
])
|
||||||
return test_images
|
return test_images
|
||||||
|
|
||||||
def create_temp_session_dir():
|
def create_temp_session_dir():
|
||||||
"""Creates a unique temporary directory for each processing request"""
|
"""Creates a unique temporary directory for each processing request"""
|
||||||
session_id = uuid.uuid4().hex[:8]
|
session_id = uuid.uuid4().hex[:8]
|
||||||
temp_dir = os.path.join(tempfile.gettempdir(), f"dots_ocr_demo_{session_id}")
|
temp_dir = os.path.join(tempfile.gettempdir(), f"dots_mocr_demo_{session_id}")
|
||||||
os.makedirs(temp_dir, exist_ok=True)
|
os.makedirs(temp_dir, exist_ok=True)
|
||||||
return temp_dir, session_id
|
return temp_dir, session_id
|
||||||
|
|
||||||
def parse_image_with_high_level_api(parser, image, prompt_mode, fitz_preprocess=False):
|
def parse_image_with_high_level_api(parser, image, prompt_mode, fitz_preprocess=False, custom_prompt=None, temperature=None):
|
||||||
"""
|
"""
|
||||||
Processes using the high-level API parse_image from DotsOCRParser
|
Processes using the high-level API parse_image from DotsMOCRParser
|
||||||
"""
|
"""
|
||||||
# Create a temporary session directory
|
# Create a temporary session directory
|
||||||
temp_dir, session_id = create_temp_session_dir()
|
temp_dir, session_id = create_temp_session_dir()
|
||||||
@@ -182,7 +348,9 @@ def parse_image_with_high_level_api(parser, image, prompt_mode, fitz_preprocess=
|
|||||||
filename=filename,
|
filename=filename,
|
||||||
prompt_mode=prompt_mode,
|
prompt_mode=prompt_mode,
|
||||||
save_dir=temp_dir,
|
save_dir=temp_dir,
|
||||||
fitz_preprocess=fitz_preprocess
|
fitz_preprocess=fitz_preprocess,
|
||||||
|
custom_prompt=custom_prompt,
|
||||||
|
temperature=temperature,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Parse the results
|
# Parse the results
|
||||||
@@ -223,7 +391,7 @@ def parse_image_with_high_level_api(parser, image, prompt_mode, fitz_preprocess=
|
|||||||
|
|
||||||
def parse_pdf_with_high_level_api(parser, pdf_path, prompt_mode):
|
def parse_pdf_with_high_level_api(parser, pdf_path, prompt_mode):
|
||||||
"""
|
"""
|
||||||
Processes using the high-level API parse_pdf from DotsOCRParser
|
Processes using the high-level API parse_pdf from DotsMOCRParser
|
||||||
"""
|
"""
|
||||||
# Create a temporary session directory
|
# Create a temporary session directory
|
||||||
temp_dir, session_id = create_temp_session_dir()
|
temp_dir, session_id = create_temp_session_dir()
|
||||||
@@ -292,8 +460,9 @@ def parse_pdf_with_high_level_api(parser, pdf_path, prompt_mode):
|
|||||||
|
|
||||||
# ==================== Core Processing Function ====================
|
# ==================== Core Processing Function ====================
|
||||||
def process_image_inference(session_state, test_image_input, file_input,
|
def process_image_inference(session_state, test_image_input, file_input,
|
||||||
prompt_mode, server_ip, server_port, min_pixels, max_pixels,
|
prompt_mode, model_selector, # Changed: use model_selector instead of server_ip/port
|
||||||
fitz_preprocess=False
|
min_pixels, max_pixels,
|
||||||
|
fitz_preprocess=False, custom_prompt=""
|
||||||
):
|
):
|
||||||
"""Core function to handle image/PDF inference"""
|
"""Core function to handle image/PDF inference"""
|
||||||
# Use session_state instead of global variables
|
# Use session_state instead of global variables
|
||||||
@@ -310,18 +479,23 @@ def process_image_inference(session_state, test_image_input, file_input,
|
|||||||
session_state['processing_results'] = get_initial_session_state()['processing_results']
|
session_state['processing_results'] = get_initial_session_state()['processing_results']
|
||||||
processing_results = session_state['processing_results']
|
processing_results = session_state['processing_results']
|
||||||
|
|
||||||
|
fitz_preprocess = PROMPT_TO_FITZ_PREPROCESS.get(prompt_mode, True)
|
||||||
|
temperature = PROMPT_TO_TEMPERATURE.get(prompt_mode, 0.1)
|
||||||
|
print(temperature)
|
||||||
|
# Get the selected model configuration
|
||||||
|
model_config = MODEL_SERVERS[model_selector]
|
||||||
current_config.update({
|
current_config.update({
|
||||||
'ip': server_ip,
|
'ip': model_config['ip'],
|
||||||
'port_vllm': server_port,
|
'port_vllm': model_config['port_vllm'],
|
||||||
'min_pixels': min_pixels,
|
'min_pixels': min_pixels,
|
||||||
'max_pixels': max_pixels
|
'max_pixels': max_pixels
|
||||||
})
|
})
|
||||||
|
|
||||||
# Update parser configuration
|
# Get parser for the selected model
|
||||||
dots_parser.ip = server_ip
|
try:
|
||||||
dots_parser.port = server_port
|
dots_parser = get_parser(model_selector, min_pixels, max_pixels)
|
||||||
dots_parser.min_pixels = min_pixels
|
except ValueError as e:
|
||||||
dots_parser.max_pixels = max_pixels
|
return None, f"Error: {str(e)}", "", "", gr.update(value=None), None, "", session_state
|
||||||
|
|
||||||
input_file_path = file_input if file_input else test_image_input
|
input_file_path = file_input if file_input else test_image_input
|
||||||
|
|
||||||
@@ -348,7 +522,7 @@ def process_image_inference(session_state, test_image_input, file_input,
|
|||||||
})
|
})
|
||||||
|
|
||||||
total_elements = len(pdf_result['combined_cells_data'])
|
total_elements = len(pdf_result['combined_cells_data'])
|
||||||
info_text = f"**PDF Information:**\n- Total Pages: {pdf_result['total_pages']}\n- Server: {current_config['ip']}:{current_config['port_vllm']}\n- Total Detected Elements: {total_elements}\n- Session ID: {pdf_result['session_id']}"
|
info_text = f"**PDF Information:**\n- Total Pages: {pdf_result['total_pages']}\n- Model: {model_selector}\n- Server: {model_config['ip']}:{model_config['port_vllm']}\n- Total Detected Elements: {total_elements}\n- Session ID: {pdf_result['session_id']}"
|
||||||
|
|
||||||
current_page_layout_image = preview_image
|
current_page_layout_image = preview_image
|
||||||
current_page_json = ""
|
current_page_json = ""
|
||||||
@@ -381,10 +555,11 @@ def process_image_inference(session_state, test_image_input, file_input,
|
|||||||
session_state['pdf_cache'] = get_initial_session_state()['pdf_cache']
|
session_state['pdf_cache'] = get_initial_session_state()['pdf_cache']
|
||||||
|
|
||||||
original_image = image
|
original_image = image
|
||||||
parse_result = parse_image_with_high_level_api(dots_parser, image, prompt_mode, fitz_preprocess)
|
effective_custom_prompt = custom_prompt if prompt_mode == 'prompt_general' else None
|
||||||
|
parse_result = parse_image_with_high_level_api(dots_parser, image, prompt_mode, fitz_preprocess, effective_custom_prompt, temperature)
|
||||||
|
|
||||||
if parse_result['filtered']:
|
if parse_result['filtered']:
|
||||||
info_text = f"**Image Information:**\n- Original Size: {original_image.width} x {original_image.height}\n- Processing: JSON parsing failed, using cleaned text output\n- Server: {current_config['ip']}:{current_config['port_vllm']}\n- Session ID: {parse_result['session_id']}"
|
info_text = f"**Image Information:**\n- Original Size: {original_image.width} x {original_image.height}\n- Model: {model_selector}\n- Processing: JSON parsing failed, using cleaned text output\n- Server: {model_config['ip']}:{model_config['port_vllm']}\n- Session ID: {parse_result['session_id']}"
|
||||||
processing_results.update({
|
processing_results.update({
|
||||||
'original_image': original_image, 'markdown_content': parse_result['md_content'],
|
'original_image': original_image, 'markdown_content': parse_result['md_content'],
|
||||||
'temp_dir': parse_result['temp_dir'], 'session_id': parse_result['session_id'],
|
'temp_dir': parse_result['temp_dir'], 'session_id': parse_result['session_id'],
|
||||||
@@ -401,7 +576,7 @@ def process_image_inference(session_state, test_image_input, file_input,
|
|||||||
})
|
})
|
||||||
|
|
||||||
num_elements = len(parse_result['cells_data']) if parse_result['cells_data'] else 0
|
num_elements = len(parse_result['cells_data']) if parse_result['cells_data'] else 0
|
||||||
info_text = f"**Image Information:**\n- Original Size: {original_image.width} x {original_image.height}\n- Model Input Size: {parse_result['input_width']} x {parse_result['input_height']}\n- Server: {current_config['ip']}:{current_config['port_vllm']}\n- Detected {num_elements} layout elements\n- Session ID: {parse_result['session_id']}"
|
info_text = f"**Image Information:**\n- Original Size: {original_image.width} x {original_image.height}\n- Model Input Size: {parse_result['input_width']} x {parse_result['input_height']}\n- Model: {model_selector}\n- Server: {model_config['ip']}:{model_config['port_vllm']}\n- Detected {num_elements} layout elements\n- Session ID: {parse_result['session_id']}"
|
||||||
|
|
||||||
current_json = json.dumps(parse_result['cells_data'], ensure_ascii=False, indent=2) if parse_result['cells_data'] else ""
|
current_json = json.dumps(parse_result['cells_data'], ensure_ascii=False, indent=2) if parse_result['cells_data'] else ""
|
||||||
|
|
||||||
@@ -452,6 +627,8 @@ def clear_all_data(session_state):
|
|||||||
|
|
||||||
def update_prompt_display(prompt_mode):
|
def update_prompt_display(prompt_mode):
|
||||||
"""Updates the prompt display content"""
|
"""Updates the prompt display content"""
|
||||||
|
if prompt_mode == 'prompt_general':
|
||||||
|
return "" # free_qa 模式下清空,让用户输入
|
||||||
return dict_promptmode_to_prompt[prompt_mode]
|
return dict_promptmode_to_prompt[prompt_mode]
|
||||||
|
|
||||||
# ==================== Gradio Interface ====================
|
# ==================== Gradio Interface ====================
|
||||||
@@ -515,18 +692,22 @@ def create_gradio_interface():
|
|||||||
#markdown_tabs {
|
#markdown_tabs {
|
||||||
height: 100%;
|
height: 100%;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#model_selector_box {
|
||||||
|
margin-bottom: 8px;
|
||||||
|
}
|
||||||
"""
|
"""
|
||||||
|
|
||||||
with gr.Blocks(theme="ocean", css=css, title='dots.ocr') as demo:
|
with gr.Blocks(theme="ocean", css=css, title='dots.mocr') as demo:
|
||||||
session_state = gr.State(value=get_initial_session_state())
|
session_state = gr.State(value=get_initial_session_state())
|
||||||
|
|
||||||
# Title
|
# Title
|
||||||
gr.HTML("""
|
gr.HTML("""
|
||||||
<div style="display: flex; align-items: center; justify-content: center; margin-bottom: 20px;">
|
<div style="display: flex; align-items: center; justify-content: center; margin-bottom: 20px;">
|
||||||
<h1 style="margin: 0; font-size: 2em;">🔍 dots.ocr</h1>
|
<h1 style="margin: 0; font-size: 2em;">🔍 dots.mocr</h1>
|
||||||
</div>
|
</div>
|
||||||
<div style="text-align: center; margin-bottom: 10px;">
|
<div style="text-align: center; margin-bottom: 10px;">
|
||||||
<em>Supports image/PDF layout analysis and structured output</em>
|
<em>Recognize Any Human Scripts and Symbols</em>
|
||||||
</div>
|
</div>
|
||||||
""")
|
""")
|
||||||
|
|
||||||
@@ -540,6 +721,15 @@ def create_gradio_interface():
|
|||||||
file_types=[".pdf", ".jpg", ".jpeg", ".png"],
|
file_types=[".pdf", ".jpg", ".jpeg", ".png"],
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# ============ NEW: Model Selector ============
|
||||||
|
model_selector = gr.Dropdown(
|
||||||
|
label="🤖 Select Model",
|
||||||
|
choices=list(MODEL_SERVERS.keys()),
|
||||||
|
value=list(MODEL_SERVERS.keys())[0],
|
||||||
|
elem_id="model_selector_box",
|
||||||
|
info="Switch between different model servers"
|
||||||
|
)
|
||||||
|
|
||||||
test_images = get_test_images()
|
test_images = get_test_images()
|
||||||
test_image_input = gr.Dropdown(
|
test_image_input = gr.Dropdown(
|
||||||
label="Or Select an Example",
|
label="Or Select an Example",
|
||||||
@@ -550,7 +740,15 @@ def create_gradio_interface():
|
|||||||
gr.Markdown("### ⚙️ Prompt & Actions")
|
gr.Markdown("### ⚙️ Prompt & Actions")
|
||||||
prompt_mode = gr.Dropdown(
|
prompt_mode = gr.Dropdown(
|
||||||
label="Select Prompt",
|
label="Select Prompt",
|
||||||
choices=["prompt_layout_all_en", "prompt_layout_only_en", "prompt_ocr"],
|
choices=[
|
||||||
|
"prompt_layout_all_en",
|
||||||
|
"prompt_web_parsing",
|
||||||
|
"prompt_scene_spotting",
|
||||||
|
"prompt_image_to_svg",
|
||||||
|
"prompt_general",
|
||||||
|
"prompt_layout_only_en",
|
||||||
|
"prompt_ocr",
|
||||||
|
],
|
||||||
value="prompt_layout_all_en",
|
value="prompt_layout_all_en",
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -560,8 +758,7 @@ def create_gradio_interface():
|
|||||||
value=dict_promptmode_to_prompt[list(dict_promptmode_to_prompt.keys())[0]],
|
value=dict_promptmode_to_prompt[list(dict_promptmode_to_prompt.keys())[0]],
|
||||||
lines=4,
|
lines=4,
|
||||||
max_lines=8,
|
max_lines=8,
|
||||||
interactive=False,
|
interactive=False, # 默认不可编辑,free_qa 模式下改为可编辑
|
||||||
show_copy_button=True
|
|
||||||
)
|
)
|
||||||
|
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
@@ -572,11 +769,9 @@ def create_gradio_interface():
|
|||||||
fitz_preprocess = gr.Checkbox(
|
fitz_preprocess = gr.Checkbox(
|
||||||
label="Enable fitz_preprocess for images",
|
label="Enable fitz_preprocess for images",
|
||||||
value=True,
|
value=True,
|
||||||
info="Processes image via a PDF-like pipeline (image->pdf->200dpi image). Recommended if your image DPI is low."
|
info="Processes image via a PDF-like pipeline (image->pdf->200dpi image). Recommended if your image DPI is low.",
|
||||||
|
visible=False, ###直接隐藏,调用模型前根据prompt mode 写死
|
||||||
)
|
)
|
||||||
with gr.Row():
|
|
||||||
server_ip = gr.Textbox(label="Server IP", value=DEFAULT_CONFIG['ip'])
|
|
||||||
server_port = gr.Number(label="Port", value=DEFAULT_CONFIG['port_vllm'], precision=0)
|
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
min_pixels = gr.Number(label="Min Pixels", value=DEFAULT_CONFIG['min_pixels'], precision=0)
|
min_pixels = gr.Number(label="Min Pixels", value=DEFAULT_CONFIG['min_pixels'], precision=0)
|
||||||
max_pixels = gr.Number(label="Max Pixels", value=DEFAULT_CONFIG['max_pixels'], precision=0)
|
max_pixels = gr.Number(label="Max Pixels", value=DEFAULT_CONFIG['max_pixels'], precision=0)
|
||||||
@@ -590,7 +785,7 @@ def create_gradio_interface():
|
|||||||
label="Layout Preview",
|
label="Layout Preview",
|
||||||
visible=True,
|
visible=True,
|
||||||
height=800,
|
height=800,
|
||||||
show_label=False
|
show_label=False,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Page navigation (shown during PDF preview)
|
# Page navigation (shown during PDF preview)
|
||||||
@@ -621,7 +816,6 @@ def create_gradio_interface():
|
|||||||
{"left": "$$", "right": "$$", "display": True},
|
{"left": "$$", "right": "$$", "display": True},
|
||||||
{"left": "$", "right": "$", "display": False}
|
{"left": "$", "right": "$", "display": False}
|
||||||
],
|
],
|
||||||
show_copy_button=False,
|
|
||||||
elem_id="markdown_output"
|
elem_id="markdown_output"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -631,7 +825,6 @@ def create_gradio_interface():
|
|||||||
label="Markdown Raw Text",
|
label="Markdown Raw Text",
|
||||||
max_lines=100,
|
max_lines=100,
|
||||||
lines=38,
|
lines=38,
|
||||||
show_copy_button=True,
|
|
||||||
elem_id="markdown_output",
|
elem_id="markdown_output",
|
||||||
show_label=False
|
show_label=False
|
||||||
)
|
)
|
||||||
@@ -642,7 +835,6 @@ def create_gradio_interface():
|
|||||||
label="Current Page JSON",
|
label="Current Page JSON",
|
||||||
max_lines=100,
|
max_lines=100,
|
||||||
lines=38,
|
lines=38,
|
||||||
show_copy_button=True,
|
|
||||||
elem_id="markdown_output",
|
elem_id="markdown_output",
|
||||||
show_label=False
|
show_label=False
|
||||||
)
|
)
|
||||||
@@ -654,11 +846,32 @@ def create_gradio_interface():
|
|||||||
visible=False
|
visible=False
|
||||||
)
|
)
|
||||||
|
|
||||||
# When the prompt mode changes, update the display content
|
def update_prompt_and_interactive(prompt_mode, session_state):
|
||||||
|
"""更新 prompt_display 并自动切换模型"""
|
||||||
|
is_free_qa = prompt_mode == 'prompt_general'
|
||||||
|
auto_custom_prompt = session_state.get('auto_custom_prompt')
|
||||||
|
|
||||||
|
if is_free_qa and auto_custom_prompt:
|
||||||
|
prompt_text = auto_custom_prompt
|
||||||
|
interactive = True
|
||||||
|
else:
|
||||||
|
prompt_text = update_prompt_display(prompt_mode)
|
||||||
|
interactive = is_free_qa
|
||||||
|
|
||||||
|
# 根据prompt_mode自动选择模型
|
||||||
|
auto_model = PROMPT_TO_MODEL.get(prompt_mode, list(MODEL_SERVERS.keys())[0])
|
||||||
|
|
||||||
|
return (
|
||||||
|
gr.update(value=prompt_text, interactive=interactive),
|
||||||
|
session_state,
|
||||||
|
gr.update(value=auto_model),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
prompt_mode.change(
|
prompt_mode.change(
|
||||||
fn=update_prompt_display,
|
fn=update_prompt_and_interactive,
|
||||||
inputs=prompt_mode,
|
inputs=[prompt_mode, session_state],
|
||||||
outputs=prompt_display,
|
outputs=[prompt_display, session_state, model_selector],
|
||||||
)
|
)
|
||||||
|
|
||||||
# Show preview on file upload
|
# Show preview on file upload
|
||||||
@@ -671,10 +884,9 @@ def create_gradio_interface():
|
|||||||
|
|
||||||
# Also handle test image selection
|
# Also handle test image selection
|
||||||
test_image_input.change(
|
test_image_input.change(
|
||||||
# fn=lambda path, state: load_file_for_preview(path, state),
|
fn=on_test_image_select,
|
||||||
fn=load_file_for_preview,
|
|
||||||
inputs=[test_image_input, session_state],
|
inputs=[test_image_input, session_state],
|
||||||
outputs=[result_image, page_info, session_state]
|
outputs=[result_image, page_info, session_state, prompt_mode, prompt_display, model_selector],
|
||||||
)
|
)
|
||||||
|
|
||||||
prev_btn.click(
|
prev_btn.click(
|
||||||
@@ -689,12 +901,14 @@ def create_gradio_interface():
|
|||||||
outputs=[result_image, page_info, current_page_json, session_state]
|
outputs=[result_image, page_info, current_page_json, session_state]
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# ============ MODIFIED: process_btn.click with model_selector ============
|
||||||
process_btn.click(
|
process_btn.click(
|
||||||
fn=process_image_inference,
|
fn=process_image_inference,
|
||||||
inputs=[
|
inputs=[
|
||||||
session_state, test_image_input, file_input,
|
session_state, test_image_input, file_input,
|
||||||
prompt_mode, server_ip, server_port, min_pixels, max_pixels,
|
prompt_mode, model_selector, # Changed: model_selector instead of server_ip/port
|
||||||
fitz_preprocess
|
min_pixels, max_pixels,
|
||||||
|
fitz_preprocess, prompt_display
|
||||||
],
|
],
|
||||||
outputs=[
|
outputs=[
|
||||||
result_image, info_display, md_output, md_raw_output,
|
result_image, info_display, md_output, md_raw_output,
|
||||||
|
|||||||
@@ -53,13 +53,14 @@ def inference(image_path, prompt, model, processor):
|
|||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
|
# We recommend enabling flash_attention_2 or flash_attention_3 for better acceleration and memory saving, especially in multi-image and video scenarios.
|
||||||
model_path = "./weights/DotsOCR"
|
model_path = "./weights/DotsMOCR"
|
||||||
model = AutoModelForCausalLM.from_pretrained(
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
model_path,
|
model_path,
|
||||||
attn_implementation="flash_attention_2",
|
attn_implementation="flash_attention_2",
|
||||||
torch_dtype=torch.bfloat16,
|
torch_dtype=torch.bfloat16,
|
||||||
device_map="auto",
|
device_map="auto",
|
||||||
|
# device_map="cpu", # ve里默认使用flash-attn,无法直接运行
|
||||||
trust_remote_code=True
|
trust_remote_code=True
|
||||||
)
|
)
|
||||||
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ from dots_ocr.model.inference import inference_with_vllm
|
|||||||
parser = argparse.ArgumentParser()
|
parser = argparse.ArgumentParser()
|
||||||
parser.add_argument("--ip", type=str, default="localhost")
|
parser.add_argument("--ip", type=str, default="localhost")
|
||||||
parser.add_argument("--port", type=str, default="8000")
|
parser.add_argument("--port", type=str, default="8000")
|
||||||
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.ocr-1.5")
|
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.mocr")
|
||||||
parser.add_argument("--image_path", type=str, default="demo/demo_image1.jpg")
|
parser.add_argument("--image_path", type=str, default="demo/demo_image1.jpg")
|
||||||
parser.add_argument("--prompt_mode", type=str, default="prompt_layout_all_en",help=(
|
parser.add_argument("--prompt_mode", type=str, default="prompt_layout_all_en",help=(
|
||||||
"Choose a task prompt: "
|
"Choose a task prompt: "
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ from dots_ocr.model.inference import inference_with_vllm
|
|||||||
parser = argparse.ArgumentParser()
|
parser = argparse.ArgumentParser()
|
||||||
parser.add_argument("--ip", type=str, default="localhost")
|
parser.add_argument("--ip", type=str, default="localhost")
|
||||||
parser.add_argument("--port", type=str, default="8000")
|
parser.add_argument("--port", type=str, default="8000")
|
||||||
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.ocr-1.5")
|
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.mocr")
|
||||||
parser.add_argument("--custom_prompt", type=str, default="Please describe the content of this image.")
|
parser.add_argument("--custom_prompt", type=str, default="Please describe the content of this image.")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ from dots_ocr.model.inference import inference_with_vllm
|
|||||||
parser = argparse.ArgumentParser()
|
parser = argparse.ArgumentParser()
|
||||||
parser.add_argument("--ip", type=str, default="localhost")
|
parser.add_argument("--ip", type=str, default="localhost")
|
||||||
parser.add_argument("--port", type=str, default="8000")
|
parser.add_argument("--port", type=str, default="8000")
|
||||||
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.ocr-1.5-svg")
|
parser.add_argument("--model_name", type=str, default="rednote-hilab/dots.mocr")
|
||||||
parser.add_argument("--prompt_mode", type=str, default="prompt_image_to_svg")
|
parser.add_argument("--prompt_mode", type=str, default="prompt_image_to_svg")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -1,17 +1 @@
|
|||||||
# download model to /path/to/model
|
CUDA_VISIBLE_DEVICES=0 nohup vllm serve dots.mocr --tensor-parallel-size 1 --gpu-memory-utilization 0.9 --chat-template-content-format string --served-model-name ${model_name} --trust-remote-code
|
||||||
if [ -z "$NODOWNLOAD" ]; then
|
|
||||||
python3 tools/download_model.py
|
|
||||||
fi
|
|
||||||
|
|
||||||
# register model to vllm
|
|
||||||
hf_model_path=./weights/DotsOCR # Path to your downloaded model weights
|
|
||||||
export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
|
|
||||||
sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
|
|
||||||
from DotsOCR import modeling_dots_ocr_vllm' `which vllm`
|
|
||||||
|
|
||||||
# launch vllm server
|
|
||||||
model_name=model
|
|
||||||
CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name ${model_name} --trust-remote-code
|
|
||||||
|
|
||||||
# # run python demo after launch vllm server
|
|
||||||
# python demo/demo_vllm.py
|
|
||||||
@@ -1,12 +1,12 @@
|
|||||||
dots.ocr LICENSE AGREEMENT
|
dots.mocr LICENSE AGREEMENT
|
||||||
|
|
||||||
Effective Date: [ August 8, 2025]
|
Effective Date: [ August 8, 2025]
|
||||||
|
|
||||||
Copyright Holder: [Xingyin Information Technology (Shanghai) Co., Ltd]
|
Copyright Holder: [Xingyin Information Technology (Shanghai) Co., Ltd]
|
||||||
|
|
||||||
This License Agreement (“Agreement”) governs Your use, reproduction, modification, and distribution of dots.ocr (the "Model Materials"). This Agreement is designed to maximize the openness and use of the Model Materials while addressing the unique legal, ethical, and technical challenges posed by large language models.
|
This License Agreement (“Agreement”) governs Your use, reproduction, modification, and distribution of dots.mocr (the "Model Materials"). This Agreement is designed to maximize the openness and use of the Model Materials while addressing the unique legal, ethical, and technical challenges posed by large language models.
|
||||||
|
|
||||||
WHEREAS, Licensor has developed the dots.ocr document parsing model and intends to distribute the Model Materials under an open‑source framework;
|
WHEREAS, Licensor has developed the dots.mocr document parsing model and intends to distribute the Model Materials under an open‑source framework;
|
||||||
WHEREAS, traditional open-source licenses (e.g., the MIT License) may not fully address the complexity inherent complexities of document parsing models, namely their multiple components (code, weights, training data), potential ethical risks, data‑governance issues, and intellectual‑property and liability questions regarding AI‑generated content;
|
WHEREAS, traditional open-source licenses (e.g., the MIT License) may not fully address the complexity inherent complexities of document parsing models, namely their multiple components (code, weights, training data), potential ethical risks, data‑governance issues, and intellectual‑property and liability questions regarding AI‑generated content;
|
||||||
WHEREAS, Licensor seeks to provide a legal framework that ensures maximum access to and use of the Model Materials while clearly defining the rights, obligations, and liabilities of Licensee;
|
WHEREAS, Licensor seeks to provide a legal framework that ensures maximum access to and use of the Model Materials while clearly defining the rights, obligations, and liabilities of Licensee;
|
||||||
|
|
||||||
@@ -24,7 +24,7 @@ Purpose: To define key terms used in this Agreement, particularly "Model Materia
|
|||||||
(b) all associated preprocessing, training, inference, and fine‑tuning code;
|
(b) all associated preprocessing, training, inference, and fine‑tuning code;
|
||||||
(c) training datasets and evaluation scripts (or their detailed descriptions and access mechanisms); and
|
(c) training datasets and evaluation scripts (or their detailed descriptions and access mechanisms); and
|
||||||
(d) any accompanying documentation, metadata, and tools.
|
(d) any accompanying documentation, metadata, and tools.
|
||||||
The above Model Materials shall be subject to the content published on the Licensor’s website or GitHub repository at https://github.com/rednote-hilab/dots.ocr.
|
The above Model Materials shall be subject to the content published on the Licensor’s website or GitHub repository at https://github.com/rednote-hilab/dots.mocr.
|
||||||
|
|
||||||
1.4 “Outputs” shall mean any content generated through the use of the Model Materials, such as text, tables, code,layout information, and formulas extracted from documents.
|
1.4 “Outputs” shall mean any content generated through the use of the Model Materials, such as text, tables, code,layout information, and formulas extracted from documents.
|
||||||
|
|
||||||
@@ -63,7 +63,7 @@ Purpose: To grant broad, permissive rights to the Licensee for the Model Materia
|
|||||||
If Licensee institutes patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model Materials constitute direct or contributory patent infringement, then any patent licenses granted under this License for the Model Materials shall terminate as of the date such litigation is asserted or filed.
|
If Licensee institutes patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model Materials constitute direct or contributory patent infringement, then any patent licenses granted under this License for the Model Materials shall terminate as of the date such litigation is asserted or filed.
|
||||||
|
|
||||||
4.3 Outputs: The Outputs generated through the use of the Model Materials generally refer to text, tables, layouts, and other content extracted from documents or images. The extracted content itself does not generate new intellectual property rights, and all intellectual property remains with the original authors or copyright holders. The Licensee is responsible for due diligence regarding the legality of the Outputs, particularly where the content extracted by the OCR model may be substantially similar to existing copyrighted works, which could present intellectual property infringement risks. The Licensor assumes no liability for such infringements.
|
4.3 Outputs: The Outputs generated through the use of the Model Materials generally refer to text, tables, layouts, and other content extracted from documents or images. The extracted content itself does not generate new intellectual property rights, and all intellectual property remains with the original authors or copyright holders. The Licensee is responsible for due diligence regarding the legality of the Outputs, particularly where the content extracted by the OCR model may be substantially similar to existing copyrighted works, which could present intellectual property infringement risks. The Licensor assumes no liability for such infringements.
|
||||||
4.4 Trademarks. Nothing in this License permits Licensee to make use of Licensor’s trademarks, trade names, logos (e.g., “rednote,” “Xiaohongshu,” “dots.ocr”) or to otherwise suggest endorsement or misrepresent the relationship between the parties, unless Licensor’s prior written approval is granted.
|
4.4 Trademarks. Nothing in this License permits Licensee to make use of Licensor’s trademarks, trade names, logos (e.g., “rednote,” “Xiaohongshu,” “dots.mocr”) or to otherwise suggest endorsement or misrepresent the relationship between the parties, unless Licensor’s prior written approval is granted.
|
||||||
|
|
||||||
5. Data Governance, Privacy, and Security
|
5. Data Governance, Privacy, and Security
|
||||||
|
|
||||||
@@ -94,7 +94,7 @@ If Licensee institutes patent litigation against any entity (including a cross-c
|
|||||||
|
|
||||||
7.2 Copyright and Notices. When distributing any part of the Model Materials, Licensee must retain all copyright, patent, trademark, and attribution notices included in the Model Materials.
|
7.2 Copyright and Notices. When distributing any part of the Model Materials, Licensee must retain all copyright, patent, trademark, and attribution notices included in the Model Materials.
|
||||||
|
|
||||||
7.3 Attribution. Licensee is encouraged to prominently display the name of Licensor and the Model Materials in any public statements, products, or services that contain the Model Materials (or any derivative works thereof), to promote transparency and community trust. If Licensee distributes modified weights or fine‑tuned models based on the Model Materials, Licensee must prominently display the following statement in the related website or documentation: “Built with dots.ocr.”
|
7.3 Attribution. Licensee is encouraged to prominently display the name of Licensor and the Model Materials in any public statements, products, or services that contain the Model Materials (or any derivative works thereof), to promote transparency and community trust. If Licensee distributes modified weights or fine‑tuned models based on the Model Materials, Licensee must prominently display the following statement in the related website or documentation: “Built with dots.mocr.”
|
||||||
|
|
||||||
8. Governing Law and Dispute Resolution
|
8. Governing Law and Dispute Resolution
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ def inference_with_vllm(
|
|||||||
temperature=0.1,
|
temperature=0.1,
|
||||||
top_p=0.9,
|
top_p=0.9,
|
||||||
max_completion_tokens=32768,
|
max_completion_tokens=32768,
|
||||||
model_name='rednote-hilab/dots.ocr',
|
model_name='rednote-hilab/dots.mocr',
|
||||||
system_prompt=None,
|
system_prompt=None,
|
||||||
):
|
):
|
||||||
|
|
||||||
|
|||||||
@@ -5,11 +5,11 @@ import os
|
|||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
parser = ArgumentParser()
|
parser = ArgumentParser()
|
||||||
parser.add_argument('--type', '-t', type=str, default="huggingface")
|
parser.add_argument('--type', '-t', type=str, default="huggingface")
|
||||||
parser.add_argument('--name', '-n', type=str, default="rednote-hilab/dots.ocr-1.5")
|
parser.add_argument('--name', '-n', type=str, default="rednote-hilab/dots.mocr")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
script_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
script_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
print(f"Attention: The model save dir dots.ocr should be replace by a name without `.` like DotsOCR, util we merge our code to transformers.")
|
print(f"Attention: The model save dir dots.mocr should be replace by a name without `.` like DotsMOCR, util we merge our code to transformers.")
|
||||||
model_dir = os.path.join(script_dir, "weights/DotsOCR_1_5")
|
model_dir = os.path.join(script_dir, "weights/DotsMOCR")
|
||||||
if not os.path.exists(model_dir):
|
if not os.path.exists(model_dir):
|
||||||
os.makedirs(model_dir)
|
os.makedirs(model_dir)
|
||||||
if args.type == "huggingface":
|
if args.type == "huggingface":
|
||||||
|
|||||||