|
32 | 32 | </div> |
33 | 33 |
|
34 | 34 | ## 🔥 News |
| 35 | +- **[2026/04/08]** 🎉 Our works on document parsing and text-image machine translation have been accepted to the CVPR 2026 Main Conference! Check out the papers: [Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training](https://arxiv.org/abs/2603.23885) and [MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation](https://arxiv.org/abs/2603.23896). |
35 | 36 | - **[2026/01/13]** ⭐ We have released a stable official [online demo](https://hunyuan.tencent.com/chat/HunyuanDefault?modelId=HY-OCR-1.0&mid=308&from=vision-zh), feel free to try it out! |
36 | 37 | - **[2025/11/28]** 🛠️ We fixed vLLM inference bugs and hyperparameter configuration issues such as system prompt. It is recommended to use the latest vLLM installation steps and the [inference script](https://github.com/Tencent-Hunyuan/HunyuanOCR/blob/main/Hunyuan-OCR-master/Hunyuan-OCR-vllm/run_hy_ocr.py) for performance testing. Currently, there is still a certain accuracy difference between Transformers and the vLLM framework (we are working on fixing this). |
37 | 38 | - **[2025/11/25]** 📝 Inference code and model weights publicly available. |
@@ -393,6 +394,22 @@ Our model is able to translate images of minor languages taken into Chines |
393 | 394 | journal={arXiv preprint arXiv:2511.19575}, |
394 | 395 | url={https://arxiv.org/abs/2511.19575}, |
395 | 396 | } |
| 397 | +
|
| 398 | +@misc{li2026mmtitbench, |
| 399 | + title={MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation}, |
| 400 | + author={Gengluo Li and Chengquan Zhang and Yupu Liang and Huawen Shen and Yaping Zhang and Pengyuan Lyu and Weinong Wang and Xingyu Wan and Gangyan Zeng and Han Hu and Can Ma and Yu Zhou}, |
| 401 | + year={2026}, |
| 402 | + journal={arXiv preprint arXiv:2603.23896}, |
| 403 | + url={https://arxiv.org/abs/2603.23896}, |
| 404 | +} |
| 405 | +
|
| 406 | +@misc{li2026towardsrealworlddocument, |
| 407 | + title={Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training}, |
| 408 | + author={Gengluo Li and Pengyuan Lyu and Chengquan Zhang and Huawen Shen and Liang Wu and Xingyu Wan and Gangyan Zeng and Han Hu and Can Ma and Yu Zhou}, |
| 409 | + year={2026}, |
| 410 | + journal={arXiv preprint arXiv:2603.23885}, |
| 411 | + url={https://arxiv.org/abs/2603.23885}, |
| 412 | +} |
396 | 413 | ``` |
397 | 414 |
|
398 | 415 | ## 🙏 Acknowledgements |
|
0 commit comments