Hello,
Thank you for open-sourcing HunyuanOCR repository. This is a great project.
I am working on invoice document processing and would like to fine-tune HunyuanOCR not only for OCR text recognition, but also as an information extraction model that returns text along with bounding box (BBox) coordinates.
From the repository, I understand that training data is provided in train.jsonl / test.jsonl format with special placeholder tokens (e.g. <hy_place_holder_no_112>, <hy_place_holder_no_110>) to encode text and bounding boxes.
Could you please confirm:
Whether HunyuanOCR can be fine-tuned to output OCR results with bounding boxes for downstream information extraction tasks (such as invoices or forms)?
If yes, could you kindly share a minimal training sample (one image + corresponding train.jsonl entry) that demonstrates the recommended format for text + BBox supervision?
This would help us ensure that we are following the correct data format and training approach.
Thank you very much for your time and support.
Hello,
Thank you for open-sourcing HunyuanOCR repository. This is a great project.
I am working on invoice document processing and would like to fine-tune HunyuanOCR not only for OCR text recognition, but also as an information extraction model that returns text along with bounding box (BBox) coordinates.
From the repository, I understand that training data is provided in train.jsonl / test.jsonl format with special placeholder tokens (e.g. <hy_place_holder_no_112>, <hy_place_holder_no_110>) to encode text and bounding boxes.
Could you please confirm:
Whether HunyuanOCR can be fine-tuned to output OCR results with bounding boxes for downstream information extraction tasks (such as invoices or forms)?
If yes, could you kindly share a minimal training sample (one image + corresponding train.jsonl entry) that demonstrates the recommended format for text + BBox supervision?
This would help us ensure that we are following the correct data format and training approach.
Thank you very much for your time and support.