Question: Training HunyuanOCR for Information Extraction with BBox output

Hello,

Thank you for open-sourcing HunyuanOCR repository. This is a great project.

I am working on invoice document processing and would like to fine-tune HunyuanOCR not only for OCR text recognition, but also as an information extraction model that returns text along with bounding box (BBox) coordinates.

From the repository, I understand that training data is provided in train.jsonl / test.jsonl format with special placeholder tokens (e.g. <hy_place_holder_no_112>, <hy_place_holder_no_110>) to encode text and bounding boxes.

Could you please confirm:

Whether HunyuanOCR can be fine-tuned to output OCR results with bounding boxes for downstream information extraction tasks (such as invoices or forms)?

If yes, could you kindly share a minimal training sample (one image + corresponding train.jsonl entry) that demonstrates the recommended format for text + BBox supervision?

This would help us ensure that we are following the correct data format and training approach.

Thank you very much for your time and support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Training HunyuanOCR for Information Extraction with BBox output #96

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Training HunyuanOCR for Information Extraction with BBox output #96

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions