#12 [enhance] ICDAR 테이블 파싱 TEDS 병렬 계산 최적화 by ryangkyung-thaki · Pull Request #13 · ThakiCloud/lmms-eval

ryangkyung-thaki · 2026-03-05T12:15:11Z

Issue?

#12

Changes?

process_results에서 개별 TEDS/OCR 점수 계산을 제거하고, aggregation 단계에서 전체 샘플을 일괄 병렬 처리로 변경
ProcessPoolExecutor + spawn 컨텍스트를 사용하여 vLLM 등 GPU 프로세스의 CUDA fork 문제 방지
picklable worker 함수(compute_teds_full, compute_teds_struct, compute_ocr)를 teds.py 모듈에 분리 정의
YAML 설정에서 aggregation: mean을 커스텀 aggregation 함수(teds_aggregate, teds_struct_aggregate, ocr_aggregate)로 교체
TEDS_NUM_WORKERS 환경변수로 워커 수 조절 가능 (기본값: CPU 코어 수, 최대 64)

Why we need?

TEDS 계산은 Tree Edit Distance 기반의 CPU-intensive 작업으로, 순차 실행 시 대규모 데이터셋에서 심각한 병목 발생
vLLM 등 GPU 모델 프로세스에서 fork 시 CUDA 충돌이 발생하므로, spawn 컨텍스트를 통한 안전한 멀티프로세싱 필요

Test?

python -m lmms_eval --model qwen2_5_vl --tasks icdar_table_parsing --limit 8 --batch_size 1로 정상 동작 확인

CC (Optional)

Anything else? (Optional)

- Defer TEDS/OCR scoring from process_results to aggregation stage - Use ProcessPoolExecutor with spawn context to avoid CUDA fork issues - Add picklable worker functions in teds.py for multiprocessing - Replace YAML aggregation: mean with custom parallel aggregation functions - Support TEDS_NUM_WORKERS env var for worker count control

coderabbitai · 2026-03-05T12:15:35Z

Summary by CodeRabbit

릴리스 노트

최적화
- 테이블 파싱 평가의 메트릭 계산(TEDS, OCR) 성능이 병렬 처리를 통해 개선되었습니다.
- 평가 중 오류 처리가 강화되어 더 안정적인 실행을 보장합니다.
- 멀티프로세싱을 활용한 효율적인 리소스 활용으로 평가 시간이 단축됩니다.

개요

ICDAR 테이블 파싱 작업의 메트릭 집계를 순차 처리에서 멀티프로세싱 기반 병렬 처리로 리팩토링했습니다. YAML 설정에서 정적 "mean" 값을 동적 함수 참조로 변경하고, 병렬 워커 함수 및 집계 함수를 추가하여 평가 계산을 지연시킵니다.

변경 사항

집단 / 파일(들)	요약
YAML 설정 `lmms_eval/tasks/icdar_table_parsing/icdar_table_parsing.yaml`	세 개 메트릭(teds, teds_struct, ocr)의 집계 방식을 정적 "mean"에서 동적 함수 참조(`!function utils.teds_aggregate`, `!function utils.teds_struct_aggregate`, `!function utils.ocr_aggregate`)로 변경
병렬 워커 함수 `lmms_eval/tasks/icdar_table_parsing/teds.py`	TEDS 및 OCR 메트릭용 병렬 처리 워커 함수 3개 추가: `compute_teds_full()`, `compute_teds_struct()`, `compute_ocr()`. 각 함수는 예외 처리를 포함하고 안전한 기본값 반환
집계 및 병렬화 구현 `lmms_eval/tasks/icdar_table_parsing/utils.py`	멀티프로세싱 기반 병렬 계산을 위한 `_parallel_compute()` 헬퍼 추가. 메트릭 계산을 결과 처리 단계에서 집계 단계로 이동. `teds_aggregate()`, `teds_struct_aggregate()`, `ocr_aggregate()` 공개 함수 추가. 워커 개수를 환경 변수 `TEDS_NUM_WORKERS`로 제어

시퀀스 다이어그램

sequenceDiagram
    participant Config as YAML Configuration
    participant Process as Results Processing
    participant Agg as Aggregation Functions
    participant Worker as Parallel Workers
    participant Metric as Metric Engines<br/>(TEDS, OCR)

    Config->>Process: References aggregation functions
    Process->>Agg: Defers metric computation,<br/>returns data tuples
    Agg->>Agg: _parallel_compute()
    Agg->>Worker: Spawn worker processes<br/>with compute_* functions
    Worker->>Metric: Execute TEDS/OCR evaluation
    Metric-->>Worker: Metric scores
    Worker-->>Agg: Results with error handling
    Agg-->>Agg: Collect & compute mean
    Agg-->>Config: Return aggregated score

예상 코드 리뷰 노력

🎯 3 (중간 난이도) | ⏱️ ~25분

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	PR 설명이 변경사항, 필요성, 테스트 내용을 포함하고 있지만, 저장소의 템플릿 구조를 따르지 않고 있습니다.	저장소의 표준 템플릿을 따라 Description, Type of Change, Changes Made, Testing 섹션을 명시적으로 구조화하여 작성하시기 바랍니다.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	PR 제목이 주요 변경사항(TEDS 병렬 계산 최적화)을 명확하게 요약하고 있으며, 간결하고 구체적입니다.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch issue/#12-icdar-teds-parallel

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lmms_eval/tasks/icdar_table_parsing/teds.py`:
- Around line 217-218: The tuple unpacking (e.g., pred_html, gt_html, _) is done
outside the try block in compute_teds_full (and the sibling functions
compute_teds_structure and compute_teds_content), so malformed inputs raise
before the exception handler and a worker can't fallback to the 0.0 return; move
the unpacking into the try block and catch Exception around it, and on any
exception return 0.0 to preserve the worker fallback path while keeping the
existing scoring logic unchanged.

In `@lmms_eval/tasks/icdar_table_parsing/utils.py`:
- Line 18: The module-level _NUM_WORKERS assignment currently calls
int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64))) which
raises ValueError for non-integer env values (e.g., "auto") and bypasses the
intended 64 max; change it to read TEDS_NUM_WORKERS into a variable, try to
parse it to int in a try/except (or validate numeric and positive), and on
failure fall back to default = min(os.cpu_count() or 1, 64); finally clamp the
resulting number to a minimum of 1 and a maximum of 64 and assign that to
_NUM_WORKERS (referencing the TEDS_NUM_WORKERS env var and the _NUM_WORKERS
symbol).
- Line 8: Replace the direct Loguru import ("from loguru import logger") with
the project standard logger by importing eval_logger from lmms_eval.utils and
aliasing it to logger (e.g., from lmms_eval.utils import eval_logger as logger),
then update all existing logger.* calls in this module (the places currently
using logger at the top-level import and the logger.info/error/debug calls
throughout the file) to use that eval_logger alias so the module uses the
centralized eval_logger instead of loguru directly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8aff134-d3ce-4bf1-8085-2f94ed5375b8

📥 Commits

Reviewing files that changed from the base of the PR and between 4c83ef9 and d8a8dc0.

📒 Files selected for processing (3)

lmms_eval/tasks/icdar_table_parsing/icdar_table_parsing.yaml
lmms_eval/tasks/icdar_table_parsing/teds.py
lmms_eval/tasks/icdar_table_parsing/utils.py

coderabbitai · 2026-03-05T12:26:58Z

lmms_eval/tasks/icdar_table_parsing/teds.py

+def compute_teds_full(args: tuple) -> float:
+    pred_html, gt_html, _ = args


⚠️ Potential issue | 🟡 Minor

인자 언패킹을 try 내부로 옮겨 워커 fallback을 보장해 주세요

Line 218, Line 229, Line 240에서 언패킹이 try 밖에 있어 입력 형식이 깨지면 0.0 반환 경로를 타지 못합니다. 단일 워커 경로에서는 집계가 중단될 수 있습니다.

제안 패치

-def compute_teds_full(args: tuple) -> float: - pred_html, gt_html, _ = args - try: +def compute_teds_full(args: tuple) -> float: + try: + pred_html, gt_html, _ = args return TEDS(structure_only=False).evaluate(pred_html, gt_html) except Exception: import traceback traceback.print_exc() return 0.0 def compute_teds_struct(args: tuple) -> float: - pred_html, gt_html, _ = args try: + pred_html, gt_html, _ = args return TEDS(structure_only=True).evaluate(pred_html, gt_html) except Exception: import traceback traceback.print_exc() return 0.0 def compute_ocr(args: tuple) -> float: - pred_html, gt_html, _ = args try: + pred_html, gt_html, _ = args return OCRMetric().evaluate(pred_html, gt_html) except Exception: import traceback traceback.print_exc() return 0.0

Also applies to: 228-229, 239-240

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lmms_eval/tasks/icdar_table_parsing/teds.py` around lines 217 - 218, The tuple unpacking (e.g., pred_html, gt_html, _) is done outside the try block in compute_teds_full (and the sibling functions compute_teds_structure and compute_teds_content), so malformed inputs raise before the exception handler and a worker can't fallback to the 0.0 return; move the unpacking into the try block and catch Exception around it, and on any exception return 0.0 to preserve the worker fallback path while keeping the existing scoring logic unchanged.

coderabbitai · 2026-03-05T12:26:59Z

lmms_eval/tasks/icdar_table_parsing/utils.py


-from lmms_eval.tasks._task_utils.file_utils import generate_submission_file
-from lmms_eval.tasks.icdar_table_parsing.teds import TEDS, OCRMetric
+from loguru import logger


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# 파일 존재 확인 및 내용 확인 find . -name "utils.py" -path "*/icdar_table_parsing/*" -type f

Repository: ThakiCloud/lmms-eval

Length of output: 110

🏁 Script executed:

# 파일 내용 전체 확인 (라인 수 먼저 확인) if [ -f "lmms_eval/tasks/icdar_table_parsing/utils.py" ]; then wc -l "lmms_eval/tasks/icdar_table_parsing/utils.py" fi

Repository: ThakiCloud/lmms-eval

Length of output: 112

🏁 Script executed:

# eval_logger 정의 확인 if [ -f "lmms_eval/utils.py" ]; then rg -n "eval_logger" "lmms_eval/utils.py" | head -20 fi

Repository: ThakiCloud/lmms-eval

Length of output: 308

🏁 Script executed:

# 저장소 내 eval_logger 사용 패턴 확인 rg -n "from lmms_eval\.utils import eval_logger" --type=py | head -20

Repository: ThakiCloud/lmms-eval

Length of output: 301

🏁 Script executed:

# icdar_table_parsing/utils.py 파일 내용 확인 cat -n "lmms_eval/tasks/icdar_table_parsing/utils.py" | head -60

Repository: ThakiCloud/lmms-eval

Length of output: 2543

🏁 Script executed:

# 로깅 사용 부분 확인 (라인 40-50, 140-170) cat -n "lmms_eval/tasks/icdar_table_parsing/utils.py" | sed -n '1,20p; 35,50p; 140,166p'

Repository: ThakiCloud/lmms-eval

Length of output: 3140

프로젝트 표준 eval_logger로 로깅 통합

loguru를 직접 import하지 말고, lmms_eval.utils에서 제공하는 eval_logger를 사용해 주세요. 라인 8, 43, 148, 156, 164에서 로깅 코드를 수정해야 합니다.

제안 패치

-from loguru import logger +from lmms_eval.utils import eval_logger @@ - logger.warning("Worker failed for item {}: {}", idx, e) + eval_logger.warning(f"Worker failed for item {idx}: {e}") @@ - logger.info("Computing TEDS for {} samples with {} workers", len(items), workers) + eval_logger.info(f"Computing TEDS for {len(items)} samples with {workers} workers") @@ - logger.info("Computing TEDS-struct for {} samples with {} workers", len(items), workers) + eval_logger.info(f"Computing TEDS-struct for {len(items)} samples with {workers} workers") @@ - logger.info("Computing OCR metric for {} samples with {} workers", len(items), workers) + eval_logger.info(f"Computing OCR metric for {len(items)} samples with {workers} workers")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lmms_eval/tasks/icdar_table_parsing/utils.py` at line 8, Replace the direct Loguru import ("from loguru import logger") with the project standard logger by importing eval_logger from lmms_eval.utils and aliasing it to logger (e.g., from lmms_eval.utils import eval_logger as logger), then update all existing logger.* calls in this module (the places currently using logger at the top-level import and the logger.info/error/debug calls throughout the file) to use that eval_logger alias so the module uses the centralized eval_logger instead of loguru directly.

coderabbitai · 2026-03-05T12:26:59Z

lmms_eval/tasks/icdar_table_parsing/utils.py


 TABLE_PARSING_PROMPT = "Convert the table in the image to HTML format. Output only the HTML table code, starting with <table> and ending with </table>. Do not include any explanation."

+_NUM_WORKERS = int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64)))


⚠️ Potential issue | 🟠 Major

TEDS_NUM_WORKERS 파싱이 import 실패와 상한 우회를 유발할 수 있습니다

Line 18은 비정상 값(예: TEDS_NUM_WORKERS=auto)에서 ValueError로 모듈 import가 실패하고, 환경변수 값에는 최대 64 제한도 적용되지 않습니다. PR 목표(최대 64)와 동작이 어긋납니다.

제안 패치

-_NUM_WORKERS = int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64))) +def _resolve_num_workers() -> int: + default_workers = min(os.cpu_count() or 1, 64) + raw = os.environ.get("TEDS_NUM_WORKERS") + if raw is None: + return default_workers + try: + parsed = int(raw) + except ValueError: + return default_workers + return max(1, min(parsed, 64)) + + +_NUM_WORKERS = _resolve_num_workers()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lmms_eval/tasks/icdar_table_parsing/utils.py` at line 18, The module-level _NUM_WORKERS assignment currently calls int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64))) which raises ValueError for non-integer env values (e.g., "auto") and bypasses the intended 64 max; change it to read TEDS_NUM_WORKERS into a variable, try to parse it to int in a try/except (or validate numeric and positive), and on failure fall back to default = min(os.cpu_count() or 1, 64); finally clamp the resulting number to a minimum of 1 and a maximum of 64 and assign that to _NUM_WORKERS (referencing the TEDS_NUM_WORKERS env var and the _NUM_WORKERS symbol).

ryangkyung-thaki self-assigned this Mar 5, 2026

style: auto-fix lint (black + isort)

d8a8dc0

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#12 [enhance] ICDAR 테이블 파싱 TEDS 병렬 계산 최적화#13

#12 [enhance] ICDAR 테이블 파싱 TEDS 병렬 계산 최적화#13
ryangkyung-thaki wants to merge 2 commits intomainfrom
issue/#12-icdar-teds-parallel

ryangkyung-thaki commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		def compute_teds_full(args: tuple) -> float:
		pred_html, gt_html, _ = args


		TABLE_PARSING_PROMPT = "Convert the table in the image to HTML format. Output only the HTML table code, starting with <table> and ending with </table>. Do not include any explanation."

		_NUM_WORKERS = int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64)))

Conversation

ryangkyung-thaki commented Mar 5, 2026

Issue?

Changes?

Why we need?

Test?

CC (Optional)

Anything else? (Optional)

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

릴리스 노트

개요

변경 사항

시퀀스 다이어그램

예상 코드 리뷰 노력

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 5, 2026 •

edited

Loading