Skip to content

#12 [enhance] ICDAR 테이블 파싱 TEDS 병렬 계산 최적화#13

Open
ryangkyung-thaki wants to merge 2 commits intomainfrom
issue/#12-icdar-teds-parallel
Open

#12 [enhance] ICDAR 테이블 파싱 TEDS 병렬 계산 최적화#13
ryangkyung-thaki wants to merge 2 commits intomainfrom
issue/#12-icdar-teds-parallel

Conversation

@ryangkyung-thaki
Copy link
Copy Markdown
Collaborator

Issue?

#12

Changes?

  • process_results에서 개별 TEDS/OCR 점수 계산을 제거하고, aggregation 단계에서 전체 샘플을 일괄 병렬 처리로 변경
  • ProcessPoolExecutor + spawn 컨텍스트를 사용하여 vLLM 등 GPU 프로세스의 CUDA fork 문제 방지
  • picklable worker 함수(compute_teds_full, compute_teds_struct, compute_ocr)를 teds.py 모듈에 분리 정의
  • YAML 설정에서 aggregation: mean을 커스텀 aggregation 함수(teds_aggregate, teds_struct_aggregate, ocr_aggregate)로 교체
  • TEDS_NUM_WORKERS 환경변수로 워커 수 조절 가능 (기본값: CPU 코어 수, 최대 64)

Why we need?

  • TEDS 계산은 Tree Edit Distance 기반의 CPU-intensive 작업으로, 순차 실행 시 대규모 데이터셋에서 심각한 병목 발생
  • vLLM 등 GPU 모델 프로세스에서 fork 시 CUDA 충돌이 발생하므로, spawn 컨텍스트를 통한 안전한 멀티프로세싱 필요

Test?

  • python -m lmms_eval --model qwen2_5_vl --tasks icdar_table_parsing --limit 8 --batch_size 1로 정상 동작 확인

CC (Optional)

Anything else? (Optional)

- Defer TEDS/OCR scoring from process_results to aggregation stage
- Use ProcessPoolExecutor with spawn context to avoid CUDA fork issues
- Add picklable worker functions in teds.py for multiprocessing
- Replace YAML aggregation: mean with custom parallel aggregation functions
- Support TEDS_NUM_WORKERS env var for worker count control
@ryangkyung-thaki ryangkyung-thaki self-assigned this Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

Summary by CodeRabbit

릴리스 노트

  • 최적화
    • 테이블 파싱 평가의 메트릭 계산(TEDS, OCR) 성능이 병렬 처리를 통해 개선되었습니다.
    • 평가 중 오류 처리가 강화되어 더 안정적인 실행을 보장합니다.
    • 멀티프로세싱을 활용한 효율적인 리소스 활용으로 평가 시간이 단축됩니다.

개요

ICDAR 테이블 파싱 작업의 메트릭 집계를 순차 처리에서 멀티프로세싱 기반 병렬 처리로 리팩토링했습니다. YAML 설정에서 정적 "mean" 값을 동적 함수 참조로 변경하고, 병렬 워커 함수 및 집계 함수를 추가하여 평가 계산을 지연시킵니다.

변경 사항

집단 / 파일(들) 요약
YAML 설정
lmms_eval/tasks/icdar_table_parsing/icdar_table_parsing.yaml
세 개 메트릭(teds, teds_struct, ocr)의 집계 방식을 정적 "mean"에서 동적 함수 참조(!function utils.teds_aggregate, !function utils.teds_struct_aggregate, !function utils.ocr_aggregate)로 변경
병렬 워커 함수
lmms_eval/tasks/icdar_table_parsing/teds.py
TEDS 및 OCR 메트릭용 병렬 처리 워커 함수 3개 추가: compute_teds_full(), compute_teds_struct(), compute_ocr(). 각 함수는 예외 처리를 포함하고 안전한 기본값 반환
집계 및 병렬화 구현
lmms_eval/tasks/icdar_table_parsing/utils.py
멀티프로세싱 기반 병렬 계산을 위한 _parallel_compute() 헬퍼 추가. 메트릭 계산을 결과 처리 단계에서 집계 단계로 이동. teds_aggregate(), teds_struct_aggregate(), ocr_aggregate() 공개 함수 추가. 워커 개수를 환경 변수 TEDS_NUM_WORKERS로 제어

시퀀스 다이어그램

sequenceDiagram
    participant Config as YAML Configuration
    participant Process as Results Processing
    participant Agg as Aggregation Functions
    participant Worker as Parallel Workers
    participant Metric as Metric Engines<br/>(TEDS, OCR)

    Config->>Process: References aggregation functions
    Process->>Agg: Defers metric computation,<br/>returns data tuples
    Agg->>Agg: _parallel_compute()
    Agg->>Worker: Spawn worker processes<br/>with compute_* functions
    Worker->>Metric: Execute TEDS/OCR evaluation
    Metric-->>Worker: Metric scores
    Worker-->>Agg: Results with error handling
    Agg-->>Agg: Collect & compute mean
    Agg-->>Config: Return aggregated score
Loading

예상 코드 리뷰 노력

🎯 3 (중간 난이도) | ⏱️ ~25분

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR 설명이 변경사항, 필요성, 테스트 내용을 포함하고 있지만, 저장소의 템플릿 구조를 따르지 않고 있습니다. 저장소의 표준 템플릿을 따라 Description, Type of Change, Changes Made, Testing 섹션을 명시적으로 구조화하여 작성하시기 바랍니다.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed PR 제목이 주요 변경사항(TEDS 병렬 계산 최적화)을 명확하게 요약하고 있으며, 간결하고 구체적입니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue/#12-icdar-teds-parallel

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lmms_eval/tasks/icdar_table_parsing/teds.py`:
- Around line 217-218: The tuple unpacking (e.g., pred_html, gt_html, _) is done
outside the try block in compute_teds_full (and the sibling functions
compute_teds_structure and compute_teds_content), so malformed inputs raise
before the exception handler and a worker can't fallback to the 0.0 return; move
the unpacking into the try block and catch Exception around it, and on any
exception return 0.0 to preserve the worker fallback path while keeping the
existing scoring logic unchanged.

In `@lmms_eval/tasks/icdar_table_parsing/utils.py`:
- Line 18: The module-level _NUM_WORKERS assignment currently calls
int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64))) which
raises ValueError for non-integer env values (e.g., "auto") and bypasses the
intended 64 max; change it to read TEDS_NUM_WORKERS into a variable, try to
parse it to int in a try/except (or validate numeric and positive), and on
failure fall back to default = min(os.cpu_count() or 1, 64); finally clamp the
resulting number to a minimum of 1 and a maximum of 64 and assign that to
_NUM_WORKERS (referencing the TEDS_NUM_WORKERS env var and the _NUM_WORKERS
symbol).
- Line 8: Replace the direct Loguru import ("from loguru import logger") with
the project standard logger by importing eval_logger from lmms_eval.utils and
aliasing it to logger (e.g., from lmms_eval.utils import eval_logger as logger),
then update all existing logger.* calls in this module (the places currently
using logger at the top-level import and the logger.info/error/debug calls
throughout the file) to use that eval_logger alias so the module uses the
centralized eval_logger instead of loguru directly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8aff134-d3ce-4bf1-8085-2f94ed5375b8

📥 Commits

Reviewing files that changed from the base of the PR and between 4c83ef9 and d8a8dc0.

📒 Files selected for processing (3)
  • lmms_eval/tasks/icdar_table_parsing/icdar_table_parsing.yaml
  • lmms_eval/tasks/icdar_table_parsing/teds.py
  • lmms_eval/tasks/icdar_table_parsing/utils.py

Comment on lines +217 to +218
def compute_teds_full(args: tuple) -> float:
pred_html, gt_html, _ = args
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

인자 언패킹을 try 내부로 옮겨 워커 fallback을 보장해 주세요

Line 218, Line 229, Line 240에서 언패킹이 try 밖에 있어 입력 형식이 깨지면 0.0 반환 경로를 타지 못합니다. 단일 워커 경로에서는 집계가 중단될 수 있습니다.

제안 패치
-def compute_teds_full(args: tuple) -> float:
-    pred_html, gt_html, _ = args
-    try:
+def compute_teds_full(args: tuple) -> float:
+    try:
+        pred_html, gt_html, _ = args
         return TEDS(structure_only=False).evaluate(pred_html, gt_html)
     except Exception:
         import traceback
 
         traceback.print_exc()
         return 0.0
 
 
 def compute_teds_struct(args: tuple) -> float:
-    pred_html, gt_html, _ = args
     try:
+        pred_html, gt_html, _ = args
         return TEDS(structure_only=True).evaluate(pred_html, gt_html)
     except Exception:
         import traceback
 
         traceback.print_exc()
         return 0.0
 
 
 def compute_ocr(args: tuple) -> float:
-    pred_html, gt_html, _ = args
     try:
+        pred_html, gt_html, _ = args
         return OCRMetric().evaluate(pred_html, gt_html)
     except Exception:
         import traceback
 
         traceback.print_exc()
         return 0.0

Also applies to: 228-229, 239-240

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lmms_eval/tasks/icdar_table_parsing/teds.py` around lines 217 - 218, The
tuple unpacking (e.g., pred_html, gt_html, _) is done outside the try block in
compute_teds_full (and the sibling functions compute_teds_structure and
compute_teds_content), so malformed inputs raise before the exception handler
and a worker can't fallback to the 0.0 return; move the unpacking into the try
block and catch Exception around it, and on any exception return 0.0 to preserve
the worker fallback path while keeping the existing scoring logic unchanged.


from lmms_eval.tasks._task_utils.file_utils import generate_submission_file
from lmms_eval.tasks.icdar_table_parsing.teds import TEDS, OCRMetric
from loguru import logger
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# 파일 존재 확인 및 내용 확인
find . -name "utils.py" -path "*/icdar_table_parsing/*" -type f

Repository: ThakiCloud/lmms-eval

Length of output: 110


🏁 Script executed:

# 파일 내용 전체 확인 (라인 수 먼저 확인)
if [ -f "lmms_eval/tasks/icdar_table_parsing/utils.py" ]; then
  wc -l "lmms_eval/tasks/icdar_table_parsing/utils.py"
fi

Repository: ThakiCloud/lmms-eval

Length of output: 112


🏁 Script executed:

# eval_logger 정의 확인
if [ -f "lmms_eval/utils.py" ]; then
  rg -n "eval_logger" "lmms_eval/utils.py" | head -20
fi

Repository: ThakiCloud/lmms-eval

Length of output: 308


🏁 Script executed:

# 저장소 내 eval_logger 사용 패턴 확인
rg -n "from lmms_eval\.utils import eval_logger" --type=py | head -20

Repository: ThakiCloud/lmms-eval

Length of output: 301


🏁 Script executed:

# icdar_table_parsing/utils.py 파일 내용 확인
cat -n "lmms_eval/tasks/icdar_table_parsing/utils.py" | head -60

Repository: ThakiCloud/lmms-eval

Length of output: 2543


🏁 Script executed:

# 로깅 사용 부분 확인 (라인 40-50, 140-170)
cat -n "lmms_eval/tasks/icdar_table_parsing/utils.py" | sed -n '1,20p; 35,50p; 140,166p'

Repository: ThakiCloud/lmms-eval

Length of output: 3140


프로젝트 표준 eval_logger로 로깅 통합

loguru를 직접 import하지 말고, lmms_eval.utils에서 제공하는 eval_logger를 사용해 주세요. 라인 8, 43, 148, 156, 164에서 로깅 코드를 수정해야 합니다.

제안 패치
-from loguru import logger
+from lmms_eval.utils import eval_logger
@@
-                logger.warning("Worker failed for item {}: {}", idx, e)
+                eval_logger.warning(f"Worker failed for item {idx}: {e}")
@@
-    logger.info("Computing TEDS for {} samples with {} workers", len(items), workers)
+    eval_logger.info(f"Computing TEDS for {len(items)} samples with {workers} workers")
@@
-    logger.info("Computing TEDS-struct for {} samples with {} workers", len(items), workers)
+    eval_logger.info(f"Computing TEDS-struct for {len(items)} samples with {workers} workers")
@@
-    logger.info("Computing OCR metric for {} samples with {} workers", len(items), workers)
+    eval_logger.info(f"Computing OCR metric for {len(items)} samples with {workers} workers")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lmms_eval/tasks/icdar_table_parsing/utils.py` at line 8, Replace the direct
Loguru import ("from loguru import logger") with the project standard logger by
importing eval_logger from lmms_eval.utils and aliasing it to logger (e.g., from
lmms_eval.utils import eval_logger as logger), then update all existing logger.*
calls in this module (the places currently using logger at the top-level import
and the logger.info/error/debug calls throughout the file) to use that
eval_logger alias so the module uses the centralized eval_logger instead of
loguru directly.


TABLE_PARSING_PROMPT = "Convert the table in the image to HTML format. Output only the HTML table code, starting with <table> and ending with </table>. Do not include any explanation."

_NUM_WORKERS = int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TEDS_NUM_WORKERS 파싱이 import 실패와 상한 우회를 유발할 수 있습니다

Line 18은 비정상 값(예: TEDS_NUM_WORKERS=auto)에서 ValueError로 모듈 import가 실패하고, 환경변수 값에는 최대 64 제한도 적용되지 않습니다. PR 목표(최대 64)와 동작이 어긋납니다.

제안 패치
-_NUM_WORKERS = int(os.environ.get("TEDS_NUM_WORKERS", min(os.cpu_count() or 1, 64)))
+def _resolve_num_workers() -> int:
+    default_workers = min(os.cpu_count() or 1, 64)
+    raw = os.environ.get("TEDS_NUM_WORKERS")
+    if raw is None:
+        return default_workers
+    try:
+        parsed = int(raw)
+    except ValueError:
+        return default_workers
+    return max(1, min(parsed, 64))
+
+
+_NUM_WORKERS = _resolve_num_workers()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lmms_eval/tasks/icdar_table_parsing/utils.py` at line 18, The module-level
_NUM_WORKERS assignment currently calls int(os.environ.get("TEDS_NUM_WORKERS",
min(os.cpu_count() or 1, 64))) which raises ValueError for non-integer env
values (e.g., "auto") and bypasses the intended 64 max; change it to read
TEDS_NUM_WORKERS into a variable, try to parse it to int in a try/except (or
validate numeric and positive), and on failure fall back to default =
min(os.cpu_count() or 1, 64); finally clamp the resulting number to a minimum of
1 and a maximum of 64 and assign that to _NUM_WORKERS (referencing the
TEDS_NUM_WORKERS env var and the _NUM_WORKERS symbol).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant