Skip to content

ttl: harden running-job iteration test session usage#13

Closed
zanmato1984 wants to merge 6 commits intomasterfrom
codex/ttl-running-job-iteration-durable
Closed

ttl: harden running-job iteration test session usage#13
zanmato1984 wants to merge 6 commits intomasterfrom
codex/ttl-running-job-iteration-durable

Conversation

@zanmato1984
Copy link
Copy Markdown
Owner

Summary

  • avoid holding one long-lived test session across the full table loop in TestIterationOfRunningJob
  • extract a timeout-aware helper and use short-lived sessions for each iteration plus cache refresh
  • add a session-timeout subtest to cover slow-loop timeout pressure explicitly

Root cause

The earlier race fix switched this test to internal sessions, but it still reused a single session for the whole loop. Under slower execution, that session can exceed its timeout and fail unrelated assertions.

Validation

  • previous work already confirmed race-path stability and timeout-path repro/pass behavior for this change
  • lightweight local rerun attempted in this finalization pass:
    • GOMAXPROCS=2 go test -p 1 -vet=off ./pkg/ttl/ttlworker -run '^TestIterationOfRunningJob$' -count=1
    • blocked by local Go build-cache environment errors (missing files under ~/Library/Caches/go-build), not by test assertions

@zanmato1984 zanmato1984 force-pushed the codex/ttl-running-job-iteration-durable branch from 50611bf to 197c765 Compare April 9, 2026 08:40
],
flaky = True,
shard_count = 4,
shard_count = 5,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shard-count update is unrelated to the TTL flaky-test fix in this PR, so it breaks the minimal-scope/test-only goal. Please drop this file change (or remove commit a0550e135c2dd48d20f1d345ec8035f2f1072b70) and keep the PR focused on TestIterationOfRunningJob.\nSource: process/pr-review | Second Opinion

Copy link
Copy Markdown
Owner Author

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT_LGTM.\n\nReview checks:\n1. Intent preservation: pass.\n2. Test-only/minimal scope: fail due unrelated Bazel file change (inline comment posted).\n3. Root-cause analysis: reasonable and evidence-based.\n4. Timing-only precise repro: necessary and appropriately focused.\n5. Regression coverage ( + ): pass.\n\nAction required:\n- Drop the unrelated change (or remove commit ) and keep this PR scoped to TTL test fix only.\nSource: process/pr-review | Second Opinion

@zanmato1984
Copy link
Copy Markdown
Owner Author

Clarification for the review above (format-safe): NOT_LGTM.

Blocking item to address:

  • Remove the unrelated file change in tests/realtikvtest/pushdowntest/BUILD.bazel (shard_count 4 -> 5), or drop commit a0550e1.
  • Keep this PR scoped to pkg/ttl/ttlworker/job_manager_integration_test.go only.

Focus checks status:

  1. Intent preservation: pass.
  2. Test-only/minimal scope: fail (blocking item above).
  3. Root-cause analysis: pass.
  4. Timing-only precise repro necessity: pass.
  5. Regression coverage in normal + timeout-pressure subtests: pass.

Source: process/pr-review | Second Opinion

@zanmato1984
Copy link
Copy Markdown
Owner Author

Address update for the latest NOT_LGTM review cycle (2026-04-09):\n\n| Finding | Source | Disposition | Evidence |\n| --- | --- | --- | --- |\n| Unrelated shard_count change () should be removed; keep PR scoped to TTL test fix only. | #13 (comment) and #13 (comment) | Fixed | Reverted the unrelated Bazel change in commit . Current PR diff is scoped to only. |\n\nValidation run:\n- ok github.com/pingcap/tidb/pkg/ttl/ttlworker 6.378s\n- Result:

@zanmato1984
Copy link
Copy Markdown
Owner Author

Address update for the latest NOT_LGTM review cycle (2026-04-09):

Finding Source Disposition Evidence
Unrelated tests/realtikvtest/pushdowntest/BUILD.bazel shard_count change (4 -> 5) should be removed; keep PR scoped to TTL test fix only. #13 (comment) and #13 (comment) Fixed Reverted the unrelated Bazel change in commit 44cb58d. Current PR diff is scoped to pkg/ttl/ttlworker/job_manager_integration_test.go only.

Validation run:

  • GOCACHE=/tmp/go-build go test ./pkg/ttl/ttlworker -run TestIterationOfRunningJob -count=1
  • Result: ok github.com/pingcap/tidb/pkg/ttl/ttlworker 6.474s

@zanmato1984
Copy link
Copy Markdown
Owner Author

Address update for stage r2 (2026-04-09):

Checklist item Disposition Evidence
Remove reintroduced non-test scope (tests/realtikvtest/pushdowntest/BUILD.bazel) Fixed Added commit c24f182d33 (`Revert "chore: update bazel file"), which restores this file to the prior PR state.
Keep PR scope clean to TTL test fix only Fixed Current changed-files list is only pkg/ttl/ttlworker/job_manager_integration_test.go.
Keep test fix intact Fixed GOCACHE=/tmp/go-build go test ./pkg/ttl/ttlworker -run TestIterationOfRunningJob -count=1 -> ok github.com/pingcap/tidb/pkg/ttl/ttlworker 6.235s.

This addresses the latest reintroduced scope drift while preserving the test fix behavior.

Copy link
Copy Markdown
Owner Author

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT_LGTM (r2, independent re-review at latest head 3e596ee).

Checks:

  1. Original test intent preserved: PASS.
  2. Diff minimal/test-only and scope-clean: FAIL (blocking; unrelated pushdown Bazel shard-count file change is present at current head).
  3. Recurrence root-cause analysis remains evidence-based: PASS (patch directly addresses long-lived session lifetime coupling).
  4. Timing-only precise repro retention/removal remains justified: PASS (session-timeout subtest is a focused timeout-pressure guard).
  5. Regression coverage quality remains adequate: PASS (normal + session-timeout subtests).
  6. Address-cycle consistency with latest head: FAIL (44cb58d reverted scope issue, but 3e596ee reintroduced it).

Concrete address checklist:

  • Remove tests/realtikvtest/pushdowntest/BUILD.bazel change from PR head (drop/revert 3e596ee).
  • Confirm final diff is scoped to pkg/ttl/ttlworker/job_manager_integration_test.go only.
  • Re-run targeted TTL test and share result.

Source: process/pr-review | Second Opinion

Copy link
Copy Markdown
Owner Author

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (r2, superseding prior interim NOT_LGTM after new address commit).

Reviewed latest PR head: c24f182

Checks:

  1. Original test intent preserved: PASS.
  2. Diff minimal/test-only and scope-clean: PASS (only pkg/ttl/ttlworker/job_manager_integration_test.go changed).
  3. Recurrence root-cause analysis remains evidence-based: PASS.
  4. Timing-only precise repro retention/removal remains justified: PASS.
  5. Regression coverage in normal + session-timeout subtests: PASS.
  6. Address-cycle consistency at latest head: PASS (reintroduced non-scope Bazel change removed by c24f182).

No material blockers remain.

Source: process/pr-review | Second Opinion

@zanmato1984 zanmato1984 closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants