Skip to content

fix(nf-tower): ensure final log checkpoint runs after thread interrupt#7097

Open
rnaidu-seqera wants to merge 2 commits intonextflow-io:masterfrom
rnaidu-seqera:fix/logs-checkpoint-interrupt
Open

fix(nf-tower): ensure final log checkpoint runs after thread interrupt#7097
rnaidu-seqera wants to merge 2 commits intonextflow-io:masterfrom
rnaidu-seqera:fix/logs-checkpoint-interrupt

Conversation

@rnaidu-seqera
Copy link
Copy Markdown
Contributor

@rnaidu-seqera rnaidu-seqera commented May 4, 2026

Summary

  • LogsCheckpoint.await() was re-setting the thread interrupt flag after catching InterruptedException, causing subsequent NIO writes in saveFiles() to
    throw ClosedByInterruptException and leave empty nf-*.txt, nf-*.log, and timeline-*.html files in the GCS work directory
  • Fixed by changing await() to return a boolean instead of re-setting the flag, and calling Thread.interrupted() in run() to clear the flag before
    saveFiles() executes — ensuring the final upload always succeeds regardless of interrupt source
  • Added synchronized(lock) guard so onFlowComplete() cannot interrupt the thread mid-upload

Test plan

  • Existing LogsCheckpointTest tests pass (interval configuration via default, env var, and config file)
  • New unit test 'should perform final saveFiles when interrupted mid-sleep' passes — verifies handler.saveFiles() is called exactly once whenonFlowComplete() interrupts the thread mid-sleep, and that the thread interrupt flag is clear at the time of the call
  • Manually verified fix on Google Batch CE with GCS work directory and cloudcache.enabled = true: nf-*.txt file uploaded to GCS with full content after workflow completion (previously 0 bytes)

@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 5fa9b4e
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69f926bbcc6d9100081cc685

Signed-off-by: Rashmi Naidu <rashmi.naidu@seqera.io>
Signed-off-by: Rashmi Naidu <rashmi.naidu@seqera.io>
@rnaidu-seqera rnaidu-seqera force-pushed the fix/logs-checkpoint-interrupt branch from ea5c083 to 5fa9b4e Compare May 4, 2026 23:07
@rnaidu-seqera rnaidu-seqera marked this pull request as ready for review May 5, 2026 00:59
if( Thread.currentThread().isInterrupted() )
break
final interrupted = await(interval)
Thread.interrupted() // clear flag so NIO writes in saveFiles() succeed
Copy link
Copy Markdown
Contributor

@jorgee jorgee May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the description of the PR I have not clear what was causing the issue and why you set Thread.interrupted() in all the cases. Is it not causing issues in the normal execution?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main mistake is to use interrupt to signal the termination that was introduced in previous iteration.

synchronized(lock) {
thread.interrupt()
}

interrupt should be use to force the thread interruption (that's not the case).

I'd suggest going in the change history, revert that implementation and proper exiting the thread

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was added in #6787. I see it is in stable 25.10.x. But this branch doesn't include #6939. So, maybe the reported problem is caused by the race condition and interrupt is happening during upload. Because, if the thread is interrupted, it should break without calling saveFiles.

@rnaidu-seqera, Is the issue happening in 25.10.4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants