Skip to content

[Auto-Recovery] Add crash recovery script for unrecoverable CUDA errors#1923

Open
yf225 wants to merge 1 commit intoyf225/stack/90from
yf225/stack/93
Open

[Auto-Recovery] Add crash recovery script for unrecoverable CUDA errors#1923
yf225 wants to merge 1 commit intoyf225/stack/90from
yf225/stack/93

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Apr 2, 2026

yf225 added a commit that referenced this pull request Apr 2, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 2, 2026
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 08:11
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 08:11
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 09:16
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the title [Autotuner] Add crash recovery bash script for unrecoverable CUDA errors [Autotuner] Add crash recovery script for unrecoverable CUDA errors Apr 2, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 09:16
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 09:18
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 09:18
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 09:20
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 09:20
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 20:00
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 20:00
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 20:18
yf225 added a commit that referenced this pull request Apr 2, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 20:18
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 2, 2026 20:22
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 2, 2026 20:22
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 3, 2026 21:36
yf225 added a commit that referenced this pull request Apr 3, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 3, 2026 21:36
yf225 added a commit that referenced this pull request Apr 3, 2026
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 3, 2026 21:40
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 3, 2026 21:41
yf225 added a commit that referenced this pull request Apr 3, 2026
yf225 added a commit that referenced this pull request Apr 3, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
…e CUDA errors

stack-info: PR: #1923, branch: yf225/stack/93
yf225 added a commit that referenced this pull request Apr 4, 2026
@yf225 yf225 changed the base branch from yf225/stack/90 to main April 4, 2026 04:24
@yf225 yf225 force-pushed the yf225/stack/93 branch 2 times, most recently from 45157d8 to b65c346 Compare April 4, 2026 04:25
yf225 added a commit that referenced this pull request Apr 4, 2026
@yf225 yf225 changed the title [Autotuner] Add crash recovery script for unrecoverable CUDA errors [Auto-Recovery] Add crash recovery script for unrecoverable CUDA errors Apr 4, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/90 April 4, 2026 04:25
yf225 added a commit that referenced this pull request Apr 4, 2026
yf225 added a commit that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant