Unsure of reproduction, seems to be intermittent, happens at random.
We've been noticing issues with runners randomly getting stuck and prevents workflows from running. Im unsure on a root cause but it seems that the longest running runners have this error:
[RUNNER 2025-11-07 14:43:16Z INFO Runner] Skipping message Job. Job message not found 'e420db7b-c780-5ae4-8145-4eea09b87aea'. job was canceled
Best i can tell, this seems to be hanging and doesnt get cleaned up. I assume the job was interrupted or manually canceled by the controller or a user. We can get runners rolling again when we delete any ephemeralrunners that have this in their logs.
However it is unclear to us if this causes another issue we are seeing at the same time but it seems some pods have the following error but may just be a result of the above action:
[RUNNER 2025-11-07 14:43:07Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-11-07 14:43:22Z WARN GitHubActionsService] Attempt 1 of POST request to https://run-actions-1-azure-eastus.actions.githubusercontent.com/119/acquirejob failed (HTTP Status: GatewayTimeout). The operation will be retried in 11.082 seconds.
[RUNNER 2025-11-07 14:43:38Z ERR GitHubActionsService] POST request to https://run-actions-1-azure-eastus.actions.githubusercontent.com/119/acquirejob failed. HTTP Status: Conflict
[RUNNER 2025-11-07 14:43:38Z INFO Runner] Skipping message Job. Job message already acquired '269c5bb1-9d88-5672-9f70-5522ca917969'. job assignment is invalid: MissingKey
[RUNNER 2025-11-07 15:33:40Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-11-07 15:33:40Z ERR GitHubActionsService] POST request to https://pipelinesghubeus25.actions.githubusercontent.com/<token>/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2025-11-07 15:33:40Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] Catch exception during request
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration 470d6c87-f8c4-411f-87d0-6b069233ccc7 was not found.
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Services.Common.RawHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpRequestMessage message, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Uri requestUri, HttpContent content, IEnumerable`1 queryParameters, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, Nullable`1 status, String os, String architecture, Nullable`1 disableUpdate, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext()
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] --- End of stack trace from previous location ---
[RUNNER 2025-11-07 15:33:40Z ERR BrokerServer] at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxAttempts, Func`2 shouldRetry)
[RUNNER 2025-11-07 15:33:40Z WARN BrokerServer] Back off 6.292 seconds before next retry. 4 attempt left.
[RUNNER 2025-11-07 15:33:46Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-11-07 15:33:46Z ERR GitHubActionsService] POST request to https://pipelinesghubeus25.actions.githubusercontent.com/<token>/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2025-11-07 15:33:46Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] Catch exception during request
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration 470d6c87-f8c4-411f-87d0-6b069233ccc7 was not found.
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Services.Common.RawHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpRequestMessage message, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Uri requestUri, HttpContent content, IEnumerable`1 queryParameters, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, Nullable`1 status, String os, String architecture, Nullable`1 disableUpdate, CancellationToken cancellationToken)
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext()
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] --- End of stack trace from previous location ---
[RUNNER 2025-11-07 15:33:46Z ERR BrokerServer] at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxAttempts, Func`2 shouldRetry)
[RUNNER 2025-11-07 15:33:46Z WARN BrokerServer] Back off 11.706 seconds before next retry. 3 attempt left.
Right now we are getting around this issue by using a cron that cleans up the following:
resources="$(kubectl get ephemeralrunner --namespace "$NS" -l app.kubernetes.io/component=runner -o yaml | yq -r '.items[] | .metadata.name')"
echo "$resources" | while IFS= read -r resource; do
if [ -z "$resource" ]; then continue; fi
echo "Checking resource: $resource"
logs=$(kubectl -n "$NS" logs "$resource" 2>/dev/null)
if echo "$logs" | grep -qE "MissingKey|job was canceled|token request: Unknown"; then
echo "Deleting resource: $resource"
kubectl -n "$NS" delete ephemeralrunner $resource
fi
done
Runners should handle this and clean themselves up.
Checks
Controller Version
0.13.0
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
We've been noticing issues with runners randomly getting stuck and prevents workflows from running. Im unsure on a root cause but it seems that the longest running runners have this error:
Best i can tell, this seems to be hanging and doesnt get cleaned up. I assume the job was interrupted or manually canceled by the controller or a user. We can get runners rolling again when we delete any ephemeralrunners that have this in their logs.
However it is unclear to us if this causes another issue we are seeing at the same time but it seems some pods have the following error but may just be a result of the above action:
Right now we are getting around this issue by using a cron that cleans up the following:
Describe the expected behavior
Runners should handle this and clean themselves up.
Additional Context
No values that i believe are relevantController Logs
Runner Pod Logs