ARC runners did not recover automatically after Github Outage

### Checks

- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [x] I am using charts that are officially provided

### Controller Version

0.13.1

### Deployment Method

Helm

### Checks

- [x] This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions/actions-runner-controller/discussions)).
- [x] I've read the [Changelog](https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md#changelog) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

### To Reproduce

```markdown
This is not reproducible directly as it is the byproduct of an incident.
see https://www.githubstatus.com/incidents/g9j4tmfqdd09
```

### Describe the bug

During the GitHub Actions degraded availability incident on 2026-03-05 [githubstatus.com/incidents/g9j4tmfqdd09](https://www.githubstatus.com/incidents/g9j4tmfqdd09)), our ARC deployment experienced runners that became stuck in a bad state and were **not automatically recovered** after GitHub's services came back online.

These runners were registered before the incident, lost their registration during GitHub's degradation, and ARC never reconciled them back to a healthy state.

This was resolved by manually deleting all of the stuck ARC runner pods.

### Describe the expected behavior

ARC has **no garbage collection loop** that reconciles GitHub-side runner registrations against actual Kubernetes pod state. The `EphemeralRunnerReconciler` only handles the forward path (create secret -> create pod -> monitor pod). It does not:

1. Periodically verify that a runner's GitHub registration is still valid
2. Detect runners whose registration was invalidated by a GitHub-side incident
3. Clean up and re-provision runners that are running but no longer recognized by GitHub

I suggest that we should add periodic health check to the `EphemeralRunnerReconciler` that verifies the GitHub-side registration is still valid for running EphemeralRunners.

### Additional Context

```yaml
1. GitHub's `api.github.com/actions/runner-registration` endpoint became unavailable.
2. The ARC `EphemeralRunnerReconciler` failed to generate JIT configs for new runners, retrying 5 times with backoff before giving up.
3. Already-running runners lost their registrations on the GitHub side (`Registration <uuid> was not found`).
4. Runner pods entered `BrokerServer` backoff loops, unable to communicate with `broker.actions.githubusercontent.com`.
5. **After GitHub recovered at ~23:55 UTC, errors and backoff warnings continued until at least 00:17 UTC** — over 20 minutes of lingering failures. Runners that were mid-registration during the incident remained in a bad state with no automatic recovery.
```

### Controller Logs

```shell
related logs:
https://gist.github.com/rob-howie-depop/27b15fd387ffc5f8f36e838614ffefc0
```

### Runner Pod Logs

```shell
related logs:
https://gist.github.com/rob-howie-depop/27b15fd387ffc5f8f36e838614ffefc0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARC runners did not recover automatically after Github Outage #4396

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARC runners did not recover automatically after Github Outage #4396

Description

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions