perf(github): cache check-run lookups with retry#2669
perf(github): cache check-run lookups with retry#2669theakshaypant wants to merge 2 commits intotektoncd:mainfrom
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2669 +/- ##
==========================================
+ Coverage 58.85% 58.96% +0.10%
==========================================
Files 204 204
Lines 20149 20227 +78
==========================================
+ Hits 11859 11926 +67
- Misses 7525 7534 +9
- Partials 765 767 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for GitHub check runs to reduce the number of API calls made by the provider. It adds a checkRunsCache struct and utilizes sync.Once within getExistingCheckRunID to fetch and store all check run pages. Feedback on the implementation highlights a potential issue where sync.Once will cache transient errors from the initial API call, preventing any future successful attempts for that provider instance. It is recommended to use a sync.Mutex or a similar pattern to avoid caching failed operations.
| v.checkRunsCache.once.Do(func() { | ||
| v.checkRunsCache.runs, v.checkRunsCache.err = v.fetchAllCheckRunPages(ctx, runevent) | ||
| }) |
There was a problem hiding this comment.
Using sync.Once to cache the result of a fallible operation like an API call can be problematic. If the first call to fetchAllCheckRunPages fails due to a transient error (e.g., network timeout, context cancellation, or temporary GitHub API issue), the error is cached in v.checkRunsCache.err. All subsequent concurrent or future calls to getExistingCheckRunID using this Provider instance will immediately return the cached error without attempting a retry.
Consider using a sync.Mutex to protect the cache and only storing the result if the error is nil, or implementing a mechanism to allow retries for transient errors.
There was a problem hiding this comment.
The Provider is created fresh per webhook event and is short-lived. If fetchAllCheckRunPages fails, that would likely affect any immediate retry from the next goroutine too. The mutex just adds complexity to retry into the same failure IMO
There was a problem hiding this comment.
I think the main issue is a bit different from provider lifetime across webhooks. Even if the provider is short-lived, we still retry status reporting on the same Provider instance inside the same webhook flow.
So if fetchAllCheckRunPages fails once because of a transient GitHub/API/network issue, sync.Once ends up caching that error and the later retries just return the same cached error without actually calling GitHub again.
Perhaps a mutex-protected cache with a loaded flag that only flips to true after a successful fetch. That keeps the perf win for repeated lookups, but still allows retries after transient GitHub/API failures.
|
Added API calls duplication data in the description showing no duplicate calls to fetch check_runs |
| return searchCheckRuns(v.checkRunsCache.runs, status, v), nil | ||
| } | ||
|
|
||
| func searchCheckRuns(runs []*github.CheckRun, status providerstatus.StatusOpts, v *Provider) *int64 { |
There was a problem hiding this comment.
why not just attach to Provider struct?
func (v *Provider) searchCheckRuns(runs []*github.CheckRun, status providerstatus.StatusOpts) *int64 {|
setting draft, to give you time to address the comments.. |
919500a to
b421ba1
Compare
Cache GitHub check-run API responses to avoid repeated paginated API calls during status updates. Use a mutex+channel pattern so concurrent goroutines share a single in-flight fetch. Failed fetches are not cached, allowing retries with exponential backoff on transient API errors. Signed-off-by: Akshay Pant <akpant@redhat.com> Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Provider struct literals bypass the New() constructor, leaving the checkRunsCache map nil. Use New() consistently across test files to ensure proper initialization. Signed-off-by: Akshay Pant <akpant@redhat.com>
b421ba1 to
f078824
Compare
|
/test go-testing |
📝 Description of the Change
Cache GitHub check-run API responses to avoid repeated
paginated API calls during status updates. Use a mutex+channel
pattern so concurrent goroutines share a single in-flight fetch.
Failed fetches are not cached, allowing retries with exponential
backoff on transient API errors.
🔗 Linked GitHub Issue
Partial #2379
🧪 Testing Strategy
Doing an analysis of the ghe e2e test suite, no duplicate list_check_runs call remains
All duplicate API calls remaining
🤖 AI Assistance
AI assistance can be used for various tasks, such as code generation,
documentation, or testing.
Please indicate whether you have used AI assistance
for this PR and provide details if applicable.
Important
Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Claude noreply@anthropic.com
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.