Add tree caching to GitHub provider to reduce duplicate API calls
Summary
The GetTektonDir() function fetches repository tree structures from the GitHub API without caching. When called multiple times for the same revision (e.g., checking both source and default_branch provenance), each call results in duplicate get_root_tree and get_tekton_tree API calls.
Problem
Location: pkg/provider/github/github.go:327-376
func (v *Provider) GetTektonDir(ctx context.Context, runevent *info.Event, path, provenance string) (string, error) {
// ...
revision := runevent.SHA
if provenance == "default_branch" {
revision = runevent.DefaultBranch
}
// API call 1: Get root tree
rootobjects, _, err := wrapAPI(v, "get_root_tree", func() (*github.Tree, *github.Response, error) {
return v.Client().Git.GetTree(ctx, runevent.Organization, runevent.Repository, revision, false)
})
// ...
// API call 2: Get tekton directory tree
tektonDirObjects, _, err := wrapAPI(v, "get_tekton_tree", func() (*github.Tree, *github.Response, error) {
return v.Client().Git.GetTree(ctx, runevent.Organization, runevent.Repository, tektonDirSha, true)
})
// ...
}
Issue: No caching. Each GetTektonDir() call makes 2 API requests, even if called with the same revision.
Evidence from E2E Instrumentation
In a single "Github PullRequest" test run with 21 API calls:
2x /repos/.../git/trees/ad4a001317e33f7716be880426e0159e77ad305e # get_root_tree
2x /repos/.../git/trees/079173c49d895c8b7df093106b2c5bb5b2d5d559 # get_tekton_tree
The same tree SHAs are fetched twice within a single test.
Total impact:
get_root_tree: 4 duplicate calls out of 50 (8% waste)
get_tekton_tree: 8 duplicate calls out of 54 (14.8% waste)
- Combined: 12 wasted calls
Root Cause
GetTektonDir() is called multiple times during event processing:
- Source provenance check: Fetches
.tekton/ from the PR's head SHA
- Default branch provenance check: Fetches
.tekton/ from the default branch
If both provenances point to the same revision (common in push events or when .tekton/ hasn't changed), the same trees are fetched twice.
Additionally, different code paths may call GetTektonDir() independently without sharing results.
Proposed Solution
Add tree caching to the Provider struct:
type Provider struct {
// ... existing fields
treeCache map[string]*github.Tree
treeCacheMutex sync.RWMutex
}
func New() *Provider {
return &Provider{
// ... existing initialization
treeCache: make(map[string]*github.Tree),
}
}
// Helper to get cached tree or fetch from API
func (v *Provider) getTree(ctx context.Context, org, repo, sha string, recursive bool) (*github.Tree, error) {
cacheKey := fmt.Sprintf("%s/%s/%s/%t", org, repo, sha, recursive)
v.treeCacheMutex.RLock()
if cached, ok := v.treeCache[cacheKey]; ok {
v.treeCacheMutex.RUnlock()
return cached, nil
}
v.treeCacheMutex.RUnlock()
opName := "get_tree"
if !recursive {
opName = "get_root_tree"
} else {
opName = "get_tekton_tree"
}
tree, _, err := wrapAPI(v, opName, func() (*github.Tree, *github.Response, error) {
return v.Client().Git.GetTree(ctx, org, repo, sha, recursive)
})
if err != nil {
return nil, err
}
v.treeCacheMutex.Lock()
v.treeCache[cacheKey] = tree
v.treeCacheMutex.Unlock()
return tree, nil
}
Then update GetTektonDir() to use the helper:
func (v *Provider) GetTektonDir(ctx context.Context, runevent *info.Event, path, provenance string) (string, error) {
// ...
rootobjects, err := v.getTree(ctx, runevent.Organization, runevent.Repository, revision, false)
if err != nil {
return "", err
}
// ... find tektonDirSha ...
tektonDirObjects, err := v.getTree(ctx, runevent.Organization, runevent.Repository, tektonDirSha, true)
if err != nil {
return "", err
}
// ...
}
Why This is Safe
-
Tree SHAs are immutable: A git tree SHA uniquely identifies the exact directory structure and contents.
-
Cache lifetime is bounded: The Provider instance lives only for the duration of webhook event processing.
-
Thread-safe: Using sync.RWMutex for concurrent access.
-
Cache key includes all parameters: org/repo/sha/recursive ensures different queries don't collide.
Files to Modify
pkg/provider/github/github.go
- Add
treeCache and treeCacheMutex fields to Provider struct
- Initialize
treeCache in New()
- Add
getTree() helper method
- Update
GetTektonDir() to use getTree()
Testing
- Run existing unit tests
- Run e2e tests with
PAC_API_INSTRUMENTATION_DIR set
- Verify no duplicate
get_root_tree or get_tekton_tree calls for the same SHA
Expected Impact
- Calls saved: ~12 per full e2e test suite
- Per-event savings: 2-4 calls for events that check both source and default_branch
- Complexity: Low - isolated changes to the provider
Acceptance Criteria
Alternative Approaches Considered
Option A: Cache at GetTektonDir level (simpler but less reusable)
Cache the entire GetTektonDir() result by (revision, path) tuple. Simpler but doesn't help if tree data is needed elsewhere.
Option B: LRU cache with size limit (more complex)
Use an LRU cache to limit memory usage. Probably overkill since Provider instances are short-lived.
Recommendation: Start with the simple per-SHA cache (proposed solution). Optimize further only if needed.
/label ~enhancement ~performance ~github-provider
Add tree caching to GitHub provider to reduce duplicate API calls
Summary
The
GetTektonDir()function fetches repository tree structures from the GitHub API without caching. When called multiple times for the same revision (e.g., checking both source and default_branch provenance), each call results in duplicateget_root_treeandget_tekton_treeAPI calls.Problem
Location:
pkg/provider/github/github.go:327-376Issue: No caching. Each
GetTektonDir()call makes 2 API requests, even if called with the same revision.Evidence from E2E Instrumentation
In a single "Github PullRequest" test run with 21 API calls:
The same tree SHAs are fetched twice within a single test.
Total impact:
get_root_tree: 4 duplicate calls out of 50 (8% waste)get_tekton_tree: 8 duplicate calls out of 54 (14.8% waste)Root Cause
GetTektonDir()is called multiple times during event processing:.tekton/from the PR's head SHA.tekton/from the default branchIf both provenances point to the same revision (common in push events or when
.tekton/hasn't changed), the same trees are fetched twice.Additionally, different code paths may call
GetTektonDir()independently without sharing results.Proposed Solution
Add tree caching to the
Providerstruct:Then update
GetTektonDir()to use the helper:Why This is Safe
Tree SHAs are immutable: A git tree SHA uniquely identifies the exact directory structure and contents.
Cache lifetime is bounded: The Provider instance lives only for the duration of webhook event processing.
Thread-safe: Using
sync.RWMutexfor concurrent access.Cache key includes all parameters:
org/repo/sha/recursiveensures different queries don't collide.Files to Modify
pkg/provider/github/github.gotreeCacheandtreeCacheMutexfields toProviderstructtreeCacheinNew()getTree()helper methodGetTektonDir()to usegetTree()Testing
PAC_API_INSTRUMENTATION_DIRsetget_root_treeorget_tekton_treecalls for the same SHAExpected Impact
Acceptance Criteria
GetTektonDir()uses cached trees when availableAlternative Approaches Considered
Option A: Cache at GetTektonDir level (simpler but less reusable)
Cache the entire
GetTektonDir()result by(revision, path)tuple. Simpler but doesn't help if tree data is needed elsewhere.Option B: LRU cache with size limit (more complex)
Use an LRU cache to limit memory usage. Probably overkill since Provider instances are short-lived.
Recommendation: Start with the simple per-SHA cache (proposed solution). Optimize further only if needed.
/label ~enhancement ~performance ~github-provider