Skip to content

Add tree caching to GitHub provider to reduce duplicate API calls #2377

@chmouel

Description

@chmouel

Add tree caching to GitHub provider to reduce duplicate API calls

Summary

The GetTektonDir() function fetches repository tree structures from the GitHub API without caching. When called multiple times for the same revision (e.g., checking both source and default_branch provenance), each call results in duplicate get_root_tree and get_tekton_tree API calls.

Problem

Location: pkg/provider/github/github.go:327-376

func (v *Provider) GetTektonDir(ctx context.Context, runevent *info.Event, path, provenance string) (string, error) {
    // ...
    revision := runevent.SHA
    if provenance == "default_branch" {
        revision = runevent.DefaultBranch
    }

    // API call 1: Get root tree
    rootobjects, _, err := wrapAPI(v, "get_root_tree", func() (*github.Tree, *github.Response, error) {
        return v.Client().Git.GetTree(ctx, runevent.Organization, runevent.Repository, revision, false)
    })
    // ...

    // API call 2: Get tekton directory tree
    tektonDirObjects, _, err := wrapAPI(v, "get_tekton_tree", func() (*github.Tree, *github.Response, error) {
        return v.Client().Git.GetTree(ctx, runevent.Organization, runevent.Repository, tektonDirSha, true)
    })
    // ...
}

Issue: No caching. Each GetTektonDir() call makes 2 API requests, even if called with the same revision.

Evidence from E2E Instrumentation

In a single "Github PullRequest" test run with 21 API calls:

2x /repos/.../git/trees/ad4a001317e33f7716be880426e0159e77ad305e  # get_root_tree
2x /repos/.../git/trees/079173c49d895c8b7df093106b2c5bb5b2d5d559  # get_tekton_tree

The same tree SHAs are fetched twice within a single test.

Total impact:

  • get_root_tree: 4 duplicate calls out of 50 (8% waste)
  • get_tekton_tree: 8 duplicate calls out of 54 (14.8% waste)
  • Combined: 12 wasted calls

Root Cause

GetTektonDir() is called multiple times during event processing:

  1. Source provenance check: Fetches .tekton/ from the PR's head SHA
  2. Default branch provenance check: Fetches .tekton/ from the default branch

If both provenances point to the same revision (common in push events or when .tekton/ hasn't changed), the same trees are fetched twice.

Additionally, different code paths may call GetTektonDir() independently without sharing results.

Proposed Solution

Add tree caching to the Provider struct:

type Provider struct {
    // ... existing fields
    treeCache      map[string]*github.Tree
    treeCacheMutex sync.RWMutex
}

func New() *Provider {
    return &Provider{
        // ... existing initialization
        treeCache: make(map[string]*github.Tree),
    }
}

// Helper to get cached tree or fetch from API
func (v *Provider) getTree(ctx context.Context, org, repo, sha string, recursive bool) (*github.Tree, error) {
    cacheKey := fmt.Sprintf("%s/%s/%s/%t", org, repo, sha, recursive)

    v.treeCacheMutex.RLock()
    if cached, ok := v.treeCache[cacheKey]; ok {
        v.treeCacheMutex.RUnlock()
        return cached, nil
    }
    v.treeCacheMutex.RUnlock()

    opName := "get_tree"
    if !recursive {
        opName = "get_root_tree"
    } else {
        opName = "get_tekton_tree"
    }

    tree, _, err := wrapAPI(v, opName, func() (*github.Tree, *github.Response, error) {
        return v.Client().Git.GetTree(ctx, org, repo, sha, recursive)
    })
    if err != nil {
        return nil, err
    }

    v.treeCacheMutex.Lock()
    v.treeCache[cacheKey] = tree
    v.treeCacheMutex.Unlock()

    return tree, nil
}

Then update GetTektonDir() to use the helper:

func (v *Provider) GetTektonDir(ctx context.Context, runevent *info.Event, path, provenance string) (string, error) {
    // ...

    rootobjects, err := v.getTree(ctx, runevent.Organization, runevent.Repository, revision, false)
    if err != nil {
        return "", err
    }

    // ... find tektonDirSha ...

    tektonDirObjects, err := v.getTree(ctx, runevent.Organization, runevent.Repository, tektonDirSha, true)
    if err != nil {
        return "", err
    }

    // ...
}

Why This is Safe

  1. Tree SHAs are immutable: A git tree SHA uniquely identifies the exact directory structure and contents.

  2. Cache lifetime is bounded: The Provider instance lives only for the duration of webhook event processing.

  3. Thread-safe: Using sync.RWMutex for concurrent access.

  4. Cache key includes all parameters: org/repo/sha/recursive ensures different queries don't collide.

Files to Modify

  • pkg/provider/github/github.go
    • Add treeCache and treeCacheMutex fields to Provider struct
    • Initialize treeCache in New()
    • Add getTree() helper method
    • Update GetTektonDir() to use getTree()

Testing

  1. Run existing unit tests
  2. Run e2e tests with PAC_API_INSTRUMENTATION_DIR set
  3. Verify no duplicate get_root_tree or get_tekton_tree calls for the same SHA

Expected Impact

  • Calls saved: ~12 per full e2e test suite
  • Per-event savings: 2-4 calls for events that check both source and default_branch
  • Complexity: Low - isolated changes to the provider

Acceptance Criteria

  • Tree responses are cached by SHA
  • GetTektonDir() uses cached trees when available
  • Cache is thread-safe
  • No duplicate tree API calls for the same SHA in instrumentation output
  • All existing tests pass

Alternative Approaches Considered

Option A: Cache at GetTektonDir level (simpler but less reusable)

Cache the entire GetTektonDir() result by (revision, path) tuple. Simpler but doesn't help if tree data is needed elsewhere.

Option B: LRU cache with size limit (more complex)

Use an LRU cache to limit memory usage. Probably overkill since Provider instances are short-lived.

Recommendation: Start with the simple per-SHA cache (proposed solution). Optimize further only if needed.

/label ~enhancement ~performance ~github-provider

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions