Skip to content

oss: add client/logging/retry/credential-refresh related logic#65630

Merged
ti-chi-bot[bot] merged 9 commits intopingcap:masterfrom
D3Hunter:oss-client
Jan 20, 2026
Merged

oss: add client/logging/retry/credential-refresh related logic#65630
ti-chi-bot[bot] merged 9 commits intopingcap:masterfrom
D3Hunter:oss-client

Conversation

@D3Hunter
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter commented Jan 19, 2026

What problem does this PR solve?

Issue Number: ref #65461

Problem Summary:

What changed and how does it work?

as title, this is the prepare PR for the OSS store, will do it in later pr

Summary

  • Adds a new OSS-backed object store client with utilities in pkg/objstore/ossstore/client.go (permission checks, get/put/delete/list/copy, multipart upload).
  • Implements a concurrent-safe credential refresher in pkg/objstore/ossstore/credential.go.
  • Adds OSS SDK log adapter and level mapping in pkg/objstore/ossstore/logger.go.
  • Adds retryer configuration plus ECS metadata error detection in pkg/objstore/ossstore/retry.go.
  • Introduces OSS/KS3 provider constants in pkg/objstore/s3like/interface.go.
  • Updates storage comment to include OSS in pkg/objstore/storeapi/storage.go.

Build/Test Wiring

  • Bazel targets and deps in pkg/objstore/ossstore/BUILD.bazel and pkg/objstore/ossstore/mock/BUILD.bazel.
  • Mockgen entries in Makefile.
  • New tests: pkg/objstore/ossstore/client_test.go, pkg/objstore/ossstore/credential_test.go.
  • New mocks: pkg/objstore/ossstore/mock/api_mock.go, pkg/objstore/ossstore/mock/provider_mock.go.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copilot AI review requested due to automatic review settings January 19, 2026 04:53
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 19, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 19, 2026

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds foundational infrastructure for Alibaba Cloud OSS (Object Storage Service) support in TiDB's object storage layer. It introduces client implementation, logging integration, retry logic, and credential refresh mechanisms, preparing the codebase for full OSS store functionality in subsequent PRs.

Changes:

  • Added OSS client implementation with support for basic object operations (get, put, delete, list, copy)
  • Implemented credential refresher for handling temporary credentials with automatic refresh
  • Integrated OSS SDK logging with TiDB's logging framework
  • Added retry logic and mock interfaces for testing

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/objstore/ossstore/client.go Implements OSS client with object operations and multipart upload support
pkg/objstore/ossstore/client_test.go Unit tests for client operations
pkg/objstore/ossstore/credential.go Credential refresher for safe concurrent credential access
pkg/objstore/ossstore/credential_test.go Tests for credential refresh logic
pkg/objstore/ossstore/interface.go API interface definition for OSS operations
pkg/objstore/ossstore/logger.go Logger integration between OSS SDK and TiDB logging
pkg/objstore/ossstore/retry.go Retry configuration for OSS operations
pkg/objstore/ossstore/mock/*.go Generated mock files for testing
pkg/objstore/ossstore/BUILD.bazel Build configuration for ossstore package
pkg/objstore/storeapi/storage.go Minor comment improvements mentioning OSS alongside S3 and GCS
pkg/objstore/s3like/interface.go Added OSS and KS3 provider constants
Makefile Added mock generation commands for OSS interfaces

require.NoError(t, err)
require.GreaterOrEqual(t, getAKTimeFn(cred), start)
require.NoError(t, refersher.startRefresh())
time.Sleep(time.Minute + 5*time.Second)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses time.Sleep(time.Minute + 5*time.Second) which would cause tests to run for over a minute. While this is wrapped in synctest.Test which should handle simulated time, consider using a smaller time increment or ticker-based approach with controlled time advancement for faster test execution.

Suggested change
time.Sleep(time.Minute + 5*time.Second)
step := 100 * time.Millisecond
steps := int((time.Minute + 5*time.Second) / step)
for i := 0; i <= steps; i++ {
time.Sleep(step)
}

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 52.84281% with 282 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.7006%. Comparing base (13dd959) to head (9a5b4bc).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65630        +/-   ##
================================================
- Coverage   77.7563%   77.7006%   -0.0558%     
================================================
  Files          1984       1913        -71     
  Lines        543041     533257      -9784     
================================================
- Hits         422249     414344      -7905     
+ Misses       119133     118897       -236     
+ Partials       1659         16      -1643     
Flag Coverage Δ
integration 41.5366% <0.0000%> (-6.6528%) ⬇️
unit 76.7838% <52.8428%> (+0.3006%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.7387% <ø> (-12.3943%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings January 19, 2026 05:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings January 19, 2026 05:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Comment on lines +57 to +59
time.Sleep(time.Minute + 5*time.Second)
cred2, err := refresher.GetCredentials(ctx)
require.NoError(t, err)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses time.Sleep with a hardcoded value of 1 minute and 5 seconds to wait for credential refresh. This makes the test unnecessarily slow and fragile. Consider using a mock time source or a much shorter refresh interval for testing purposes to make the test faster and more reliable.

Suggested change
time.Sleep(time.Minute + 5*time.Second)
cred2, err := refresher.GetCredentials(ctx)
require.NoError(t, err)
deadline := time.Now().Add(5 * time.Second)
var cred2 credentials.Credentials
for {
if time.Now().After(deadline) {
t.Fatalf("credential was not refreshed within the expected timeout")
}
c, err := refresher.GetCredentials(ctx)
require.NoError(t, err)
cred2 = c
if getAKTimeFn(cred2) >= start+time.Minute.Nanoseconds() {
break
}
time.Sleep(10 * time.Millisecond)
}

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +58
func (*retryer) GetRetryToken(context.Context, error) (releaseToken func(error) error, err error) {
return func(err error) error {
return nil
}, nil
}

func (*retryer) GetInitialToken() (releaseToken func(error) error) {
return func(err error) error {
return nil
}
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetRetryToken and GetInitialToken methods return no-op functions that always return nil. While this may be intentional to satisfy an interface, there's no documentation explaining why these are no-ops or what implications this has for retry behavior. Consider adding a comment explaining that these are intentionally no-ops and why.

Copilot uses AI. Check for mistakes.
return &multipartWriter{
svc: c.svc,
createOutput: resp,
completeParts: make([]oss.UploadPart, 0, 128),
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 128 used for initial capacity of completeParts slice without explanation. Consider defining this as a named constant with documentation explaining the rationale for this initial capacity (e.g., estimated typical number of parts in a multipart upload).

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +72
logFn(msg)
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the OSS SDK log level string doesn't match any of the expected cases (ERROR, WARNING, INFO, DEBUG), the function silently does nothing. This could hide logging issues. Consider logging an unexpected level string warning when logFn is nil after the switch statement to help debug cases where the OSS SDK changes its log level format.

Suggested change
logFn(msg)
}
logFn(msg)
return
}
// Log unexpected log level strings from OSS SDK to aid debugging.
l.logger.Warn("unexpected OSS log level", zap.String("level", levelStr), zap.String("msg", msg))

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +315
// multipartWriter does multi-part upload.
type multipartWriter struct {
svc API
createOutput *oss.InitiateMultipartUploadResult
completeParts []oss.UploadPart
}

// UploadPart updates partial data.
// the size of each part except the last part must >= oss.MinPartSize, and <=
// oss.MaxPartSize
func (u *multipartWriter) Write(ctx context.Context, data []byte) (int, error) {
req := &oss.UploadPartRequest{
Body: bytes.NewReader(data),
Bucket: u.createOutput.Bucket,
Key: u.createOutput.Key,
PartNumber: int32(len(u.completeParts) + 1),
UploadId: u.createOutput.UploadId,
ContentLength: oss.Ptr(int64(len(data))),
}

uploadResult, err := u.svc.UploadPart(ctx, req)
if err != nil {
return 0, errors.Trace(err)
}
u.completeParts = append(u.completeParts, oss.UploadPart{
ETag: uploadResult.ETag,
PartNumber: req.PartNumber,
})
return len(data), nil
}

// Close completes the multi-part upload request.
func (u *multipartWriter) Close(ctx context.Context) error {
req := &oss.CompleteMultipartUploadRequest{
Bucket: u.createOutput.Bucket,
Key: u.createOutput.Key,
UploadId: u.createOutput.UploadId,
CompleteMultipartUpload: &oss.CompleteMultipartUpload{
Parts: u.completeParts,
},
}
_, err := u.svc.CompleteMultipartUpload(ctx, req)
return errors.Trace(err)
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multipartWriter lacks an abort mechanism for failed uploads. While the API interface includes AbortMultipartUpload, the multipartWriter struct doesn't implement a way to abort/cleanup incomplete multipart uploads when errors occur. This could lead to orphaned incomplete uploads accumulating in OSS. Consider adding an abort method or cleanup mechanism to handle error cases.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +330
func (c *client) MultipartWriter(ctx context.Context, name string) (objectio.Writer, error) {
key := c.ObjectKey(name)
input := &oss.InitiateMultipartUploadRequest{
Bucket: oss.Ptr(c.Bucket),
Key: oss.Ptr(key),
}
if c.options.Sse != "" {
input.ServerSideEncryption = oss.Ptr(c.options.Sse)
}
if c.options.SseKmsKeyId != "" {
input.ServerSideEncryptionKeyId = oss.Ptr(c.options.SseKmsKeyId)
}
if c.options.StorageClass != "" {
input.StorageClass = oss.StorageClassType(c.options.StorageClass)
}

resp, err := c.svc.InitiateMultipartUpload(ctx, input)
if err != nil {
return nil, errors.Trace(err)
}
return &multipartWriter{
svc: c.svc,
createOutput: resp,
completeParts: make([]oss.UploadPart, 0, 128),
}, nil
}

func (c *client) MultipartUploader(name string, partSize int64, concurrency int) s3like.Uploader {
up := oss.NewUploader(c.svc, func(u *oss.UploaderOptions) {
u.PartSize = partSize
u.ParallelNum = concurrency
})
return &multipartUploader{
uploader: up,
BucketPrefix: c.BucketPrefix,
key: c.ObjectKey(name),
}
}

// multipartWriter does multi-part upload.
type multipartWriter struct {
svc API
createOutput *oss.InitiateMultipartUploadResult
completeParts []oss.UploadPart
}

// UploadPart updates partial data.
// the size of each part except the last part must >= oss.MinPartSize, and <=
// oss.MaxPartSize
func (u *multipartWriter) Write(ctx context.Context, data []byte) (int, error) {
req := &oss.UploadPartRequest{
Body: bytes.NewReader(data),
Bucket: u.createOutput.Bucket,
Key: u.createOutput.Key,
PartNumber: int32(len(u.completeParts) + 1),
UploadId: u.createOutput.UploadId,
ContentLength: oss.Ptr(int64(len(data))),
}

uploadResult, err := u.svc.UploadPart(ctx, req)
if err != nil {
return 0, errors.Trace(err)
}
u.completeParts = append(u.completeParts, oss.UploadPart{
ETag: uploadResult.ETag,
PartNumber: req.PartNumber,
})
return len(data), nil
}

// Close completes the multi-part upload request.
func (u *multipartWriter) Close(ctx context.Context) error {
req := &oss.CompleteMultipartUploadRequest{
Bucket: u.createOutput.Bucket,
Key: u.createOutput.Key,
UploadId: u.createOutput.UploadId,
CompleteMultipartUpload: &oss.CompleteMultipartUpload{
Parts: u.completeParts,
},
}
_, err := u.svc.CompleteMultipartUpload(ctx, req)
return errors.Trace(err)
}

type multipartUploader struct {
uploader *oss.Uploader
storeapi.BucketPrefix
key string
}

func (u *multipartUploader) Upload(ctx context.Context, rd io.Reader) error {
upParams := &oss.PutObjectRequest{
Bucket: oss.Ptr(u.Bucket),
Key: oss.Ptr(u.key),
}
_, err := u.uploader.UploadFrom(ctx, upParams, rd)
return errors.Trace(err)
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multipart upload functionality (MultipartWriter and MultipartUploader methods) lacks test coverage. Given that these are critical methods for handling large file uploads, they should have comprehensive unit tests to ensure correct behavior, especially for error cases and multipart upload completion/abort scenarios.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 19, 2026 06:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comment on lines +82 to +90
for {
select {
case <-r.ctx.Done():
return
case <-time.After(refreshInterval):
}
if err := r.refreshOnce(); err != nil {
r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using time.After in a loop can cause a memory leak because it creates a new timer on each iteration that won't be garbage collected until it fires. Consider using time.NewTicker instead, or time.NewTimer with Reset. For example, replace the select statement with a ticker pattern that can be properly stopped.

Suggested change
for {
select {
case <-r.ctx.Done():
return
case <-time.After(refreshInterval):
}
if err := r.refreshOnce(); err != nil {
r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))
}
ticker := time.NewTicker(refreshInterval)
defer ticker.Stop()
for {
select {
case <-r.ctx.Done():
return
case <-ticker.C:
if err := r.refreshOnce(); err != nil {
r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))
}
}

Copilot uses AI. Check for mistakes.
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 19, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 19, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 19, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 19, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joechenrh, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 20, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 20, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-19 08:41:06.508000244 +0000 UTC m=+396894.121957090: ☑️ agreed by joechenrh.
  • 2026-01-20 01:52:11.128060481 +0000 UTC m=+458758.742017337: ☑️ agreed by Leavrth.

@ti-chi-bot ti-chi-bot bot merged commit 85aa740 into pingcap:master Jan 20, 2026
31 checks passed
@D3Hunter D3Hunter deleted the oss-client branch January 20, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants