oss: add client/logging/retry/credential-refresh related logic by D3Hunter · Pull Request #65630 · pingcap/tidb

D3Hunter · 2026-01-19T04:53:51Z

What problem does this PR solve?

Issue Number: ref #65461

Problem Summary:

What changed and how does it work?

as title, this is the prepare PR for the OSS store, will do it in later pr

Summary

Adds a new OSS-backed object store client with utilities in pkg/objstore/ossstore/client.go (permission checks, get/put/delete/list/copy, multipart upload).
Implements a concurrent-safe credential refresher in pkg/objstore/ossstore/credential.go.
Adds OSS SDK log adapter and level mapping in pkg/objstore/ossstore/logger.go.
Adds retryer configuration plus ECS metadata error detection in pkg/objstore/ossstore/retry.go.
Introduces OSS/KS3 provider constants in pkg/objstore/s3like/interface.go.
Updates storage comment to include OSS in pkg/objstore/storeapi/storage.go.

Build/Test Wiring

Bazel targets and deps in pkg/objstore/ossstore/BUILD.bazel and pkg/objstore/ossstore/mock/BUILD.bazel.
Mockgen entries in Makefile.
New tests: pkg/objstore/ossstore/client_test.go, pkg/objstore/ossstore/credential_test.go.
New mocks: pkg/objstore/ossstore/mock/api_mock.go, pkg/objstore/ossstore/mock/provider_mock.go.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

tiprow · 2026-01-19T04:58:23Z

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copilot

Pull request overview

This PR adds foundational infrastructure for Alibaba Cloud OSS (Object Storage Service) support in TiDB's object storage layer. It introduces client implementation, logging integration, retry logic, and credential refresh mechanisms, preparing the codebase for full OSS store functionality in subsequent PRs.

Changes:

Added OSS client implementation with support for basic object operations (get, put, delete, list, copy)
Implemented credential refresher for handling temporary credentials with automatic refresh
Integrated OSS SDK logging with TiDB's logging framework
Added retry logic and mock interfaces for testing

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
pkg/objstore/ossstore/client.go	Implements OSS client with object operations and multipart upload support
pkg/objstore/ossstore/client_test.go	Unit tests for client operations
pkg/objstore/ossstore/credential.go	Credential refresher for safe concurrent credential access
pkg/objstore/ossstore/credential_test.go	Tests for credential refresh logic
pkg/objstore/ossstore/interface.go	API interface definition for OSS operations
pkg/objstore/ossstore/logger.go	Logger integration between OSS SDK and TiDB logging
pkg/objstore/ossstore/retry.go	Retry configuration for OSS operations
pkg/objstore/ossstore/mock/*.go	Generated mock files for testing
pkg/objstore/ossstore/BUILD.bazel	Build configuration for ossstore package
pkg/objstore/storeapi/storage.go	Minor comment improvements mentioning OSS alongside S3 and GCS
pkg/objstore/s3like/interface.go	Added OSS and KS3 provider constants
Makefile	Added mock generation commands for OSS interfaces

pkg/objstore/ossstore/client.go

Copilot · 2026-01-19T05:04:07Z

pkg/objstore/ossstore/credential_test.go

+		require.NoError(t, err)
+		require.GreaterOrEqual(t, getAKTimeFn(cred), start)
+		require.NoError(t, refersher.startRefresh())
+		time.Sleep(time.Minute + 5*time.Second)


The test uses time.Sleep(time.Minute + 5*time.Second) which would cause tests to run for over a minute. While this is wrapped in synctest.Test which should handle simulated time, consider using a smaller time increment or ticker-based approach with controlled time advancement for faster test execution.

Suggested change

time.Sleep(time.Minute + 5*time.Second)

step := 100 * time.Millisecond

steps := int((time.Minute + 5*time.Second) / step)

for i := 0; i <= steps; i++ {

time.Sleep(step)

}

pkg/objstore/ossstore/client_test.go

pkg/objstore/ossstore/credential_test.go

pkg/objstore/ossstore/logger.go

pkg/objstore/ossstore/client.go

codecov · 2026-01-19T05:16:18Z

Codecov Report

❌ Patch coverage is 52.84281% with 282 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.7006%. Comparing base (13dd959) to head (9a5b4bc).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #65630        +/-   ##
================================================
- Coverage   77.7563%   77.7006%   -0.0558%     
================================================
  Files          1984       1913        -71     
  Lines        543041     533257      -9784     
================================================
- Hits         422249     414344      -7905     
+ Misses       119133     118897       -236     
+ Partials       1659         16      -1643

Flag	Coverage Δ
integration	`41.5366% <0.0000%> (-6.6528%)`	⬇️
unit	`76.7838% <52.8428%> (+0.3006%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.7387% <ø> (-12.3943%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

pkg/objstore/ossstore/credential.go

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Copilot · 2026-01-19T06:03:24Z

pkg/objstore/ossstore/credential_test.go

+		time.Sleep(time.Minute + 5*time.Second)
+		cred2, err := refresher.GetCredentials(ctx)
+		require.NoError(t, err)


The test uses time.Sleep with a hardcoded value of 1 minute and 5 seconds to wait for credential refresh. This makes the test unnecessarily slow and fragile. Consider using a mock time source or a much shorter refresh interval for testing purposes to make the test faster and more reliable.

Suggested change

time.Sleep(time.Minute + 5*time.Second)

cred2, err := refresher.GetCredentials(ctx)

require.NoError(t, err)

deadline := time.Now().Add(5 * time.Second)

var cred2 credentials.Credentials

for {

if time.Now().After(deadline) {

t.Fatalf("credential was not refreshed within the expected timeout")

}

c, err := refresher.GetCredentials(ctx)

require.NoError(t, err)

cred2 = c

if getAKTimeFn(cred2) >= start+time.Minute.Nanoseconds() {

break

}

time.Sleep(10 * time.Millisecond)

}

pkg/objstore/ossstore/credential.go

Copilot · 2026-01-19T06:03:25Z

pkg/objstore/ossstore/retry.go

+func (*retryer) GetRetryToken(context.Context, error) (releaseToken func(error) error, err error) {
+	return func(err error) error {
+		return nil
+	}, nil
+}
+
+func (*retryer) GetInitialToken() (releaseToken func(error) error) {
+	return func(err error) error {
+		return nil
+	}
+}


The GetRetryToken and GetInitialToken methods return no-op functions that always return nil. While this may be intentional to satisfy an interface, there's no documentation explaining why these are no-ops or what implications this has for retry behavior. Consider adding a comment explaining that these are intentionally no-ops and why.

Copilot · 2026-01-19T06:03:25Z

pkg/objstore/ossstore/client.go

+	return &multipartWriter{
+		svc:           c.svc,
+		createOutput:  resp,
+		completeParts: make([]oss.UploadPart, 0, 128),


Magic number 128 used for initial capacity of completeParts slice without explanation. Consider defining this as a named constant with documentation explaining the rationale for this initial capacity (e.g., estimated typical number of parts in a multipart upload).

Copilot · 2026-01-19T06:03:25Z

pkg/objstore/ossstore/logger.go

+		logFn(msg)
+	}


When the OSS SDK log level string doesn't match any of the expected cases (ERROR, WARNING, INFO, DEBUG), the function silently does nothing. This could hide logging issues. Consider logging an unexpected level string warning when logFn is nil after the switch statement to help debug cases where the OSS SDK changes its log level format.

Suggested change

logFn(msg)

}

logFn(msg)

return

}

// Log unexpected log level strings from OSS SDK to aid debugging.

l.logger.Warn("unexpected OSS log level", zap.String("level", levelStr), zap.String("msg", msg))

pkg/objstore/storeapi/storage.go

Copilot · 2026-01-19T06:03:26Z

pkg/objstore/ossstore/client.go

+// multipartWriter does multi-part upload.
+type multipartWriter struct {
+	svc           API
+	createOutput  *oss.InitiateMultipartUploadResult
+	completeParts []oss.UploadPart
+}
+
+// UploadPart updates partial data.
+// the size of each part except the last part must >= oss.MinPartSize, and <=
+// oss.MaxPartSize
+func (u *multipartWriter) Write(ctx context.Context, data []byte) (int, error) {
+	req := &oss.UploadPartRequest{
+		Body:          bytes.NewReader(data),
+		Bucket:        u.createOutput.Bucket,
+		Key:           u.createOutput.Key,
+		PartNumber:    int32(len(u.completeParts) + 1),
+		UploadId:      u.createOutput.UploadId,
+		ContentLength: oss.Ptr(int64(len(data))),
+	}
+
+	uploadResult, err := u.svc.UploadPart(ctx, req)
+	if err != nil {
+		return 0, errors.Trace(err)
+	}
+	u.completeParts = append(u.completeParts, oss.UploadPart{
+		ETag:       uploadResult.ETag,
+		PartNumber: req.PartNumber,
+	})
+	return len(data), nil
+}
+
+// Close completes the multi-part upload request.
+func (u *multipartWriter) Close(ctx context.Context) error {
+	req := &oss.CompleteMultipartUploadRequest{
+		Bucket:   u.createOutput.Bucket,
+		Key:      u.createOutput.Key,
+		UploadId: u.createOutput.UploadId,
+		CompleteMultipartUpload: &oss.CompleteMultipartUpload{
+			Parts: u.completeParts,
+		},
+	}
+	_, err := u.svc.CompleteMultipartUpload(ctx, req)
+	return errors.Trace(err)
+}


The multipartWriter lacks an abort mechanism for failed uploads. While the API interface includes AbortMultipartUpload, the multipartWriter struct doesn't implement a way to abort/cleanup incomplete multipart uploads when errors occur. This could lead to orphaned incomplete uploads accumulating in OSS. Consider adding an abort method or cleanup mechanism to handle error cases.

Copilot · 2026-01-19T06:03:26Z

pkg/objstore/ossstore/client.go

+func (c *client) MultipartWriter(ctx context.Context, name string) (objectio.Writer, error) {
+	key := c.ObjectKey(name)
+	input := &oss.InitiateMultipartUploadRequest{
+		Bucket: oss.Ptr(c.Bucket),
+		Key:    oss.Ptr(key),
+	}
+	if c.options.Sse != "" {
+		input.ServerSideEncryption = oss.Ptr(c.options.Sse)
+	}
+	if c.options.SseKmsKeyId != "" {
+		input.ServerSideEncryptionKeyId = oss.Ptr(c.options.SseKmsKeyId)
+	}
+	if c.options.StorageClass != "" {
+		input.StorageClass = oss.StorageClassType(c.options.StorageClass)
+	}
+
+	resp, err := c.svc.InitiateMultipartUpload(ctx, input)
+	if err != nil {
+		return nil, errors.Trace(err)
+	}
+	return &multipartWriter{
+		svc:           c.svc,
+		createOutput:  resp,
+		completeParts: make([]oss.UploadPart, 0, 128),
+	}, nil
+}
+
+func (c *client) MultipartUploader(name string, partSize int64, concurrency int) s3like.Uploader {
+	up := oss.NewUploader(c.svc, func(u *oss.UploaderOptions) {
+		u.PartSize = partSize
+		u.ParallelNum = concurrency
+	})
+	return &multipartUploader{
+		uploader:     up,
+		BucketPrefix: c.BucketPrefix,
+		key:          c.ObjectKey(name),
+	}
+}
+
+// multipartWriter does multi-part upload.
+type multipartWriter struct {
+	svc           API
+	createOutput  *oss.InitiateMultipartUploadResult
+	completeParts []oss.UploadPart
+}
+
+// UploadPart updates partial data.
+// the size of each part except the last part must >= oss.MinPartSize, and <=
+// oss.MaxPartSize
+func (u *multipartWriter) Write(ctx context.Context, data []byte) (int, error) {
+	req := &oss.UploadPartRequest{
+		Body:          bytes.NewReader(data),
+		Bucket:        u.createOutput.Bucket,
+		Key:           u.createOutput.Key,
+		PartNumber:    int32(len(u.completeParts) + 1),
+		UploadId:      u.createOutput.UploadId,
+		ContentLength: oss.Ptr(int64(len(data))),
+	}
+
+	uploadResult, err := u.svc.UploadPart(ctx, req)
+	if err != nil {
+		return 0, errors.Trace(err)
+	}
+	u.completeParts = append(u.completeParts, oss.UploadPart{
+		ETag:       uploadResult.ETag,
+		PartNumber: req.PartNumber,
+	})
+	return len(data), nil
+}
+
+// Close completes the multi-part upload request.
+func (u *multipartWriter) Close(ctx context.Context) error {
+	req := &oss.CompleteMultipartUploadRequest{
+		Bucket:   u.createOutput.Bucket,
+		Key:      u.createOutput.Key,
+		UploadId: u.createOutput.UploadId,
+		CompleteMultipartUpload: &oss.CompleteMultipartUpload{
+			Parts: u.completeParts,
+		},
+	}
+	_, err := u.svc.CompleteMultipartUpload(ctx, req)
+	return errors.Trace(err)
+}
+
+type multipartUploader struct {
+	uploader *oss.Uploader
+	storeapi.BucketPrefix
+	key string
+}
+
+func (u *multipartUploader) Upload(ctx context.Context, rd io.Reader) error {
+	upParams := &oss.PutObjectRequest{
+		Bucket: oss.Ptr(u.Bucket),
+		Key:    oss.Ptr(u.key),
+	}
+	_, err := u.uploader.UploadFrom(ctx, upParams, rd)
+	return errors.Trace(err)
+}


The multipart upload functionality (MultipartWriter and MultipartUploader methods) lacks test coverage. Given that these are critical methods for handling large file uploads, they should have comprehensive unit tests to ensure correct behavior, especially for error cases and multipart upload completion/abort scenarios.

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Copilot · 2026-01-19T06:29:07Z

pkg/objstore/ossstore/credential.go

+		for {
+			select {
+			case <-r.ctx.Done():
+				return
+			case <-time.After(refreshInterval):
+			}
+			if err := r.refreshOnce(); err != nil {
+				r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))
+			}


Using time.After in a loop can cause a memory leak because it creates a new timer on each iteration that won't be garbage collected until it fires. Consider using time.NewTicker instead, or time.NewTimer with Reset. For example, replace the select statement with a ticker pattern that can be properly stopped.

Suggested change

for {

select {

case <-r.ctx.Done():

return

case <-time.After(refreshInterval):

}

if err := r.refreshOnce(); err != nil {

r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))

}

ticker := time.NewTicker(refreshInterval)

defer ticker.Stop()

for {

select {

case <-r.ctx.Done():

return

case <-ticker.C:

if err := r.refreshOnce(); err != nil {

r.logger.Warn("failed to refresh OSS credentials", zap.Error(err))

}

}

D3Hunter · 2026-01-19T06:55:39Z

/retest

tiprow · 2026-01-19T06:56:04Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-01-19T07:06:03Z

/retest

tiprow · 2026-01-19T07:06:26Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-01-19T08:26:51Z

/retest

tiprow · 2026-01-19T08:27:17Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-01-20T01:52:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joechenrh, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Leavrth,joechenrh]
~~pkg/objstore/OWNERS~~ [Leavrth]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-01-20T01:52:12Z

[LGTM Timeline notifier]

Timeline:

2026-01-19 08:41:06.508000244 +0000 UTC m=+396894.121957090: ☑️ agreed by joechenrh.
2026-01-20 01:52:11.128060481 +0000 UTC m=+458758.742017337: ☑️ agreed by Leavrth.

D3Hunter added 2 commits January 19, 2026 12:43

oss client

ccd0735

bazel

0a8db28

Copilot AI review requested due to automatic review settings January 19, 2026 04:53

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 19, 2026

Copilot started reviewing on behalf of D3Hunter January 19, 2026 04:54 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

D3Hunter added 2 commits January 19, 2026 13:20

change

d86d010

lint

05afeeb

Copilot AI review requested due to automatic review settings January 19, 2026 05:31

Copilot started reviewing on behalf of D3Hunter January 19, 2026 05:31 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

pkg/objstore/ossstore/credential.go Outdated Show resolved Hide resolved

D3Hunter added 2 commits January 19, 2026 13:44

fix comments

b7fa2d2

change

281c618

Copilot AI review requested due to automatic review settings January 19, 2026 05:52

Copilot started reviewing on behalf of D3Hunter January 19, 2026 05:52 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

D3Hunter added 2 commits January 19, 2026 14:17

fix comments

96439ce

fix comments

6420970

Copilot AI review requested due to automatic review settings January 19, 2026 06:18

fix comments

9a5b4bc

Copilot started reviewing on behalf of D3Hunter January 19, 2026 06:19 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

joechenrh approved these changes Jan 19, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 19, 2026

Leavrth approved these changes Jan 20, 2026

View reviewed changes

ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 20, 2026

ti-chi-bot bot merged commit 85aa740 into pingcap:master Jan 20, 2026
31 checks passed

D3Hunter deleted the oss-client branch January 20, 2026 03:29

D3Hunter mentioned this pull request Jan 20, 2026

objstore: support access OSS using native SDK #65461

Closed

11 tasks

-		time.Sleep(time.Minute + 5*time.Second)
+		step := 100 * time.Millisecond
+		steps := int((time.Minute + 5*time.Second) / step)
+		for i := 0; i <= steps; i++ {
+			time.Sleep(step)
+		}

-		time.Sleep(time.Minute + 5*time.Second)
-		cred2, err := refresher.GetCredentials(ctx)
-		require.NoError(t, err)
+		deadline := time.Now().Add(5 * time.Second)
+		var cred2 credentials.Credentials
+		for {
+			if time.Now().After(deadline) {
+				t.Fatalf("credential was not refreshed within the expected timeout")
+			}
+			c, err := refresher.GetCredentials(ctx)
+			require.NoError(t, err)
+			cred2 = c
+			if getAKTimeFn(cred2) >= start+time.Minute.Nanoseconds() {
+				break
+			}
+			time.Sleep(10 * time.Millisecond)
+		}

Conversation

D3Hunter commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

tiprow bot commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

D3Hunter commented Jan 19, 2026

Uh oh!

tiprow bot commented Jan 19, 2026

Uh oh!

D3Hunter commented Jan 19, 2026

Uh oh!

tiprow bot commented Jan 19, 2026

Uh oh!

D3Hunter commented Jan 19, 2026

Uh oh!

tiprow bot commented Jan 19, 2026

Uh oh!

ti-chi-bot bot commented Jan 20, 2026

Uh oh!

ti-chi-bot bot commented Jan 20, 2026

D3Hunter commented Jan 19, 2026 •

edited

Loading

codecov bot commented Jan 19, 2026 •

edited

Loading