Skip to content

oss: detect ECS region and use internal endpoint automaticaly#65687

Merged
ti-chi-bot[bot] merged 8 commits intopingcap:masterfrom
D3Hunter:oss-autodetect-use-inner-endpoint
Jan 21, 2026
Merged

oss: detect ECS region and use internal endpoint automaticaly#65687
ti-chi-bot[bot] merged 8 commits intopingcap:masterfrom
D3Hunter:oss-autodetect-use-inner-endpoint

Conversation

@D3Hunter
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: ref #65461

Problem Summary:

What changed and how does it work?

unlike s3 which is free for traffic when enable gateway endpoint on the console, it's transparent to the client, OSS must explicitly use its internal endpoint on the client side to avoid charged of traffic

this pr detect the ECS region when possible, and use internal endpoint automatically

also move and refactor the API to make http requests inside br and pkg/util

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

when run TestInternalEndpoint locally, public endpoint is used

[2026/01/20 18:17:49.283 +08:00] [INFO] [store.go:190] ["succeed to get bucket region"] ... [context=oss] [bucketRegion=ap-southeast-1] [useInternalEndpoint=false]
[2026/01/20 18:17:51.765 +08:00] [WARN] [retry.go:71] ["failed to request s3, checking whether we can retry"] [error="Error returned by Service. \nHttp Status Code: 404. \nError Code: NoSuchKey. \nRequest Id: 696F564F6B4B133833276D50. \nMessage: The specified key does not exist..\nEC: 0026-00000001.\nTimestamp: 2026-01-20 10:17:51 +0000 UTC.\nRequest Endpoint: GET https://bucket-name.oss-ap-southeast-1.aliyuncs.com/test-prefix/perm-check/2df67aa9-4642-4bf5-82ab-0f1e8e77fca6."] [retry=false]

run on aliyun ECS which has bind a RAM Role, internal endpoint is used.

[2026/01/20 19:23:56.643 +08:00] [INFO] [store.go:190] ["succeed to get bucket region"] ... [context=oss] [bucketRegion=ap-southeast-1] [useInternalEndpoint=true]
[2026/01/20 19:23:56.716 +08:00] [WARN] [retry.go:71] ["failed to request s3, checking whether we can retry"] [error="Error returned by Service. \nHttp Status Code: 404. \nError Code: NoSuchKey. \nRequest Id: 696F65CCD53C6E34307465FB. \nMessage: The specified key does not exist..\nEC: 0026-00000001.\nTimestamp: 2026-01-20 11:23:56 +0000 UTC.\nRequest Endpoint: GET https://bucket-name.oss-ap-southeast-1-internal.aliyuncs.com/test-prefix/perm-check/97834a4c-8272-4009-a764-b7146959c72e."] [retry=false]
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copilot AI review requested due to automatic review settings January 21, 2026 03:13
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 21, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 21, 2026

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to pkg/util/httputil/

// return errors.Trace(err)
// }
// fmt.Println(resp.IP)
func GetJSON(ctx context.Context, client *http.Client, url string, v any) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 such method, they are merged into one in pkg/util/httputil

require.False(t, common.IsDirExists("not-exists"))
}

func TestGetJSON(t *testing.T) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to pkg/util/httputil

// fmt.Println(resp.IP)
//
// nolint:unused
func GetJSON(client *http.Client, url string, v any) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic detection of ECS region and uses internal OSS endpoints to avoid traffic charges when running on Aliyun ECS instances. It also refactors HTTP utility code by moving it from br/pkg/httputil to pkg/util/httputil for better code organization and reusability.

Changes:

  • Adds automatic ECS region detection using metadata service when using ECS RAM role credentials
  • Implements logic to use internal OSS endpoints when ECS and bucket regions match
  • Moves HTTP utility functions (NewClient, GetJSON) from br/pkg/httputil to pkg/util/httputil
  • Adds GetText function to httputil package for fetching plain text responses
  • Updates all import paths across the codebase to use the new httputil location

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/objstore/ossstore/store.go Adds ECS region detection and internal endpoint logic; refactors logger initialization
pkg/objstore/ossstore/logger.go Refactors newLogPrinter to accept logger directly with improved caller skip logic
pkg/objstore/ossstore/store_test.go Adds tests for internal endpoint detection logic
pkg/objstore/ossstore/BUILD.bazel Updates dependencies and shard count
pkg/util/httputil/http.go New package with HTTP utility functions moved from br/pkg/httputil
pkg/util/httputil/http_test.go Tests for HTTP utility functions
pkg/util/httputil/BUILD.bazel Build configuration for new httputil package
pkg/util/util.go Removes GetJSON function (moved to httputil)
pkg/lightning/common/util.go Removes GetJSON function (moved to httputil)
pkg/lightning/common/util_test.go Removes GetJSON test (moved to httputil)
pkg/lightning/common/security.go Updates import path to use new httputil location
pkg/lightning/common/BUILD.bazel Updates dependencies and shard count
br/pkg/httputil/http.go Deleted - functionality moved to pkg/util/httputil
br/pkg/httputil/BUILD.bazel Deleted - package moved
br/pkg/task/*.go Updates import paths to use new httputil location
br/pkg/stream/*.go Updates import paths to use new httputil location
br/pkg/pdutil/*.go Updates import paths to use new httputil location
br/tests/br_key_locked/*.go Updates import paths to use new httputil location
Comments suppressed due to low confidence (1)

pkg/objstore/ossstore/store.go:112

  • This TODO comment is now outdated since this PR implements the automatic internal endpoint detection. The TODO should be removed or updated to reflect that the feature has been implemented.
	// TODO OSS charges for traffic, consider auto use internal endpoint when
	// not specified explicitly and the bucket is in the same region with the
	// client.

Comment on lines +52 to +53
func GetText(client *http.Client, url string) (string, error) {
body, err := doGet(context.Background(), client, url)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetText function doesn't accept a context parameter, unlike GetJSON. This is inconsistent and prevents callers from controlling timeouts or cancellation for GetText operations. Consider adding a context parameter for consistency and to allow proper timeout/cancellation control.

Suggested change
func GetText(client *http.Client, url string) (string, error) {
body, err := doGet(context.Background(), client, url)
func GetText(ctx context.Context, client *http.Client, url string) (string, error) {
body, err := doGet(ctx, client, url)

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +84
func TestGetText(t *testing.T) {
// Mock success response
handle := func(res http.ResponseWriter, _ *http.Request) {
res.WriteHeader(http.StatusOK)
_, err := res.Write([]byte("test-content"))
require.NoError(t, err)
}
testServer := httptest.NewServer(http.HandlerFunc(func(res http.ResponseWriter, req *http.Request) {
handle(res, req)
}))
defer testServer.Close()

client := &http.Client{}
text, err := GetText(client, testServer.URL+"/test")
require.NoError(t, err)
require.Equal(t, "test-content", text)
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestGetText test only covers the success case. It should also test error cases similar to TestGetJSON, such as connection errors and non-200 status codes, to ensure comprehensive test coverage.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that part of code is common, GetJSON can cover it

Copilot AI review requested due to automatic review settings January 21, 2026 03:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pkg/objstore/ossstore/store.go:112

  • This TODO comment should be removed or updated since the feature it describes is now implemented in this PR. The code below already detects the ECS region and automatically uses the internal endpoint when the bucket is in the same region as the ECS instance.
	// TODO OSS charges for traffic, consider auto use internal endpoint when
	// not specified explicitly and the bucket is in the same region with the
	// client.

Comment on lines +64 to +65
func GetText(client *http.Client, url string) (string, error) {
body, err := doGet(context.Background(), client, url)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetText function should accept a context parameter for proper cancellation support. The function currently uses context.Background() internally, which prevents callers from being able to cancel or timeout the HTTP request. This is inconsistent with GetJSON which accepts a context parameter. Consider changing the signature to accept a context parameter similar to GetJSON.

Suggested change
func GetText(client *http.Client, url string) (string, error) {
body, err := doGet(context.Background(), client, url)
func GetText(ctx context.Context, client *http.Client, url string) (string, error) {
body, err := doGet(ctx, client, url)

Copilot uses AI. Check for mistakes.
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 21, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 34.06593% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.5799%. Comparing base (e9c1bbb) to head (6b17662).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65687        +/-   ##
================================================
+ Coverage   77.7971%   79.5799%   +1.7827%     
================================================
  Files          1992       1942        -50     
  Lines        544075     531894     -12181     
================================================
+ Hits         423275     423281         +6     
+ Misses       119141     107153     -11988     
+ Partials       1659       1460       -199     
Flag Coverage Δ
integration 47.8390% <1.1111%> (-0.3483%) ⬇️
unit 76.7231% <34.0659%> (+0.3041%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 66.4125% <ø> (+5.3850%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 21, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 21, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 21, 2026
@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 21, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 21, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-21 03:45:15.136714197 +0000 UTC m=+551942.750671052: ☑️ agreed by Leavrth.
  • 2026-01-21 06:07:30.777048319 +0000 UTC m=+560478.391005175: ☑️ agreed by joechenrh.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/approve

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, joechenrh, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jan 21, 2026
@ti-chi-bot ti-chi-bot bot merged commit 93c65e2 into pingcap:master Jan 21, 2026
44 of 47 checks passed
@D3Hunter D3Hunter deleted the oss-autodetect-use-inner-endpoint branch January 21, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants