MDEV-39318 MTR: fix test failures when running with --ssl#4929
Open
FaramosCZ wants to merge 4 commits intoMariaDB:10.11from
Open
MDEV-39318 MTR: fix test failures when running with --ssl#4929FaramosCZ wants to merge 4 commits intoMariaDB:10.11from
FaramosCZ wants to merge 4 commits intoMariaDB:10.11from
Conversation
When MTR's '--ssl' flag is active, all client connections use TLS. When a connection is killed or rejected, the TLS layer intercepts the error before the MySQL protocol can report it: - Error 2026 (CR_SSL_CONNECTION_ERROR) replaces error 2013 (CR_SERVER_LOST) or 0 on killed connections. - Error 2002 (CR_CONNECTION_ERROR) replaces error 1040 (ER_CON_COUNT_ERROR) when max_connections is exceeded. Upstream CI does not use '--ssl', so these go unnoticed. Distros like Fedora run the test suite with '--ssl' enabled, causing spurious failures. Add the TLS error codes to each affected test's '--error' directive. This is backward-compatible (tests still pass without '--ssl') and follows the pattern already used elsewhere in the tree. The pattern was introduced in two waves, both incomplete: - f9315b3 "CC 3.1 update" (CONC-603): Connector/C 3.1 fixed TLS read/write error handling so the client no longer always returns 2013 on connection loss -- it may now correctly return 2026 or 5014. Updated ssl_timeout.test, wait_until_disconnected, kill.test, and openssl tests. - e3e7264 "MDEV-30452": OpenSSL 3.x and recent GnuTLS changed error indication when the peer omits close_notify. Updated several galera, innodb, and sys_vars tests. Both waves only fixed tests that were actively failing in their CI at the time, leaving the remaining tests unfixed. main.kill was also evaluated but excluded: it already tolerates error 5014 (write error) which covers the TLS case, so it passes with --ssl without changes. Each test was verified in the upstream BuildBot Fedora 42 container (both RelWithDebInfo and Debug builds): - without --ssl, before patch: pass (baseline) - with --ssl, before patch: fail (got 2026) - without --ssl, after patch: pass (no regression) - with --ssl, after patch: pass (fix works) Co-Authored-By: Claude AI <noreply@anthropic.com>
When MTR's '--ssl' flag is active, connection_type in performance_schema.threads shows 'SSL/TLS' instead of 'Socket', and server_audit logs include the TLS version (e.g. 'TLSv1.3') in CONNECT/DISCONNECT events. Fix with --replace_result and --replace_regex to normalize the output. Without '--ssl' the replacements are no-ops. Same patterns already in the tree: - --replace_result for TLS version normalization: main/chained_ssl_certificates.test:42,55 main/ssl_crl.test:6 - The server_audit --replace_regex already normalizes timestamps and session IDs; the TLS version pattern is appended to the same directive. Co-Authored-By: Claude AI <noreply@anthropic.com>
Under TLS, data flows through SSL_read/SSL_write instead of direct recv/send syscalls, completely bypassing the performance_schema socket I/O instrumentation hooks. Socket wait events are never recorded. Skip the tests when the session is actually encrypted, detected via a non-empty Ssl_cipher status variable. The existing not_ssl.inc checks @@have_ssl (server capability), which is always 'YES' on OpenSSL builds and would skip too aggressively. Same Ssl_cipher session-level check already in the tree: - main/ssl_timeout.test:11 - main/userstat.test:41 Co-Authored-By: Claude AI <noreply@anthropic.com>
Under TLS, Ssl_cipher_list contains the full list of supported ciphers (~2047 chars on Fedora OpenSSL 3.x), exceeding the VARCHAR(1024) VARIABLE_VALUE column in PFS and information_schema tables. In show_aggregate.inc, multi-table UPDATEs joining PFS status tables fail with ER_DATA_TOO_LONG when STRICT_TRANS_TABLES is active. Fix: temporarily relax sql_mode for PFS-joining UPDATEs. In feedback_plugin_load.test, every SELECT from information_schema.feedback triggers Warning 1265. Fix: wrap SELECTs with --disable_warnings. --disable_warnings/--enable_warnings for known benign warnings is the standard MTR idiom, used extensively throughout the test suite. Co-Authored-By: Claude AI <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When MTR runs with
--ssl(as Fedora's%checkdoes), all client connections use TLS. This causes four categories of test failure. All fixes are test-side, backward-compatible, and produce identical results without--ssl.Upstream CI doesn't use
--ssl, so these failures go unnoticed.1. TLS error codes replace MySQL protocol error codes
The connector's TLS layer intercepts server-side errors before the MySQL protocol can report them. For example, error 2026 (
CR_SSL_CONNECTION_ERROR) replaces 2013 (CR_SERVER_LOST) on killed/crashed connections, and error 2002 (CR_CONNECTION_ERROR) replaces 1040 (ER_CON_COUNT_ERROR) when max_connections is exceeded.This was partially addressed in two earlier commits (CONC-603 in 2022, MDEV-30452 in 2023), but only for tests actively failing upstream at the time. Thirteen tests remained unfixed.
This is arguably a connector bug — the connector should propagate the server's error code through the TLS layer rather than replacing it with a generic TLS error. The test-side fix (adding the TLS error codes to
--errordirectives) is needed until the connector is fixed, and becomes a harmless no-op afterward.2. Value differences in test output
connection_typeinperformance_schema.threadsshowsSSL/TLSinstead ofSocket. The server_audit plugin logs the TLS version (e.g.TLSv1.3) in CONNECT/DISCONNECT events.Fix:
--replace_resultand--replace_regexto normalize the output. Without--sslthe replacements are no-ops.3. Missing PFS socket instrumentation
Under TLS, data flows through SSL_read/SSL_write instead of direct recv/send syscalls, completely bypassing the performance_schema socket I/O instrumentation hooks. Tests that verify socket wait events or byte counters get zero values.
Fix: detect active TLS session via
Ssl_cipherstatus variable and skip the affected tests. The existingnot_ssl.incchecks@@have_ssl(server capability), which is alwaysYESon OpenSSL builds and would skip too aggressively.4. Ssl_cipher_list truncation
Ssl_cipher_listcontains the full list of supported ciphers (~2047 chars on Fedora's OpenSSL 3.x), exceeding the VARCHAR(1024) VARIABLE_VALUE column in PFS and information_schema tables. This causes ER_DATA_TOO_LONG in multi-table UPDATEs with STRICT_TRANS_TABLES, and Warning 1265 in information_schema SELECTs.Fix: temporarily relax sql_mode for PFS-joining UPDATEs; wrap information_schema SELECTs with
--disable_warnings.Each test was verified in the upstream BuildBot Fedora 42 container (both RelWithDebInfo and Debug builds):
--ssl, before patch: pass (baseline)--ssl, before patch: fail--ssl, after patch: pass (no regression)--ssl, after patch: pass (fix works)