Skip to content

MDEV-39263 innodb_snapshot_isolation fails to prevent lost updates under contention#4936

Open
dr-m wants to merge 1 commit into10.6from
MDEV-39263
Open

MDEV-39263 innodb_snapshot_isolation fails to prevent lost updates under contention#4936
dr-m wants to merge 1 commit into10.6from
MDEV-39263

Conversation

@dr-m
Copy link
Copy Markdown
Contributor

@dr-m dr-m commented Apr 14, 2026

  • The Jira issue number for this PR is: MDEV-39263

Description

An UPDATE could fail to return the error ER_CHECKREAD when innodb_snapshot_isolation=ON.

Thanks to @vadimtk of Percona for reporting this.

lock_clust_rec_read_check_and_lock(): Recheck the read view if trx_sys.is_registered() indicates that the lock owner is active.

How can this PR be tested?

mysql-test/mtr --parallel=auto --repeat=1000 innodb.snapshot_isolation_race{,,,}{,,,,,}

The failure of this test case, as simplified by me, was analyzed with https://rr-project.org and an additional patch that allows the race conditions to occur during the rr record tracing. Most likely, the patch to trx_t::commit_persist() is the essential one:

diff --git a/storage/innobase/btr/btr0pcur.cc b/storage/innobase/btr/btr0pcur.cc index c7fcfd205fb..b549b6e12df 100644
--- a/storage/innobase/btr/btr0pcur.cc
+++ b/storage/innobase/btr/btr0pcur.cc
@@ -317,7 +317,7 @@ btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr)
 	page_cur_mode_t	mode;
 	page_cur_mode_t	old_mode;
 	mem_heap_t*	heap;
-
+	std::this_thread::yield(); ut_ad(mtr->is_active()); ut_ad(pos_state == BTR_PCUR_WAS_POSITIONED || pos_state == BTR_PCUR_IS_POSITIONED); diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc index 5d854ebd5e5..d0c442a0f1f 100644
--- a/storage/innobase/trx/trx0rec.cc
+++ b/storage/innobase/trx/trx0rec.cc
@@ -1841,7 +1841,7 @@ trx_undo_report_row_operation(
 #ifdef UNIV_DEBUG
 	int		loop_count	= 0;
 #endif /* UNIV_DEBUG */
-
+	std::this_thread::yield(); ut_a(dict_index_is_clust(index)); ut_ad(!update || rec); ut_ad(!rec || rec_offs_validate(rec, index, offsets)); diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index e2adc085c53..adc807e5943 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1578,7 +1578,7 @@ void trx_t::commit_persist() noexcept
 {
   mtr_t *mtr= nullptr;
   mtr_t local_mtr;
-
+  std::this_thread::yield(); if (has_logged_persistent()) { mtr= &local_mtr;

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

@dr-m dr-m requested review from Thirunarayanan and temeo April 14, 2026 06:49
@dr-m dr-m self-assigned this Apr 14, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines +6349 to +6352
if (UNIV_UNLIKELY(trx_id != 0)
&& !trx_sys.is_registered(trx, trx_id)) {
err = DB_RECORD_CHANGED;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that we will need an additional condition err <= DB_SUCCESS_LOCKED_REC here. Otherwise, the mtr test in MDEV-39264 would crash a CMAKE_BUILD_TYPE=RelWithDebInfo build on shutdown:

2026-04-14 10:01:23 0 [Note] InnoDB: Buffer pool(s) dump completed at 260414 10:01:23
2026-04-14 10:01:23 0x7fc6340517c0  InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/trx/trx0trx.cc line 219
InnoDB: Failing assertion: trx->lock.wait_lock == NULL

I will revise this shortly.

…der contention

lock_clust_rec_read_check_and_lock(): Recheck the read view if
trx_sys.is_registered() indicates that the lock owner is active.

Thanks to Vadim Tkachenko of Percona for reporting this.
His test case was simplified by me and the root cause analyzed
with https://rr-project.org and an additional patch that allows
the race condition to be repeated under "rr record". Most likely,
the patch to trx_t::commit_persist() is the essential one:

diff --git a/storage/innobase/btr/btr0pcur.cc b/storage/innobase/btr/btr0pcur.cc
index c7fcfd2..b549b6e12df 100644
--- a/storage/innobase/btr/btr0pcur.cc
+++ b/storage/innobase/btr/btr0pcur.cc
@@ -317,7 +317,7 @@ btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr)
 	page_cur_mode_t	mode;
 	page_cur_mode_t	old_mode;
 	mem_heap_t*	heap;
-
+	std::this_thread::yield();
 	ut_ad(mtr->is_active());
 	ut_ad(pos_state == BTR_PCUR_WAS_POSITIONED
 	      || pos_state == BTR_PCUR_IS_POSITIONED);
diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc
index 5d854eb..d0c442a0f1f 100644
--- a/storage/innobase/trx/trx0rec.cc
+++ b/storage/innobase/trx/trx0rec.cc
@@ -1841,7 +1841,7 @@ trx_undo_report_row_operation(
 #ifdef UNIV_DEBUG
 	int		loop_count	= 0;
 #endif /* UNIV_DEBUG */
-
+	std::this_thread::yield();
 	ut_a(dict_index_is_clust(index));
 	ut_ad(!update || rec);
 	ut_ad(!rec || rec_offs_validate(rec, index, offsets));
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index e2adc08..adc807e5943 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1578,7 +1578,7 @@ void trx_t::commit_persist() noexcept
 {
   mtr_t *mtr= nullptr;
   mtr_t local_mtr;
-
+  std::this_thread::yield();
   if (has_logged_persistent())
   {
     mtr= &local_mtr;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants