Background
The arm64 assembler supports the LSE compare-and-swap-pair instructions only in their relaxed form: CASPD and CASPW. The ARMv8.1 ISA defines four ordering variants of CASP — plain (relaxed), CASPA (acquire), CASPL (release), and CASPAL (acquire-release) — and for the single-register CAS family the assembler already supports the full set (CASD, CASAD, CASLD, CASALD, plus B/H/W sizes). The pair instructions are the only members of the CAS family limited to the relaxed variant.
Motivation
128-bit CAS is the natural primitive for lock-free data structures that need a pointer plus an ABA counter. There was some previous talk about adding native 128-bit CAS support to sync/atomic (#61236), but that has been progressing slowly.
While waiting for proper 128-bit CAS support to land in sync/atomics, the Vitess project has implemented a lock-free Treiber stack for our optimized connection pool implementation, hand-rolling 128-bit CAS in assembly. Unfortunately, that implementation used the CASPD mnemonic that's available in arm64 assembly - not realizing that this instruction gives relaxed memory ordering - making it's behavior deviate from Go's own atomic operations, which are always sequentially consistent.
This caused a subtle race condition, which once found, was "easily" fixed by switching from CASPD to a WORD literal that encodes the correct instruction and registers.
Proposed change
I want to propose adding the six missing mnemonics, following the existing naming convention: CASPAD, CASPAW, CASPLD, CASPLW, CASPALD, CASPALW.
The implementation should be small because we can re-use the existing CASPD plumbing.
I'm happy to send a CL.
Background
The arm64 assembler supports the LSE compare-and-swap-pair instructions only in their relaxed form:
CASPDandCASPW. The ARMv8.1 ISA defines four ordering variants of CASP — plain (relaxed),CASPA(acquire),CASPL(release), andCASPAL(acquire-release) — and for the single-register CAS family the assembler already supports the full set (CASD,CASAD,CASLD,CASALD, plus B/H/W sizes). The pair instructions are the only members of the CAS family limited to the relaxed variant.Motivation
128-bit CAS is the natural primitive for lock-free data structures that need a pointer plus an ABA counter. There was some previous talk about adding native 128-bit CAS support to
sync/atomic(#61236), but that has been progressing slowly.While waiting for proper 128-bit CAS support to land in
sync/atomics, the Vitess project has implemented a lock-free Treiber stack for our optimized connection pool implementation, hand-rolling 128-bit CAS in assembly. Unfortunately, that implementation used theCASPDmnemonic that's available in arm64 assembly - not realizing that this instruction gives relaxed memory ordering - making it's behavior deviate from Go's own atomic operations, which are always sequentially consistent.This caused a subtle race condition, which once found, was "easily" fixed by switching from
CASPDto aWORDliteral that encodes the correct instruction and registers.Proposed change
I want to propose adding the six missing mnemonics, following the existing naming convention:
CASPAD,CASPAW,CASPLD,CASPLW,CASPALD,CASPALW.The implementation should be small because we can re-use the existing
CASPDplumbing.I'm happy to send a CL.