- Feature Name:
rustc_scalable_vector - Start Date: 2025-07-07
- RFC PR: rust-lang/rfcs#3838
- Rust Issue: rust-lang/rust#0000
Introduces a new attribute, #[rustc_scalable_vector(N)], which can be used to
define new scalable vector types, such as those in Arm's Scalable Vector
Extension (SVE), or RISC-V's Vector Extension (RVV).
rustc_scalable_vector(N) is internal compiler infrastructure that will be used
only in the standard library to introduce scalable vector types which can then
be stabilised. Only the infrastructure to define these types are introduced in
this RFC, not the types or intrinsics that use it.
This RFC depends on rfcs#3729: Hierarchy of Sized traits.
SVE is used in examples throughout this RFC, but the proposed features should be sufficient to enable support for similar extensions in other architectures, such as RISC-V's V Extension.
SIMD types and instructions are a crucial element of high-performance Rust applications and allow for operating on multiple values in a single instruction. Many processors have SIMD registers of a known fixed length and provide intrinsics which operate on these registers. For example, Arm's Neon extension is well-supported by Rust and provides 128-bit registers and a wide range of intrinsics.
Instead of releasing more extensions with ever increasing register bit widths, AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has a Vector Extension (RVV). These extensions have vector registers whose width depends on the CPU implementation and bit-width-agnostic intrinsics for operating on these registers. By using scalable vectors, code won't need to be re-written using new architecture extensions with larger registers, new types and intrinsics, but instead will work on newer processors with different vector register lengths and performance characteristics.
Scalable vectors have interesting and challenging implications for Rust, introducing value types with sizes that can only be known at runtime, requiring significant changes to the language's notion of sizedness - this support is being proposed in the rfcs#3729.
Hardware is generally available with SVE, and key Rust stakeholders want to be able to use these architecture features from Rust. In a recent discussion on SVE, Amanieu, co-lead of the library team, said:
I've talked with several people in Google, Huawei and Microsoft, all of whom have expressed a rather urgent desire for the ability to use SVE intrinsics in Rust code, especially now that SVE hardware is generally available.
Without support in the compiler, leveraging the Hierarchy of Sized traits proposal, it is not possible to introduce intrinsics and types exposing the scalable vector support in hardware.
None of the infrastructure proposed in this RFC is intended to be used directly by Rust users.
rustc_scalable_vector as described later in
Reference-level explanation is perma-unstable
and exists only to enable scalable vector types to be defined in the standard
library. The specific vector types are intended to eventually be stabilised, but
none are proposed in this RFC.
From a user's perspective, writing code for scalable vectors isn't too different from when writing code with a fixed sized vector. To illustrate how the types and intrinsics that this infrastructure will enable could be used, consider the following example that sums two input vectors:
use std::arch::aarch64::{
// These intrinsics and types are not proposed by this RFC
svcntw, svwhilelt_b32, svld1_f32, svadd_f32_m, svst1_f32
};
fn sve_add(in_a: Vec<f32>, in_b: Vec<f32>, out_c: &mut Vec<f32>) {
assert_eq!(in_a.len(), in_b.len());
assert_eq!(in_a.len(), out_c.len());
let len = in_a.len();
unsafe {
// `svcntw` returns the actual number of elements that are in a 32-bit
// element vector
let step = svcntw() as usize;
for i in (0..len).step_by(step) {
let a = in_a.as_ptr().add(i);
let b = in_b.as_ptr().add(i);
let c = out_c as *mut f32;
let c = c.add(i);
// `svwhilelt_b32` generates a predicate vector that deals with
// the tail of the iteration - it enables the operations which
// follow for the first `len` elements overall, but disables
// the last `len % step` elements in the last iteration
let pred = svwhilelt_b32(i as _, len as _);
// `svld1_f32` loads a vector register with the data from address
// `a`, zeroing any elements in the vector that are masked out
//
// Does not access memory for inactive elements
let sva = svld1_f32(pred, a);
let svb = svld1_f32(pred, b);
// `svadd_f32_m` adds `a` and `b`, any lanes that are masked out will
// take the keep value of `a`
let svc = svadd_f32_m(pred, sva, svb);
// `svst1_f32` will store the result without accessing any memory
// locations that are masked out
svst1_f32(svc, pred, c);
}
}
}Scalable vectors are similar to fixed length vectors already supported by Rust, enabling operations to be performed on multiple values at once in a single instruction. Unlike fixed length vectors, the length of scalable vectors is not fixed, and intrinsics which operate on scalable vectors are length-agnostic.
Scalable vectors types supported by the rustc_scalable_vector(N) attribute can
be thought of as having the form vscale × N × ty, where vscale is a single,
global, fixed-for-the-runtime-of-the-program "scaling factor" of the CPU.
Vector registers are of the length vscale × vunit, with vunit being an
architecture-specific value:
- ARM SVE:
vunitis the minimum length of the vector register in the architecture - 128 bits. - RISC-V V:
vunitis the least common multiple of the supported element widths - 64 bits.
Tip
While the vscale terminology is borrowed from LLVM, vunit is invented for
the purposes of aiding this explanation.
N in rustc_scalable_vector(N) defines the value of N in a scalable vector
type. Any value of N is accepted by the attribute and it is the responsibility
of whomever is defining the type to provide a valid value. A correct value for
N depends on the purpose of the specific scalable vector type and the
architecture. See
Manually-chosen or compiler-calculated element count
for rationale.
In the simplest case, a scalable vector register could be depicted as follows:
◁────── vscale x vunit ──────▷ ◁─── vunit ───▷
◁────── vscale x ty x N ─────▷ ◁─── ty x N ──▷
┌──────────────────────────────┬───────────────┐
│ ... │ ty │ ty │ ... │ ← a vector register
└──────────────────────────────┴───────────────┘
Scalable vector types contain a single field which is used to determine ty:
#[rustc_scalable_vector(4)]
pub struct svfloat32_t(f32);In the example above, svfloat32_t is a scalable vector with a minimum of four
f32 elements when vscale = 1 and more when vscale > 1. svfloat32_t could
be depicted as..
◁───────────── vscale x f32 x 4 ─────────────▷ ◁────── f32 x 4 ──────▷
┌──────────────────────────────────────────────┬───────────────────────┐
│ ... │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└──────────────────────────────────────────────┴───────────────────────┘
..and when running on hardware with vscale=2..
◁──── 2 x f32 x 4 ────▷ ◁────── f32 x 4 ──────▷
┌───────────────────────┬───────────────────────┐
│ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└───────────────────────┴───────────────────────┘
..or vscale=3:
◁──────────────── 3 x f32 x 4 ────────────────▷ ◁────── f32 x 4 ──────▷
┌───────────────────────┬───────────────────────┬───────────────────────┐
│ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└───────────────────────┴───────────────────────┴───────────────────────┘
The type marker field is solely used to get the element type for the codegen backend. It must one of the following types:
u8,u16,u32oru64i8,i16,i32ori64f16,f32orf64bool
It is not permitted to project into scalable vector types and access the type marker field.
Structs of scalable vectors are supported, but every element of the struct must
have the same scalable vector type. This will enable definition of "tuple of
vector" types, such as svfloat32x2_t below, that are used in some load and
store intrinsics.
#[rustc_scalable_vector]
pub struct svfloat32x2_t(svfloat32_t, svfloat32_t);◁───────────── vscale x f32 x 4 ─────────────▷ ◁────── f32 x 4 ──────▷
┌──────────────────────────────────────────────┬───────────────────────┐ ┐
│ ... │ f32 │ f32 │ f32 │ f32 │ │
└──────────────────────────────────────────────┴───────────────────────┘ ├─ svfloat32x2_t
┌──────────────────────────────────────────────┬───────────────────────┐ │
│ ... │ f32 │ f32 │ f32 │ f32 │ │
└──────────────────────────────────────────────┴───────────────────────┘ ┘
Structs must be still be annotated with #[rustc_scalable_vector], so end-users
cannot define their own structs of scalable vectors. It is not permitted to
project into structs and access the individual vectors.
Scalable vectors are necessarily non-const Sized (from rfcs#3729) as they
behave like value types but the exact size cannot be known at compilation time.
rfcs#3729 allows these types to implement Clone (and consequently Copy) as
Clone only requires an implementation of Sized, irrespective of constness.
Scalable vector types have some further restrictions due to limitations of the codegen backend:
-
Can only be in the signature of a function if it is annotated with the appropriate target feature (see ABI)
-
Cannot be stored in compound types (structs, enums, etc)
-
Including coroutines, so these types cannot be held across an await boundary in async functions
-
repr(transparent)newtypes could be permitted with scalable vectors -
Exception: Scalable vectors can be stored in arrays
-
Exception: Scalable vectors can be stored in structs with every element of the same type (but only if that struct is annotated with
#[rustc_scalable_vector])
-
-
Cannot be the type of a static variable
-
Cannot be instantiated into generic functions (see Target features)
-
Cannot have trait implementations (see Target features)
- Including blanket implementations (i.e.
impl<T> Foo for Tis not a valid candidate for a scalable vector)
- Including blanket implementations (i.e.
Some of these limitations may be able to be lifted in future depending on what is supported by rustc's codegen backends or with evolution of the language.
Rust currently always passes SIMD vectors on the stack to avoid ABI mismatches
between functions annotated with target_feature - where the relevant vector
register is guaranteed to be present - and those without - where the relevant
vector register might not be present.
However, this approach will not work for scalable vector types as the relevant target feature must to be present to use the instruction that can allocate the correct size on the stack for the scalable vector.
Therefore, there is an additional restriction that these types cannot be used in the argument or return types of functions unless those functions are annotated with the relevant target feature.
Any such functions would make a trait containing them dyn-incompatible.
It is permitted to create pointers to function that have scalable vector types
in their arguments or return types, even though function pointers themselves
cannot be annotated as having the target feature. When the function pointer was
created, the user must have made the unsafe promise that it was okay to call the
#[target_feature]-annotated function, so it is sound to permit function
pointers.
As scalable vectors will always be passed as immediates, they will therefore have the same ABI as in C, so should be considered FFI-safe.
Similarly to the challenges with the ABI of scalable vectors, without the relevant target features, few operations can actually be performed on scalable vectors - causing issues for the use of scalable vectors in generic code and with traits implementations.
For example, implementations of traits like Clone would not be able to
actually perform a clone, and generic functions that are instantiated with
scalable vectors would during instruction selection in the codegen backend.
Without a mechanism for a generic function to be able to inherit target features from its instantiated types or for trait methods to have target features, it is not possible for these types to be used with generic functions or traits.
See Trait implementations and generic instantiation.
It is possible to change the vector length at runtime using a
prctl() call to the Linux kernel, or via similar mechanisms in other
operating systems.
Doing so would require that vscale change, which Rust will not supported.
prctl or similar must only be used to set up the vector length for child
processes, not to change the vector length of the current process. As Rust
cannot prevent users from doing this, it will be documented as undefined
behaviour, consistent with C and C++.
Implementing rustc_scalable_vector largely involves lowering scalable vectors
to the appropriate type in the codegen backend. LLVM has robust support for
scalable vectors and is the default backend, so this section will focus on
implementation in the LLVM codegen backend. Other backends should be able to
support scalable vectors in Rust once they support scalable vectors in general.
Most of the complexity of scalable vectors are handled by LLVM: lowering Rust's
scalable vectors to the correct type in LLVM and the vscale modifier that is
applied to LLVM's vector types.
LLVM's scalable vector type is of the form <vscale × element_count × type>.
vscale is the scaling factor determined by the hardware at runtime, it can be
any value providing it gives a legal vector register size for the architecture.
For example, a <vscale × 4 × f32> is a scalable vector with a minimum of four
f32 elements and with SVE, vscale could then be any power of two which would
result in register sizes of 128, 256, 512, 1024 or 2048 and 4, 8, 16, 32, or 64
f32 elements respectively.
The N in the #[rustc_scalable_vector(N)] determines the element_count used
in the LLVM type for a scalable vector.
Structs of vectors are lowered to LLVM as struct types containing scalable vector types. This is supported since the Permit load/store/alloca for struct of the same scalable vector type LLVM RFC.
Arrays of vectors are lowered to LLVM as array types containing scalable vector types. Arrays of vectors are also supported by LLVM since the Enable arrays of scalable vector types LLVM RFC.
Tuples in RISC-V's V Extension lower to target-specific types in LLVM rather
than generic scalable vector types, so rustc_scalable_vector will not
initially support RVV tuples (see
RISC-V Vector Extension's tuple types).
rustc_scalable_vector(N)is inherently additional complexity to the language, despite being largely hidden from users.
Without support for scalable vectors in the language and compiler, it is not possible to leverage hardware with scalable vectors from Rust. As extensions with scalable vectors are available in architectures as the recommended way to do SIMD, lack of support in Rust would limit Rust's suitability on these architectures compared to other systems programming languages.
rustc_scalable_vector is preferred over a repr(scalable) attribute as there
is existing dissatisfaction with fixed-length vectors being defined using the
repr(simd) attribute (rust#63633).
By aligning with the approach taken by C (discussed in the Prior art below), most of the documentation that already exists for scalable vector intrinsics in C should still be applicable to Rust.
rustc_scalable_vector(N) expects N to be provided rather than calculating
it. Calculating N would make this attribute more robust and decrease the
likelihood of it being used incorrectly - even for permanently unstable internal
attributes like rustc_scalable_vector, this would be worthwhile if feasible.
In the simplest case, calculating N is a simple division: vunit (as
previously above) divided by the element_size. For example, with ARM SVE,
vunit=128 so with an f32 element, N = 128/32 = 4; and with RISC-V RVV,
vunit=64 so with an f32 element, N = 64/32 = 2 (assuming LMUL=1, but
more on that later).
There are more complicated scalable vector definitions than those presented in
the Reference-level explanation, which
rustc_scalable_vector can support, but that would require more complicated
calculations for N or architecture-specific knowledge:
-
With Arm SVE, each intrinsic that takes a predicate takes a
svbool_t(vscale x i1 x 16).svbool_tcould have itsNcalculated as above with simple division.svbool_tis used even when the data arguments have fewer elements, e.g. ansvfloat32_t(vscale x f32 x 4).svbool_thas predicates for sixteen lanes but there are only four lanes in thesvfloat32_targuments to enable or disable. This is slightly unintuitive but matches the definitions of the intrinsics in the Arm ACLE.Within the definition of those intrinsics, the
svbool_tis cast to a privatesvboolN_ttype which has a number of lanes matching the data argument (e.g. asvbool4_t/vscale x i1 x 4forsvfloat32_t/vscale x f32 x 4).├──────── vscale x i32 x 4 ───────┤ ├──────────── i32 x 4 ────────────┤ ┌───────────────────────────────────┬───────────────────────────────────┐ │ ... │ 0x0000 │ 0x0000 │ 0x0000 │ 0x0000 │ └───────────────────────────────────┴───────────────────────────────────┘ △ △ △ △ △ │ ┌───────────┘ │ │ │ │ │ ┌──────────────┘ │ │ │ │ │ ┌─────────────────┘ │ ┌─────┘ │ │ │ ┌─────────────────────┘ ┌───────────────────────┬───────────────────────┐ │ ... │ 0x0 │ 0x0 │ 0x0 │ 0x0 │ + unused space for 12x `i1`s └───────────────────────┴───────────────────────┘ ├── vscale x i1 x 4 ──┤ ├─────── i1 x 4 ──────┤Defining a
svboolN_tis more complicated than trivial division, requiring the attribute accept either arbitrary specification ofNor a type to calculateNwith, for example:// alternative: user-provided arbitrary `N` #[rustc_scalable_vector(4)] struct svbool4_t(bool); // alternative: add `predicate_of` to attribute #[rustc_scalable_vector(predicate_of = "u32")] struct svbool4_t(bool); // alternative: use another field to separate element type and size to use for `N` #[rustc_scalable_vector] struct svbool4_t(bool, u32);
-
Similarly, with Arm SVE, the sign extending intrinsics will internally use LLVM intrinsics which return vectors with fewer elements than
vunit / element_size(similar tosvboolN_tbut for other types).For example, the
svldnt1sb_gather_s64offset_s64intrinsic wraps thellvm.aarch64.sve.ldnt1.gather.nxv2i8intrinsic in LLVM. It returnsnxv2i8, which is avscale x i8 x 2that is then cast tosvint64_t.Like in the previous case,
vscale x i8 x 2cannot be defined without the attribute accepting arbitrary specification ofNor a type to calculateNwith. -
RISC-V RVV's scalable vectors are quite different from Arm's SVE, while sharing the same underlying infrastructure in LLVM.
SVE's scalable vector types map directly onto LLVM scalable vector types, and all of the dynamic parts of the vectors are abstracted by
vscale:├───────── vscale x 128 ────────┤ ├── 128 ──┤ ┌─────────────────────────────────┬───────────┐ │ ... │ │ └─────────────────────────────────┴───────────┘RVV's scalable vector types have an extra dimensions of flexibility, the "register grouping factor" or
LMUL, andSEW:-
SEWis the "selected element width", and corresponds to the size of the element type of the vector (element_size).RVV uses the least common multiple of the supported element types as
vunitso that the overall vector length isVLEN = vscale * vunit, which is a constant, rather thanVLEN = vscale * element_size, which is not a constant. -
LMULconfigures how many vector registers are grouped together to form a larger logical vector register.LMULcan be 1/8, 1/4, 1/2, 1, 2, 4, or 8. Not allLMULvalues are valid for each type.
LMULandSEWare part of the processor state and are changed by compiler-insertedvsetliinstructions depending on the vector types being used.LMULis distinct from tuple types, which are a separate variable namedNFIELD(which is not part of the processor state, as with SVE tuples).NFIELDcan be 1, 2, 3, 4, 5, 6, 7 or 8. Not allNFIELDvalues are valid for each type.Scalable vector types which vary in both
LMULandNFIELDcould be exposed to the user (see RVV Type System Documentation). For example, consider the following types:vint8mf2_thasNFIELD=1,LMUL=1/2andty=i8vuint32m4_thasNFIELD=1,LMUL=4andty=i32vint16mf4x6_thasNFIELD=6,LMUL=1/4andty=i16vint64m2x3_thasNFIELD=3,LMUL=2andty=i64
This can include types which have different representation but have the same
N:vint32m1x2_thasNFIELD=2,LMUL=1andty=i32(N=4elements)vint32m2_thasNFIELD=1,LMUL=2andty=i32(N=4elements)
When
NFIELD=1,LMUL=4andty=i64, four registers are grouped together to form a logical vector register, and this has the type<vscale x 4 x i64>:├────────────────────── VLEN ────────────────────┤ ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┌──────────────────────╋─┬───────────────────────┐ ┃ ┬ │ ... ┃ │ i64 │ ┃ │ └──────────────────────╋─┴───────────────────────┘ ┃ │ ┌──────────────────────╋─┬───────────────────────┐ ┃ │ │ ... ┃ │ i64 │ ┃ │ └──────────────────────╋─┴───────────────────────┘ ┃ │ LMUL=4 (vint64m4_t) ┌──────────────────────╋─┬───────────────────────┐ ┃ │ vscale x 4 x i64 │ ... ┃ │ i64 │ ┃ │ └──────────────────────╋─┴───────────────────────┘ ┃ │ ┌──────────────────────╋─┬───────────────────────┐ ┃ │ │ ... ┃ │ i64 │ ┃ │ └──────────────────────╋─┴───────────────────────┘ ┃ ┴ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛Similarly, when
NFIELD=1,LMUL=4andty=i32, the smaller element type results in each register containing more elements to add up tovunitand this is repeated across all four registers:├────────────────────── VLEN ────────────────────┤ ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┌──────────────────────╋─┬───────────┬───────────┐ ┃ ┬ │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ │ ┌──────────────────────╋─┬───────────┬───────────┐ ┃ │ │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ │ LMUL=4 (vint32m4_t) ┌──────────────────────╋─┬───────────┬───────────┐ ┃ │ vscale x 8 x i32 │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ │ ┌──────────────────────╋─┬───────────┬───────────┐ ┃ │ │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ ┴ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛It is possible for different scalable vector types with RVV to have the same value of
N, considerNFIELD=1,LMUL=2andty=i32..├────────────────────── VLEN ────────────────────┤ ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┌──────────────────────╋─┬───────────┬───────────┐ ┃ ┬ │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ │ LMUL=2 (vint32m2_t) ┌──────────────────────╋─┬───────────┬───────────┐ ┃ │ vscale x 4 x i32 │ ... ┃ │ i32 │ i32 │ ┃ │ └──────────────────────╋─┴───────────┴───────────┘ ┃ ┴ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛..and
NFIELD=2,LMUL=1andty=i32:├────────────────────── VLEN ────────────────────┤ ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┐ ┌──────────────────────╋─┬───────────┬───────────┐ ┃ ┬ │ │ ... ┃ │ i32 │ i32 │ ┃ │ LMUL=1 │ └──────────────────────╋─┴───────────┴───────────┘ ┃ ┴ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ├─ vint32m1x2_t ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ vscale x 4 x i32 ┌──────────────────────╋─┬───────────┬───────────┐ ┃ ┬ │ │ ... ┃ │ i32 │ i32 │ ┃ │ LMUL=1 │ └──────────────────────╋─┴───────────┴───────────┘ ┃ ┴ │ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┘Despite
vint32m2_tandvint32m1x2_thaving the same value ofN, and hence same LLVM type,vscale x 4 x i32, the value ofLMULin the processor state will be different when each are used. In practice, RVV tuples lower to target-specific types in LLVM rather than generic scalable vector types, sorustc_scalable_vectorwill not initially support RVV tuples (see RISC-V Vector Extension's tuple types).RVV's scalable vectors cannot be defined without the attribute accepting arbitrary specification of
Nor an argument to the attribute to specify thelmulused:// alternative: user-provided arbitrary `N` #[rustc_scalable_vector(4)] struct vint32m2_t(i32); #[rustc_scalable_vector(1)] struct vint16mf4_t(i16); // alternative: user-provided `LMUL` #[rustc_scalable_vector(lmul = "2")] struct vint32m2_t(i32); #[rustc_scalable_vector(lmul = "1/4")] struct vint16mf4_t(i16);
-
It is technically possible to calculate N, requiring lots of additional
machinery in the rustc_scalable_vector attribute, much of which would be
mutually exclusive or produce invalid types with many values of the parameters.
There will be a fixed number of scalable vector types that will be defined in the standard library alongside their intrinsics and well-tested. It is very likely that their implementations will be automatically generated. For example, while not being proposed in this RFC, it is expected Arm SVE will define 55 scalable vector types, exhaustively covering the possible vector types enabled by the architecture extension (it is assumed that the same will be true for RISC-V RVV):
svbool_tsv{int8,uint8}{,x2,x3,x4}_tsv{int16,uint16}{,x2,x3,x4}_tsv{float32,int32,uint32}{,x2,x3,x4}_tsv{float64,int64,uint64}{,x2,x3,x4}_tsvbool{2,4,8}_tfor internal usenxv{2,4,8}{i8,u8}for internal usenxv{2,4}{i16,u16}for internal usenxv2{i32,u32}for internal use
Given the complexity required in rustc_scalable_vector to be able to calculate
N, that it would still be possible to use the attribute incorrectly, that the
attribute is permanently unstable, and the low risk of misuse given the intended
use, this proposal argues that allowing arbitrary specification of N is
reasonable.
rfcs#3268 was a previous iteration of this RFC.
There are not many languages with support for scalable vectors:
- SVE in C takes a similar approach as this proposal by using sizeless incomplete types to represent scalable vectors. However, sizeless types are not part of the C specification and Arm's C Language Extensions (ACLE) provide an edit to the C standard which formally define "sizeless types".
- .NET 9 has experimental support for SVE, but as a managed language, the design and implementation considerations in .NET are quite different to Rust.
Both repr(simd) and target_feature attributes were initially proposed in
RFCs:
-
rfcs#2045:
target_feature- Original accepted RFC for
#[target_feature(enable = "..")]. - Of relevance to ABI-affecting target features, there was various discussion
around the RFC which led to discussion of ABI issues being
relegated to an unresolved question
- At the time of writing the RFC, Portable SIMD types were the only types
that were considered as potentially having ABI issues, rather than types
to be used with vendor intrinsics
-
However, there are types that we might want to add to the language at some point, like portable vector types, for which this [a lack of ABI changes] is not the case.
The behaviour of
#[target_feature]for those types should be specified in the RFC that proposes to stabilize those types, and this RFC should be amended as necessary. - It does not appear that this has been considered for any intrinsic types that were later stabilised (e.g. Neon intrinsics on AArch64) and as such that these types can exist in featureless functions is an accident of history
-
- At the time of writing the RFC, Portable SIMD types were the only types
that were considered as potentially having ABI issues, rather than types
to be used with vendor intrinsics
- Original accepted RFC for
-
rust#44839: Tracking issue for RFC 2045: improving
#[target_feature]- Tracking issue for the
#[target_feature]parts of RFC 2045 - This issue hasn't been well-maintained and the description is out-of-date.
It aims to track both the addition of intrinsics for various architectures
which use
#[target_feature]as well as improvements to the#[target_feature]attribute itself - It was last triaged by the language team in Mar 2022, concluding that the issue needed an owner
- Tracking issue for the
-
rfcs#2396:
#[target_feature]1.1- Allows specifying
#[target_feature]functions without making them unsafe, still requiring calls to be in unsafe blocks unless the calling function also has the target features enabled
- Allows specifying
-
rfcs#1199:
repr_simd- Proposed
repr(simd)attribute, applied to structs with multiple fields, one for each element in the corresponding vectorrepr(simd)has since changed it was proposed in this RFC- It is used for both portable SIMD types and non-portable types, and now
contains an array (i.e.
[f32; 4]instead of(f32, f32, f32, f32))
- It is used for both portable SIMD types and non-portable types, and now
contains an array (i.e.
- Largely focused on portable SIMD, rather than non-portable intrinsics
- Proposed intrinsics be declared in
extern "platform-intrinsic"blocks and that platform detection be available (though this part was later subsumed by rfcs#2045)
- Proposed
-
rust#27731: Tracking issue for SIMD support
- Initially tracked implementation of rfcs#1199, eventually ended up
tracking
simd_ffi(rust#53346),repr(simd)andcore::archintrinsics - It was later closed and split up into tracking issues for each architecture's intrinsics
- Initially tracked implementation of rfcs#1199, eventually ended up
tracking
There are many existing issues and RFCs related to repr(simd); interactions
between SIMD types and target features; and ABI incompatibilities with SIMD
types, surveyed in the sections below.
Many of these issues were related to specific intrinsics on specific platforms (adding, stabilising or fixing bugs with them), these have been omitted and only issues that affect generic infrastructure are included.
A handful of issues are related to projections into repr(simd) types being
initially permitted..
-
rust#105439: ICE due to generating LLVM bitcast vec -> array
- Accessing the field of a
repr(simd)type causes an ICE - Fixed by rust#105583, changing codegen to remove the illegal operation
- Later addressed holistically by compiler-team#838 which will ban projecting into
repr(simd)types- Landed in rust#143833
- Accessing the field of a
-
rust#137108: Projecting into non-power-of-two-lanes
repr(simd)types does the wrong thing- Repeat of rust#105439:
- Accessing the field of a
repr(simd)type misbehaves - Similarly addressed by compiler-team#838
- Accessing the field of a
- Repeat of rust#105439:
-
rust#113465: transmute + tuple access + eq on
repr(simd)'s inner value seems UB to valgrind- Repeat of rust#105439, except with Portable SIMD type (portable-simd#339)
- Accessing the field of a
repr(simd)type misbehaves - Similarly addressed by compiler-team#838
- Accessing the field of a
- Repeat of rust#105439, except with Portable SIMD type (portable-simd#339)
These issues are informative for scalable vectors and projection into scalable vectors will not be supported, as described in Reference-level explanation.
Other issues discussed confusion related to inheritance of target_feature to
nested functions and closures...
-
rust#58729: target_feature doesn't trickle down to closures and internal fns
target_featureattribute doesn't apply to nested functions and closures- Interaction with nested functions is expected, these never inherit from their parent
- Interaction with closures was a bug
- Prior to rfcs#2396, closures would have required the ability to be marked as unsafe to support
target_feature - After rfcs#2396, closures inheriting target features was accepted in rust#73631 (then implemented in rust#78231)
- Interactions with
inline(always)fixed in rust#111836target_featureattributes are ignored frominline(always)-annotated closures
- Interactions with
- Prior to rfcs#2396, closures would have required the ability to be marked as unsafe to support
-
rust#108338: closure doesn't seem to inherit the target attributes for codegen purposes
- Basically a dupe of rust#58729 with same resolution
-
rust#111836: Fix #[inline(always)] on closures with target feature 1.1
- Allows
#[inline(always)]to be used with#[target_feature]on closures, assuming that target features only affect codegen
- Allows
Scalable vectors will inherit the behaviour described above.
There are issues related to how well repr(simd) syntax works with other
representation hints and whether a language item would be better:
-
rust#47103: What to do about repr(C, simd)?
- Unclear what the behaviour of
repr(C, simd)should be- When submitted, a warning of incompatible representation hints was emitted
- When omitted, a FFI unsafety warning was emitted when SIMD types used in FFI
- Passing vectors as immediates is trickier, later resolved in rfcs#2574, so discussion focused on passing vectors indirectly over the FFI boundary
- Discussion fizzled out, but with rust#116558, it may be possible to allow
repr(C, simd)
- Unclear what the behaviour of
-
rust#130402: When a type is
#[repr(simd)],#[repr(align(N))]annotations are ignoredrepr(align)ignored whenrepr(simd)is present- Intended to be fixed after rust#137256 which refactored layout logic
within the compiler
- Unclear if the fix happened, but the code from the bug report still has the unexpected alignment
-
rust#63633: Remove repr(simd) attribute and use a lang-item instead
- Suggests using a language item for the
Simdtype (part of Portable SIMD) instead of usingrepr(simd) - Doesn't address what would happen for non-portable intrinsics that also use this infrastructure
- Various other issues cited as motivation:
- rust#18147 used Portable SIMD
f64x2and found that constant initialisers weren't optimised with-Copt-level=0- Not clear that this applies to types intended for use with architecture-specific intrinsics
- rust#47103
- See above
- rust#53346
- See below
- rust#77529
- See below
- rust#77866 defines its own
repr(simd)type and then passes it to an LLVM intrinsic binding that has been declared incorrectly - rust#81931 defines its own
repr(simd)type and finds that it is misaligned according to recommendations for achieving best performance
- rust#18147 used Portable SIMD
- Suggests using a language item for the
On account of these concerns, scalable vectors use rustc_scalable_vector
instead.
A handful of architecture-agnostic issues only relate to Portable SIMD:
-
rust#126217: What should SIMD bitmasks look like?
- Design discussion related to Portable SIMD
simd_bitmask/simd_bitmask_selectintrinsics - Not relevant to architecture-specific scalable vector intrinsics
- Design discussion related to Portable SIMD
-
rust#99211: fn where clause "constrains trait impls" or something
- Writing extension traits with const generics which. apply to Portable SIMD types can run into tricky compiler errors related to the type system
- Only applies to Portable SIMD
-
rust#77529: Invalid monomorphisation when
-Clink-dead-codeis usedrepr(simd)types w/ generics (i.e. Portable SIMD or hand-rolled equivalents) can have invalid instantiations with-Clink-dead-code
These issues don't apply to scalable vectors.
There was a single issue related to const-initialisation of non-portable vector types:
- rust#48745: Provide a way to const-initialise vendor-specific vector types
- Initialisation of fixed length vectors was not possible in a const context for non-portable SIMD
mem::transmutebeing made constant has addressed this issue
This issue doesn't apply to scalable vectors as they are inherently non-const.
There have been well-documented issues with the ABI of fixed-length SIMD vectors, many of which apply to scalable vectors too, but are harder to resolve:
-
rust#44367:
repr(simd)is unsoundrepr(simd)types in functions with different target features enabled can have different ABIs- Fixed by passing SIMD types indirectly in rust#47743
-
rust#53346:
repr(simd)is unsound in C FFI- Same issue as in rust#44367 but only with
extern "C"functions where the Rust ABI does not apply - Later fixed by rust#116558
- Same issue as in rust#44367 but only with
-
rust#87438: future-incompat: use of SIMD types aren't gated properly
- Calling a
extern "C"function with an SIMD vector type in arepr(C)orrepr(transparent)struct doesn't error - Later fixed by rust#116558
- Calling a
-
rfcs#2574:
simd_ffi- Permits calls to
extern "C"functions with SIMD types so long as those functions have the appropriatetarget_featureattribute - Never fully implemented until rust#116558 effectively did so
- Permits calls to
-
rust#131800: Figure out which target features are required for which SIMD size
- As part of rust#116558, solicited input in determining which target features were required for a given vector length so that the lint could check for those
-
rust#133146: How should we handle dynamic vector ABIs?
- Follow-up to rust#131800:
- Existing ABI compatibility checks rely on the length of the vector and the architecture to identify an appropriate target feature that must be enabled, but this approach does not scale to scalable vectors
- Follow-up to rust#131800:
-
rust#133144: How should we handle matrix ABIs?
- Follow-up to rust#131800:
- Same as rust#133146 but for matrix extensions
- Out-of-scope for this RFC
- Follow-up to rust#131800:
-
rust#116558: The
extern "C"ABI of SIMD vector types depends on target features (tracking issue forabi_unsupported_vector_typesfuture-incompatibility lint)- Identified ABI incompatibility when calling
extern "C"functions that used SIMD types- Rust passes SIMD types indirectly for functions with and without
target_featureannotations in its ABI.extern "C"functions take SIMD types as immediates - Calls from annotated functions to
extern "C"could use immediates, but calls from non-annotated functions could not. Rust did not prevent calls from non-annotated functions.
- Rust passes SIMD types indirectly for functions with and without
- A
abi_unsupported_vector_typesfuture-incompatibility lint was introduced to enforce thatextern "C"functions could not have SIMD types in their signatures without the appropriate target feature being enabled- The lint has since been removed and replaced with a hard error
- It only triggers when such a function is called
- Identified ABI incompatibility when calling
-
rust#132865: Support calling functions with SIMD vectors that couldn't be used in the caller
- Follow-up to rust#116558
- There are valid calls to
extern "C"functions which take SIMD types that are not currently accepted, such as checking for the presence of the target feature and then calling theextern "C"function with a newly created vector - It is hard to support this as it is not possible to generate a call with a
specific ABI without annotating the entire containing function as having the
target feature (llvm#70563)
- This limitation also causes similar issues with inlining (rust#116573)
-
Pre-RFC: Fixing ABI for SIMD types
- Proposes requiring appropriate target features be enabled when a x86 SIMD
type is used in a function signature
- Written primarily considering x86 SIMD
- Considers both globally-enabled target features (e.g.
-Ctarget-featureor default features from target specification) and per-function-enabled target features (#[target_feature]) - Proposes generating shims to translate between ABIs when calling annotated
functions with SIMD type arguments from non-annotated functions
- Avoids breakage in cases similar to rust#132865 but between annotated and non-annotated functions, rather than just Rust ABI to non-Rust ABI
- Errors will be emitted for function pointers based on the target features of the caller
- Prompted by discussion in rust#116558
- Never progressed to being a submitted RFC
- Discussed on Zulip
- How does the pre-RFC interact with Portable SIMD efforts?
- The inherent portability of these types means that they will need a matching featureful and featureless ABI. It is suggested that this be the current indirect ABI, but this isn't seen as desirable - ABI shims or per-target-feature monomorphisation is to be explored
- Should there be a difference in codegen for calls to function items vs
function pointers (e.g. use of a shim)?
- Suggestion that an ABI shim be used for function pointers rather than requiring target feature on functions with the call
- Is there a proper featureless ABI for x86 SIMD types?
- Yes, details in thread
- Should these changes also apply to
extern "Rust"?-
Mixed opinions - enables use of performant ABI, larger breaking change
-
Concern that doing this jeopardizes the entire proposal and that Rust is stuck with the current behaviour
I still don't like the idea I'm forced to use an FFI calling convention in pure rust code because the default is fundamentally too slow
it's a tradeoff. should the default be portable or fast. I dont think there is an obvious right answer here. might be worth digging out the history that led to the current situation -- possibly this decision has been made in the past, in favor of "portable", and that's why the ABI works the way it does?
-
Could use the performant ABI when global target features have feature enabled
-
References later design meeting (lang-team#235)
-
- How does the pre-RFC interact with Portable SIMD efforts?
- Proposes requiring appropriate target features be enabled when a x86 SIMD
type is used in a function signature
-
lang-team#235: Design meeting: resolve ABI issues around target-feature (meeting notes)
- Proposes property that functions with the same signature will always have
the same ABI (i.e. that target features will not be considered in the ABI)
and three possible fixes:
- Track target features as part of function signatures, which is hard to do
without changing function pointer syntax
- Discussed briefly but not proposed due to concerns regarding breaking change and expectation that function pointer syntax would need changed/extended, and that it introduces a new semver hazard (adding a SIMD type field)
- Suggested that if this route were taken then allowing target features in extern blocks would be desirable and passing SIMD types using registers could be considered
- Did not discuss challenges related to trait methods and generic functions
- References rust#111836
- Define an ABI which does not depend on target features
- i.e. as the Rust ABI today with indirect passing of SIMD types
- Reject declaring/calling functions with target-feature-requiring types
when the ABI target feature is not available/enabled
- References Pre-RFC: Fixing ABI for SIMD types, proposing
a variant of the RFC:
- Instead of applying to all ABIs (as in the pre-RFC) and using the
performant calling convention, it would only to non-Rust ABIs (e.g.
extern "C") and would be based on a size of the vector to feature mapping (rather than annotating types w/ the required features)- This narrowly fixes the soundness issue and is the basis for the currently implemented future incompatibility warning
- Instead of applying to all ABIs (as in the pre-RFC) and using the
performant calling convention, it would only to non-Rust ABIs (e.g.
- References Pre-RFC: Fixing ABI for SIMD types, proposing
a variant of the RFC:
- Reject enabling/disabling certain ABI-affecting features
- Discusses this as a solution for ABI issues relating to floats
- Track target features as part of function signatures, which is hard to do
without changing function pointer syntax
- During the meeting, various points were discussed:
- Should it be possible to declare (non-Rust ABI) functions which take SIMD
vectors as long as there aren't calls?
- Intended to reduce potential opportunities breakage.
- How plausible is it to include ABI-affecting target features in function
pointer types?
- It is suggested that this information could be smuggled through the ABI
name: e.g.
extern "Rust+avx" - Concern that this is a wider breaking change and would require treating ABIs specially or would require shims
- Feeling that this should be possible but not discussed further as an immediate solution was desired to resolve unsoundness
- It is suggested that this information could be smuggled through the ABI
name: e.g.
- There was consensus that crater runs were needed to gather data
- Should it be possible to declare (non-Rust ABI) functions which take SIMD
vectors as long as there aren't calls?
- After the meeting, the language team concluded:
We discussed this question in the T-lang design meeting on 2023-12-20. The consensus was generally in favor of fixing this, and we were interested in seeing more work in this direction. While we considered various ways that this could be fixed, we were supportive of finding the most minimal, simplest way to fix this as the first step, assuming that such an approach proves to be feasible. We'll need to see at least the results of a crater run and further analysis to confirm that feasibility.
-
It was explicitly noted in the notes that the language team didn't want to rule out considering target features as part of the ABI:
Track the target features in the function signature. This would basically mean that function pointers now have to also list the set of ABI-relevant target features that were enabled. This would be a rather fundamental change requiring new function pointer syntax, and hard to do without breaking code, so we mention it only for completeness' sake.
I don't think we should rule this out for the future. We've already talked about being able to track calling-convention ABI (extern "C" vs other ABIs) in function pointers somehow, so that we can safely track which kind of function we have. We've also talked about having this work in generics somehow, so that the Fn traits have a (defaulted) parameter for ABI or similar, and the monomorphization of a call will call the right ABI.
-
- Proposes property that functions with the same signature will always have
the same ABI (i.e. that target features will not be considered in the ABI)
and three possible fixes:
See ABI for discussion of these challenges as they apply to scalable vectors.
There are ongoing efforts related to improving Rust's SIMD support:
-
rust-project-goals#261: Nightly support for ergonomic SIMD multiversioning
- Generating efficient code for specific SIMD ISAs requires
target_featureattributes on functions, which isn't particularly ergonomic- Need to do runtime checks then dispatch to functions with target features
- Must be repeated when leaving and entering these functions
- Intermediate functions use
inline(always)to avoid having to have different versions for each target feature, which impacts code size
- Need to do runtime checks then dispatch to functions with target features
- Various solutions have been proposed - witness types carrying target feature information, inherited target features from callers, features being const generic arguments
- Goal aims to explore design space and experiment
- Generating efficient code for specific SIMD ISAs requires
-
lang-team#309: SIMD multiversioning (pre-read)
- Design meeting proposal, has not yet taken place
- Compares two related proposals that address some of the problems with
multiversioning
- Ideas similar to rfcs#3525
- In brief: attaching target features to types, introduce traits that abstract over common operations, functions are generic over those traits and when instantiated with a function-carrying type, inherit the target feature and use the trait methods to do SIMD operations
- Ideas similar to unopened contextual target features RFC
- In brief:
#[target_features(caller)]which causes a function to inherit the features of the caller - Expands on this with a
#[target_features(generic)]attribute which takes target features from the first const generic argument of the function (i.e. aconst FEATURES: str)
- In brief:
- Ideas similar to rfcs#3525
- Both ideas have many open design questions
-
rust#143352: Tracking issue for Effective Target Features
- Initial proposal aims to experiment with SIMD multiversioning based on the effect model used with const traits
- In brief: traits can be defined as having a target feature effect, implementations of those traits define the target feature that is enabled by the effect, bounds on the trait in functions will enable the target feature from the impl
-
lang-team#317: Design meeting: "Marker effects" (meeting notes)
- Discusses the findings from investigations into keyword generics and the implementation of const traits, proposing a categorisation of effects and a subset to focus on initially
- Briefly discusses that effects overlap with the SIMD multiversioning efforts
These efforts are followed with interest as they may synergise well with resolving the similar challenges that scalable vectors face. See Trait implementations and generic instantiation.
There is one outstanding unresolved question for scalable vectors:
-
How to support trait implementations and generic instantiation for scalable vectors?
There are a handful of future possibilities enabled by this RFC - relaxing restrictions, architecture-agnostic use or extending the feature to support more features of the architecture extensions:
Improvements to the language's target_feature infrastructure could enable the
restrictions on trait implementations and generic instantiation to be lifted:
-
Some variety of rfcs#3820:
target_feature_traitscould help traits be implemented on scalable vectors -
Efforts to integrate target features with the effect system (rust#143352) may help enable generic instantiation of scalable vectors
- Any mechanism that could be applied to scalable vector types could also be
used to enforce that existing SIMD types are only used in
target_feature-annotated functions, which would enable fixed-length vectors to be passed as immediates, improving performance
- Any mechanism that could be applied to scalable vector types could also be
used to enforce that existing SIMD types are only used in
-
It may be possible to support scalable vector types without the target feature being enabled by using an indirect ABI similarly to fixed length vectors.
-
This would enable these restrictions to be lifted and for scalable vector types to be the same as fixed length vectors with respect to interactions with the
target_featureattribute.- As with fixed length vectors, it would still be desirable for them to avoid needing to be passed indirectly between annotated functions, but this could be addressed in a follow-up.
-
Experimentation is required to determine if this is feasible.
-
The restriction that scalable vectors cannot be used in compound types could be relaxed at a later time either by extending rustc's codegen or leveraging newly added support in LLVM.
However, as C also has this restriction and scalable vectors are nevertheless used in production code, it is unlikely there will be much demand for those restrictions to be relaxed in LLVM.
As explained in
Manually-chosen or compiler-calculated element count,
there is a distinction in RVV between vectors which would have the same N in a
scalable vector type, but which vary in LMUL and NFIELD.
For example, vint32m2_t and vint32m1x2_t, if lowered to scalable vector
types in LLVM, would both be <vscale x 4 x i32>.
RVV's tuple types need to be lowered to target-specific types in the backend which is out-of-scope of this general infrastructure for scalable vectors.
Given that there are significant differences between scalable vectors and
fixed-length vectors, and that std::simd is unstable, it is worth
experimenting with architecture-specific support and implementation initially.
Later, there are a variety of approaches that could be taken to incorporate
support for scalable vectors into Portable SIMD.