Skip to content

Latest commit

 

History

History
1269 lines (1038 loc) · 67.9 KB

File metadata and controls

1269 lines (1038 loc) · 67.9 KB

Summary

Introduces a new attribute, #[rustc_scalable_vector(N)], which can be used to define new scalable vector types, such as those in Arm's Scalable Vector Extension (SVE), or RISC-V's Vector Extension (RVV).

rustc_scalable_vector(N) is internal compiler infrastructure that will be used only in the standard library to introduce scalable vector types which can then be stabilised. Only the infrastructure to define these types are introduced in this RFC, not the types or intrinsics that use it.

This RFC depends on rfcs#3729: Hierarchy of Sized traits.

SVE is used in examples throughout this RFC, but the proposed features should be sufficient to enable support for similar extensions in other architectures, such as RISC-V's V Extension.

Motivation

SIMD types and instructions are a crucial element of high-performance Rust applications and allow for operating on multiple values in a single instruction. Many processors have SIMD registers of a known fixed length and provide intrinsics which operate on these registers. For example, Arm's Neon extension is well-supported by Rust and provides 128-bit registers and a wide range of intrinsics.

Instead of releasing more extensions with ever increasing register bit widths, AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has a Vector Extension (RVV). These extensions have vector registers whose width depends on the CPU implementation and bit-width-agnostic intrinsics for operating on these registers. By using scalable vectors, code won't need to be re-written using new architecture extensions with larger registers, new types and intrinsics, but instead will work on newer processors with different vector register lengths and performance characteristics.

Scalable vectors have interesting and challenging implications for Rust, introducing value types with sizes that can only be known at runtime, requiring significant changes to the language's notion of sizedness - this support is being proposed in the rfcs#3729.

Hardware is generally available with SVE, and key Rust stakeholders want to be able to use these architecture features from Rust. In a recent discussion on SVE, Amanieu, co-lead of the library team, said:

I've talked with several people in Google, Huawei and Microsoft, all of whom have expressed a rather urgent desire for the ability to use SVE intrinsics in Rust code, especially now that SVE hardware is generally available.

Without support in the compiler, leveraging the Hierarchy of Sized traits proposal, it is not possible to introduce intrinsics and types exposing the scalable vector support in hardware.

Guide-level explanation

None of the infrastructure proposed in this RFC is intended to be used directly by Rust users.

rustc_scalable_vector as described later in Reference-level explanation is perma-unstable and exists only to enable scalable vector types to be defined in the standard library. The specific vector types are intended to eventually be stabilised, but none are proposed in this RFC.

Using scalable vectors

From a user's perspective, writing code for scalable vectors isn't too different from when writing code with a fixed sized vector. To illustrate how the types and intrinsics that this infrastructure will enable could be used, consider the following example that sums two input vectors:

use std::arch::aarch64::{
    // These intrinsics and types are not proposed by this RFC
    svcntw, svwhilelt_b32, svld1_f32, svadd_f32_m, svst1_f32
};

fn sve_add(in_a: Vec<f32>, in_b: Vec<f32>, out_c: &mut Vec<f32>) {
    assert_eq!(in_a.len(), in_b.len());
    assert_eq!(in_a.len(), out_c.len());
    let len = in_a.len();
    unsafe {
        // `svcntw` returns the actual number of elements that are in a 32-bit
        // element vector
        let step = svcntw() as usize;
        for i in (0..len).step_by(step) {
            let a = in_a.as_ptr().add(i);
            let b = in_b.as_ptr().add(i);
            let c = out_c as *mut f32;
            let c = c.add(i);

            // `svwhilelt_b32` generates a predicate vector that deals with
            // the tail of the iteration - it enables the operations which
            // follow for the first `len` elements overall, but disables
            // the last `len % step` elements in the last iteration
            let pred = svwhilelt_b32(i as _, len as _);

            // `svld1_f32` loads a vector register with the data from address
            // `a`, zeroing any elements in the vector that are masked out
            //
            // Does not access memory for inactive elements
            let sva = svld1_f32(pred, a);
            let svb = svld1_f32(pred, b);

            // `svadd_f32_m` adds `a` and `b`, any lanes that are masked out will
            // take the keep value of `a`
            let svc = svadd_f32_m(pred, sva, svb);

            // `svst1_f32` will store the result without accessing any memory
            // locations that are masked out
            svst1_f32(svc, pred, c);
        }
    }
}

Reference-level explanation

Scalable vectors are similar to fixed length vectors already supported by Rust, enabling operations to be performed on multiple values at once in a single instruction. Unlike fixed length vectors, the length of scalable vectors is not fixed, and intrinsics which operate on scalable vectors are length-agnostic.

Scalable vectors types supported by the rustc_scalable_vector(N) attribute can be thought of as having the form vscale × N × ty, where vscale is a single, global, fixed-for-the-runtime-of-the-program "scaling factor" of the CPU.

Vector registers are of the length vscale × vunit, with vunit being an architecture-specific value:

  • ARM SVE: vunit is the minimum length of the vector register in the architecture - 128 bits.
  • RISC-V V: vunit is the least common multiple of the supported element widths - 64 bits.

Tip

While the vscale terminology is borrowed from LLVM, vunit is invented for the purposes of aiding this explanation.

N in rustc_scalable_vector(N) defines the value of N in a scalable vector type. Any value of N is accepted by the attribute and it is the responsibility of whomever is defining the type to provide a valid value. A correct value for N depends on the purpose of the specific scalable vector type and the architecture. See Manually-chosen or compiler-calculated element count for rationale.

In the simplest case, a scalable vector register could be depicted as follows:

 ◁────── vscale x vunit ──────▷ ◁─── vunit ───▷
 ◁────── vscale x ty x N ─────▷ ◁─── ty x N ──▷
┌──────────────────────────────┬───────────────┐
│              ...             │ ty │ ty │ ... │ ← a vector register
└──────────────────────────────┴───────────────┘

Scalable vector types contain a single field which is used to determine ty:

#[rustc_scalable_vector(4)]
pub struct svfloat32_t(f32);

In the example above, svfloat32_t is a scalable vector with a minimum of four f32 elements when vscale = 1 and more when vscale > 1. svfloat32_t could be depicted as..

 ◁───────────── vscale x f32 x 4 ─────────────▷ ◁────── f32 x 4 ──────▷
┌──────────────────────────────────────────────┬───────────────────────┐
│                      ...                     │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└──────────────────────────────────────────────┴───────────────────────┘

..and when running on hardware with vscale=2..

 ◁──── 2 x f32 x 4 ────▷ ◁────── f32 x 4 ──────▷
┌───────────────────────┬───────────────────────┐
│ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└───────────────────────┴───────────────────────┘

..or vscale=3:

 ◁──────────────── 3 x f32 x 4 ────────────────▷ ◁────── f32 x 4 ──────▷
┌───────────────────────┬───────────────────────┬───────────────────────┐
│ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ f32 │ ← `svfloat32_t`
└───────────────────────┴───────────────────────┴───────────────────────┘

The type marker field is solely used to get the element type for the codegen backend. It must one of the following types:

  • u8, u16, u32 or u64
  • i8, i16, i32 or i64
  • f16, f32 or f64
  • bool

It is not permitted to project into scalable vector types and access the type marker field.

Tuples of vectors

Structs of scalable vectors are supported, but every element of the struct must have the same scalable vector type. This will enable definition of "tuple of vector" types, such as svfloat32x2_t below, that are used in some load and store intrinsics.

#[rustc_scalable_vector]
pub struct svfloat32x2_t(svfloat32_t, svfloat32_t);
◁───────────── vscale x f32 x 4 ─────────────▷ ◁────── f32 x 4 ──────▷
┌──────────────────────────────────────────────┬───────────────────────┐  ┐
│                      ...                     │ f32 │ f32 │ f32 │ f32 │  │
└──────────────────────────────────────────────┴───────────────────────┘  ├─ svfloat32x2_t
┌──────────────────────────────────────────────┬───────────────────────┐  │
│                      ...                     │ f32 │ f32 │ f32 │ f32 │  │
└──────────────────────────────────────────────┴───────────────────────┘  ┘

Structs must be still be annotated with #[rustc_scalable_vector], so end-users cannot define their own structs of scalable vectors. It is not permitted to project into structs and access the individual vectors.

Properties of scalable vectors

Scalable vectors are necessarily non-const Sized (from rfcs#3729) as they behave like value types but the exact size cannot be known at compilation time.

rfcs#3729 allows these types to implement Clone (and consequently Copy) as Clone only requires an implementation of Sized, irrespective of constness.

Scalable vector types have some further restrictions due to limitations of the codegen backend:

  • Can only be in the signature of a function if it is annotated with the appropriate target feature (see ABI)

  • Cannot be stored in compound types (structs, enums, etc)

    • Including coroutines, so these types cannot be held across an await boundary in async functions

    • repr(transparent) newtypes could be permitted with scalable vectors

    • Exception: Scalable vectors can be stored in arrays

    • Exception: Scalable vectors can be stored in structs with every element of the same type (but only if that struct is annotated with #[rustc_scalable_vector])

  • Cannot be the type of a static variable

  • Cannot be instantiated into generic functions (see Target features)

  • Cannot have trait implementations (see Target features)

    • Including blanket implementations (i.e. impl<T> Foo for T is not a valid candidate for a scalable vector)

Some of these limitations may be able to be lifted in future depending on what is supported by rustc's codegen backends or with evolution of the language.

ABI

Rust currently always passes SIMD vectors on the stack to avoid ABI mismatches between functions annotated with target_feature - where the relevant vector register is guaranteed to be present - and those without - where the relevant vector register might not be present.

However, this approach will not work for scalable vector types as the relevant target feature must to be present to use the instruction that can allocate the correct size on the stack for the scalable vector.

Therefore, there is an additional restriction that these types cannot be used in the argument or return types of functions unless those functions are annotated with the relevant target feature.

Any such functions would make a trait containing them dyn-incompatible.

It is permitted to create pointers to function that have scalable vector types in their arguments or return types, even though function pointers themselves cannot be annotated as having the target feature. When the function pointer was created, the user must have made the unsafe promise that it was okay to call the #[target_feature]-annotated function, so it is sound to permit function pointers.

As scalable vectors will always be passed as immediates, they will therefore have the same ABI as in C, so should be considered FFI-safe.

Target features

Similarly to the challenges with the ABI of scalable vectors, without the relevant target features, few operations can actually be performed on scalable vectors - causing issues for the use of scalable vectors in generic code and with traits implementations.

For example, implementations of traits like Clone would not be able to actually perform a clone, and generic functions that are instantiated with scalable vectors would during instruction selection in the codegen backend.

Without a mechanism for a generic function to be able to inherit target features from its instantiated types or for trait methods to have target features, it is not possible for these types to be used with generic functions or traits.

See Trait implementations and generic instantiation.

Changing vector lengths at runtime

It is possible to change the vector length at runtime using a prctl() call to the Linux kernel, or via similar mechanisms in other operating systems.

Doing so would require that vscale change, which Rust will not supported.

prctl or similar must only be used to set up the vector length for child processes, not to change the vector length of the current process. As Rust cannot prevent users from doing this, it will be documented as undefined behaviour, consistent with C and C++.

Implementing rustc_scalable_vector

Implementing rustc_scalable_vector largely involves lowering scalable vectors to the appropriate type in the codegen backend. LLVM has robust support for scalable vectors and is the default backend, so this section will focus on implementation in the LLVM codegen backend. Other backends should be able to support scalable vectors in Rust once they support scalable vectors in general.

Most of the complexity of scalable vectors are handled by LLVM: lowering Rust's scalable vectors to the correct type in LLVM and the vscale modifier that is applied to LLVM's vector types.

LLVM's scalable vector type is of the form <vscale × element_count × type>. vscale is the scaling factor determined by the hardware at runtime, it can be any value providing it gives a legal vector register size for the architecture.

For example, a <vscale × 4 × f32> is a scalable vector with a minimum of four f32 elements and with SVE, vscale could then be any power of two which would result in register sizes of 128, 256, 512, 1024 or 2048 and 4, 8, 16, 32, or 64 f32 elements respectively.

The N in the #[rustc_scalable_vector(N)] determines the element_count used in the LLVM type for a scalable vector.

Structs of vectors are lowered to LLVM as struct types containing scalable vector types. This is supported since the Permit load/store/alloca for struct of the same scalable vector type LLVM RFC.

Arrays of vectors are lowered to LLVM as array types containing scalable vector types. Arrays of vectors are also supported by LLVM since the Enable arrays of scalable vector types LLVM RFC.

Tuples in RISC-V's V Extension lower to target-specific types in LLVM rather than generic scalable vector types, so rustc_scalable_vector will not initially support RVV tuples (see RISC-V Vector Extension's tuple types).

Drawbacks

  • rustc_scalable_vector(N) is inherently additional complexity to the language, despite being largely hidden from users.

Rationale and alternatives

Without support for scalable vectors in the language and compiler, it is not possible to leverage hardware with scalable vectors from Rust. As extensions with scalable vectors are available in architectures as the recommended way to do SIMD, lack of support in Rust would limit Rust's suitability on these architectures compared to other systems programming languages.

rustc_scalable_vector is preferred over a repr(scalable) attribute as there is existing dissatisfaction with fixed-length vectors being defined using the repr(simd) attribute (rust#63633).

By aligning with the approach taken by C (discussed in the Prior art below), most of the documentation that already exists for scalable vector intrinsics in C should still be applicable to Rust.

Manually-chosen or compiler-calculated element count

rustc_scalable_vector(N) expects N to be provided rather than calculating it. Calculating N would make this attribute more robust and decrease the likelihood of it being used incorrectly - even for permanently unstable internal attributes like rustc_scalable_vector, this would be worthwhile if feasible.

In the simplest case, calculating N is a simple division: vunit (as previously above) divided by the element_size. For example, with ARM SVE, vunit=128 so with an f32 element, N = 128/32 = 4; and with RISC-V RVV, vunit=64 so with an f32 element, N = 64/32 = 2 (assuming LMUL=1, but more on that later).

There are more complicated scalable vector definitions than those presented in the Reference-level explanation, which rustc_scalable_vector can support, but that would require more complicated calculations for N or architecture-specific knowledge:

  1. With Arm SVE, each intrinsic that takes a predicate takes a svbool_t (vscale x i1 x 16). svbool_t could have its N calculated as above with simple division.

    svbool_t is used even when the data arguments have fewer elements, e.g. an svfloat32_t (vscale x f32 x 4). svbool_t has predicates for sixteen lanes but there are only four lanes in the svfloat32_t arguments to enable or disable. This is slightly unintuitive but matches the definitions of the intrinsics in the Arm ACLE.

    Within the definition of those intrinsics, the svbool_t is cast to a private svboolN_t type which has a number of lanes matching the data argument (e.g. a svbool4_t/vscale x i1 x 4 for svfloat32_t/vscale x f32 x 4).

     ├──────── vscale x i32 x 4 ───────┤ ├──────────── i32 x 4 ────────────┤
    ┌───────────────────────────────────┬───────────────────────────────────┐
    │                ...                │ 0x0000 │ 0x0000 │ 0x0000 │ 0x0000 │
    └───────────────────────────────────┴───────────────────────────────────┘
                      △                   △        △        △        △
                      │       ┌───────────┘        │        │        │
                      │       │     ┌──────────────┘        │        │
                      │       │     │     ┌─────────────────┘        │
                ┌─────┘       │     │     │    ┌─────────────────────┘
    ┌───────────────────────┬───────────────────────┐
    │          ...          │ 0x0 │ 0x0 │ 0x0 │ 0x0 │ + unused space for 12x `i1`s
    └───────────────────────┴───────────────────────┘
    ├── vscale x i1 x 4 ──┤ ├─────── i1 x 4 ──────┤
    

    Defining a svboolN_t is more complicated than trivial division, requiring the attribute accept either arbitrary specification of N or a type to calculate N with, for example:

    // alternative: user-provided arbitrary `N`
    #[rustc_scalable_vector(4)]
    struct svbool4_t(bool);
    
    // alternative: add `predicate_of` to attribute
    #[rustc_scalable_vector(predicate_of = "u32")]
    struct svbool4_t(bool);
    
    // alternative: use another field to separate element type and size to use for `N`
    #[rustc_scalable_vector]
    struct svbool4_t(bool, u32);
  2. Similarly, with Arm SVE, the sign extending intrinsics will internally use LLVM intrinsics which return vectors with fewer elements than vunit / element_size (similar to svboolN_t but for other types).

    For example, the svldnt1sb_gather_s64offset_s64 intrinsic wraps the llvm.aarch64.sve.ldnt1.gather.nxv2i8 intrinsic in LLVM. It returns nxv2i8, which is a vscale x i8 x 2 that is then cast to svint64_t.

    Like in the previous case, vscale x i8 x 2 cannot be defined without the attribute accepting arbitrary specification of N or a type to calculate N with.

  3. RISC-V RVV's scalable vectors are quite different from Arm's SVE, while sharing the same underlying infrastructure in LLVM.

    SVE's scalable vector types map directly onto LLVM scalable vector types, and all of the dynamic parts of the vectors are abstracted by vscale:

     ├───────── vscale x 128 ────────┤ ├── 128 ──┤
    ┌─────────────────────────────────┬───────────┐
    │               ...               │           │
    └─────────────────────────────────┴───────────┘
    

    RVV's scalable vector types have an extra dimensions of flexibility, the "register grouping factor" or LMUL, and SEW:

    • SEW is the "selected element width", and corresponds to the size of the element type of the vector (element_size).

      RVV uses the least common multiple of the supported element types as vunit so that the overall vector length is VLEN = vscale * vunit, which is a constant, rather than VLEN = vscale * element_size, which is not a constant.

    • LMUL configures how many vector registers are grouped together to form a larger logical vector register. LMUL can be 1/8, 1/4, 1/2, 1, 2, 4, or 8. Not all LMUL values are valid for each type.

    LMUL and SEW are part of the processor state and are changed by compiler-inserted vsetli instructions depending on the vector types being used.

    LMUL is distinct from tuple types, which are a separate variable named NFIELD (which is not part of the processor state, as with SVE tuples). NFIELD can be 1, 2, 3, 4, 5, 6, 7 or 8. Not all NFIELD values are valid for each type.

    Scalable vector types which vary in both LMUL and NFIELD could be exposed to the user (see RVV Type System Documentation). For example, consider the following types:

    • vint8mf2_t has NFIELD=1, LMUL=1/2 and ty=i8
    • vuint32m4_t has NFIELD=1, LMUL=4 and ty=i32
    • vint16mf4x6_t has NFIELD=6, LMUL=1/4 and ty=i16
    • vint64m2x3_t has NFIELD=3, LMUL=2 and ty=i64

    This can include types which have different representation but have the same N:

    • vint32m1x2_t has NFIELD=2, LMUL=1 and ty=i32 (N=4 elements)
    • vint32m2_t has NFIELD=1, LMUL=2 and ty=i32 (N=4 elements)

    When NFIELD=1, LMUL=4 and ty=i64, four registers are grouped together to form a logical vector register, and this has the type <vscale x 4 x i64>:

    ├────────────────────── VLEN ────────────────────┤
     ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤
                           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┌──────────────────────╋─┬───────────────────────┐ ┃  ┬
    │          ...         ┃ │          i64          │ ┃  │
    └──────────────────────╋─┴───────────────────────┘ ┃  │
    ┌──────────────────────╋─┬───────────────────────┐ ┃  │
    │          ...         ┃ │          i64          │ ┃  │
    └──────────────────────╋─┴───────────────────────┘ ┃  │ LMUL=4 (vint64m4_t)
    ┌──────────────────────╋─┬───────────────────────┐ ┃  │ vscale x 4 x i64
    │          ...         ┃ │          i64          │ ┃  │
    └──────────────────────╋─┴───────────────────────┘ ┃  │
    ┌──────────────────────╋─┬───────────────────────┐ ┃  │
    │          ...         ┃ │          i64          │ ┃  │
    └──────────────────────╋─┴───────────────────────┘ ┃  ┴
                           ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
    

    Similarly, when NFIELD=1, LMUL=4 and ty=i32, the smaller element type results in each register containing more elements to add up to vunit and this is repeated across all four registers:

    ├────────────────────── VLEN ────────────────────┤
     ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤
                           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  ┬
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  │
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  │
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  │ LMUL=4 (vint32m4_t)
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  │ vscale x 8 x i32
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  │
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  │
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  ┴
                           ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
    

    It is possible for different scalable vector types with RVV to have the same value of N, consider NFIELD=1, LMUL=2 and ty=i32..

    ├────────────────────── VLEN ────────────────────┤
     ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤
                           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  ┬
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  │ LMUL=2 (vint32m2_t)
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  │ vscale x 4 x i32
    │          ...         ┃ │    i32    │    i32    │ ┃  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  ┴
                           ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
    

    ..and NFIELD=2, LMUL=1 and ty=i32:

    ├────────────────────── VLEN ────────────────────┤
     ├── vscale x 64 bits ──┤ ├────── 64 bits ──────┤
                           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓            ┐
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  ┬         │
    │          ...         ┃ │    i32    │    i32    │ ┃  │ LMUL=1  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  ┴         │
                           ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛            ├─ vint32m1x2_t
                           ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓            │  vscale x 4 x i32
    ┌──────────────────────╋─┬───────────┬───────────┐ ┃  ┬         │
    │          ...         ┃ │    i32    │    i32    │ ┃  │ LMUL=1  │
    └──────────────────────╋─┴───────────┴───────────┘ ┃  ┴         │
                           ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━┛            ┘
    

    Despite vint32m2_t and vint32m1x2_t having the same value of N, and hence same LLVM type, vscale x 4 x i32, the value of LMUL in the processor state will be different when each are used. In practice, RVV tuples lower to target-specific types in LLVM rather than generic scalable vector types, so rustc_scalable_vector will not initially support RVV tuples (see RISC-V Vector Extension's tuple types).

    RVV's scalable vectors cannot be defined without the attribute accepting arbitrary specification of N or an argument to the attribute to specify the lmul used:

    // alternative: user-provided arbitrary `N`
    #[rustc_scalable_vector(4)]
    struct vint32m2_t(i32);
    
    #[rustc_scalable_vector(1)]
    struct vint16mf4_t(i16);
    
    // alternative: user-provided `LMUL`
    #[rustc_scalable_vector(lmul = "2")]
    struct vint32m2_t(i32);
    
    #[rustc_scalable_vector(lmul = "1/4")]
    struct vint16mf4_t(i16);

It is technically possible to calculate N, requiring lots of additional machinery in the rustc_scalable_vector attribute, much of which would be mutually exclusive or produce invalid types with many values of the parameters.

There will be a fixed number of scalable vector types that will be defined in the standard library alongside their intrinsics and well-tested. It is very likely that their implementations will be automatically generated. For example, while not being proposed in this RFC, it is expected Arm SVE will define 55 scalable vector types, exhaustively covering the possible vector types enabled by the architecture extension (it is assumed that the same will be true for RISC-V RVV):

  • svbool_t
  • sv{int8,uint8}{,x2,x3,x4}_t
  • sv{int16,uint16}{,x2,x3,x4}_t
  • sv{float32,int32,uint32}{,x2,x3,x4}_t
  • sv{float64,int64,uint64}{,x2,x3,x4}_t
  • svbool{2,4,8}_t for internal use
  • nxv{2,4,8}{i8,u8} for internal use
  • nxv{2,4}{i16,u16} for internal use
  • nxv2{i32,u32} for internal use

Given the complexity required in rustc_scalable_vector to be able to calculate N, that it would still be possible to use the attribute incorrectly, that the attribute is permanently unstable, and the low risk of misuse given the intended use, this proposal argues that allowing arbitrary specification of N is reasonable.

Prior art

rfcs#3268 was a previous iteration of this RFC.

Other languages

There are not many languages with support for scalable vectors:

  • SVE in C takes a similar approach as this proposal by using sizeless incomplete types to represent scalable vectors. However, sizeless types are not part of the C specification and Arm's C Language Extensions (ACLE) provide an edit to the C standard which formally define "sizeless types".
  • .NET 9 has experimental support for SVE, but as a managed language, the design and implementation considerations in .NET are quite different to Rust.

repr(simd) and target_feature

Both repr(simd) and target_feature attributes were initially proposed in RFCs:

  • rfcs#2045: target_feature

    • Original accepted RFC for #[target_feature(enable = "..")].
    • Of relevance to ABI-affecting target features, there was various discussion around the RFC which led to discussion of ABI issues being relegated to an unresolved question
      • At the time of writing the RFC, Portable SIMD types were the only types that were considered as potentially having ABI issues, rather than types to be used with vendor intrinsics
        • However, there are types that we might want to add to the language at some point, like portable vector types, for which this [a lack of ABI changes] is not the case.

          The behaviour of #[target_feature] for those types should be specified in the RFC that proposes to stabilize those types, and this RFC should be amended as necessary.

        • It does not appear that this has been considered for any intrinsic types that were later stabilised (e.g. Neon intrinsics on AArch64) and as such that these types can exist in featureless functions is an accident of history
  • rust#44839: Tracking issue for RFC 2045: improving #[target_feature]

    • Tracking issue for the #[target_feature] parts of RFC 2045
    • This issue hasn't been well-maintained and the description is out-of-date. It aims to track both the addition of intrinsics for various architectures which use #[target_feature] as well as improvements to the #[target_feature] attribute itself
    • It was last triaged by the language team in Mar 2022, concluding that the issue needed an owner
  • rfcs#2396: #[target_feature] 1.1

    • Allows specifying #[target_feature] functions without making them unsafe, still requiring calls to be in unsafe blocks unless the calling function also has the target features enabled
  • rfcs#1199: repr_simd

    • Proposed repr(simd) attribute, applied to structs with multiple fields, one for each element in the corresponding vector
      • repr(simd) has since changed it was proposed in this RFC
        • It is used for both portable SIMD types and non-portable types, and now contains an array (i.e. [f32; 4] instead of (f32, f32, f32, f32))
    • Largely focused on portable SIMD, rather than non-portable intrinsics
    • Proposed intrinsics be declared in extern "platform-intrinsic" blocks and that platform detection be available (though this part was later subsumed by rfcs#2045)
  • rust#27731: Tracking issue for SIMD support

    • Initially tracked implementation of rfcs#1199, eventually ended up tracking simd_ffi (rust#53346), repr(simd) and core::arch intrinsics
    • It was later closed and split up into tracking issues for each architecture's intrinsics

There are many existing issues and RFCs related to repr(simd); interactions between SIMD types and target features; and ABI incompatibilities with SIMD types, surveyed in the sections below.

Many of these issues were related to specific intrinsics on specific platforms (adding, stabilising or fixing bugs with them), these have been omitted and only issues that affect generic infrastructure are included.

Projections into repr(simd)

A handful of issues are related to projections into repr(simd) types being initially permitted..

  • rust#105439: ICE due to generating LLVM bitcast vec -> array

    • Accessing the field of a repr(simd) type causes an ICE
    • Fixed by rust#105583, changing codegen to remove the illegal operation
    • Later addressed holistically by compiler-team#838 which will ban projecting into repr(simd) types
  • rust#137108: Projecting into non-power-of-two-lanes repr(simd) types does the wrong thing

  • rust#113465: transmute + tuple access + eq on repr(simd)'s inner value seems UB to valgrind

These issues are informative for scalable vectors and projection into scalable vectors will not be supported, as described in Reference-level explanation.

Inheritance of target_feature

Other issues discussed confusion related to inheritance of target_feature to nested functions and closures...

  • rust#58729: target_feature doesn't trickle down to closures and internal fns

    • target_feature attribute doesn't apply to nested functions and closures
      • Interaction with nested functions is expected, these never inherit from their parent
      • Interaction with closures was a bug
        • Prior to rfcs#2396, closures would have required the ability to be marked as unsafe to support target_feature
        • After rfcs#2396, closures inheriting target features was accepted in rust#73631 (then implemented in rust#78231)
          • Interactions with inline(always) fixed in rust#111836
            • target_feature attributes are ignored from inline(always)-annotated closures
  • rust#108338: closure doesn't seem to inherit the target attributes for codegen purposes

    • Basically a dupe of rust#58729 with same resolution
  • rust#111836: Fix #[inline(always)] on closures with target feature 1.1

    • Allows #[inline(always)] to be used with #[target_feature] on closures, assuming that target features only affect codegen

Scalable vectors will inherit the behaviour described above.

repr(simd) syntax

There are issues related to how well repr(simd) syntax works with other representation hints and whether a language item would be better:

  • rust#47103: What to do about repr(C, simd)?

    • Unclear what the behaviour of repr(C, simd) should be
      • When submitted, a warning of incompatible representation hints was emitted
      • When omitted, a FFI unsafety warning was emitted when SIMD types used in FFI
    • Passing vectors as immediates is trickier, later resolved in rfcs#2574, so discussion focused on passing vectors indirectly over the FFI boundary
    • Discussion fizzled out, but with rust#116558, it may be possible to allow repr(C, simd)
  • rust#130402: When a type is #[repr(simd)], #[repr(align(N))] annotations are ignored

    • repr(align) ignored when repr(simd) is present
    • Intended to be fixed after rust#137256 which refactored layout logic within the compiler
      • Unclear if the fix happened, but the code from the bug report still has the unexpected alignment
  • rust#63633: Remove repr(simd) attribute and use a lang-item instead

    • Suggests using a language item for the Simd type (part of Portable SIMD) instead of using repr(simd)
    • Doesn't address what would happen for non-portable intrinsics that also use this infrastructure
    • Various other issues cited as motivation:
      • rust#18147 used Portable SIMD f64x2 and found that constant initialisers weren't optimised with -Copt-level=0
        • Not clear that this applies to types intended for use with architecture-specific intrinsics
      • rust#47103
        • See above
      • rust#53346
        • See below
      • rust#77529
        • See below
      • rust#77866 defines its own repr(simd) type and then passes it to an LLVM intrinsic binding that has been declared incorrectly
      • rust#81931 defines its own repr(simd) type and finds that it is misaligned according to recommendations for achieving best performance

On account of these concerns, scalable vectors use rustc_scalable_vector instead.

Portable SIMD-specific

A handful of architecture-agnostic issues only relate to Portable SIMD:

  • rust#126217: What should SIMD bitmasks look like?

    • Design discussion related to Portable SIMD simd_bitmask/simd_bitmask_select intrinsics
    • Not relevant to architecture-specific scalable vector intrinsics
  • rust#99211: fn where clause "constrains trait impls" or something

    • Writing extension traits with const generics which. apply to Portable SIMD types can run into tricky compiler errors related to the type system
    • Only applies to Portable SIMD
  • rust#77529: Invalid monomorphisation when -Clink-dead-code is used

    • repr(simd) types w/ generics (i.e. Portable SIMD or hand-rolled equivalents) can have invalid instantiations with -Clink-dead-code

These issues don't apply to scalable vectors.

Const-initialisation of vectors

There was a single issue related to const-initialisation of non-portable vector types:

  • rust#48745: Provide a way to const-initialise vendor-specific vector types
    • Initialisation of fixed length vectors was not possible in a const context for non-portable SIMD
    • mem::transmute being made constant has addressed this issue

This issue doesn't apply to scalable vectors as they are inherently non-const.

target_feature ABI

There have been well-documented issues with the ABI of fixed-length SIMD vectors, many of which apply to scalable vectors too, but are harder to resolve:

  • rust#44367: repr(simd) is unsound

    • repr(simd) types in functions with different target features enabled can have different ABIs
    • Fixed by passing SIMD types indirectly in rust#47743
  • rust#53346: repr(simd) is unsound in C FFI

    • Same issue as in rust#44367 but only with extern "C" functions where the Rust ABI does not apply
    • Later fixed by rust#116558
  • rust#87438: future-incompat: use of SIMD types aren't gated properly

    • Calling a extern "C" function with an SIMD vector type in a repr(C) or repr(transparent) struct doesn't error
    • Later fixed by rust#116558
  • rfcs#2574: simd_ffi

    • Permits calls to extern "C" functions with SIMD types so long as those functions have the appropriate target_feature attribute
    • Never fully implemented until rust#116558 effectively did so
  • rust#131800: Figure out which target features are required for which SIMD size

    • As part of rust#116558, solicited input in determining which target features were required for a given vector length so that the lint could check for those
  • rust#133146: How should we handle dynamic vector ABIs?

    • Follow-up to rust#131800:
      • Existing ABI compatibility checks rely on the length of the vector and the architecture to identify an appropriate target feature that must be enabled, but this approach does not scale to scalable vectors
  • rust#133144: How should we handle matrix ABIs?

  • rust#116558: The extern "C" ABI of SIMD vector types depends on target features (tracking issue for abi_unsupported_vector_types future-incompatibility lint)

    • Identified ABI incompatibility when calling extern "C" functions that used SIMD types
      • Rust passes SIMD types indirectly for functions with and without target_feature annotations in its ABI. extern "C" functions take SIMD types as immediates
      • Calls from annotated functions to extern "C" could use immediates, but calls from non-annotated functions could not. Rust did not prevent calls from non-annotated functions.
    • A abi_unsupported_vector_types future-incompatibility lint was introduced to enforce that extern "C" functions could not have SIMD types in their signatures without the appropriate target feature being enabled
      • The lint has since been removed and replaced with a hard error
      • It only triggers when such a function is called
  • rust#132865: Support calling functions with SIMD vectors that couldn't be used in the caller

    • Follow-up to rust#116558
    • There are valid calls to extern "C" functions which take SIMD types that are not currently accepted, such as checking for the presence of the target feature and then calling the extern "C" function with a newly created vector
    • It is hard to support this as it is not possible to generate a call with a specific ABI without annotating the entire containing function as having the target feature (llvm#70563)
      • This limitation also causes similar issues with inlining (rust#116573)
  • Pre-RFC: Fixing ABI for SIMD types

    • Proposes requiring appropriate target features be enabled when a x86 SIMD type is used in a function signature
      • Written primarily considering x86 SIMD
      • Considers both globally-enabled target features (e.g. -Ctarget-feature or default features from target specification) and per-function-enabled target features (#[target_feature])
      • Proposes generating shims to translate between ABIs when calling annotated functions with SIMD type arguments from non-annotated functions
        • Avoids breakage in cases similar to rust#132865 but between annotated and non-annotated functions, rather than just Rust ABI to non-Rust ABI
      • Errors will be emitted for function pointers based on the target features of the caller
    • Prompted by discussion in rust#116558
    • Never progressed to being a submitted RFC
    • Discussed on Zulip
      • How does the pre-RFC interact with Portable SIMD efforts?
        • The inherent portability of these types means that they will need a matching featureful and featureless ABI. It is suggested that this be the current indirect ABI, but this isn't seen as desirable - ABI shims or per-target-feature monomorphisation is to be explored
      • Should there be a difference in codegen for calls to function items vs function pointers (e.g. use of a shim)?
        • Suggestion that an ABI shim be used for function pointers rather than requiring target feature on functions with the call
      • Is there a proper featureless ABI for x86 SIMD types?
        • Yes, details in thread
      • Should these changes also apply to extern "Rust"?
        • Mixed opinions - enables use of performant ABI, larger breaking change

        • Concern that doing this jeopardizes the entire proposal and that Rust is stuck with the current behaviour

          I still don't like the idea I'm forced to use an FFI calling convention in pure rust code because the default is fundamentally too slow

          it's a tradeoff. should the default be portable or fast. I dont think there is an obvious right answer here. might be worth digging out the history that led to the current situation -- possibly this decision has been made in the past, in favor of "portable", and that's why the ABI works the way it does?

        • Could use the performant ABI when global target features have feature enabled

        • References later design meeting (lang-team#235)

  • lang-team#235: Design meeting: resolve ABI issues around target-feature (meeting notes)

    • Proposes property that functions with the same signature will always have the same ABI (i.e. that target features will not be considered in the ABI) and three possible fixes:
      • Track target features as part of function signatures, which is hard to do without changing function pointer syntax
        • Discussed briefly but not proposed due to concerns regarding breaking change and expectation that function pointer syntax would need changed/extended, and that it introduces a new semver hazard (adding a SIMD type field)
        • Suggested that if this route were taken then allowing target features in extern blocks would be desirable and passing SIMD types using registers could be considered
        • Did not discuss challenges related to trait methods and generic functions
        • References rust#111836
      • Define an ABI which does not depend on target features
        • i.e. as the Rust ABI today with indirect passing of SIMD types
      • Reject declaring/calling functions with target-feature-requiring types when the ABI target feature is not available/enabled
        • References Pre-RFC: Fixing ABI for SIMD types, proposing a variant of the RFC:
          • Instead of applying to all ABIs (as in the pre-RFC) and using the performant calling convention, it would only to non-Rust ABIs (e.g. extern "C") and would be based on a size of the vector to feature mapping (rather than annotating types w/ the required features)
            • This narrowly fixes the soundness issue and is the basis for the currently implemented future incompatibility warning
      • Reject enabling/disabling certain ABI-affecting features
        • Discusses this as a solution for ABI issues relating to floats
    • During the meeting, various points were discussed:
      • Should it be possible to declare (non-Rust ABI) functions which take SIMD vectors as long as there aren't calls?
        • Intended to reduce potential opportunities breakage.
      • How plausible is it to include ABI-affecting target features in function pointer types?
        • It is suggested that this information could be smuggled through the ABI name: e.g. extern "Rust+avx"
        • Concern that this is a wider breaking change and would require treating ABIs specially or would require shims
        • Feeling that this should be possible but not discussed further as an immediate solution was desired to resolve unsoundness
      • There was consensus that crater runs were needed to gather data
    • After the meeting, the language team concluded:

      We discussed this question in the T-lang design meeting on 2023-12-20. The consensus was generally in favor of fixing this, and we were interested in seeing more work in this direction. While we considered various ways that this could be fixed, we were supportive of finding the most minimal, simplest way to fix this as the first step, assuming that such an approach proves to be feasible. We'll need to see at least the results of a crater run and further analysis to confirm that feasibility.

      • It was explicitly noted in the notes that the language team didn't want to rule out considering target features as part of the ABI:

        Track the target features in the function signature. This would basically mean that function pointers now have to also list the set of ABI-relevant target features that were enabled. This would be a rather fundamental change requiring new function pointer syntax, and hard to do without breaking code, so we mention it only for completeness' sake.

        I don't think we should rule this out for the future. We've already talked about being able to track calling-convention ABI (extern "C" vs other ABIs) in function pointers somehow, so that we can safely track which kind of function we have. We've also talked about having this work in generics somehow, so that the Fn traits have a (defaulted) parameter for ABI or similar, and the monomorphization of a call will call the right ABI.

See ABI for discussion of these challenges as they apply to scalable vectors.

Multiversioning and effects

There are ongoing efforts related to improving Rust's SIMD support:

  • rust-project-goals#261: Nightly support for ergonomic SIMD multiversioning

    • Generating efficient code for specific SIMD ISAs requires target_feature attributes on functions, which isn't particularly ergonomic
      • Need to do runtime checks then dispatch to functions with target features
        • Must be repeated when leaving and entering these functions
      • Intermediate functions use inline(always) to avoid having to have different versions for each target feature, which impacts code size
    • Various solutions have been proposed - witness types carrying target feature information, inherited target features from callers, features being const generic arguments
    • Goal aims to explore design space and experiment
  • lang-team#309: SIMD multiversioning (pre-read)

    • Design meeting proposal, has not yet taken place
    • Compares two related proposals that address some of the problems with multiversioning
      • Ideas similar to rfcs#3525
        • In brief: attaching target features to types, introduce traits that abstract over common operations, functions are generic over those traits and when instantiated with a function-carrying type, inherit the target feature and use the trait methods to do SIMD operations
      • Ideas similar to unopened contextual target features RFC
        • In brief: #[target_features(caller)] which causes a function to inherit the features of the caller
        • Expands on this with a #[target_features(generic)] attribute which takes target features from the first const generic argument of the function (i.e. a const FEATURES: str)
    • Both ideas have many open design questions
  • rust#143352: Tracking issue for Effective Target Features

    • Initial proposal aims to experiment with SIMD multiversioning based on the effect model used with const traits
    • In brief: traits can be defined as having a target feature effect, implementations of those traits define the target feature that is enabled by the effect, bounds on the trait in functions will enable the target feature from the impl
  • lang-team#317: Design meeting: "Marker effects" (meeting notes)

    • Discusses the findings from investigations into keyword generics and the implementation of const traits, proposing a categorisation of effects and a subset to focus on initially
    • Briefly discusses that effects overlap with the SIMD multiversioning efforts

These efforts are followed with interest as they may synergise well with resolving the similar challenges that scalable vectors face. See Trait implementations and generic instantiation.

Unresolved questions

There is one outstanding unresolved question for scalable vectors:

Future possibilities

There are a handful of future possibilities enabled by this RFC - relaxing restrictions, architecture-agnostic use or extending the feature to support more features of the architecture extensions:

Trait implementations and generic instantiation

Improvements to the language's target_feature infrastructure could enable the restrictions on trait implementations and generic instantiation to be lifted:

  • Some variety of rfcs#3820: target_feature_traits could help traits be implemented on scalable vectors

  • Efforts to integrate target features with the effect system (rust#143352) may help enable generic instantiation of scalable vectors

    • Any mechanism that could be applied to scalable vector types could also be used to enforce that existing SIMD types are only used in target_feature-annotated functions, which would enable fixed-length vectors to be passed as immediates, improving performance
  • It may be possible to support scalable vector types without the target feature being enabled by using an indirect ABI similarly to fixed length vectors.

    • This would enable these restrictions to be lifted and for scalable vector types to be the same as fixed length vectors with respect to interactions with the target_feature attribute.

      • As with fixed length vectors, it would still be desirable for them to avoid needing to be passed indirectly between annotated functions, but this could be addressed in a follow-up.
    • Experimentation is required to determine if this is feasible.

Compound types

The restriction that scalable vectors cannot be used in compound types could be relaxed at a later time either by extending rustc's codegen or leveraging newly added support in LLVM.

However, as C also has this restriction and scalable vectors are nevertheless used in production code, it is unlikely there will be much demand for those restrictions to be relaxed in LLVM.

RISC-V Vector Extension's tuple types

As explained in Manually-chosen or compiler-calculated element count, there is a distinction in RVV between vectors which would have the same N in a scalable vector type, but which vary in LMUL and NFIELD.

For example, vint32m2_t and vint32m1x2_t, if lowered to scalable vector types in LLVM, would both be <vscale x 4 x i32>.

RVV's tuple types need to be lowered to target-specific types in the backend which is out-of-scope of this general infrastructure for scalable vectors.

Portable SIMD

Given that there are significant differences between scalable vectors and fixed-length vectors, and that std::simd is unstable, it is worth experimenting with architecture-specific support and implementation initially. Later, there are a variety of approaches that could be taken to incorporate support for scalable vectors into Portable SIMD.