Implement all x86 vendor intrinsics

This is intended to be a tracking issue for implementing all vendor intrinsics in this repository.
This issue is also intended to be a guide for documenting the process of adding new vendor intrinsics to this crate.

If you decide to implement a set of vendor intrinsics, please check the list below to make sure somebody else isn't already working on them. If it's not checked off or has a name next to it, feel free to comment that you'd like to implement it!

At a high level, each vendor intrinsic should correspond to a single exported Rust function with an appropriate `target_feature` attribute. Here's an example for `_mm_adds_epi16`:

```rust
/// Add packed 16-bit integers in `a` and `b` using saturation.
#[inline]
#[target_feature(enable = "sse2")]
#[cfg_attr(test, assert_instr(paddsw))]
pub unsafe fn _mm_adds_epi16(a: __m128i, b: __m128i) -> __m128i {
 unsafe { paddsw(a, b) }
}
```

Let's break this down:

* The `#[inline]` is added because vendor intrinsic functions generally should always be inlined because the intent of a vendor intrinsic is to correspond to a single particular CPU instruction. A vendor intrinsic that is compiled into an actual function call could be quite disastrous for performance.
* The `#[target_feature(enable = "sse2")]` attribute intructs the compiler to generate code with the `sse2` target feature enabled, *regardless* of the target platform. That is, even if you're compiling for a platform that doesn't support `sse2`, the compiler will still generate code for `_mm_adds_epi16` *as if* `sse2` support existed. Without this attribute, the compiler might not generate the intended CPU instruction.
* The `#[cfg_attr(test, assert_instr(paddsw))]` attribute indicates that when we're testing the crate we'll assert that the `paddsw` instruction is generated inside this function, ensuring that the SIMD intrinsic truly is an intrinsic for the instruction!
* The types of the vectors given to the intrinsic should match exactly the types as provided in the vendor interface. (with things like `int64_t` translated to `i64` in Rust)
* The implementation of the vendor intrinsic is generally very simple. Remember, the goal is to compile a call to `_mm_adds_epi16` down to a single particular CPU instruction. As such, the implementation typically defers to a compiler intrinsic (in this case, `paddsw`) when one is available. More on this below as well.
* The intrinsic itself is `unsafe` due to the usage of `#[target_feature]`

Once a function has been added, you should also add at least one test for basic functionality. Here's an example for `_mm_adds_epi16`:

```rust
#[simd_test = "sse2"]
unsafe fn test_mm_adds_epi16() {
 let a = _mm_set_epi16(0, 1, 2, 3, 4, 5, 6, 7);
 let b = _mm_set_epi16(8, 9, 10, 11, 12, 13, 14, 15);
 let r = _mm_adds_epi16(a, b);
 let e = _mm_set_epi16(8, 10, 12, 14, 16, 18, 20, 22);
 assert_eq_m128i(r, e);
}
```

Note that `#[simd_test]` is the same as `#[test]`, it's just a custom macro to enable the target feature in the test and generate a wrapper for ensuring the feature is available on the local cpu as well.

Finally, once that's done, send a PR!

## Writing the implementation

An implementation of an intrinsic (so far) generally has one of three shapes:

1. The vendor intrinsic does not have any corresponding compiler intrinsic, so you must write the implementation in such a way that the compiler will recognize it and produce the desired codegen. For example, the `_mm_add_epi16` intrinsic (note the missing `s` in `add`) is implemented via `simd_add(a, b)`, which compiles down to LLVM's cross platform SIMD vector API.
2. The vendor intrinsic *does* have a corresponding compiler intrinsic, so you must write an `extern` block to bring that intrinsic into scope and then call it. The example above (`_mm_adds_epi16`) uses this approach.
3. The vendor intrinsic has a parameter that must be a *constant* value when given to the CPU instruction, where that constant is often a parameter that impacts the operation of the intrinsic. This means the implementation of the vendor intrinsic must guarantee that a particular parameter be a constant. This is tricky because Rust doesn't (yet) have a stable way of doing this, so we have to do it ourselves. How you do it can vary, but one particularly gnarly example is [`_mm_cmpestri`](https://github.com/BurntSushi/stdsimd/blob/ff6021b72e8cc1e7db942847d99278fe0056c245/src/x86/sse42.rs#L286) (make sure to look at the `constify_imm8!` macro).

## References

All intel intrinsics can be found here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5236

The compiler intrinsics available to us through LLVM can be found here: https://gist.github.com/anonymous/a25d3e3b4c14ee68d63bd1dcb0e1223c

The Intel vendor intrinsic API can be found here: https://gist.github.com/anonymous/25d752fda8521d29699a826b980218fc

The Clang header files for vendor intrinsics can also be incredibly useful. When in doubt, Do What Clang Does:
https://github.com/llvm-mirror/clang/tree/master/lib/Headers


## TODO


<details><summary>["AVX2"]</summary>

 * [ ] [`_mm256_stream_load_si256`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_stream_load_si256&expand=5236)
 * [ ] [`_mm_broadcastsi128_si256`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_broadcastsi128_si256&expand=5236)
</details>

<details><summary>["MMX"]</summary>

 * [ ] [`_mm_srli_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_pi16&expand=5236)
 * [ ] [`_mm_srl_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_pi16&expand=5236)
 * [ ] [`_mm_mullo_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mullo_pi16&expand=5236)
 * [ ] [`_mm_slli_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_si64&expand=5236)
 * [ ] [`_mm_mulhi_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mulhi_pi16&expand=5236)
 * [ ] [`_mm_srai_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srai_pi16&expand=5236)
 * [ ] [`_mm_srli_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_si64&expand=5236)
 * [ ] [`_mm_and_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_and_si64&expand=5236)
 * [ ] [`_mm_cvtsi32_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi32_si64&expand=5236)
 * [ ] [`_mm_cvtm64_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtm64_si64&expand=5236)
 * [ ] [`_mm_andnot_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_andnot_si64&expand=5236)
 * [ ] [`_mm_packs_pu16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_packs_pu16&expand=5236)
 * [ ] [`_mm_madd_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_madd_pi16&expand=5236)
 * [ ] [`_mm_cvtsi64_m64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi64_m64&expand=5236)
 * [ ] [`_mm_cmpeq_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi16&expand=5236)
 * [ ] [`_mm_sra_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sra_pi32&expand=5236)
 * [ ] [`_mm_cvtsi64_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi64_si32&expand=5236)
 * [ ] [`_mm_cmpeq_pi8`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi8&expand=5236)
 * [ ] [`_mm_srai_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srai_pi32&expand=5236)
 * [ ] [`_mm_sll_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_pi16&expand=5236)
 * [ ] [`_mm_srli_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_pi32&expand=5236)
 * [ ] [`_mm_slli_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_pi16&expand=5236)
 * [ ] [`_mm_srl_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_si64&expand=5236)
 * [ ] [`_mm_empty`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_empty&expand=5236)
 * [ ] [`_mm_srl_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_pi32&expand=5236)
 * [ ] [`_mm_slli_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_pi32&expand=5236)
 * [ ] [`_mm_or_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_or_si64&expand=5236)
 * [ ] [`_mm_sll_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_si64&expand=5236)
 * [ ] [`_mm_sra_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sra_pi16&expand=5236)
 * [ ] [`_mm_sll_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_pi32&expand=5236)
 * [ ] [`_mm_xor_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_xor_si64&expand=5236)
 * [ ] [`_mm_cmpeq_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi32&expand=5236)
</details>

<details><summary>["SSE"]</summary>

 * [ ] [`_mm_free`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_free&expand=5236)
 * [ ] [`_mm_storeu_si16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si16&expand=5236)
 * [ ] [`_mm_loadu_si16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si16&expand=5236)
 * [x] [`_mm_loadu_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si64&expand=5236)
 * [ ] [`_mm_malloc`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_malloc&expand=5236)
 * [ ] [`_mm_storeu_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si64&expand=5236)
</details>


<details><summary>["SSE2"]</summary>

 * [ ] [`_mm_loadu_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si32&expand=5236)
 * [ ] [`_mm_storeu_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si32&expand=5236)
</details>


<details><summary>["SSE4.1"]</summary>

 * [ ] [`_mm_stream_load_si128`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_stream_load_si128&expand=5236)
</details>


---

[previous description of this issue](https://gist.github.com/alexcrichton/58838cc127838da9d9584446b95aa1b4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement all x86 vendor intrinsics #40

Writing the implementation

References

TODO

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement all x86 vendor intrinsics #40

Description

Writing the implementation

References

TODO

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions