cpu: rv64: support RVV f16 k3s1 and k3s2 nhwc dwconv by xinghai-zh · Pull Request #5345 · uxlfoundation/oneDNN

xinghai-zh · 2026-06-17T06:46:47Z

Description

This PR adds an RV64 RVV implementation for f16 depthwise convolution.

The new primitive covers forward 2D depthwise convolution with:

f16 source, weights, and destination
optional f16 or f32 bias
grouped depthwise layout where IC == G
3x3 kernel, stride 1 or stride 2, dilation 1
NHWC source and destination tensors
GOIHW weights

The implementation adds JIT RVV kernels for the 3x3 f16 depthwise cases, packs NHWC input and GOIHW weights into the layout expected by the kernels, and registers the primitive before the reference f16 convolution implementation. The stride-1 and stride-2 kernels share the common epilogue path for bias, f32-to-f16 narrowing, and output stores; stride-specific input loading and FMA scheduling are generated separately.

Performance

Benchmarks were run on Spacemit X100 (K3).

Performance was measured with MobileNet depthwise f16 benchdnn cases. The baseline is the latest upstream repository version before this change.

Case	Impl	Ref avg time	New avg time	Speedup
mobilenet:conv2_1/dw	dw_k3s1s2:rvv	21.8489	2.16469	10.09x
mobilenet:conv2_2/dw	dw_k3s1s2:rvv	11.5791	2.97556	3.89x
mobilenet:conv3_1/dw	dw_k3s1s2:rvv	24.4496	1.62424	15.05x
mobilenet:conv3_2/dw	dw_k3s1s2:rvv	8.19451	1.26879	6.46x
mobilenet:conv4_1/dw	dw_k3s1s2:rvv	10.9909	0.869389	12.64x
mobilenet:conv4_2/dw	dw_k3s1s2:rvv	2.98457	0.685749	4.35x
mobilenet:conv5_1/dw	dw_k3s1s2:rvv	5.46905	0.671989	8.14x
mobilenet:conv5_6/dw	dw_k3s1s2:rvv	1.53196	0.515312	2.97x
mobilenet:conv6/dw	dw_k3s1s2:rvv	2.78519	0.711036	3.92x

The MobileNet depthwise cases are now covered by the RVV primitive and no longer fall back to ref:any.

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

Performance improvements

Have you submitted performance data that demonstrates performance improvements?

zhangfeiv0 · 2026-06-17T07:26:13Z

Regarding the development plan for RV64, I think file naming should be unified under the prefix jit_uni*. Including rvv in the names is confusing, since the code actually also references the Zvfh extension and may introduce more extensions in the future. In addition, the CPU_INSTANCE_RV64GCV macro should evolve toward CPU_INSTANCE_RV64; you can check #5343 for details.

zhangjian29 · 2026-06-17T08:29:31Z

+    const dim_t padded_w = iw + l_pad + r_pad;
+    const dim_t stride_h = cd->strides[0];
+
+    std::vector<float16_t> packed_weights;


Recommend to use scratchpad instead of heap allocation in oneDNN.

Thanks for your recommendation. We have updated the temporary buffers in jit_uni_dwconv to use oneDNN scratchpad instead of heap allocations.

zhangjian29 · 2026-06-17T08:33:54Z

+            VDISPATCH_CONV(dilate_h == 0 && dilate_w == 0, VERBOSE_BAD_DIM,
+                    "dilates", 0);
+


Do we need a check on whether padding matches the kernel's zero-fill assumption? If padding is asymmetric (e.g., t_pad != l_pad), the packing code would produce incorrect results silently.

Thanks for the reminder. We re-examined the implementation and performed additional verification. Based on our current understanding, additional checks related to padding does not appear to be necessary.

The implementation does not assume t_pad == l_pad or otherwise require symmetric padding. In the packing path, a buffer of size (ih + t_pad + b_pad) x (iw + l_pad + r_pad) is allocated and zero-initialized, and the source tensor is copied into the region starting at (h + t_pad, w + l_pad). The JIT kernel then operates on this zero-padded buffer.

We also validated this behavior with benchdnn correctness tests. All of the following cases selected jit_dw:uni and passed successfully:

Scenario Descriptor Result

No padding g8mb1ic8ih8oc8oh6kh3ph0iw8ow6kw3pw0 Passed

Symmetric padding g8mb1ic8ih8oc8oh8kh3ph1iw8ow8kw3pw1 Passed

Top-only padding g8mb1ic8ih8oc8oh7kh3ph1iw8ow6kw3pw0 Passed

Bottom-only padding g8mb1ic8ih8oc8oh7kh3ph0iw8ow6kw3pw0 Passed

Left-only padding g8mb1ic8ih8oc8oh6kh3ph0iw8ow7kw3pw1 Passed

Right-only padding g8mb1ic8ih8oc8oh6kh3ph0iw8ow7kw3pw0 Passed

Small asymmetric padding, with bottom/right padding inferred by benchdnn g8mb1ic8ih8oc8oh9kh3ph1iw8ow9kw3pw1 Passed

Stride-2 asymmetric padding g8mb1ic8ih8oc8oh5kh3sh2ph1iw8ow5kw3sw2pw1 Passed

Large top/left padding g8mb1ic8ih64oc8oh92kh3ph30iw64ow102kw3pw40 Passed

Large bottom/right padding, inferred from the output shape g8mb1ic8ih64oc8oh92kh3ph0iw64ow102kw3pw0 Passed

Large asymmetric padding on both sides g8mb1ic8ih64oc8oh132kh3ph30iw64ow132kw3pw30 Passed

Stride-2 with large asymmetric padding g8mb1ic8ih64oc8oh42kh3sh2ph10iw64ow44kw3sw2pw12 Passed

Thanks for the comment!

xinghai-zh · 2026-06-17T08:37:21Z

Regarding the development plan for RV64, I think file naming should be unified under the prefix jit_uni*. Including rvv in the names is confusing, since the code actually also references the Zvfh extension and may introduce more extensions in the future. In addition, the CPU_INSTANCE_RV64GCV macro should evolve toward CPU_INSTANCE_RV64; you can check #5343 for details.

Thanks for the suggestion. We have already consolidated the current k3 implementation into jit_uni_dwconv.hpp/cpp following the gnorm style.
Looking ahead, if we add k5/k7 depthwise kernels later, would it still be preferable to keep all kernel implementations in jit_uni_dwconv.cpp? Or would it also be acceptable to split them into separate files such as jit_uni_dwconv_k{3,5,7}_kernel.cpp, as long as the naming remains under the jit_uni_dwconv* prefix?
cc @zhangfeiv0

Ga1axy0 added 3 commits June 17, 2026 13:44

cpu: rv64: add RVV f16 nhwc k3s1 depthwise convolution

e15a7bc

cpu: rv64: fix dwconv verbose checks

db7fc78

cpu: rv64: support RVV f16 k3s2 nhwc dwconv

0d97c32

xinghai-zh requested review from a team as code owners June 17, 2026 06:46

github-actions Bot added platform:cpu-rv64 RISC-V component:common labels Jun 17, 2026

cpu: rv64: fix clang format error in cpu_convolution_list.cpp

9c5f086

zhangjian29 reviewed Jun 17, 2026

View reviewed changes

cpu: rv64: refactor dwconv into unify style

4617a9c

xinghai-zh closed this Jun 17, 2026

xinghai-zh reopened this Jun 17, 2026

cpu: rv64: support scatchpad in dwconv

26d1cf2

zhangjian29 approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: rv64: support RVV f16 k3s1 and k3s2 nhwc dwconv#5345

cpu: rv64: support RVV f16 k3s1 and k3s2 nhwc dwconv#5345
xinghai-zh wants to merge 6 commits into
uxlfoundation:mainfrom
spacemit-com:upstream-spacemit-ops-parallel

xinghai-zh commented Jun 17, 2026

Uh oh!

zhangfeiv0 commented Jun 17, 2026

Uh oh!

zhangjian29 Jun 17, 2026

Uh oh!

xinghai-zh Jun 17, 2026

Uh oh!

zhangjian29 Jun 17, 2026

Uh oh!

xinghai-zh Jun 17, 2026

Uh oh!

zhangjian29 Jun 17, 2026

Uh oh!

xinghai-zh commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		VDISPATCH_CONV(dilate_h == 0 && dilate_w == 0, VERBOSE_BAD_DIM,
		"dilates", 0);

Scenario	Descriptor	Result
No padding	`g8mb1ic8ih8oc8oh6kh3ph0iw8ow6kw3pw0`	Passed
Symmetric padding	`g8mb1ic8ih8oc8oh8kh3ph1iw8ow8kw3pw1`	Passed
Top-only padding	`g8mb1ic8ih8oc8oh7kh3ph1iw8ow6kw3pw0`	Passed
Bottom-only padding	`g8mb1ic8ih8oc8oh7kh3ph0iw8ow6kw3pw0`	Passed
Left-only padding	`g8mb1ic8ih8oc8oh6kh3ph0iw8ow7kw3pw1`	Passed
Right-only padding	`g8mb1ic8ih8oc8oh6kh3ph0iw8ow7kw3pw0`	Passed
Small asymmetric padding, with bottom/right padding inferred by `benchdnn`	`g8mb1ic8ih8oc8oh9kh3ph1iw8ow9kw3pw1`	Passed
Stride-2 asymmetric padding	`g8mb1ic8ih8oc8oh5kh3sh2ph1iw8ow5kw3sw2pw1`	Passed
Large top/left padding	`g8mb1ic8ih64oc8oh92kh3ph30iw64ow102kw3pw40`	Passed
Large bottom/right padding, inferred from the output shape	`g8mb1ic8ih64oc8oh92kh3ph0iw64ow102kw3pw0`	Passed
Large asymmetric padding on both sides	`g8mb1ic8ih64oc8oh132kh3ph30iw64ow132kw3pw30`	Passed
Stride-2 with large asymmetric padding	`g8mb1ic8ih64oc8oh42kh3sh2ph10iw64ow44kw3sw2pw12`	Passed

Conversation

xinghai-zh commented Jun 17, 2026

Description

Performance

Checklist

General

Performance improvements

Uh oh!

zhangfeiv0 commented Jun 17, 2026

Uh oh!

zhangjian29 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

xinghai-zh Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

zhangjian29 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

xinghai-zh Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

zhangjian29 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

xinghai-zh commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants