Skip to content

[BPF] Consider alignment when selecting memcpy/memmove/memset op type#204042

Open
yonghong-song wants to merge 1 commit into
llvm:mainfrom
yonghong-song:fix-selection-dag
Open

[BPF] Consider alignment when selecting memcpy/memmove/memset op type#204042
yonghong-song wants to merge 1 commit into
llvm:mainfrom
yonghong-song:fix-selection-dag

Conversation

@yonghong-song

Copy link
Copy Markdown
Contributor

Currently getOptimalMemOpType() forces i64 for size >= 8 and i32 otherwise without considering the operation alignment. In [1], memcpy and memmove destination and source alignments are passed separately instead of being collapsed to their minimum. As a result, poor code may be generated for a copy with a well aligned destination but a poorly aligned source: a wide store ends up fed by a wide load that the legalizer has to synthesize from byte loads via a chain of shifts and ORs.

Return MVT::Other from getOptimalMemOpType() so that findOptimalMemOpLowering() in SelectionDAG/TargetLowering.cpp selects a proper value type and alignment.

Newly added memcpy-align.ll uses u8 type for load and stores. Without BPFISelLowering.h change, a chain of shifts and ORs will happen.

The rodata_1.ll change is only instruction scheduling / register allocation churn caused by the different but equivalent lowering.

[1] #201119

Currently getOptimalMemOpType() forces i64 for size >= 8 and i32 otherwise
without considering the operation alignment. In [1], memcpy and memmove
destination and source alignments are passed separately instead of being
collapsed to their minimum. As a result, poor code may be generated for a
copy with a well aligned destination but a poorly aligned source: a wide
store ends up fed by a wide load that the legalizer has to synthesize from
byte loads via a chain of shifts and ORs.

Return MVT::Other from getOptimalMemOpType() so that findOptimalMemOpLowering()
in SelectionDAG/TargetLowering.cpp selects a proper value type and alignment.

Newly added memcpy-align.ll uses u8 type for load and stores. Without
BPFISelLowering.h change, a chain of shifts and ORs will happen.

The rodata_1.ll change is only instruction scheduling / register allocation
churn caused by the different but equivalent lowering.

  [1] llvm#201119
@yonghong-song

yonghong-song commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

In BPFSelectionDAGInfo.cpp, we have

static cl::opt<unsigned> BPFMaxStoresPerMemFunc(
    "bpf-max-stores-per-memfunc", cl::Hidden, cl::init(128),
    cl::desc("Set the maximum number of stores for inlined BPF memory "
             "intrinsics"));

So the default value for MaxStoresPerMemcpy/MaxStoresPerMemset etc. is 128.
The '128' is for the number of actual copy.

    // inline memcpy() for kernel to see explicit copy
    unsigned CommonMaxStores =
      STI.getSelectionDAGInfo()->getCommonMaxStoresPerMemFunc();

    MaxStoresPerMemset = MaxStoresPerMemsetOptSize = CommonMaxStores;
    MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = CommonMaxStores;
    MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = CommonMaxStores;
    MaxLoadsPerMemcmp = MaxLoadsPerMemcmpOptSize = CommonMaxStores;

For example, for alignment 8, the maximum to-be-copied memory size is 1024.
But for alignment 1, the maximum to-be-copied memory size is 128.

I attempt to think whether we should implement findOptimalMemOpLowering() in bpf backend so for alignment 1, we can set MaxStoresPerMemcpy to be 1024, so the same number of maximum memory is used for alignment 1 and alignment 8. But I checked
other architectures (AArch64 and SystemZ) and their usage didn't change 'limit' (count).

But if we increase alignment 1 to say alignment 8 for bpf source code, it won't be a problem. So probably findOptimalMemOpLowering() is not really needed. Your opinion is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants