[BPF] Consider alignment when selecting memcpy/memmove/memset op type#204042
[BPF] Consider alignment when selecting memcpy/memmove/memset op type#204042yonghong-song wants to merge 1 commit into
Conversation
Currently getOptimalMemOpType() forces i64 for size >= 8 and i32 otherwise without considering the operation alignment. In [1], memcpy and memmove destination and source alignments are passed separately instead of being collapsed to their minimum. As a result, poor code may be generated for a copy with a well aligned destination but a poorly aligned source: a wide store ends up fed by a wide load that the legalizer has to synthesize from byte loads via a chain of shifts and ORs. Return MVT::Other from getOptimalMemOpType() so that findOptimalMemOpLowering() in SelectionDAG/TargetLowering.cpp selects a proper value type and alignment. Newly added memcpy-align.ll uses u8 type for load and stores. Without BPFISelLowering.h change, a chain of shifts and ORs will happen. The rodata_1.ll change is only instruction scheduling / register allocation churn caused by the different but equivalent lowering. [1] llvm#201119
|
In BPFSelectionDAGInfo.cpp, we have So the default value for MaxStoresPerMemcpy/MaxStoresPerMemset etc. is 128. For example, for alignment 8, the maximum to-be-copied memory size is 1024. I attempt to think whether we should implement findOptimalMemOpLowering() in bpf backend so for alignment 1, we can set MaxStoresPerMemcpy to be 1024, so the same number of maximum memory is used for alignment 1 and alignment 8. But I checked But if we increase alignment 1 to say alignment 8 for bpf source code, it won't be a problem. So probably findOptimalMemOpLowering() is not really needed. Your opinion is appreciated. |
Currently getOptimalMemOpType() forces i64 for size >= 8 and i32 otherwise without considering the operation alignment. In [1], memcpy and memmove destination and source alignments are passed separately instead of being collapsed to their minimum. As a result, poor code may be generated for a copy with a well aligned destination but a poorly aligned source: a wide store ends up fed by a wide load that the legalizer has to synthesize from byte loads via a chain of shifts and ORs.
Return MVT::Other from getOptimalMemOpType() so that findOptimalMemOpLowering() in SelectionDAG/TargetLowering.cpp selects a proper value type and alignment.
Newly added memcpy-align.ll uses u8 type for load and stores. Without BPFISelLowering.h change, a chain of shifts and ORs will happen.
The rodata_1.ll change is only instruction scheduling / register allocation churn caused by the different but equivalent lowering.
[1] #201119