Skip to content

Commit f3481df

Browse files
authored
Add comparison results of DFT performance on macbook pro. (#668)
1 parent 5533114 commit f3481df

File tree

5 files changed

+32
-10
lines changed

5 files changed

+32
-10
lines changed

Jenkinsfile

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -151,19 +151,19 @@ pipeline {
151151
}
152152
}
153153

154-
stage('aarch64 linux clang-18') {
154+
stage('aarch64 linux clang-19') {
155155
agent { label 'aarch64 && ubuntu24 && apple' }
156156
options { skipDefaultCheckout() }
157157
steps {
158158
cleanWs()
159159
checkout scm
160160
sh '''
161-
echo "aarch64 clang-18 on" `hostname`
162-
export CC=clang-18
163-
export CXX=clang++-18
161+
echo "aarch64 clang-19 on" `hostname`
162+
export CC=clang-19
163+
export CXX=clang++-19
164164
mkdir build
165165
cd build
166-
cmake .. -GNinja -DCMAKE_INSTALL_PREFIX=../../install -DSLEEF_SHOW_CONFIG=1 -DSLEEF_BUILD_DFT=TRUE -DSLEEF_ENFORCE_DFT=TRUE -DSLEEF_BUILD_QUAD=TRUE -DSLEEF_BUILD_INLINE_HEADERS=TRUE -DSLEEF_ENFORCE_SVE=TRUE -DEMULATOR=qemu-aarch64-static -DSLEEF_ENFORCE_TESTER4=True -DSLEEF_ENABLE_TESTER=False
166+
cmake .. -GNinja -DCMAKE_INSTALL_PREFIX=../../install -DSLEEF_SHOW_CONFIG=1 -DSLEEF_BUILD_DFT=TRUE -DSLEEF_ENFORCE_DFT=TRUE -DSLEEF_BUILD_QUAD=TRUE -DSLEEF_BUILD_INLINE_HEADERS=TRUE -DSLEEF_ENFORCE_SVE=TRUE -DEMULATOR=qemu-aarch64-static -DSLEEF_ENFORCE_TESTER4=True -DSLEEF_ENABLE_TESTER=False -DSLEEF_ENABLE_LTO=True -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld-19"
167167
cmake -E time oomstaller ninja -j `nproc`
168168
export CTEST_OUTPUT_ON_FAILURE=TRUE
169169
ctest -j `nproc`

docs/5-performance/README.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,16 +96,39 @@ FFTW_MEASURE mode, respectively.
9696

9797
<p style="text-align:center; margin-bottom:2cm;">
9898
<a class="nothing" href="../img/dftzen4dp.png">
99-
<img src="../img/dftzen4dp.png" alt="Performance graph for DP DFT"/>
99+
<img src="../img/dftzen4dp.png" alt="Performance graph for DP DFT on Ryzen 9 7950X"/>
100100
</a>
101101
<br />
102102
Fig. 6.5: Performance of transform in double precision
103103
</p>
104104

105105
<p style="text-align:center; margin-bottom:2cm;">
106106
<a class="nothing" href="../img/dftzen4sp.png">
107-
<img src="../img/dftzen4sp.png" alt="Performance graph for SP DFT"/>
107+
<img src="../img/dftzen4sp.png" alt="Performance graph for SP DFT on Ryzen 9 7950X"/>
108108
</a>
109109
<br />
110110
Fig. 6.6: Performance of transform in single precision
111111
</p>
112+
113+
Below is the result of comparison on M1 MacBook Pro with the following
114+
settings.
115+
116+
* OS : Ubuntu 24.04, Linux 6.8.0-1011-asahi-arm
117+
* Compiler : Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2)
118+
* FFTW version 3.3.10-1ubuntu3
119+
120+
<p style="text-align:center; margin-bottom:2cm;">
121+
<a class="nothing" href="../img/dftm1dp.png">
122+
<img src="../img/dftm1dp.png" alt="Performance graph for DP DFT on M1"/>
123+
</a>
124+
<br />
125+
Fig. 6.7: Performance of transform in double precision
126+
</p>
127+
128+
<p style="text-align:center; margin-bottom:2cm;">
129+
<a class="nothing" href="../img/dftm1sp.png">
130+
<img src="../img/dftm1sp.png" alt="Performance graph for SP DFTon M1"/>
131+
</a>
132+
<br />
133+
Fig. 6.8: Performance of transform in single precision
134+
</p>

docs/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,8 @@ and multiple cores can be utilized for efficient computation. It has an API
7777
similar to that of FFTW for easy migration. The subroutines can utilize long
7878
vectors up to 2048 bits. The helper files for abstracting SIMD intrinsics are
7979
shared with SLEEF libm, and thus it is easy to port the DFT subroutines to other
80-
architectures. [Preliminary results of
81-
benchmark](https://github.com/shibatch/sleef/wiki/DFT-Performance) are now
82-
available.
80+
architectures. [Benchmark results of the DFT subroutines](5-performance) are
81+
now available.
8382

8483
<h2 id="environment">Supported environments</h2>
8584

docs/img/dftm1dp.png

36.2 KB
Loading

docs/img/dftm1sp.png

34.6 KB
Loading

0 commit comments

Comments
 (0)