Intel Alder Lake i7-1280p For single Golden Cove(big) core:
$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI | INT8 | 590.31 GOPS |
| AVX_VNNI | INT16 | 295.06 GOPS |
| FMA | FP32 | 149.87 GFLOPS |
| FMA | FP64 | 74.931 GFLOPS |
| AVX | FP32 | 112.39 GFLOPS |
| AVX | FP64 | 56.203 GFLOPS |
| SSE | FP32 | 56.054 GFLOPS |
| SSE | FP64 | 28.001 GFLOPS |
--------------------------------------------------
For multiple big cores:
$ ./cpufp --thread_pool=[0,2,4,6,8,10]
Number Threads: 6
Thread Pool Binding: 0 2 4 6 8 10
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI | INT8 | 2636.8 GOPS |
| AVX_VNNI | INT16 | 1319.1 GOPS |
| FMA | FP32 | 670.05 GFLOPS |
| FMA | FP64 | 335 GFLOPS |
| AVX | FP32 | 502.4 GFLOPS |
| AVX | FP64 | 251.2 GFLOPS |
| SSE | FP32 | 250.42 GFLOPS |
| SSE | FP64 | 125.16 GFLOPS |
--------------------------------------------------
For single Gracemont(little) core:
$ ./cpufp --thread_pool=[12]
Number Threads: 1
Thread Pool Binding: 12
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI | INT8 | 114.89 GOPS |
| AVX_VNNI | INT16 | 57.445 GOPS |
| FMA | FP32 | 57.444 GFLOPS |
| FMA | FP64 | 28.723 GFLOPS |
| AVX | FP32 | 28.723 GFLOPS |
| AVX | FP64 | 14.362 GFLOPS |
| SSE | FP32 | 28.312 GFLOPS |
| SSE | FP64 | 14.361 GFLOPS |
--------------------------------------------------
For multiple little cores:
$ ./cpufp --thread_pool=[12-19]
Number Threads: 8
Thread Pool Binding: 12 13 14 15 16 17 18 19
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI | INT8 | 867.99 GOPS |
| AVX_VNNI | INT16 | 434 GOPS |
| FMA | FP32 | 434 GFLOPS |
| FMA | FP64 | 217 GFLOPS |
| AVX | FP32 | 217.01 GFLOPS |
| AVX | FP64 | 108.5 GFLOPS |
| SSE | FP32 | 216.39 GFLOPS |
| SSE | FP64 | 108.5 GFLOPS |
--------------------------------------------------
AMD Ryzen9 6900HX(Zen3+)For single core:
$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| FMA | FP32 | 156.18 GFLOPS |
| FMA | FP64 | 78.371 GFLOPS |
| AVX | FP32 | 156.55 GFLOPS |
| AVX | FP64 | 78.256 GFLOPS |
| SSE | FP32 | 78.219 GFLOPS |
| SSE | FP64 | 38.99 GFLOPS |
--------------------------------------------------
For multi-cores:
$ ./cpufp --thread_pool=[0,2,4,6,8,10,12,14]
Number Threads: 8
Thread Pool Binding: 0 2 4 6 8 10 12 14
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| FMA | FP32 | 1151.2 GFLOPS |
| FMA | FP64 | 569.89 GFLOPS |
| AVX | FP32 | 1088.7 GFLOPS |
| AVX | FP64 | 536.37 GFLOPS |
| SSE | FP32 | 541.35 GFLOPS |
| SSE | FP64 | 269.56 GFLOPS |
--------------------------------------------------
Intel Celeron N5105(Jasper Lake)For single core:
$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| SSE | FP32 | 23.102 GFLOPS |
| SSE | FP64 | 11.564 GFLOPS |
--------------------------------------------------
For multi_cores:
$ ./cpufp --thread_pool=[0-3]
Number Threads: 4
Thread Pool Binding: 0 1 2 3
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| SSE | FP32 | 89.327 GFLOPS |
| SSE | FP64 | 44.664 GFLOPS |
-------------------------------------------------- 目前还没有ARM版本的,可以深度做一下其他芯片版本的测试,还有Cache啊,访存延迟什么的,带宽什么,如果更加全面就更好了。
说点什么...