猿代码 — 科研/AI模型/高性能计算
0

cpufp:测试X86 CPU峰值性能的测试工具

摘要: https://github.com/pigirons/cpufpIntel Alder Lake i7-1280pFor single Golden Cove(big) core:$ ./cpufp --thread_pool= Number Threads: 1 Thread Pool Binding: 0 ------------------------------------------- ...
https://github.com/pigirons/cpufp

Intel Alder Lake i7-1280p

For single Golden Cove(big) core:

$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI        | INT8      | 590.31 GOPS      |
| AVX_VNNI        | INT16     | 295.06 GOPS      |
| FMA             | FP32      | 149.87 GFLOPS    |
| FMA             | FP64      | 74.931 GFLOPS    |
| AVX             | FP32      | 112.39 GFLOPS    |
| AVX             | FP64      | 56.203 GFLOPS    |
| SSE             | FP32      | 56.054 GFLOPS    |
| SSE             | FP64      | 28.001 GFLOPS    |
--------------------------------------------------

For multiple big cores:

$ ./cpufp --thread_pool=[0,2,4,6,8,10]
Number Threads: 6
Thread Pool Binding: 0 2 4 6 8 10
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI        | INT8      | 2636.8 GOPS      |
| AVX_VNNI        | INT16     | 1319.1 GOPS      |
| FMA             | FP32      | 670.05 GFLOPS    |
| FMA             | FP64      | 335 GFLOPS       |
| AVX             | FP32      | 502.4 GFLOPS     |
| AVX             | FP64      | 251.2 GFLOPS     |
| SSE             | FP32      | 250.42 GFLOPS    |
| SSE             | FP64      | 125.16 GFLOPS    |
--------------------------------------------------

For single Gracemont(little) core:

$ ./cpufp --thread_pool=[12]
Number Threads: 1
Thread Pool Binding: 12
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI        | INT8      | 114.89 GOPS      |
| AVX_VNNI        | INT16     | 57.445 GOPS      |
| FMA             | FP32      | 57.444 GFLOPS    |
| FMA             | FP64      | 28.723 GFLOPS    |
| AVX             | FP32      | 28.723 GFLOPS    |
| AVX             | FP64      | 14.362 GFLOPS    |
| SSE             | FP32      | 28.312 GFLOPS    |
| SSE             | FP64      | 14.361 GFLOPS    |
--------------------------------------------------

For multiple little cores:

$ ./cpufp --thread_pool=[12-19]
Number Threads: 8
Thread Pool Binding: 12 13 14 15 16 17 18 19
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| AVX_VNNI        | INT8      | 867.99 GOPS      |
| AVX_VNNI        | INT16     | 434 GOPS         |
| FMA             | FP32      | 434 GFLOPS       |
| FMA             | FP64      | 217 GFLOPS       |
| AVX             | FP32      | 217.01 GFLOPS    |
| AVX             | FP64      | 108.5 GFLOPS     |
| SSE             | FP32      | 216.39 GFLOPS    |
| SSE             | FP64      | 108.5 GFLOPS     |
--------------------------------------------------

AMD Ryzen9 6900HX(Zen3+)

For single core:

$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| FMA             | FP32      | 156.18 GFLOPS    |
| FMA             | FP64      | 78.371 GFLOPS    |
| AVX             | FP32      | 156.55 GFLOPS    |
| AVX             | FP64      | 78.256 GFLOPS    |
| SSE             | FP32      | 78.219 GFLOPS    |
| SSE             | FP64      | 38.99 GFLOPS     |
--------------------------------------------------

For multi-cores:

$ ./cpufp --thread_pool=[0,2,4,6,8,10,12,14]
Number Threads: 8
Thread Pool Binding: 0 2 4 6 8 10 12 14
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| FMA             | FP32      | 1151.2 GFLOPS    |
| FMA             | FP64      | 569.89 GFLOPS    |
| AVX             | FP32      | 1088.7 GFLOPS    |
| AVX             | FP64      | 536.37 GFLOPS    |
| SSE             | FP32      | 541.35 GFLOPS    |
| SSE             | FP64      | 269.56 GFLOPS    |
--------------------------------------------------

Intel Celeron N5105(Jasper Lake)

For single core:

$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| SSE             | FP32      | 23.102 GFLOPS    |
| SSE             | FP64      | 11.564 GFLOPS    |
--------------------------------------------------

For multi_cores:

$ ./cpufp --thread_pool=[0-3]
Number Threads: 4
Thread Pool Binding: 0 1 2 3
--------------------------------------------------
| Instruction Set | Data Type | Peak Performance |
| SSE             | FP32      | 89.327 GFLOPS    |
| SSE             | FP64      | 44.664 GFLOPS    |
--------------------------------------------------
目前还没有ARM版本的,可以深度做一下其他芯片版本的测试,还有Cache啊,访存延迟什么的,带宽什么,如果更加全面就更好了。

说点什么...

已有0条评论

最新评论...

本文作者
2023-6-1 16:49
  • 0
    粉丝
  • 563
    阅读
  • 0
    回复
资讯幻灯片
热门评论
热门专题
排行榜
Copyright   ©2015-2023   猿代码-超算人才智造局 高性能计算|并行计算|人工智能      ( 京ICP备2021026424号-2 )