This is something to watch out for when applying such correlation techniques to systems that see significant thermal throttling or power-capping while running these benchmarks.Īnother scenario where the correlation can break is non-linear jumps in performance that one benchmark suite sees but not the other. However SPEC CPU will be governed by the long term power dissipation capability of the system due to its long run-time.
The net effect of this is that Geekbench 5 may achieve a higher average frequency because it is able to exploit the system’s thermal mass due to its short runtime. Geekbench typically runs quickly (in minutes) and especially so in our testing where the default workload gaps are removed, whereas SPEC CPU typically runs for hours. It is important to note that the observed correlation is not a fundamental property and can break under several scenarios.
To test this hypothesis, we measured the INT single-core (or Rate x 1) and multi-core (or Rate) SPEC CPU2006, CPU2017 and Geekbench 5 performance for systems shown in Table 1.įigure 3: Relative mispredict and miss rate for CPU2006 and CPU2017 baselined to Geekbench 5 We believe this to be the case for Geekbench and SPEC CPU. Our hypothesis is that the performance of different benchmark suites will correlate well with each other so long as the suites are comprised of a diverse set of tests that exercise the different aspects of the micro-architecture, utilize the same instruction-set features and are of similar type (integer, floating-point, database, etc). The question we are trying to explore is whether benchmarking CPU performance using Geekbench leads us to a different conclusion about the relative speeds of different CPUs than one might arrive at from using SPEC CPU. It is a cross-platform, easy-to-run benchmark that exercises different aspects of the CPU. So, where does Geekbench fit in? Geekbench is a tremendously popular benchmark in the mobile and client space. There are several other server benchmarks such as TPC-C, SpecJBB, PerfKitBenchmarker that cover areas that SPEC CPU is deficient in (JITed code, data-sharing workloads. These suites measure both the CPU speed and throughput. These benchmarks have a variety of tests (from compiler to AI to weather forecasting) that exercise various aspects of the CPU and the memory hierarchy. In this nebulous back-drop, SPEC CPU2006 and later the CPU2017 benchmark suite have emerged as the de-facto standard for measuring server performance. However, quantifying representativeness is challenging (and often controversial) due to the richness and diversity of the workloads that customers of general-purpose CPUs run. What is an ideal benchmark? Answer: one that is most representative of the customer’s workload. In this blog we explore that proposition. Instead they noted that CPU2006 and CPU2017 is what should be used. Geekbench 5’s execution is well contained within the CPU complex, making the idlenormalized power measurement technique more closely reflect the actual CPU power.Īfter we posted this blog we saw some industry discussion that Geekbench 5 - while a popular CPU benchmark in the client space (laptop, phone, tablet) is not relevant in the server CPU space. If anything, these environments now deliver incredibly similar characteristics in how they operate.” In our view, there is no meaningful difference. … You may be wondering how we can make the extrapolation from smartphone and client CPU cores to a server core. It also gives us the ability to conduct these tests on commercially available products. It runs on multiple platforms, including Linux, iOS, Android, and Windows.
#Geekbench comparison series
“We believe Geekbench 5 is a good starting point, as it consists of a series of modern real-world kernels that include both integer and floating-point workloads. The blog explained the rationale for this choice as follows: The power-performance data presented in that blog were based on Geekbench 5. It also introduced NUVIA’s first generation CPU, codenamed Phoenix and its projected performance and power based on internal simulations. That blog highlighted the importance of achieving maximum performance within a limited power envelope in the general-purpose server market. In August, we published a blog comparing the power-performance characteristics of NUVIA’s CPU design against shipping CPUs from Intel, AMD, ARM (Qualcomm) and Apple.