Performance Impacts of Hybrid Cores
All cores are equal, but some are more equal than others.
I was reading about how modern intel CPUs have multiple types of cores that have different max clock
speeds to achieve better power efficiency. For example, on my desktop (Intel i9-14900K
):
[I] (base) aneesh@earth:~/a/_/2025$ lscpu --all --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 5700.0000 800.0000 972.4320
1 0 0 0 0:0:0:0 yes 5700.0000 800.0000 1062.0959
2 0 0 1 4:4:1:0 yes 5700.0000 800.0000 1100.0110
3 0 0 1 4:4:1:0 yes 5700.0000 800.0000 800.0000
4 0 0 2 8:8:2:0 yes 5700.0000 800.0000 800.0000
5 0 0 2 8:8:2:0 yes 5700.0000 800.0000 800.0000
6 0 0 3 12:12:3:0 yes 5700.0000 800.0000 889.0060
7 0 0 3 12:12:3:0 yes 5700.0000 800.0000 800.0000
8 0 0 4 16:16:4:0 yes 6000.0000 800.0000 1053.1350
9 0 0 4 16:16:4:0 yes 6000.0000 800.0000 1065.4250
10 0 0 5 20:20:5:0 yes 6000.0000 800.0000 1100.0000
11 0 0 5 20:20:5:0 yes 6000.0000 800.0000 800.0000
12 0 0 6 24:24:6:0 yes 5700.0000 800.0000 1048.7610
13 0 0 6 24:24:6:0 yes 5700.0000 800.0000 800.0000
14 0 0 7 28:28:7:0 yes 5700.0000 800.0000 800.0000
15 0 0 7 28:28:7:0 yes 5700.0000 800.0000 800.0000
16 0 0 8 32:32:8:0 yes 4400.0000 800.0000 800.0000
17 0 0 9 33:33:8:0 yes 4400.0000 800.0000 800.0000
18 0 0 10 34:34:8:0 yes 4400.0000 800.0000 800.1590
19 0 0 11 35:35:8:0 yes 4400.0000 800.0000 800.0000
20 0 0 12 36:36:9:0 yes 4400.0000 800.0000 800.1810
21 0 0 13 37:37:9:0 yes 4400.0000 800.0000 800.0000
22 0 0 14 38:38:9:0 yes 4400.0000 800.0000 800.0000
23 0 0 15 39:39:9:0 yes 4400.0000 800.0000 799.5310
24 0 0 16 40:40:10:0 yes 4400.0000 800.0000 800.0000
25 0 0 17 41:41:10:0 yes 4400.0000 800.0000 800.0000
26 0 0 18 42:42:10:0 yes 4400.0000 800.0000 800.0000
27 0 0 19 43:43:10:0 yes 4400.0000 800.0000 800.2030
28 0 0 20 44:44:11:0 yes 4400.0000 800.0000 800.0000
29 0 0 21 45:45:11:0 yes 4400.0000 800.0000 800.0000
30 0 0 22 46:46:11:0 yes 4400.0000 800.0000 800.0000
31 0 0 23 47:47:11:0 yes 4400.0000 800.0000 800.0000
This is useful info, but not very fun or empirical. I wanted to see if I could observe the difference in performance between the different core types, so I wrote the following benchmark:
fn fib(n: usize) -> u64 {
match n {
0 => 1,
1 => 1,
_ => fib(n - 1) + fib(n - 2),
}
}
fn main() {
let timer = std::time::Instant::now();
println!("{}", fib(42));
println!("elapsed: {:?}", timer.elapsed());
}
In Linux, processes can be set to run on a specific core via
taskset. Using hyperfine
, I ran the
benchmark (built in release mode) on every core. Here’s the results:
[I] (base) aneesh@earth:~/t/cputest$ hyperfine -P cpu 0 31 "taskset -c {cpu} ./target/release/cputest"
Benchmark 1: taskset -c 0 ./target/release/cputest
Time (mean ± σ): 417.4 ms ± 2.9 ms [User: 418.0 ms, System: 0.3 ms]
Range (min … max): 416.1 ms … 425.7 ms 10 runs
...
Benchmark 8: taskset -c 7 ./target/release/cputest
Time (mean ± σ): 418.1 ms ± 1.2 ms [User: 418.7 ms, System: 0.0 ms]
Range (min … max): 416.5 ms … 420.1 ms 10 runs
...
Benchmark 17: taskset -c 16 ./target/release/cputest
Time (mean ± σ): 738.4 ms ± 0.2 ms [User: 738.9 ms, System: 0.0 ms]
Range (min … max): 738.1 ms … 738.5 ms 10 runs
...
Summary
taskset -c 9 ./target/release/cputest ran
1.00 ± 0.00 times faster than taskset -c 8 ./target/release/cputest
1.01 ± 0.00 times faster than taskset -c 10 ./target/release/cputest
1.01 ± 0.00 times faster than taskset -c 11 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 1 ./target/release/cputest
1.05 ± 0.01 times faster than taskset -c 0 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 13 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 7 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 3 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 5 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 15 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 12 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 6 ./target/release/cputest
1.05 ± 0.00 times faster than taskset -c 14 ./target/release/cputest
1.06 ± 0.00 times faster than taskset -c 4 ./target/release/cputest
1.06 ± 0.01 times faster than taskset -c 2 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 20 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 18 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 16 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 21 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 27 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 22 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 19 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 30 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 17 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 29 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 25 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 28 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 26 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 31 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 24 ./target/release/cputest
1.86 ± 0.00 times faster than taskset -c 23 ./target/release/cputest
This seems to be in line with what we saw from lscpu
’s output - cores 8,9,10,11
are the fastest,
and are about 5% faster than the 12 other cores in 0-15
. The remaining cores are much slower.
As far as I know, the default Linux scheduler doesn’t take each core’s properties into account during scheduling, and many parallel libraries don’t either. I’d like to try playing around with explicit core affinity sometime - it might be interesting to move lower priority, or IO bound tasks to a slower core and save the fastest cores for the bulk of the compute. Regardless, this was an interesting experiment and I gained a deeper understanding of the hardware I use daily.