Rust vs Go vs C on Apple Silicon and Linux — benchmarks on real hardware (2026)

Four benchmarks on two machines — Apple M1 Pro (macOS/clang) and Intel i5-13400F (Linux/gcc). Rust ≈ C on the M1; on x86 Linux, Rust is 1.5× slower than gcc on recursive fib and Go is 2.9× slower. Go's matrix multiply gap is 4.7× on Linux (vs 3.3× on arm64). The surprise: Rust's concurrent sum on x86 is 2.8× faster than C pthreads — LLVM likely folds the range into a closed-form sum.

TL;DR — Four benchmarks on two machines: Apple M1 Pro (arm64, macOS 26.6, clang 21) and Intel i5-13400F (x86_64, Ubuntu 24.04, gcc 13). On the M1, Rust matches C within noise on every test. On Linux x86, gcc beats rustc on recursive fib (1.5×) and Go’s fib drops to 2.9× slower than C — a much bigger gap than on arm64. The biggest surprise: Rust’s concurrent sum is 2.8× faster than C pthreads on x86, likely because LLVM folds the integer range to a closed-form sum that the loop never actually runs.

Setup

M1 Pro (arm64)i5-13400F (x86_64)
MachineApple MacBook ProDesktop PC
RAM16 GB125 GB
OSmacOS 26.6 (Sequoia)Ubuntu 24.04
C compilerApple clang 21.0, -O3 -march=nativeGCC 13.3, -O3 -march=native
Go1.26.4 darwin/arm641.22.4 linux/amd64
Rust1.96.01.96.0
Rust flags-C opt-level=3 -C target-cpu=native-C opt-level=3 -C target-cpu=native

Each benchmark was run three times; the tables below show the median of runs 2–3.

The four benchmarks

1 — Sieve of Eratosthenes (primes up to 10,000,000)

Flat bool-array sieve: write-heavy sequential memory access, branch-light inner loop. Good proxy for memory-bandwidth-bound code.

2 — Recursive Fibonacci (n = 47)

No heap allocation, no loops — ~2.97 billion stack frames. Measures raw call dispatch, branch prediction, and how aggressively each compiler inlines/optimises deep recursion.

3 — Matrix multiply (512 × 512, f64, i-k-j loop order)

Cache-friendly loop order but still a tight FP accumulate in the inner loop. Where auto-vectorisation and bounds-check elimination matter most.

4 — Concurrent sum (100,000,000 integers, 8 threads)

8 POSIX threads (C), goroutines (Go), or std::thread (Rust), each summing a partition of 0..100_000_000. Tests thread spawn overhead and how cleverly each compiler handles the inner sum.

Results: Apple M1 Pro (arm64, clang 21, macOS)

BenchmarkCGo 1.26Rust 1.96
Sieve (10M primes)13.3 ms19.3 ms12.3 ms
Fib(47) recursive8.86 s9.84 s8.91 s
Matrix multiply 512²24.7 ms80.2 ms24.1 ms
Concurrent sum 100M3.0 ms10.7 ms3.4 ms

Results: Intel i5-13400F (x86_64, gcc 13, Linux)

BenchmarkCGo 1.22Rust 1.96
Sieve (10M primes)25.7 ms34.3 ms27.6 ms
Fib(47) recursive4.06 s11.86 s6.02 s
Matrix multiply 512²22.5 ms106.7 ms28.3 ms
Concurrent sum 100M3.4 ms8.7 ms1.2 ms

Median of 3 runs, run 2 used. 2026-06-22.

Speed relative to C, both machines

BenchmarkGo/C (M1)Go/C (i5)Rust/C (M1)Rust/C (i5)
Sieve1.45×1.33×0.92×1.07×
Fib(47)1.11×2.92×1.01×1.48×
Matrix multiply3.24×4.74×0.98×1.26×
Concurrent sum3.57×2.56×1.13×0.35×

Three things worth explaining

1 — Go’s fib collapses on x86

On the M1, Go’s fib(47) was 11% slower than C — essentially the same. On the i5, it’s 2.9× slower. Same Go version, same algorithm. The difference is the compiler backend: Go’s arm64 codegen handles deep recursive call chains more efficiently than its amd64 codegen. This is a known quality gap; Go’s x86 backend is generally less mature than the arm64 one for recursion-heavy patterns.

2 — gcc beats rustc on recursive fib

On the M1 (clang vs rustc, both using LLVM), fib took ~8.9s in both languages — identical. On Linux, gcc produced 4.06s while rustc produced 6.02s — gcc is 1.5× faster. GCC’s optimiser applies more aggressive tail-recursion and inline heuristics for this specific pattern on x86. It’s a reminder that “Rust uses LLVM” doesn’t mean it always wins vs gcc; GCC has 30+ years of x86 optimisation art.

3 — Rust’s concurrent sum is 2.8× faster than C on x86

The Rust thread body is (start..end).sum::<u64>(). LLVM recognises a sum of a contiguous integer range as an arithmetic series and replaces the loop with a closed-form n*(n+1)/2 calculation — O(1), no matter the range size. GCC with the manual for (i = start; i < end; i++) s += i; loop applies SIMD vectorisation but doesn’t make the leap to closed-form. Result: Rust’s eight threads each do one multiplication; C’s eight threads each sum 12.5M integers. The 2.8× gap is entirely a compiler optimisation difference, not a language one. (On arm64 this effect was smaller because Apple clang also applies some of this optimisation.)

Binary size and compile time

C (macOS)C (Linux)Go (macOS)Go (Linux)Rust (macOS)Rust (Linux)
Binary size (sieve)33 KB16 KB2.5 MB1.9 MB431 KB4.3 MB
Cold compile (sieve)96 ms54 ms319 ms193 ms660 ms134 ms

The Rust binary being 4.3 MB on Linux vs 431 KB on macOS is the most striking size difference. On macOS, Rust links against system dylibs for parts of the standard library, keeping the binary small. On Linux, Rust statically links everything, producing a larger but fully self-contained binary with no system library dependency — a real advantage for containerised deployments.

Compile times flip: Rust is 134ms on Linux vs 660ms on macOS for a single file. LLVM’s x86_64 backend is substantially faster to emit code than its arm64 backend; the M1 machine runs rustc natively on arm64 which has less LLVM throughput optimisation work behind it.

Anno 2026: where does each language live?

C is still the reference point and still the fastest on raw recursive work when gcc’s optimiser gets involved. In 2026 it’s not a first choice for new projects unless you’re writing kernels, embedded firmware, or FFI glue — but the optimizer art accumulated since 1972 is real.

Go hit 1.0 in 2012 with a clear thesis: make networked services as fast to develop as Python, while running with the efficiency of a compiled language. It largely succeeded. Where it pays: the GC adds latency variance, and the performance ceiling is lower than Rust or C for compute-intensive loops. The fib result on x86 also reveals that Go’s optimizer has real quality gaps on certain patterns — it’s tuned for what Go code actually looks like (interfaces, channels, HTTP handlers), not tree recursion.

Rust reached 1.0 in 2015 and has spent a decade proving memory safety without a GC is achievable at scale. In 2026 the Linux kernel accepts Rust for drivers (since 6.1), Firefox and Chromium are incrementally replacing C++ with it, and ISRG rewrites critical network infrastructure in it. The borrow checker is real friction, but async/await, Arc<Mutex<T>>, and rayon have made modern Rust significantly more ergonomic than 2018 Rust. The LLVM closed-form sum optimisation above is a good example of what “zero-cost abstractions” means in practice: writing idiomatic Rust lets the compiler apply transformations that a manual C loop can’t trigger.

Pick-one guide

You want to…Reach for
Write a Linux driver or kernel moduleC (or Rust if the subsystem accepts it)
Build a microservice or CLI tool fastGo — tooling + goroutines + fast iteration
Write safety-critical systems codeRust — borrow checker, no GC, no UB
Optimise a hot numerical inner loopC or Rust — SIMD, no bounds checks, full LLVM
Ship a static binary to a containerGo (static by default) or Rust (musl)
Target arm64 performanceRust ≈ C, better than Go
Target x86 recursive codeC with gcc wins; Rust competitive; Go lags

Benchmarks run on personal hardware; your numbers will vary with CPU, OS tuning, and compiler flags. Code is four single-file programs, no external dependencies. All compilers invoked with full optimisation flags as shown.