Is Rust faster than Go?

On compute-bound code: yes, meaningfully. On a 512×512 double-precision matrix multiply, Rust took 24–28ms vs Go's 80–107ms across both test machines. On recursive Fibonacci (n=47) on x86 Linux, Go was 2.9× slower than C while Rust was 1.5× slower. The gap comes from Go's bounds checking, more conservative auto-vectorisation, and less aggressive optimizer for deep recursion.

Is Rust as fast as C?

On Apple Silicon with clang: yes, within noise on all four benchmarks. On x86 Linux with gcc: Rust matched C on sieve and matrix multiply, but was 1.5× slower on recursive fib — gcc's recursive-call optimisation outpaced rustc/LLVM here. Rust beat C 2.8× on concurrent sum on x86, likely due to LLVM folding the integer range sum to a closed-form expression.

How big are the binaries?

On macOS: C 33KB, Rust 431KB, Go 2.5MB. On Linux: C 16KB, Rust 4.3MB, Go 1.9MB. The large Linux Rust binary comes from statically linking the standard library; on macOS some of this is deferred to dylibs. Go always bundles its runtime statically on both platforms.

Which is fastest to compile?

On macOS (clang): C 96ms, Go 319ms, Rust 660ms. On Linux (gcc): C 54ms, Go 193ms, Rust 134ms. Rust is dramatically faster to compile on Linux — rustc uses LLVM which has better Linux x86_64 codegen throughput, while Go's compiler advantage over Rust narrows significantly on Linux.

Why is Go 2.9× slower than C on recursive Fibonacci on Linux?

Go's amd64 optimizer generates less efficient code for deep recursive function calls than gcc -O3 does. On arm64 (M1 Pro, compiled by Go's arm64 backend), the gap was only 11%. The i5-13400F result exposed a real compiler quality difference in Go's x86_64 backend for recursion-heavy code.

Why is Rust's concurrent sum faster than C on Linux?

The Rust code sums a `u64` range with `.sum()`. LLVM recognises a sum of a contiguous integer range as a closed-form arithmetic-series computation (n*(n+1)/2 variant) and eliminates the loop entirely — O(1) regardless of range size. GCC with a manual for-loop does not apply this optimisation, leaving it as a vectorised but still O(n) sum.

Rust vs Go vs C on Apple Silicon and Linux — benchmarks on real hardware (2026)

TL;DR — Four benchmarks on two machines: Apple M1 Pro (arm64, macOS 26.6, clang 21) and Intel i5-13400F (x86_64, Ubuntu 24.04, gcc 13). On the M1, Rust matches C within noise on every test. On Linux x86, gcc beats rustc on recursive fib (1.5×) and Go’s fib drops to 2.9× slower than C — a much bigger gap than on arm64. The biggest surprise: Rust’s concurrent sum is 2.8× faster than C pthreads on x86, likely because LLVM folds the integer range to a closed-form sum that the loop never actually runs.

Setup

	M1 Pro (arm64)	i5-13400F (x86_64)
Machine	Apple MacBook Pro	Desktop PC
RAM	16 GB	125 GB
OS	macOS 26.6 (Sequoia)	Ubuntu 24.04
C compiler	Apple clang 21.0, `-O3 -march=native`	GCC 13.3, `-O3 -march=native`
Go	1.26.4 darwin/arm64	1.22.4 linux/amd64
Rust	1.96.0	1.96.0
Rust flags	`-C opt-level=3 -C target-cpu=native`	`-C opt-level=3 -C target-cpu=native`

Each benchmark was run three times; the tables below show the median of runs 2–3.

The four benchmarks

1 — Sieve of Eratosthenes (primes up to 10,000,000)

Flat bool-array sieve: write-heavy sequential memory access, branch-light inner loop. Good proxy for memory-bandwidth-bound code.

2 — Recursive Fibonacci (n = 47)

No heap allocation, no loops — ~2.97 billion stack frames. Measures raw call dispatch, branch prediction, and how aggressively each compiler inlines/optimises deep recursion.

3 — Matrix multiply (512 × 512, f64, i-k-j loop order)

Cache-friendly loop order but still a tight FP accumulate in the inner loop. Where auto-vectorisation and bounds-check elimination matter most.

4 — Concurrent sum (100,000,000 integers, 8 threads)

8 POSIX threads (C), goroutines (Go), or std::thread (Rust), each summing a partition of 0..100_000_000. Tests thread spawn overhead and how cleverly each compiler handles the inner sum.

Results: Apple M1 Pro (arm64, clang 21, macOS)

Benchmark	C	Go 1.26	Rust 1.96
Sieve (10M primes)	13.3 ms	19.3 ms	12.3 ms
Fib(47) recursive	8.86 s	9.84 s	8.91 s
Matrix multiply 512²	24.7 ms	80.2 ms	24.1 ms
Concurrent sum 100M	3.0 ms	10.7 ms	3.4 ms

Results: Intel i5-13400F (x86_64, gcc 13, Linux)

Benchmark	C	Go 1.22	Rust 1.96
Sieve (10M primes)	25.7 ms	34.3 ms	27.6 ms
Fib(47) recursive	4.06 s	11.86 s	6.02 s
Matrix multiply 512²	22.5 ms	106.7 ms	28.3 ms
Concurrent sum 100M	3.4 ms	8.7 ms	1.2 ms

Median of 3 runs, run 2 used. 2026-06-22.

Speed relative to C, both machines

Benchmark	Go/C (M1)	Go/C (i5)	Rust/C (M1)	Rust/C (i5)
Sieve	1.45×	1.33×	0.92×	1.07×
Fib(47)	1.11×	2.92×	1.01×	1.48×
Matrix multiply	3.24×	4.74×	0.98×	1.26×
Concurrent sum	3.57×	2.56×	1.13×	0.35×

Three things worth explaining

1 — Go’s fib collapses on x86

On the M1, Go’s fib(47) was 11% slower than C — essentially the same. On the i5, it’s 2.9× slower. Same Go version, same algorithm. The difference is the compiler backend: Go’s arm64 codegen handles deep recursive call chains more efficiently than its amd64 codegen. This is a known quality gap; Go’s x86 backend is generally less mature than the arm64 one for recursion-heavy patterns.

2 — gcc beats rustc on recursive fib

On the M1 (clang vs rustc, both using LLVM), fib took ~8.9s in both languages — identical. On Linux, gcc produced 4.06s while rustc produced 6.02s — gcc is 1.5× faster. GCC’s optimiser applies more aggressive tail-recursion and inline heuristics for this specific pattern on x86. It’s a reminder that “Rust uses LLVM” doesn’t mean it always wins vs gcc; GCC has 30+ years of x86 optimisation art.

3 — Rust’s concurrent sum is 2.8× faster than C on x86

The Rust thread body is (start..end).sum::<u64>(). LLVM recognises a sum of a contiguous integer range as an arithmetic series and replaces the loop with a closed-form n*(n+1)/2 calculation — O(1), no matter the range size. GCC with the manual for (i = start; i < end; i++) s += i; loop applies SIMD vectorisation but doesn’t make the leap to closed-form. Result: Rust’s eight threads each do one multiplication; C’s eight threads each sum 12.5M integers. The 2.8× gap is entirely a compiler optimisation difference, not a language one. (On arm64 this effect was smaller because Apple clang also applies some of this optimisation.)

Binary size and compile time

	C (macOS)	C (Linux)	Go (macOS)	Go (Linux)	Rust (macOS)	Rust (Linux)
Binary size (sieve)	33 KB	16 KB	2.5 MB	1.9 MB	431 KB	4.3 MB
Cold compile (sieve)	96 ms	54 ms	319 ms	193 ms	660 ms	134 ms

The Rust binary being 4.3 MB on Linux vs 431 KB on macOS is the most striking size difference. On macOS, Rust links against system dylibs for parts of the standard library, keeping the binary small. On Linux, Rust statically links everything, producing a larger but fully self-contained binary with no system library dependency — a real advantage for containerised deployments.

Compile times flip: Rust is 134ms on Linux vs 660ms on macOS for a single file. LLVM’s x86_64 backend is substantially faster to emit code than its arm64 backend; the M1 machine runs rustc natively on arm64 which has less LLVM throughput optimisation work behind it.

Anno 2026: where does each language live?

C is still the reference point and still the fastest on raw recursive work when gcc’s optimiser gets involved. In 2026 it’s not a first choice for new projects unless you’re writing kernels, embedded firmware, or FFI glue — but the optimizer art accumulated since 1972 is real.

Go hit 1.0 in 2012 with a clear thesis: make networked services as fast to develop as Python, while running with the efficiency of a compiled language. It largely succeeded. Where it pays: the GC adds latency variance, and the performance ceiling is lower than Rust or C for compute-intensive loops. The fib result on x86 also reveals that Go’s optimizer has real quality gaps on certain patterns — it’s tuned for what Go code actually looks like (interfaces, channels, HTTP handlers), not tree recursion.

Rust reached 1.0 in 2015 and has spent a decade proving memory safety without a GC is achievable at scale. In 2026 the Linux kernel accepts Rust for drivers (since 6.1), Firefox and Chromium are incrementally replacing C++ with it, and ISRG rewrites critical network infrastructure in it. The borrow checker is real friction, but async/await, Arc<Mutex<T>>, and rayon have made modern Rust significantly more ergonomic than 2018 Rust. The LLVM closed-form sum optimisation above is a good example of what “zero-cost abstractions” means in practice: writing idiomatic Rust lets the compiler apply transformations that a manual C loop can’t trigger.

Pick-one guide

You want to…	Reach for
Write a Linux driver or kernel module	C (or Rust if the subsystem accepts it)
Build a microservice or CLI tool fast	Go — tooling + goroutines + fast iteration
Write safety-critical systems code	Rust — borrow checker, no GC, no UB
Optimise a hot numerical inner loop	C or Rust — SIMD, no bounds checks, full LLVM
Ship a static binary to a container	Go (static by default) or Rust (`musl`)
Target arm64 performance	Rust ≈ C, better than Go
Target x86 recursive code	C with gcc wins; Rust competitive; Go lags

Benchmarks run on personal hardware; your numbers will vary with CPU, OS tuning, and compiler flags. Code is four single-file programs, no external dependencies. All compilers invoked with full optimisation flags as shown.