#gpu

2

Posts on X2Q tagged gpu. Updated as new posts are published.

Jun 22, 2026
Apple M1 Pro vs RTX 4070 Ti for local AI — GPU matmul and LLM benchmarks (2026)
RTX 4070 Ti hits 75 TFLOPS at fp16 via tensor cores; M1 Pro MPS reaches 4.2 TFLOPS — an 18× compute gap. But on real LLM inference (minicpm-v4.6, qwen3.5, gemma4:12b via ollama), the RTX is only 3–4× faster. LLM inference at batch=1 is memory-bandwidth-bound, not compute-bound, and the M1's unified-memory architecture closes most of the gap. Apple's AMX extensions also give its CPU 2.7× better matmul throughput than an Intel i5 with the same thread count.
Jun 10, 2026
Whisper backend shootout: faster-whisper vs HF Transformers vs whisper.cpp (2026)
Three ways to run Whisper — faster-whisper (CTranslate2), Hugging Face Transformers (batched, SDPA), and whisper.cpp (ggml) — benchmarked on real Danish telephone audio on an RTX 4070 Ti. faster-whisper + large-v3-turbo is the fastest overall (~56× real-time); whisper.cpp CUDA is close at a fraction of the VRAM and even matches GPU speed on CPU for short files. Full speed (RTF), VRAM, and consensus-WER quality tables, plus a recommendation per use case.