Whisper backend shootout: faster-whisper vs HF Transformers vs whisper.cpp (2026)
Three ways to run Whisper — faster-whisper (CTranslate2), Hugging Face Transformers (batched, SDPA), and whisper.cpp (ggml) — benchmarked on real Danish telephone audio on an RTX 4070 Ti. faster-whisper + large-v3-turbo is the fastest overall (~56× real-time); whisper.cpp CUDA is close at a fraction of the VRAM and even matches GPU speed on CPU for short files. Full speed (RTF), VRAM, and consensus-WER quality tables, plus a recommendation per use case.