We thought the model was the problem. It wasn't.
A speaker-identification experiment on phone calls went through five rounds — from a conclusion that 8kHz phone audio 'drowns speaker identity in noise' (EER 35-45%), to a methodology breakthrough (82-92% on a real gold set), to a reality check that crashed to 59.5% on blind calls. The fix wasn't a better model — it was discovering that 68% of 'single-speaker' clips actually contained two voices. Cleaned of those: 87.2% accuracy, AUC 0.856.