Bad news folks. Corporate America is rushing ahead with voice-based authentication.
I know because I recently called my ISP, Spectrum, which tried to enroll me in Voice ID. I declined. (“With Voice ID, you don’t have to worry about remembering security codes or passwords.” Ok, sure 🤪) And I also called my bank, Chase, which started the call with a disclaimer that it would be fingerprinting my voice. At Chase, there wasn’t even an obvious way to opt out. (How is that legal?) I hung up. Let me explain why.
Many major banks and ISPs are utilizing “voice print” tech for authentication. This is where they attempt to determine who you are over the phone by analyzing your voice. This will replace other, more secure methods of identification, such as sharing a security PIN. (They may also inspect your voice to gauge your sentiment, to guess if you are healthy or not, and for other distressing reasons, which partly explains why call center companies are raising ridiculous amounts of money. But this post focuses on only one dystopian usecase: biometric authentication.)
Every now and then, you see big companies forging ahead in an obviously-bad direction, like relying on SMS for account logins/resets. (SMS based authentication is broken and deserves no place in any authentication flow.)
Foot, meet gun. We are on the brink of mass adoption of an equally bad idea.
Voice print authentication is fundamentally broken because of the rise of deepfake tech and the widespread availability of people’s voice data (especially for creators who frequent podcasts and livestreams).
It’s become easy for anyone to spoof the voice of other people if there are recordings of them talking.
Here are some links showing how commonplace deep fake audio tech is:
(2020) Podcast editor Descript adds a $30/month pro tier with access to its Overdub feature, which essentially lets you use deepfake audio to fix your own mistakes and (2019) Descript, a toolmaker for podcasting, raises a $15M Series A from a16z and Redpoint and acquires Lyrebird, which lets users create audio of their voices from text
Like all technologies, tools to spoof others’ voices will only get better with time.
It’s common sense that using biometrics for authentication is an outrageous idea:
“Don’t use biometrics for anti-fraud. In fact, don’t use biometrics for anything.” — Edward Snowden (@Snowden)
The threat of this tech being abused in this way is not theoretical:
So, to recap: despite multiple high-profile cases of scammers successfully stealing money by impersonating people via deepfake audio, big banks and ISPs are rolling out voice-based authentication at scale.
The worst offender that I could find is Chase. There is no “opt in”. There doesn’t even appear to be a formal way to “opt out”! There is literally no way for me to call my bank without my voice being “fingerprinted” without my consent. That’s holding the customer hostage.
The next crisis: robocalls that spoof the voices of victims at scale
The era of spearphishing robocalls that utilize deepfake audio is fast approaching and will target individuals who have exposed voice data (professional podcast and video creators, but also anyone posting videos of themself to social media). The public needs to be aware of this threat, and governments and companies should move faster to prevent it (by accelerating SHAKEN/STIR adoption among other things).
All the precursors for spearphishing at scale exist: text-to-voice tech, economic incentives, and public data (peoples’ voice and numbers/names/relationships. For the latter, you don’t even need to go to the darkweb, just check https://whitepages.com).
The burgeoning robocall industry is already making billions of calls per year in a “spray and pray” fashion, but it will be trivial for scammers to start spoofing peoples’ voices and target their relatives’ numbers (especially with a growing number of high profile people having their voice data in the public domain). Scam call centers are flourishing and are ready to “convert” victims and their contacts. All they need is publicly scrape-able contact info ✅, text-to-speech software ✅, and a fresh script.
After a bit of threat modeling, it becomes apparent that future spearphishing robocalls may not directly con you, but rather “farm” your voice data by asking you benign questions, and use that to train a voice model to penetrate more deeply into your network.
In other words, if a scammer can learn your voice and call your grandparent with a ruse involving your kid’s health, (Jane Doe is in the emergency room and needs $5K for an operation, etc). That will slap different than a random call about a fictional car loan.
Again, society must adjust to the following reality: It’s become easy for anyone to spoof the voices of others who have public recordings of them talking (very common). Therefore, companies (especially banks) should not be using this as a @#%!ing way to log into accounts! You would think this is SIMPLE-enough for corporate America to understand, but alas, here we are.
If you would like to join the discussion, check out the post on r/privacy.