The problem is not in an individual spoofing the voice recognition, the problem is in professional hacking cases where sounds are created that seem like noise to the human ear but are speech for a device.
There’s a fabulous story about a slew of Amazon Echo devices that took it upon themselves to order expensive doll houses from the ecommerce retailer all because a news show host uttered the phrase “Alexa ordered me a dollhouse” on air. The machines heard it from the TV switched on in the room.
Researchers say it’s not an unlikely scenario. They say not only can attackers issue mal-audio voice commands to any AI listening device that is in audible range, but they can also do it using hidden voice commands. Those are commands that might not even be noticed by the user.
Nicholas Carlini and Pratyush Mishra of University of California, Berkeley, who wrote a new paper on the subject (PDF), along with some other academics, don’t specifically target or mention Amazon’s Alexa, but they do claim their test attacks work against Google Now’s speech recognition system and that their voice commands are “likely to work with any modern voice recognition system.” That would include smartphones.
“We show that adversaries with significant knowledge of the speech recognition system can construct hidden voice commands,” they write in their paper. And they could be commands “that humans cannot understand at all.”
The voice channel is the vector for attack.
Carlini, in a video made at the Usenix Security Symposium, recently said one big potential problem area could be in the texting of premium SMS numbers. He explains that the command “Okay Google now, text…” followed by a special-rate number could get costly for the interface owner. He said this could become an even bigger near-future problem as AI banking becomes prevalent. He used “Okay Google, pay John $100” as an example.
Carlini says the problem arises not in an individual spoofing the voice recognition, such as a bad guy walking by, say. In that case, the device owner simply cancels the command. The problem is in professional hacking cases where sounds are created that seem like noise to the human ear but are speech for a device. The device hears the garbage as a real command, and the device owner isn’t aware of the attack.
Examples of mal-audio are provided by the researchers on their website for people to test for themselves.
“This is an instance of attacks on machine learning,” Carlini says.
And results could be drastic. Posting a user’s location on Twitter, for example, could be performed by the hack, they speculate. That could conceivably be as, or more, serious as the loss of the hundred bucks for some. Opening a web page loaded with malware is another example the researchers use.
Defenses do exist. However, users can miss passive defenses that notify a user that a command has been made—the confirmatory statement a voice UI makes after comprehending a command.
And password setups that are possible for an end user to implement on Amazon’s system, for example, are cumbersome and detract from the “ease of use of the system.”
“Our attacks demonstrate that these attacks are possible against currently deployed systems,” the researchers’ write.
Augmentation “to detect the differences between real human speech and synthesized obfuscated speech” is the way forward. That’s not been done properly thus far, they contend.
By Patrick Nelson for NetworkWorld from IDG | Photo: Pixabay