Secrets can fall prey to eavesdropping smart devices, not just nosy siblings or classmates. A new audio system could help better protect privacy.
Capuski/E+/Getty Images Plus
You may know them as Siri or Alexa. Dubbed personal assistants, these smart devices are attentive listeners. Say just a few words, and they’ll play a favorite song or lead the way to the nearest gas station. But all that listening poses a privacy risk. To help people protect themselves against eavesdropping devices, a new system plays soft, calculated sounds. This masks conversations to confuse the devices.
The smart devices use automated speech-recognition — or ASR — to translate sound waves into text, explains Mia Chiquier. She studies computer science at Columbia University in New York City. The new program fools the ASR by playing sound waves that vary with your speech. Those added waves jumble a sound signal to make it hard for the ASR to pick out the sounds of your speech. It “completely confuses this transcribing system,” Chiquier says.
She and her colleagues describe their new system as “voice camouflage.”
The volume of the masking sounds is not what’s key. In fact, those sounds are quiet. Chiquier likens them to the sound of a small air conditioner in the background. The trick to making them effective, she says, is having these so-called “attack” sound waves fit in with what someone says. To work, the system predicts the sounds that someone will say a short time in the future. Then it quietly broadcasts sounds chosen to confuse the smart speaker’s interpretation of those words.
Chiquier described it on April 25 at the virtual International Conference for Learning Representations.
Step one in creating great voice camo: Get to know the speaker.
If you text a lot, your smartphone will start to anticipate what the next few letters or word in a message will be. It also gets used to what types of messages you send and the words you use. The new algorithm works in much the same way.
“Our system listens to the last two seconds of your speech,” explains Chiquier. “Based on that speech, it anticipates the sounds you might make in the future.” And not just sometime in the future, but half a second later. That prediction is based on the characteristics of your voice and your language patterns. These data help the algorithm learn and calculate what the team calls a predictive attack.
That attack amounts to the sound that the system plays alongside the speaker’s words. And it keeps changing with each sound someone speaks. When the attack plays along with the words predicted by the algorithm, the combined sound waves turn into an acoustic mishmash that confuses any ASR system within earshot.
The predictive attacks also are hard for an ASR system to outsmart, says Chiquier. For instance, if someone tried to disrupt an ASR by playing a single sound in the background, the device could subtract that noise from the speech sounds. That’s true even if the masking sound periodically changed over time.
The new system instead generates sound waves based on what a speaker has just said. So its attack sounds are constantly changing — and in an unpredictable way. According to Chiquier, that makes it “very difficult for [an ASR device] to defend against.”
Weekly updates to help you use Science News for Students in the learning environment
Thank you for signing up!
There was a problem signing you up.
To test their algorithm, the researchers simulated a real-life situation. They played a recording of someone speaking English in a room with an average level of background noise. An ASR device listened in and transcribed what it heard. The team then repeated this test after they added white noise to the background. Finally, the team did this with their voice-masking system on.
The voice-camouflage algorithm kept ASR from correctly hearing words 80 percent of the time. Common words such as “the” and “our” were the hardest to mask. But those words don’t carry a lot of information, the researchers add. Their system was much more effective than white noise. It even performed well against ASR systems designed to subtract background noise.
The algorithm could someday be embedded into an app for use in the real world, Chiquier says. To ensure that an ASR system couldn’t reliably listen in, “you would just open the app,” she says. “That’s about it.” The system could be added to any device that emits sound.
That’s getting a bit ahead of things, though. Next comes more testing.
The scientists tested their system in different rooms to mimic real environments. As their results show, the ASR almost always transcribed speech correctly when there was no attack. It was slightly confused by white noise, and much more confused by the new attack system. Here, sounds that were spoken and correctly transcribed by ASR appear green. Sounds that were not spoken and mistakenly transcribed by the ASR appear red. The white noise was played twice as loud as the new attack sounds.
This is “good work” says Bhiksha Raj. He’s an electrical and computer engineer at Carnegie Mellon University in Pittsburgh, Pa. He wasn’t involved in this research. But he, too, studies how people can use technology to protect their speech and voice privacy.
Smart devices currently control how a user’s voice and conversations are protected, Raj says. But he thinks control instead should be left to who’s speaking.
“There are so many aspects to voice,” Raj explains. Words are one aspect. But a voice may also contain other personal information, such as someone’s accent, gender, health, emotional state or physical size. Companies could potentially exploit those features by targeting users with different content, ads or pricing. They could even sell voice information to others, he says.
When it comes to voice, “it’s a challenge to find out how exactly we can obscure it,” Raj says. “But we need to have some control over at least parts of it.”
ad: Short for advertisement. It may appear in any medium (print, online or broadcast) and has been prepared to sell someone on a product, idea or point of view.
algorithm: A group of rules or procedures for solving a problem in a series of steps. Algorithms are used in mathematics and in computer programs for figuring out solutions.
camouflage: Hiding people or objects from an enemy by making them appear to be part of the natural surroundings. Animals can also use camouflage patterns on their skin, hide or fur to hide from predators.
computer science: The scientific study of the principles and use of computers. Scientists who work in this field are known as computer scientists.
disrupt: (n. disruption) To break apart something; interrupt the normal operation of something; or to throw the normal organization (or order) of something into disorder.
engineer: A person who uses science to solve problems. As a verb, to engineer means to design a device, material or process that will solve some problem or unmet need. (v.) To perform these tasks, or the name for a person who performs such tasks.
gender: The attitudes, feelings, and behaviors that a given culture associates with a person’s biological sex. Behavior that is compatible with cultural expectations is referred to as being the norm. Behaviors that are incompatible with these expectations are described as non-conforming.
physical: (adj.) A term for things that exist in the real world, as opposed to in memories or the imagination. It can also refer to properties of materials that are due to their size and non-chemical interactions (such as when one block slams with force into another). (in biology and medicine) The term can refer to the body, as in a physical exam or physical activity.
real time: A term that connotes immediacy; something is being studied, recorded and/or reported at the very time it is happening.
smart device: Some product or machine that can send information to and retrieve information from the internet, or that can be controlled via the internet, such as by using an app on a smartphone.
sound wave: A wave that transmits sound. Sound waves have alternating swaths of high and low pressure.
system: A network of parts that together work to achieve some function. For instance, the blood, vessels and heart are primary components of the human body's circulatory system. Similarly, trains, platforms, tracks, roadway signals and overpasses are among the potential components of a nation's railway system. System can even be applied to the processes or ideas that are part of some method or ordered set of procedures for getting a task done.
technology: The application of scientific knowledge for practical purposes, especially in industry — or the devices, processes and systems that result from those efforts.
virtual: Being almost like something. An object or concept that is virtually real would be almost true or real — but not quite. The term often is used to refer to something that has been modeled by (or accomplished by) a computer using numbers, not by using real-world parts. So a virtual motor would be one that could be seen on a computer screen and tested by computer programming (but it wouldn’t be a three-dimensional device made from metal). (in computing) Things that are performed in or through digital processing and/or the internet. For instance, a virtual conference may be where people attended by watching it over the internet.
wave: A disturbance or variation that travels through space and matter in a regular, oscillating fashion.
Meeting: M. Chiquier, C. Mao and C. Vondrick . Real-time neural voice camouflage. International Conference on Learning Representations 2022. April 25, 2022. Virtual. ICLR 2022 conference paper 284. https://openreview.net/forum?id=qj1IZ-6TInc.
Free educator resources are available for this article. Register to access:
Already Registered? Enter your e-mail address above.
Founded in 2003, Science News for Students is a free, award-winning online publication dedicated to providing age-appropriate science news to learners, parents and educators. The publication, as well as Science News magazine, are published by the Society for Science, a nonprofit 501(c)(3) membership organization dedicated to public engagement in scientific research and education.
© Society for Science & the Public 2000–2022. All rights reserved.