Skip to main content

Sound in Voice User Interfaces – a Sound Architect’s perspective

Sound in Voice User Interfaces – a Sound Architect’s perspective

The rise of voice assistants feels like a dream come true from a Sound Architect's perspective: a model of interaction becomes "mainstream" in which sound is the primary communication modality between humans and machines. Finally, I thought sound would be one of the first things brand-, product- and project managers are thinking about when delving into the voice world. Yet, it feels like often, sound is still an afterthought rather than an integral part of many voice experiences.

I have some hypotheses why this is the case, and I would like to use these hypotheses to touch upon some aspects of the work as a Sound Architect in the voice domain. 

Hypothesis 1: The primary or, better put, "most conscious" way of perceiving the world is through sight and vision.

Sound often works subconsciously and is not the first aspect to think about when designing a product or a brand.

Ironically, this potential reason sound could often be an afterthought is one of sound's key strengths, which movie producers and game designers are very aware of. When thoughtfully designed, sound becomes an integral part of the experience, can strongly affect the viewer's and listener's emotions, and add value. Think about the iconic James Bond brand sound or R2D2's sounds that give the little robot from Star Wars a human-like character, not to mention Darth Vader's Voice.

The positive effects that sound has in movies and games can also be capitalized on in branding, product design, and especially voice user interfaces. 

When brands are thinking about developing or integrating voice technology into their products, they inevitably will come to a point where they need to define what they want to sound like. A question that is answered within the realm of Sonic Branding, which has seen an incredible boost in the past years due to the developments within the voice industry, and rightfully so.

Don't get me wrong, Sonic Branding was important before smart voice assistants became such an integral part of people's lives, but it is now an absolute necessity if you want to be heard as a brand. 

There is a ton of information out there about how to get started with Sonic Branding and how a sonic strategy can save your company money, build brand equity, boost recognition or trust.

For example, the “Best Audio Brands” Ranking released by amp sound branding is a great resource and overview of how brands utilize Sonic Branding across touchpoints.

In summary, the message is that brands using stock synthetic voices and no other sonic assets will disappear on a primarily auditory interface without a sonic strategy. Furthermore, a brand needs to develop and implement a cohesive sonic strategy to be heard and recognized across all touchpoints, not only in the voice domain.

Only then sound can help a brand become top of mind. However, the voice domain is a perfect place to start a brand's sonic branding endeavor.

Hypothesis 2: There are more pressing, technological challenges when developing voice products

that leave fewer resources for thinking about sound to support the implementation of a great user experience.

One can compare the development of product managers and designers becoming aware of the importance of sound in voice user interfaces to the development of web pages or other visual interfaces when they emerged as new technology. From text-heavy and incredibly appalling designs to the seamless and lean experience, the user can have nowadays when browsing through the web.

I expect to see a similar development in the voice domain, from unbranded-, information-heavy-, menu-inspired voice interactions towards customized, emotive, and effortless conversations, or what we at VUI.agency would describe as assistants with auditive charisma. Technological advances partly drive this transition, but utilizing sound and exploiting the strengths of this modality plays a vital part in this transition, too.

Hypothesis 3: There is a lack of knowledge in the industry about how powerful sound can be

when it comes to supporting the emotive and functional side of voice assistants?

Functional sounds are probably the most popular form of sonic feedback used to guide the user through a conversation or product interaction:

Funtional sounds

You can use functional sounds to shorten or even replace the voice output (there is a brief mode on Alexa and Google Home for a reason).

Information and attention

They can prime further information and grasp the user's attention, just like the notification sounds of messengers.

Reward

They can be rewarding when an accomplishment happens, think about video games.

Transparency

They can help make your product more transparent about internal processes, for example, indicating when the interface starts listening and when it stops or when the interface is loading resources.

These are just a few examples. Another blog post on functional sounds in voice user interfaces will follow dedicated to all the above aspects. For the interested reader who can't wait until then, there are great sections in "Designing with Sound" by Amber Case and Aaron Day about functional sounds, and the book provides an excellent general overview of how sound can improve your product design.

Soundscapes

Another way to use sound in voice experiences is soundscapes, which significantly increase immersion. They might set the tone for a whole conversation segment or emphasize parts of a conversation, wherein in a real-world scenario, we might have used gestures or facial expressions to convey our message.

A whole field called Sonification also explores how we can translate data into an auditory form. So next time you have to show specific data to your users through the auditory channel, you can look at the "Sonification Handbook" or talk with us. 

I want to point out that for all these forms and use cases of sonic feedback, it is crucial that you make sure your sound conveys the intended information to the user, on a subconscious or conscious level, from a cognitive or affective perspective. A danger sign that does not look like a danger sign most likely fails its purpose. A danger sign that looks like a danger sign but does not indicate danger is creating a bad user experience. The same holds for sound. An alarm that does not sound like an alarm most likely fails its purpose. An alarm that sounds like an alarm but does not indicate something alarming creates a bad user experience. Designing meaningful sounds is something for another blog post, which leads me to the last hypothesis. 

Hypothesis 4: Companies know about the power of sound

and would like to harness it, but they do not understand how and when to include it?

I am sure at least some of you (if not the majority) probably thought, "Sound Architect? What is that, and what can you do with that?". To be fair, I did not only have to explain to my grandmother what I do for a living, but also to friends my age, so I will try to explain in a few words what I understand are the responsibilities of a sound architect. A sound architect, much like a real architect, takes a multidisciplinary approach towards a specific goal.

While an architect combines design-, engineering- and communication skills (among other skills) to build something, a sound architect considers aspects from Sound Design, Human-Computer Interaction, Sonic Branding, and Sound Engineering to come up with a sonic concept and accompany the implementation of the best possible and desired product experience. So, when you aim to develop a voice assistant with auditive charisma, take sound into the loop and consider it as an integral part of the project right from the beginning. It will fit seamlessly into the process.

I hope you could take away some learnings about how sound can boost your voice experience and why it is a good idea to consider it as an essential element of every voice user interface from the get-go. If you would like to know more, we are more than happy to answer your questions.