By Hammad Khan, head of experience imagination, Accenture Interactive
Like many others, I have used the recent lockdown period due to Covid-19 as an opportunity to reinvest in other forms of media – specifically by upgrading my podcasting studio. We are often guilty of unconscious bias towards tactile screen technology (such as visual UI, graphics, and video) when we think of digital channels. However, there is far more to audio than just podcasting, even if that medium has seen yet another boom in production and consumption of late.
In traditional audio conference calls, we have an established pattern: mundane small talk, repeating yourself after sporadic joiners/leavers, and the inevitable silences when somebody gives a monologue on mute. With Zoom and Teams, we’ve seen new behaviours become commonplace, with even the classic disruption apology of “sorry, that’s my kid in the background” becoming more acceptable than it used to be. It could also be argued that in some cases it even helps to create more empathy with your attendees, who realise you are just as human as they are.
The human voice, both spoken and heard, remains the most potent form of communication we have at our disposal. Thinking this through, it’s surprising just how many voice-based channels we already have in everyday use. However, they don’t form as large a part of a digital strategy, transformation or campaigns as they could.
On reflection, in the last few days alone, I’ve used digital voice interactions more than I would have predicted:
- Changed the AC setting using my car’s voice controls;
- Spoke to the virtual assistant of my local bank about a transaction;
- Followed yes/no prompts on an interactive voice response (IVR) when switching home services;
- Sent voice notes back and forth on Whatsapp with a mechanic who can’t write English;
- Had countless calls with colleagues on Microsoft Teams;
- Monitored my kids’ live school lessons on Zoom;
- Listened to ARN news on the radio in my car and through an app at home;
- Changed the playlist and volume on the Google Home speaker;
- Searched for a movie on Amazon Prime using my FireTV voice control.
Those are many channels that are fully voice-enabled already, and it feels like we’ve barely scratched the surface. We cannot continue to convince ourselves that a screen or touch-based interface is the default interaction to put at the centre of a strategy, product, service, or campaign. And if it was ever in doubt that we shouldn’t only focus on the latest and greatest technologies, just look at the overnight disruption brought by Covid-19, which has affected previously pioneering technology such as thumbprints and facial recognition, making them almost unusable for frequent authentications, transactions and selections.
I’m old enough to remember the early versions of Dragon Dictate, which promised us natural language narration becoming the norm. Twenty years later, it’s still not even commonplace for short statements, let alone more substantial content creation. I believe that as well as the technical challenges presented with voice (such as language, accent, noise, etc.), there is also a more human barrier to overcome that helps with adoption, advocacy and effectiveness.
To give an example of this, it wasn’t until the interface for spoken dialogue was given a personified name that we found it natural enough to go from a Nokia Communicator to the Star Trek Communicator (aka the modern smartphone). Alexa, Siri and Cortana are almost sentient beings, and even our parents and children are talking with them in increasingly sophisticated ways. “Hey Google” is the notable odd one out, with the brand choosing instead to flex its iconic status as a verb by retaining the moniker in its audio branding. The result, for me at least, is the feeling that I am still interacting with Google as a technology and not as a trusted confidant. The alternative isn’t without challenges, though, and who knows if Alexa, Siri and Cortana are listening to whether we are two-timing them with our requests for the weather update or a Spotify playlist. We are creating relationships with technology through voice. As this is set to expand, we need to ask ourselves a whole new set of questions as to why, how and what this involves.
Verbal communication is one of our strongest skills. Yet, we seldom use that to create a connection with brands and organisations in the same way we do our visual sight (video/advertising/content), tactile feeling (malls, products, spaces), or even taste (food and beverage outlets) and smell (perfumery). Our under-utilised sonic sense is an opportunity for innovation beyond a podcast or smart assistant.
In the Middle East, we have such diversity and cultural specifics, and it feels as if we could do more to co-create from within. I’ll leave you with some thought-starters to consider:
- What would the voice of your brand sound like?
- Is it viable or advisable to have multiple voices for multiple audiences, in the same way visuals, copy and creative are adapted in campaign content?
- How do you sustain relevant messaging in these voices? Do we need to retain voice talent in the same way we have art directors curating our brands visually or through motion?
- What is the measurement framework for vocal channels, and how do we establish the metrics to monitor them while remaining cognisant of more listening and privacy conflicts?
- How do we safely store, search, and share voice media?