Siri, when will I be able to talk to you naturally?

Ask your digital assistant a question and if you are direct, concise and in possession of the right kind of accent, you may get a reply. Voice-activated assistants are still worlds away from the full-scale virtual companions Artificial Intelligence (AI) developers want them to be. Smart as they are, these devices keep on struggling with the complexity of human speech. So far, they remain limited to responding to simple requests related to performing mundane tasks or providing encyclopaedic knowledge.

Lost in translation

Voice-controlled artificially intelligent devices (still) lack appreciation for verbal improvisations, emotions and the richness of spoken language in general; their approach to speech is purely mathematical, based on clarity and simplicity. Until digital assistants learn how to talk like humans, humans must accustom their conversational techniques to compensate for virtual helpers’ shortcomings. Be prepared to rephrase your query as many times as it takes in order to be understood by your digital assistant! And if you crave deep, open-ended conversations that go beyond mundane, Siri, Cortana, Alexa and the rest won’t live to your expectations, at least for now.

Another major setback all digital assistants share is the accent bias inherited by their human creators. For example, newspapers including the Washington Post have reported on tests of various digital assistants that demonstrate how they can fail to understand the accents of people from ethnic minorities in the USA. (One company carrying out such tests is Globalme.) The reason for this is that AI has been trained to talk and listen like its American-English-speaking developers. Various tests and surveys reveal that voice-controlled AI is more unfriendly and demotivating to people whose accents don’t fit the mainstream. Some users report delays in digital assistants’ responses when the question is being delivered in a certain pronunciation; others are left even more humiliated by the “Sorry, I don’t understand that” reply.

Digital alienation

Sophisticated natural speech, that digital assistants will eventually be able to deliver, is expected to be of immense importance to many sectors and industries.  Already, interactive voice response systems work quite well in customer service call centres for banks, ticket booking companies and other consumer-facing businesses.  AI-powered assistants may also be able to make a big difference in the lives of bed-ridden patients in the future as they will allow people to be more independent. However, at present, it is still hard to predict when machines will finally start chatting to humans in a friendlier manner. Listen like a machine, speak like a human is considered to be the winning formula for teaching AI natural speech. Until this is achieved in practice, people with strong regional accents and non-native English speakers will suffer digital alienation (they complain their accents are perfectly understandable to other people but not the machines).  One of the principles of machine learning is that the bigger the data set, the better the performance of the system: AI assistants may not be that different from humans in that the more they hear different accents the more successful they will become in understanding them. Amazon, whose digital assistant Alexa is among the most popular in the world, said in a statement: “As more people speak to Alexa, and with various accents, Alexa’s understanding will improve”. Training AI on diverse accent databases is the key to resolving the issue. And AI software companies have a financial interest in eliminating the accent bias as a more open-minded AI will pave its way to wider markets.

Too human to be true?

Another question the AI-powered assistants are faced with is how human their speech needs to become before crossing the line where they become indistinguishable from a person. A Google experiment called Duplex caused controversy. The virtual platform was demoed during the Google I/O 2018 event, making appointments over the phone for its client in an unprecedented human-like tone of voice. According to Google, people contacted by the AI assistant were unaware that they were conversing with a machine. Press described the demonstration as “impressive”, “unbelievably human-like”, “potentially faked”, “creepy” (see, for example, this take on the demo). While little is known about Duplex in terms of technology as it is still in testing phase, Google said it was working with a number of businesses for testing and early adoption of what has been called the most human-like AI platform to date.

Google’s Duplex and similar AI-powered assistants may be tailor-made for call centres; some see it as the perfect business assistant, whose only flaw will be its inability to make coffees. However, some have voiced concern over the extreme similarity to human voice and the distress it may cause. They believe AI should aim for a middle ground between cold, unnatural robot-like way of speaking and the genuine complexity of human voice. A good discussion of the issues is here.

A recent competition between the most popular voice assistants in the market released Google’s Assistant as the fastest and most accurate in question answering. While replying in a blink of an eye to the question “what is the square root of 128998?” is certainly impressive, no voice-activated AI can answer a seemingly simpler sounding question: “When will humans be able to talk to you naturally?”


[Image licensed to Ingram Image.]

Add this: