How to add sound to your chatbot

The seven most important rules of thumb if you want to get started with voice technology

6 min read

Voice technology in Danish has taken some years. We have looked enviously at the English-speaking businesses that have many good platforms to choose between and therefore have plunged quicker into cool voice solutions. In Denmark, a large part of businesses gradually has chatbots. So, as the voice technology is ready in Danish, it is a very good idea to make a "voiceification" of your chatbot. Because it is obvious and technically "easy", you can easily overlook the things that should be changed, if you do not want to go from a cool chatbot to a half-dead voice bot.

What I mean by "half-dead" is the fact that the voice bot users have difficulty listening to - for example because of too long monologues or monotonous listings - the consequence is that all information is lost. If the information is lost, the voice bot does not live up to its purpose to help quickly 27/4. We must therefore secure that the users listen, understand, and remembers what the voice bots are saying.

So, here comes the seven most important rules of thumb if you want to get started with voice technology.

1. From writing to speech

Most often, I recommend that chatbots have a language that resembles spoken language rather than written language. This recommendation is because that spoken language is shorter and easier to read in the graphic and visual little window, where the chatbot usually "lives". It is a prerequisite for chatbots to keep it brief. A user can't listen to a longer monologue in a voice bot without ending in cognitive overload. If your responses take 20-30 seconds for the voice bot, you should split the dialogue into smaller parts with more shifts between user and bot.

You also should remove bullets, emojis, GIFs, pictures, links, and buttons. These are everything that I normally used to recommend to secure that the chatbot is easy to decode, has personality, and can communicate simply and easily in writing. In a speech, contrarily, we have the voice and the words to cooperate instead.

2. Short, shorter, shortest

There are also other elements we should remove from the language of the chatbot if it should be a well-spoken voice bot. This for example applies to hyphens in listings (Wich we use in Danish, not in English): "I can help you with vacation-, illness-, and overtime registration". Reformulate so it is clearer that you are talking about three different things: "I can help you with registering vacation, illness, and work overtime". The latter is moreover better to implement in a chatbot because it is easier to read and understand. Nonetheless, many authorities write like the first example above.

3. The voice bot's personality

Occasionally, I meet businesses that believe it is significant the way a voice bot talks. One might think so. Just like it does not matter how the employees who work in customer service talk. If you have called customer service just once, then I think you will agree with me on the fact that how an employee talks to you is very relevant - just like with voice bots.

Personality is difficult to get settled in the voice bot's vocabulary - it becomes too long and feels ineffective for the user. It is therefore critical that you remove superfluous words and instead move the bot's personality to the voice and the choice of voice. SSML is used to adjust and customise the voice (like HTML, just for voices). With SSML you can adjust the pace of the voice, pitch, pronunciation, and so forth. In this way, you get your voice to sound efficient, serious, patient, and willing to help.

Those of us who do not normally work with voices as a subject area may find it difficult to assess the things that need to be adjusted so a digital voice, objectively, will signal what it needs. This is why we, at KPMG, have initiated a collaboration with the business Ministry of Music that works with performance training and voices – to ensure that the digital voices we create, get the expression and look that we want, regardless of if it is to sound friendly, trustworthy or funny - or anything else you need for your brand to signal.

4. Create progress in the conversation

Just like we need two people to create a good conversation at a dinner party, I use to say that we also have a common responsibility for the conversation to move forward when we talk with bots. This means that the voice bot must take some of the responsibility so the recipient knows what the next turn should be about. In writing, we often have buttons that help the recipient along the way, but in a voice bot, it must be clear without any visual help.

There are many places where you will read that voice bots have to end their answers with a question. It is a good technique, but it is extremely humdrum and tiring to be the recipient. It feels more like a cross-questioning than an actual conversation. Therefore, it is all about being thorough, when you create your choice of answers - what kind of input would the recipient think to ask in different situations? If the voice bot is handled well, it can detect the different outcomes. To get the full picture of the outcome, the voice bot will demand more user tests than the chatbot.

5. Goodbye and thank you

When a chat conversation ends, the users occasionally write "Thank you". However, it is often that the user leaves the conversation when she has received the answer that the needed. It does not feel impolite to "ghost" a chatbot, but it is different from a voice bot. It will for most of the time feel uncomfortable to hang up as soon as you have got the information you needed. We should therefore take into account how the users end the conversation - if they say "Thank you for helping" or the voice bot could ask if there is anything else it can help with? It of course implies that the user has experienced that the bot was able to help with the first thing the user asked for - otherwise it is a bit of a bother.

So use more energy on how the conversation should end. It is a sad user experience to have a successful conversation, which ultimately ends with a voice bot in I-do-not-understand-you-circles as an ending of the conversation.

6. Handle the errors - lifelines are guarantees

Your bot can appear smart in the process of handling the error. It sounds wrong, but there are awkward errors of handling that frustrate many users and thereby maintains the opinion of "stupid bots" - even though the bot perhaps has mastered most of the conversation close to perfect. There are especially two types of mistakes:

Those mistakes where the bot does not listen. Either because the user is a bit put out and does not say anything, they think longer than they use to or there has been a technical mistake that prevents the voice bot from listening.
Those mistakes where the voice bot mishears. If so, it is paramount that the user gets back on track. As a voice bot designer, you must consider the answers that should be possible to interrupt, and how the users either reenter the conversation or gets help from a human being.

Uniform error messages make the majority of people furious. We have other expectations of conversations, which is why we often hear people grumble at Siri or tell her that she is stupid when she repetitively says "I am not sure I understand".

7. But does it work? Find out - test it again!

You receive the best result if you test your 'voiceificated' chatbot the same way as if you have built a voice bot from scratch - with, for example, back-to-back or Wizard of Oz testing. A simple way to test it is by reading your answers aloud. In that way, you catch a large part of complicated words and long answers. If you are a little more ambitious, you must go through all varieties of all flows. In that way, you secure that the articulation of all words is appropriate, so the bot does not confuse the user. You will only realise that, if you listen to everything, carefully.

Other than that - it just takes you to throw yourself into building a good voice bot. They are not difficult to build and they can do much more than most people realise. Have a good time!

Written by Linea Svendsen