Language assistants have been gaining popularity for several years. Almost all companies have developed an eloquent assistant or are currently in the process. Among the best known is Siri, which was launched in 2011, Google’s Ok Google, Cortana by Microsoft, Amazon’s Alexa and Bixby by Samsung.
The vendors pursue different specializations. OK Google aims to present the user with situation-specific information that is as personalized as possible. Alexa leads the field when it comes to smart home and in the extension of language commands (skills).
Voice control has managed to become a new interface, the Voice User Interface (VUI). This also brings with it new challenges for the user experience. Right now, you quickly realize that language assistants are still in their infancy – for example, when we hear the sentence „Sorry, I did not understand you correctly“ for the tenth time or when we wonder why the music stops when we use Alexa as a night light.
This is a good reason to research what is important in the development of a language assistant and what are the differences between conventional interactions such as mouse, keyboard, touch screens or buttons.
Low Entry Barriers
I can talk to that?
We are used to writing on our PC with a keyboard. Even typing on the virtual keyboard of our smartphones has now become second nature for us. What we are not yet familiar with is a conversation with our devices. Which speed and volume are optimal? It feels strange.
Language assistants are increasingly conquering our everyday lives. Nevertheless, many feel a certain shyness in dealing with them. Of course, it takes some time to get used to these eloquent helpers. As a concept designer, we can certainly help. For example, with a good onboarding that removes the user’s fear of contact and gives them an initial success.
Does everyone know what a burger icon is?
These and similar questions are familiar to us as User Experience Designers in the context of screen interfaces. With a VUI, there are no such problems. Every user knows how to speak. The user does not have to acquire new knowledge or familiarize themselves with an unknown platform. At least, that’s the best case.
But we cannot assume this is the ideal state!
If you install an ordinary iOS app and start it for the first time, a startup screen appears. You usually discover some buttons, a menu and some text to find your way around. If you have a screenless voice assistant like the Alexa Dot, you will not see anything. Matty Mariansky and his team have created a language assistant that can maintain calendar entries. He has tested this with users who did not know what it was. Resulting reactions of the users were not to be missed:
“This thing can do whatever I ask him, so I’m going to ask him to make me a sandwich.”
“I have no idea what I’m supposed to do now, so I’m just going to freeze and stare at the screen.”
Users who do not know what the language assistant should be able to do will initially be disappointed. The solution to the problem is simple. The wizard introduces itself with one or two introductory sentences. Thereafter, the user should be prompted to speak a command to start. For the user, it is a success if a voice command leads immediately to the desired reaction. Basically, it is recommended to test the onboarding multiple times and to keep the hurdles as low as possible.
Amazing results came from a study by Comscore. It was found that users of language assistants ask for very simple tasks. So, 57 percent of users ask about the weather. It is therefore important that the user learns, on request, how to expand and individualize their commands in order to continue to be satisfied and to unlock the full value of their assistant.
It becomes problematic when language assistants read long lists, for example, timetables or recipes. A common solution: to recite the first headlines and ask if the user wants to hear more. If you look for a random recipe, it can work well. If, on the other hand, you are looking for something more special, it will be exhausting and tedious. With a screen, you were able to scan the results within seconds and gain much more detailed insight. Every display, even a small one, has a clear advantage here.
Example: Recipe search – once with screen and once with language:
I: Alexa, open Chefkoch.
Alexa: Welcome to Chefkoch, what would you like today?
I: Alexa, search for Semmelknödel.
Alexa: I didn’t find anything about Hochmehlknödel.
I: (louder) Alexa, search for Semmelknödel.
Alexa: Here is the result, I found Sivi’s Semmelknödel, Semmelknödel excellent, (..). For more details, for example, open recipe 1.
I: Open recipe 1.
Alexa: For the recipe you chose, you will need 20 minutes and it has 4.7 of 5 stars.
Next, I’m asked if Alexa should send me the recipe. This will immediately appear in my Alexa app. However, I do not see any picture of the recipe there.
For recipes, I prefer to go directly to the website. What works linguistically perfectly is One-Shots. These are tasks that are done with one instruction: „Alexa, set the timer to 5 minutes.“ Equally good are commands that start a task: „Alexa, play music.“ Or „Alexa, how high is the Zugspitze?“
The less the user has to choose, the better.
Overall, the experience with Alexa shows that language alone is not sufficient as an opportunity to interact. That’s why more and more products, like Echo Show, are being developed – a combination of voice assistant and touchscreen. If you want to buy handkerchiefs, for example, you say: „Alexa, I need handkerchiefs.“ Then you see on the screen a variety of choices [d1] and can select a product. During product development, I think it is essential to decide early on whether it will be an assistant with screen support or not.
Follow the Course of the Conversation
A good Conversational User Interface (CUI) should be able to provide an understandable and consistent conversation. If you ask, for example, who is the 16th President of the United States?, Ok Google answers reliably „Abraham Lincoln“. I would like more information about him and ask „How old was he?“ And then „Where is he born?“. The answers are correct and refer to the 16th President. It’s great not to have to start all over again.
However, in the event of an abrupt change of topic, it may happen that the assistant is not sure whether the question relates to a new topic. Before it gives the wrong answer, it is more pleasant when the assistant asks nicely.
Inquiring is important for sensitive commands such as „delete this“. The assistant must understand the intentions of the user or ask what they should delete. Is this an email or your own Facebook profile?
Care for Personality
Chatbots need a personality – a tip from Bettina from her blog article Good UX for Chatbots. The same applies to language assistants. A detailed persona with name and behavior needs to be defined. Both features can also be related. Cortana from Microsoft got its name and personality from the futuristic PC game „Halo“. Alexa’s name, on the other hand, is historically oriented because it pays tribute to the library of Alexandria, and therefore stands for large amounts of knowledge.
One thing that stands out: at the moment, language assistants are predominantly female. Although this fact can be justified, it is far from set. It only matters that the voice is sympathetic and understandable. Another interesting consideration is the behavior of the digital helper: should our assistant react differently if a 20-year-old orders a pizza than with an elderly lady who has questions about a drug? In principle, these two persons expect a different tone in the conversation. The first one could be fun and casual, while the elderly lady is better approached with more thoughtfulness and precision. To find suitable behavior, you should allocate a lot of time to this task.
This topic also influences the development of the professional image of UX designers. In the future, we will design fewer visual elements, but many more artificial personalities. Exciting, I think!
Our Language is Complex
Scrolling through the Alexa skills is still quite sobering. There’s an awful lot on display, but most applications are not popular. 4 or 5 rating stars are rare. Actually, not really surprising – after all, the first apps for the smartphone were not a hit. It took a while for the technology to mature and for developers to become familiar with the possibilities.
It is similar with language assistants. Here too, UX designers and developers have to approach a completely new use case. In addition, there is another challenge: the understanding of human language.
We are used to clicking on buttons or pictures and having text display. If you look at highly-rated skills, you’ll notice that these skills do not really require language and fall into the category of Smart Home work:
“Alexa, open Sleep Sounds.”
“Alexa, reduce the temperature by 5 degrees.”
Although these commands work, there is room for improvement. The command is like a special syntax that you have to remember. It’s far from free speech. These commands can be improved, for example:
“Alexa, please adjust the heating so that it is not cold anymore. ”
“Alexa, turn off the stove before the milk boils over.”
We have to learn that language works differently than a website or voice search.
In a skill description you can find commands like „Timer on 5 minutes“. But even if it seems trivial, it should always be explained how I can turn the timer off again or change it. In the best case, the command is as simple as possible, so that the user can just start talking. The timer then gives a short confirmation and changes as desired.
Conclusion: Integration Into Everyday Life?
How the language assistants integrate into our lives will become apparent in the next few years. Many topics are still unclear. How to deal with difficult questions, as an example. This is not solved well in many cases today:
I: “I want to quit smoking.”
Siri: “Ok, here is what I found [Search results for nearby tobacco shops]”
Here’s one thing that becomes clear: rules such as ethical design guidelines and ethical health criteria need to be more integrated.
In any case, I am looking forward to exciting projects related to language assistants and to the possibility to dive deeper into the topic.