MI01:P1 = Microsoft PDF 01, Page 1G01:p1= Google PDF 01, Page 1Ag01:p01 = alibabaAM01:P1 = Amazon PDF 01, Page 1
Open code123456789101112131415161718
Make Conversation Human
Model conversation after human speech
MI01:P3:All interactions with the user should use a conversational tone whether spoken or written.MI01:P3:Be conversational. Interact how people speakAM01:P1:As you design your skill, create scripts for the dialog between the user and Alexa.Scripts show the conversation between the user and Alexa, like in a movie or play, and are a great way to determine how
conversation will flow. Use scripts to help identify situations that you may not have already accounted for.
AM01:P2:Make sure that Alexa speaks like a person, for example using contractions and avoiding jargon.G02:P1:Conversation design is a design language based on human conversation (similar to how material design
is a design language based on pen and paper). The more an interface leverages human conversation, the
less users have to be taught how to use it.
G02:P02: persona: It should be easy to answer the question,
“What would this persona say or do in this situation?”.
G02:P4:Conversations with a computer should not feel awkward or break patterns that have evolved over the past hundred thousand
years. Instead, computers should adapt to the communication system users learned first and know best. This helps create an
intuitive and frictionless experience.
G03:P2:Research has shown that people respond to technology as they would to another human. This means users rely on their existing
model of human-to-human conversation and follow the Cooperative Principle even when interacting with the persona of a conversational user interface, and they expect your persona to follow it, too.
G03:P08:People naturally avoid ambiguity and obscurity of expression in a conversation. Using words and phrases that are familiar help reduce cognitive load. When it comes to word choice, if
you wouldn’t say it, neither should your persona.
G08:P01:The goal of creating a persona is not to trick the user into thinking they're talking to a human being, but simply to leverage the
communication system users learned first and know best: conversation.
Your persona doesn’t
have to be a person. It could also be an
anthropomorphized animal, an alien,
an artificial intelligence, a cartoon
character, etc.
G09:P01:When starting, we recommend focusing on just the spoken conversation—that is, designing for a screenless device like Google
Home. Getting the flow right is easier if everything is in one place—the spoken prompts.
MI01:P3:Use contractions for more natural interactions and to save space on Cortana's canvas. For example, "I can’t find that movie" instead of "I was unable to find that movie." Write for the ear, not the eyeAM03:P2:Make sure that Alexa speaks like a person, for example using contractions and avoiding jargon. This will help the user more easily understand Alexa and encourages the user to speak naturally in return.G15:P06:Use contractions.
Spelling out words like "cannot" and "do not" can sound
punishing and harsh.
G9:P03:Find a partner and role-play the
conversation, with one person
pretending they’re the user and the
other pretending they’re the system
persona. Record the conversation.
Aim for Natural ConversationAM03:P1: Alexa responds, informs, and asks questions in a natural and conversational way.AM01:P2:Be sure to listen to how your prompts sound when spoken by Alexa. Sometimes, a written phrase doesn’t sound natural and
needs to be reworded.
G01:p1:Our goal is to help you:
Craft conversations that are natural and
intuitive for users
MI01:P3:Don’t emphasize grammatical accuracy over sounding natural. For example, ear-friendly verbal shortcuts like "wanna" or "gotta" are fine for text-to-speech (TTS).

MI01:P3:Avoid ambiguity. Use everyday language instead of technical jargonG10:P02:Pay attention to the way
users naturally ask for
things. Do they feel like
they can only speak in
short keyword-like
phrases, or do they
sound more
conversational? Do they
sound hesitant or
confident when
speaking to your
persona? Does the flow
make users feel like they
AG01:P01:For developers and designers of voice-interactive applications, a new challenge has been added: how to create a user experience based on natural conversa tions.The advantage of voice interaction over graphical interaction is that users naturally know how to communicate in language, so it is recommended that when you design the language and interact with sounds, take advantage to allow users to naturally communicate with everyday conversations. When designing conversational content, many times you may unwittingly write in written language. But in the process of voice interaction, written language may sound unnatural and not easy to understand. So it is recommended that you use TTS or when writing your conversations, read aloud to test the dialogue you have designed. It is better not to sound unnatural even if there are the grammatical errorsG02:P02: persona: It should be easy to answer the question,
“What would this persona say or do in this situation?”.
AM03:P2:Make sure that Alexa speaks like a person, for example using contractions and avoiding jargon. This will help the user more easily understand Alexa and encourages the user to speak naturally in return.
Enact a PersonaG08:P01:A good persona evokes a distinct tone and personality, and it’s simple enough to keep top-of-mind when
writing dialog.
G08:P02:Your persona can help provide users with a mental model for what your Action can do and how it works by starting with what
users already know. For example, in a banking application, the persona could be modeled after an idealized bank teller—
trustworthy with customers’ money and personal information. The metaphor of the bank teller makes this new experience feel
familiar, since users’ real-world banking knowledge can guide them.
G08:P02: Steps for persona creation
Narrow your list down to 4-6 key
adjectives that describe your persona’s
core personality traits.
G08:P02: Steps for persona creation
Brainstorm a list of adjectives (e.g.,
friendly, technologically competent).
Focus on the qualities you want users
to perceive when talking to your
G08:P02: Steps for persona creation
Choose one character that best
embodies your Action and write a short
description, no more than a paragraph.
This description should provide a clear
sense of what this persona is like,
especially what it would say, write, or
Focus on personality traits,
G08:P02: Steps for persona creation
Find, or create, an image or two that
visually represents your persona.
Pictures are a great memory aid and
can help you keep the persona in mind
when writing dialog.
When people hear a voice, they instantly make assumptions about the speaker’s gender, age, social status, emotional state, and
place of origin, as well as personality traits like warmth, confidence, intelligence, etc. People can’t help but do this with virtual
assistants, too—so
Voice: synthesised: The Actions on Google platform provides a variety of text-to-speech (TTS) voices that
speak different languages. Go to Languages and Locales to hear them.
Voice: synthesised: Note that you
can adjust the way the synthesized speech sounds by using Speech Synthesis
Markup Language (SSML). For example, you may want to add silence or pauses,
specify how numbers should be read, or adjust the intonation.
Voice: Recorded: You can hire a professional voice actor, or even try using your own voice. Either way,
you’ll need to record all the audio that will be used in your Action.
AG03:P01:Before designing the conversation, it is recommended to set an image for your voice skills. This image determines that you communicate through voice. The brand image of users, the dialogue content you designed, such as sentence structure, grammar, mantra, etc., must be displayed according to the image you set. Therefore, the voice image you have developed needs to match your voice skills in everyday use scenarios, target users and skill objectives. For example, if the voice skill to be designed is to tell users a joke, the voice image you set may be young and humorous; while for the technique of reading daily news, you need to use a more mature and stable voice image.G02:P3:Defining a clear system persona is vital to ensuring a consistent user experience. Otherwise, each designer will follow
their own personal conversational style and the overall experience will feel disjointed.
At Google, we’ve created the Google Assistant. Everything the Assistant does (e.g., says, writes, displays, suggests) and
everywhere the Google Assistant appears (e.g., the look and feel of the software and hardware) were designed to evoke a
consistent persona.
AP04:P02: Don’t attempt to mimic or manipulate Siri. Your app should never impersonate Siri, attempt to reproduce the functionality Siri provides, or provide a response that appears to come from Apple.
Make Conversation Personal
Set smart defaultsMI01:P1:Does the design use default values when the user is not specific?AP04:P02:On Apple Watch, design a streamlined workflow that requires minimal interaction. Your app can’t provide a custom user interface for Siri to display on Apple Watch. As a result, your app’s Siri experience should be streamlined and minimal. Whenever possible, use intelligent defaults rather than asking for input. For example, a ride sharing app might automatically default to the last requested ride type, or a fitness app might default to a favorite workout.AP04:P02:when a request has a financial impact, default to the safest and least expensive option. Never deceive the user or misrepresent information. For a purchase with multiple pricing levels, don’t default to the most expensive. At the point where a user is making a payment, don’t charge extra fees without informing them.MI01:P6:Use default values when the user is not specific. For example, if the user says, "Make my room warmer," Cortana should say, "I’ve raised your room temperature to 72 degrees" instead of "Sure, what temperature?"AM02:P05: Use built-in slot values whenever possible to help save time and improve accuracy. As appropriate for your skill, you can also
extend some of the built-in values. For example, for a local region, you might extend AMAZON.US_CITY to include all of the local
cities and towns. For more information,
"G13:P04:a) Implicit invocations
Users tell the Assistant they want to accomplish a certain
task (by saying an invocation phrase), and the Assistant may
suggest your Action to complete that task.
"Ok Google. Tell me about the I/O 18 keynote"
"Ok Google. Find a session at I/O 18"
"Ok Google. How do I watch I/O 18 remotely?"
"Ok Google. When is I/O?”
G03:P07:Knowing what someone said is not the same as knowing
what they meant. People often suggest things rather than
state them explicitly. Our ability to “listen between the lines”
is known as “conversational implicature”.
AP04:P02: Whenever possible, complete tasks automatically. Messaging watchOS apps, for example, automatically send messages unless a Don’t Send button is tapped.
Intelligently interpret users’ varied request phrasing
MI01:P2:Because there are many ways to express the same intent, you need to develop as many variations of the intent as possible when you model your intents.AM02:P1:Expressing and extracting meaning is not as simple as it may seem, and you’ll need to design conversations between Alexa and your customers carefully and intentionally. A great voice experience allows for the many ways people might express meaning and intent.AM02:P2:Avoid assuming that people will say precisely the words that you anticipate for an intent. While the user might say “plan a trip,” he
or she could just as easily say “plan a vacation to Hawaii.” To
AM02:P03:One of the most important aspects of designing a voice experience is defining the
range of what people might say.
To help ensure a good experience, provide examples from complete commands all the way through incomplete and ambiguous
fragments. To make sure you have coverage, include subtle variations and even mispronunciations. For example, include
“arrangement” and “bouquet” when talking about flowers even though they have similar meanings.
AM02:P4:To make sure your skill performs well, a good benchmark is 30 or more utterances per intent, even for simpler intents. You don’t need 100% coverage, but more examples are better. Also, plan to continue adding utterances over time to improve skill
AM02:P4:Think about ways that the user might say all of the slots in one utterance.G10:P02:Users might say
something you didn’t
expect. Take note of it
and add handling for it
in your design.
G11:P05:By anticipating variations in user responses, you can create
robust intents and avoid errors.
G27:P01:Before asking a question, think about what responses you can reasonably support. Don’t ask the user a question if you’re not
prepared to handle their answer.
AG01:P03:IPeople are complex and changeable creatures. Theymay use various words and methods to describe the same thing because of the current time, environment, and even the ongoing actions. Therefore, for developers and designers, speech design will face a large number of diverse voice input commands.
G03:P07:Knowing what someone said is not the same as knowing
what they meant. People often suggest things rather than
state them explicitly. Our ability to “listen between the lines”
is known as “conversational implicature”.
Intelligently respond to users with varied phrasing
MI01:P3:Use variations. Use variation to help make the app sound more natural. When repeating a question, ask it differently the second time. For example, a variation of "What movie do you wanna see?" might be "What movie would you like to watch?"MI01:P3:Use phrases like "OK" and "Alright" in responses with restraint. While they provide acknowledgment and a sense of progress, they can also get repetitive if used too often and without variation.AM03:P4:Use variety to inject a natural and less robotic feel into a conversation and make repeat interactions sound less rote or
memorized, for example by randomly selecting from reasonable synonyms of the same prompt.
AM03:P3:Introduce variety if the user will hear the same prompt frequently, for example in your opening and closing prompts. This kind of variety is a good way to add personality.G03:P13:Variety is the spice of life. Users pay more attention when
there’s more of it. Variety can also keep the interaction from
feeling monotonous or robotic.
So randomize. For any given prompt, there are usually a few
conversational alternatives that’ll work. Focus your efforts on
prompts that users hear frequently, so these phrases don’t
become tiresome.
G14:P04:Randomize prompts when appropriate.
Craft a variety of responses just like a person would. This
makes the conversation feel more natural and keeps the
experience from getting stale.
G17:P01:Acknowledgment: Avoid overuse by adding randomization and by skipping some acknowledgements in dialogs. The experience will quickly become
monotonous and robotic if your persona starts every utterance with “Okay”. For example, after your persona has completed a task,
it’s appropriate to randomize among synonyms like “Done,” “Got it,” “Alright,” “There,” “You got it,” and “Sure.”"
Intelligently respond to requests of varied completeness
MI01:P2:When the user provides a complete request in their first utterance, you should respond directly to their request and either propose further interaction, if required or end the conversation.MI01:P2:The conversational design process should identify intents, entities, and utterances. MI01:P2:Your skill should detect the missing element and automatically provide a follow-up prompt that asks for the missing entity.MI01:P3:Incomplete Intends.When this happens, you need to tell the user how to interact with your skill.It is critical that the skill consider the first-time user interaction and help them to get started. You should present a list of three potential options to choose from.
AM03:P9:When Alexa receives no answer from the user, use a re-prompt with a slight rewording. This is an opportunity to add detail in case the customer did not understand.AM02:P4:Occasionally, users offer more than one answer even when Alexa requests only one. If Alexa prompts for a departure date, the
user may answer by providing the date and the departure city. The user might even provide other information that is needed like
arrival city and activity, and not provide the date that Alexa requested.
G03:P02:Because users are cooperative, they often offer more
information than is literally required of them.
AG01:P05:In the process of speech interaction, there may be several different conversation categories such as clear intention, unclear intention, and no intention , depending on whether the key information given by the user is sufficient or not. In order to create a speech application with good user experiences, when a user initiates an intent through voice, it is necessary to make an appropriate dialog design according to different types of intentions.
AG02:P02:The user-initiated dialogue can directly disassemble the corresponding wake-up words and the corresponding key words in the intention. AliGenie can immediately find the conversation content and interface data that you have set to directly answer the user's current question.
AG02:P02:The definition of non-clear intention on the the AliGenie developer platform is a user-initiated conversation contains the corresponding wake-up words, but the key words that should be intended are incomplete and need to be asked to the user for more information.AG02:P02:When users don't understand the app’s speech settings, they may say a conversation without specific intent. On the AliGenie developer platform, you can use wake up words to write instructions that let users quickly understand the skills, and help users learn how to use your app. You can also set the option of giving priority to users after collecting enough user data so that users can intuitively feel how to use your speech settings.AM02:P03:One of the most important aspects of designing a voice experience is defining the
range of what people might say.
To help ensure a good experience, provide examples from complete commands all the way through incomplete and ambiguous
Use contextual dataG26:P03: Spoken prompts should lead with an implicit confirmation of
the information that was said or implied, followed by the new
information. This is because spoken English places the most
important information (e.g., the answer) at the end of the
sentence; this is known as the End-Focus Principle.
For some answers, a simple informational
statement is sufficient
G27:P03:Language is filled with ambiguity, though most of the time it
is resolved through context. When context is not enough, it’s
okay to ask the user for more information.
AG01:P02:a dialogue process with context may be generated by expressing the complexity, time, place, efficiency, etc. of the event, and all the dialogue elements are woven in a coherent linear dialogue. Therefore, when building your voice application, you need to consider whether the user has a contextual dialogue situation when using it.AG01:P02:In addition to the need to correct keywords to help the two sides to synchronize the current cognitive dialogue, but more time is dependent on the context of the current dialogue. Therefore, developers and designers in the establishment of voice applications, they need to make some possible assumptions and speculation in advance, so that the user's voice interaction will not be confined to the narrow use of the scene"G13:P06:Use appropriately varied greetings
Greetings should differ based on the invocation—in this case,
the link’s destination. For instance, a link without intents
should lead to an initial greeting (like the example above),
whereas deep links should lead to more specific greetings
(depending on the promised function)."
“This,” “that,” “here’s,” and “it” help to identify subjects that have been previously referenced
AM03:P5: Similar to conversing with a friend, users appreciate when Alexa remembers what happened recently and what was said, especially for frequent actions and static information. For example, you could be in the middle of a game, walk away for an hour or two, and pick up right where you left off.AM03:19:When responding to a request for help, provide additional prompts to give more context to the immediate conversation. For example, if a user asks for help in the middle of confirming a pizza order, focus on completing the confirmation and avoid including information about selecting a topping. Design the conversation to ensure that help is not needed very often.G03:p10:Utterances often can’t be understood in isolation; they can only be understood in context.

Pronouns or generic references
Your persona needs to keep track of context in order to
understand the user’s utterances.
G03:P12:Your persona needs to keep track of context in order to understand follow-up intents. Unless the user changes the subject, we can assume that the thread of conversation continues. Therefore, it’s likely that ambiguities in the current utterance can be resolved by referring to previous utterances.G24:P01:Good error handling is context-specific. Even though you’re asking for the same information, the conversational context is different on the second or third attempt.AG02:P07:Considering the various situations that users may encounter when using your voice skills from awakening your speech skills to terminating your conversation interaction, you need to speculate in advance about the user's possible usage in situations, and provide corresponding instructions when users are not familiar with the application.
Make Conversations Efficient
Reduce time and effort as compared to screensMI01:P1: If the user needs to share content or set a reminder, it's easier to say, "Share that with John" or "Remind me to pick up Jessie".G05:P5: Interaction is brief with minimal back and forth dialogMI01:P1:Will the design solve the user’s problem with the minimum number of steps?MI01:P:Which scenarios are relatively quick to complete with minimal steps?MI01:P1: For example, saying "Play the latest House of Cards" is much easier than opening up an app, searching for "House of Cards", finding the latest episode, and pressing play.G05:P03: User will have to navigate multiple apps and screens to perform a task.G05:P03:When user will have to tap multiple times on the screens to perform a task within a screenG05:P03:Conversation saves the
user more time and
effort than a screen-
based UI. Conversation

can be the ultimate
shortcut. It reduces
friction by quickly getting
the user what they want.
Support multi-taskingMI01:P1: When the user's hands are busy
If the user is cooking, they can ask, "Was that 1 tablespoon or 1 teaspoon of salt?"
AP04:P02:Strive for a voice-driven experience that doesn’t require touching or looking at the screen. People often interact with Siri through a headset, through their car, with HomePod, or from across the room. To the extent possible, let people complete tasks without unlocking their phone. If you must present options, offer focused choices that reduce the possibility of additional prompting.MI01:P1:If the user is laying down, it's easier to use your voice than a keyboard.AP04:P04:Craft great voice responses. Remember that people may perform your app’s actions from their HomePod or through CarPlay without looking at a screen. Strive for voice responses that are just as engaging as your custom Siri interface. For example, if the user runs an Order my favorite soup shortcut on a day their favorite soup isn't on the menu, the soup app could respond with a spoken message instead of a failure such as, Sorry, we’re out of your favorite today.MI01:P1:If you can use your voice to do things like reading and reply to incoming messages or control music while you're writing a document.G05:P03:Conversation lets users
multitask. It helps them
when they're busy,
especially in situations
when their hands or eyes
are occupied, or when
they’re on the move.
MI01:P1:When the user is driving, walking, or is otherwise distracted
If the user is driving or walking, it is easier to use voice than to navigate the world safely while using a device.
Keep conversations conciseAM03:P1:Users need Alexa to speak concisely without extra words. This helps them understand what Alexa is saying and feel confident about what is happening. Longer responses tend to be more difficult to follow and remember.MI01:P6:For most people, the human brain can only remember a small amount of information when listening to instructions. Limit voice interactions to only what is absolutely required. For example, present only three items of a list at a time.MI01:P3:Efficient
Use as few words as possible and put the most important information up front.
AM03:P1:When writing what Alexa will say, read aloud what you’ve written. If you can say the words at a conversational pace with one breath, the length is probably good. If you need to take a breath, consider reducing the length.AM03:P4: As a person uses a skill more and more, he or she becomes increasingly comfortable and remembers what will happen.Consider making the prompts shorter and more direct, and even acknowledge the frequency of use.G03:P05:In conversation, saying too much is as uncooperative as
saying too little. Facilitate comprehension by keeping turns
brief and optimally relevant from the user’s point of view.
G14:P02:Be informative, but keep responses concise. Let users take
their turn. Don’t go into heavy-handed details unless the user
will clearly benefit from it.
G25: P07:Tapering
Consider both first-time and repeat users of your Action. A novice user might need more detailed descriptions of your Action’s
options and features. This same information can become frustrating to more experienced users (it violates the Cooperative
Principle). On the other hand, an expert user might benefit from a well-placed tip about an advanced feature (information which
might overwhelm a novice user).
AB01:P02:it is necessary to analyze the user's intention and make a corr ect response within a limited dialogue.G10:P05: Signs of frustration or
This is typically a sign
that the interaction is
too long-winded. Review
your prompts to see if
you can be more
concise. Are there
details that can be
AG02:P06:Try to avoid getting more than 2 key words in multiple rounds of conversation, avoiding users talking too broadly.When a user performs voice interaction, the amount of information that can be carried each time is much lower than the interface interaction. If there is multiple information in a single interaction, users cannot select options by browsing multiple voice information. At the same time, user's memory for voice information is shorter than that of images. You can only rely on limited voice prompts and short-term memory to complete the operation. Therefore, it is recommended to reduce the amount of information in each conversation to help users complete the mission.When using voice interaction, the design goal should express the necessary information in a short time as much as possible to avoid outputting irrelevant information.When designing a voice dialogue, the design goal should be to synchronize the current status with the user as much as possible, avoiding the use of a strong subjective consciousness and being too arbitrary, or the content that is not relevant to the current context.
Make lists efficientAM03:P8: Pacing: Use pacing to help the listener distinguish where one list item ends and the next begins,
Specify a comma plus a 350-ms pause using SSML after each item instead of a period or question mark. This makes the
final item sound more similar to other items in the list.
Avoid adding an additional pause to list introductions that end with a period or question mark.
For lengthy list items or those that require the user to think more deeply, consider replacing the 350-ms pause with a
400-ms pause.
AM03:P7: Brevity:it shouldn’t take more than 20 seconds to read the first few items in the list.AM03:P8:List handling:Start with reading between two and five items, and adjust based on the following:
How familiar the user is with the list items.
How long and voice-friendly the item names are.
The total number of elements read and displayed per item, for example Alexa might read the item name while
displaying elements for image, ratings, and distance.
Whether the count of items sounds like enough without sounding too long.
AM03:p8:To improve comprehension when reading a list, try to cluster items into sets of two or three. Also, don’t try to pack
everything into the list items. Allow the user to tap the item to learn more.
MI01:P3:If you present more than three options, the user might be overwhelmed and frustrated, which leads to a rejection of the voice experience.
MI01:P4:Limit the list of options to three entries, and state all options clearly. Because users can't quickly scan and skip content like they can in a visual interface, it is important to keep your questions simple and concise.
AP04:P02: If you must present options to the user, offer efficient, focused choices that reduce the possibility of additional prompting. Whenever possible, complete tasks automatically. Messaging watchOS apps, for example, automatically send messages unless a Don’t Send button is tapped.MI01:P4:Good uses of a directed prompt are when:There is never more than three options
MI01:P6:For most people, the human brain can only remember a small amount of information when listening to instructions. Limit voice interactions to only what is absolutely required. For example, present only three items of a list at a time.AM03:P8:Brevity:Start with reading between two and five items,
How familiar the user is with the list items.
How long and voice-friendly the item names are.
The total number of elements read and displayed per item, for example Alexa might read the item name while
displaying elements for image, ratings, and distance.
Whether the count of items sounds like enough without sounding too long.
MI01:P3:Incomplete Intends.When this happens, you need to tell the user how to interact with your skill.It is critical that the skill consider the first-time user interaction and help them to get started. You should present a list of three potential options to choose from.
AG02:P06:When users have more than 3 options, sort by "popularity" and "importance" AG03:P01:It is recommended that when creating a voice skill, provide less than three options for the user to choose from, including the options to expand the selection.
Scaffold turn-takingMI01:P4:Your design needs to make sure that the user clearly understands what you are asking and that a response is expected. Just presenting the options is not sufficient. Follow a list of options with a question so the user knows that they are expected to say somethingAM01:P3: Generally, end with a question before having the user respond. The question provides a cue to begin speaking and coaches the user on what to say next. End the prompt right after the question so that people don’t try to answer while Alexa is speaking.AM03:P9:don’t ask a question before presenting the list. For example, asking “Which cheese do you want? Gouda,
Cheddar, Brie,...” confuses the user about when to speak, and the user may try to answer the question while Alexa reads
the options.
AM03:P9:Don’t use prompts that encourage the user to barge in, for example “When you hear the option you want, just
say it.” Barge-ins are also discouraged because the user has to use the wake word to interrupt Alexa’s response.
G03:P15:Your persona should give clear signals when it’s the user’s
turn.Make the call to action clear by asking a question.c
G03:p16:Your persona should not monopolize the conversation or try
to present all options/questions in a single turn.
G26:P08:Menus can be used to present options before
asking a question.
In this example, a narrow-focus question is used for
G26:P8:A question can include a menu as long as the options are
short and few. This narrows the question’s focus and makes
it easier for the user to understand.
"G25: P05:
Ask questions to let the user know it’s their turn to speak.`
Users should find it easy to respond to this narrow-focus
question, by saying “a number from 1 to 100”."
"G27:P01:One of the most effective ways to get the user to continue the conversation (e.g., make a choice) is to
ask a question. When the call to action isn’t clear, the user won’t know when, or how, to respond."
AG01:P02:In the process of rotating conversations, if there is no key word that effectively connects the two sides, it is easy to lose synchronization in the rotating conversation.
AG02:P06:During the process of a multi-round conversation, it is recommended that your conversation interaction strategy needs to include dialogue guidance, dialogue compression, dialogue repiring, etc., to help users use it more fluently.

Provide the best type of confirmation
G20:p04:Implicit confirmation of parameters (common)
Use most of the time, not to confirm the user’s input per se,
but to confirm the parameters that were said or implied.
Users require this context to understand the response.
G20:p04:Implicit confirmation of actions (common)
Acknowledge that an action has been completed (unless it is
G20:p05:No confirmation of actions (uncommon)
Use when the action/response itself makes it instantly clear
that you understood the user. This is true for global
commands like “stop” or “cancel”.
G20:p05:No confirmation of parameters (rare)
Don’t confirm if the input is simple and typically recognized
with high confidence, for example, yes/no grammars.
G20:p05:Explicit confirmation of actions (rare)
Double-check with the user prior to performing an action that
would be difficult to undo, for example, deleting user data,
completing a transaction, etc.
G20:p06:Explicit confirmation of parameters (rare)
Use sparingly, only when the cost of misunderstanding the
user is high, for example, names, addresses, texts to be
shared on the user’s behalf.
MI01:P05:Explicit confirmation is the most basic form of confirmation. It also slows the conversation flow because it introduces an extra prompt to explicitly confirm
information that the user provided. Use explicit confirmation for situations where the cost of a misunderstanding is high. For example, in a flight booking
application, the application must understand the cities that the user wishes to fly between. The following shows an explicit confirmation interaction.
MI01:P05:Implicit confirmation combines the confirmation with your next question. This method uses fewer prompts than explicit confirmation. Consider a flight
booking scenario where the skill obtains the city that the user is flying from, followed by the date.
grammar for implicit confirmation interaction is subtly different from the grammar for explicit confirmation. The grammar for implicit confirmation
combines acceptance or denial of the previous prompt ﴾in this case, the city﴿ with supplying information for the next prompt.
MI01:P4:Think about where in the conversation flow the users need confirmations.A good voice-based skill uses a variety of techniques for confirmation and correction. The techniques depend on the style of the skill, the importance of the action being performed, the cost of misunderstanding, and the need for a natural dialog.For example, a dialog that follows each question with a confirmation, such as "Did you say X?", is slow and potentially very frustrating. Conversely, a dialog that employs no confirmation and, based on a misrecognized command, deletes data without first checking with the user, is equally frustrating. A developer must strike a balance between efficient interaction with the skill and protection from wasted time or lost data.
In many cases, the cost of misrecognition is so low that confirmation is not warranted. In other cases, explicit confirmation is always required, regardless of the skill's confidence in the user's utterance.
Limit use of earconsG22:P1:Earcons impose cognitive load because users have to learn what they mean; i.e., they’re not intuitive.
Limit use to just a few sounds that are easily distinguishable so that users don’t have to learn too many
Use them in moderation, or they can quickly become overwhelming
Use them consistently so that users learn to associate the sound with its context
G22:P1: There’s a risk that users might make the wrong association and assume an earcon means something different than intended.G22:P1:All sounds should align with your brand, complement your persona, and feel like a coherent set. Generally, earcons are as brief as
possible; however, when used as greetings, they can be a bit longer.
G22:P02:If you feel like you have to teach users what an earcon
means, don’t use an earcon.
Conversation is intuitive and efficient on its own. Earcons
often don’t add value, but instead add cognitive load. Users
have to process the instructions and remember them for
Make Conversations Relational
Maintain Friendly ConversationG14:P06:Niceties make responses feel distant and formal. Ditch them
to keep the conversation friendly and informal.
"G15:P02: Exclamation points
Avoid exclamation points as they can be perceived as
AM03:P19:Users can say anything in a voice interface, and it is important to gracefully handle errors and
guide users back to the skill. For use cases that aren’t yet supported, say something like “The Trivia Mania challenge can’t
help you with that yet.”
Maintain transparency and take responsibility for errors
AM03:P19: While errors are uncommon, they can be a source of confusion. When possible, let the user know what the error was and
avoid using technical jargon.
AM03:P19: If the error is likely to be present for only a few seconds, tell the user to try again. Otherwise, avoid encouraging the user because the user may encounter the same error. Consider a specific message like “Your smart lock isn’t responding right now.”G24:P15: system error: Evaluate every system your Action depends on and account
for all possible errors that could be encountered. Where
possible, provide the reason and possible next steps in a way
that’s transparent, honest, and helpful.
Try to be transparent without being overly technical. See if
there are any next steps you might offer.
G18:P01:Furthermore, your persona should take responsibility, never
blame the user, and never blame another party. People think less of individuals who blame others for failure.
G18:P06: Never Blame the user. Provide clear motivation for any actions you want the user to
take. Tell the user why they might want to do something
before telling them how to do it.
G18:P06: Never Blame another party. Your persona should take responsibility for not being able to
fulfill the user’s request, even when it’s out of your control.
G18:P01:It’s okay to use “sorry” when it serves a transitional social or phatic function and is not a full-fledged, heartfelt apology. If you can
remove “sorry” without changing the meaning, then the function is transitional. For example, the reprompts “Sorry, for how many
people?” and “For how many people?” convey the same meaning.s
G18:P01“Sorry” is most helpful in no match prompts to make it clear to the user that your persona couldn't understand or interpret their
response in context.
G18:P01:overuse. For system errors, avoid saying “sorry” when it’s not your persona’s fault.G18:P05:Acknowledge instead of apologizing.Simply make the correction and move on without focusing on
the error.
Give Users a Sense of ControlG19:P01:Teaching commands discourages experimentation and undermines trust. The implied message is that users have to say these exact phrases or they won’t be understood. In other words, the interface is not intuitive and the grammar is limited.G05:p03:Check box: Users are comfortable talking and typing about this topic. Conversation lets users speak freely. Spoken conversations are best in private spaces or familiar shared spaces. Written conversations are best for personal devices.AM03:P2: Inspire users to say what they want naturally. Don’t prompt with a menu of options. Instead, let the user know what’s possible and guide the user toward productive input.AG01:P05:A good voice conversation content design should help the user to complete her intentions without following a rigid dialogue script.MI01:P1:A great user experience does not require users to talk too much, repeat themselves, or explain things that the skill should automatically know.G27:P01:The magic and art of good conversation design is that users feel like they’re in control and that they can say anything at anytime,but in reality, the dialog directs them along pre-scripted paths.G27:P02:Wide-focus question
Best for questions about domains that are familiar to the user and
therefore are easy to answer.
Narrow-focus question
Best for questions about complex or unfamiliar domains, or when
options are limited or unclear.
G19:P01:Your persona should leverage the power
of natural language understanding to adapt to the user’s word choices, instead of forcing the user to memorize a set of
commands. It’s easier, and more natural, for users to respond to a narrow-focus question (e.g., “Do you want to hear some more
options?”) than to be taught what to say (e.g., “To hear more options, say ‘continue’.”).
MI01:P4:A wide variety of users use the skill, or they use it on an infrequent basis. For example, a call center application is best suited to use directed prompts.
A directed prompt lists specific choices for the user. For example, "Please
select cheese, pepperoni, or sausage." Directed prompts often minimize user confusion.
MI01:P4:Good uses of a directed prompt are when:There is never more than three options
MI01:P4:For directed prompts, use the form, "Please select X, Y, or Z." Don't use the form, "Would you like X or do you want Y or Z," because it may lead to Yes or No response instead of an X or Y or Z response.
AM03:P5: Organize your responses and prompts so that the user has a clear choice to make. Open-ended questions can confuse the user orn cause the user to answer in ways that you’re not expecting or supporting. For example, asking “What would you like?” is too open-
ended. Even something like “Would you like Brie or Gouda?” opens up a likely response of “Yes.”
G27:P02: When designing a question, think about where it should fall on the continuum from wide to narrow focus. Consider the pros and cons in the table below.MI01:P4:If the list of options is long (for example, a list of stock investments) or variable (for example, movie titles), using a directed prompt is impractical. In this case, use an open prompt.AM02:p4:Sometimes people make corrections when they know that Alexa got something wrong or when they change their minds. For
example, a user might say something like “no,” or “I said,” followed by a valid utterance. Be prepared to handle these properly.
G25:P01:Your Action has to introduce itself and make a good first impression by showing value. The goal is to
make the user feel confident and in control as quickly as possible, so it’s important to help users
discover what they can do with your Action without making it feel like a tutorial.
G23:P04:Let users quit before finishing a task.
Don’t double-check unless significant progress will be lost.
Note that "exit", "cancel", "stop", "nevermind", and "goodbye"
are supported by default, so if users say them, your Action
will end. See App Exits for more information.
G19:P02; Say what user can do thansayG19:P02; Use verb phrases to indicate actions the user can
take. Users will be cooperative and echo them.
G14:P1:Focus on the user.

Make the user the center of attention, not your persona. User-
focused text keeps the conversation on track. It's more crisp

and to-the-point.