1 of 16

Voice HTML

Challenges and Solutions

Mozilla Voice Recognition Hacks, 2017-01-28...29

<VOICE>

</VOICE>

2 of 16

Component View

<VOICE>

</VOICE>

3 of 16

Problem facets

We focus on the aspect of �free and open discoverability�of voice services

There will be solution provider for �trust / user rating of voice services

There can be different transports �for the service metadata.�Embedding data with a microformat�in web pages is a standard approach

<VOICE>

</VOICE>

4 of 16

Sample Workflow

Service discovery by assistant via registries

User Interaction with voice service

Handover

<VOICE>

</VOICE>

5 of 16

Registries are an open and free approach

<VOICE>

</VOICE>

6 of 16

Use schema.org (or other RDF vocabularies)

To describe services:

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Mario's Pizzaria",

"potentialAction": {

"@type": "QuestionAction",

"about": "Pizza"

}

},

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Maria's Pizzaria",

"address": {

"@type": "PostalAddress",

"addressLocality": "Mexico Beach",

"addressRegion": "FL",

"streetAddress": "3102 Highway 98"

},

"potentialAction": {

"@type": "QuestionAction",

"about": "Pizza"

}

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Luigi's Pizzaria",

"potentialAction": [

{

"@type": "QuestionAction",

"about": "Pizza"

},

{

"@type": "QuestionAction",

"about": "Pepperoni Pizza"

},

{

"@type": "QuestionAction",

"about": "Margherita Pizza"

},

{

"@type": "QuestionAction",

"about": "Calabrese Pizza"

}

]

}

<VOICE>

</VOICE>

7 of 16

Two Levels of Voice HTML

Voice Service Metadata in a registry

Service discovery information used for search, filtering, discovery

Properties, Topics, Tags, Intents, Description, ….
URI of voice service site

RDF/Schema.org (in Microformat in HTML or JSON-LD, can be parsed / used by different user agents)

Voice Service Detail Data at Site of Voice Service Provider

Full Service Details

Properties, Topics, Tags, Intents
actual service URI

RDF/Schema.org (in Microformat in HTML or JSON-LD, can be parsed / used by different user agents

<VOICE>

</VOICE>

8 of 16

Voice Service metadata in a registry

allows to filter and search for services

kind of service
short description/documentation for service

for different locales, for different ages, for different user capabilities or impairments, …

properties & restrictions of service

especially location/area of service
authentication and authorization options of service
payment options of service

All of this data should have a long validity because this data may be contained in several registries where the update process and intervall is not in control of the service provider

<VOICE>

</VOICE>

9 of 16

Voice Service detail data

allows to get detailed information about the service

kind of service
description/documentation for service

for different locales, for different ages, for different user capabilities or impairments, …
links to documentation (pdf, text, video, formal: Interaction Flow Modeling Language, state machine description, VoiceXML, UML-diagrams, … . Discuss: a formal description of a voice interaction flow may be difficult/impossible to generate or not wanted for a AI-based, multi tier, complex, game interaction and may be difficult to present to users especially via voice.)

central properties and restrictions of service

especially location/area of service
authentication and authorization options of service
payment options of service

Data may be updated frequently at service provider site

<VOICE>

</VOICE>

10 of 16

How are the voice service registries filled

Look back at the history of the web and expect parallels

Device Local registry

Preset Favorites on devices. Manufacturers provided/built in lists of voice services.
User adds own lists of preferred voice services.
Syncing of voice service lists between different devices of users
Sharing of lists between users

Remote registries

Manually curated lists of voice services
Manually curated, structured lists of voice services: like DMOZ for websites
Service providers ask registries and search engines to include their voice services
Search engines crawl the web and provide lists of voice services

Other options for voice service discovery may include: DNS, peer2peer exchange of lists, ….

<VOICE>

</VOICE>

11 of 16

Relation / difference between Voice Service description in Registry and at service provider

static parts shall not differ
service description in registry is subset of description at service provider
use established (semantic web) techniques to write the service description

schema.org

descriptions should both be easily parsable and usable for different kinds of user agents

pure voice user agents with assistant software or browser
small screen user agents (watches) with assistant software or browser
TV browser or assistant software
Mobile phone browser or assistant software
Laptop browser or e.g. powerful assistant software
Desktop browser or e.g. really powerful assistant software

<VOICE>

</VOICE>

12 of 16

Further requirements for Voice HTML

Has to be accepted by service provider

focus on relevant information
easy to provide and maintain
capable to integrate existing solution

linking, use established file format and transports, locale handling, ….

extendable for special use cases

use extendable standard format to provide needed information

must allow voice service handover

voice service providers will provide a sets of services
voice html for additional has to be integrated in voice service output/interaction

<VOICE>

</VOICE>

13 of 16

Further requirements for Voice HTML

Has to be accepted by service users / user agent providers

focus on relevant information
easy to parse and use
capable to integrate existing solutions

linking, use established file format and transports, locale handling, ….

capable to handle special use cases

<VOICE>

</VOICE>

14 of 16

Voice Service access: assistant or browser �as basic user agent - I

Voice HTML must allow linking Web Services from Web pages

Use a Browser to start interaction with a Voice Service
Registries can be implemented as Webpages

Voice HTML must be usable by assistant software

Parse Registries
Parse Websites

User can interact directly with voice service, Browser with TTS and STT is usable as basic User Agent

Browser with add-ons can act as assistant

<VOICE>

</VOICE>

15 of 16

Voice Service access: assistant or browser �as basic user agent - II

Enable handover between assisted and direct user interaction with voice service

Interaction with a Voice Service does not need an assistant software
Interaction with a Voice Service may be enhanced by an assistant software

Standard Pattern

Assistant Software uses registries to select voice service based on �personal preferences and service properties
Assistant Software hands over voice interaction with service to user

<VOICE>

</VOICE>

16 of 16

Questions

Find this presentation at: https://goo.gl/ttzdwW�Find more Information at https://public.etherpad-mozilla.org/p/voice-hackz

<VOICE>

</VOICE>