1 of 16

Voice HTML

Challenges and Solutions

Mozilla Voice Recognition Hacks, 2017-01-28...29

<VOICE>

</VOICE>

2 of 16

Component View

<VOICE>

</VOICE>

3 of 16

Problem facets

We focus on the aspect of �free and open discoverability�of voice services

There will be solution provider for �trust / user rating of voice services

There can be different transports �for the service metadata.�Embedding data with a microformat�in web pages is a standard approach

<VOICE>

</VOICE>

4 of 16

Sample Workflow

Service discovery by assistant via registries

User Interaction with voice service

Handover

<VOICE>

</VOICE>

5 of 16

Registries are an open and free approach

<VOICE>

</VOICE>

6 of 16

Use schema.org (or other RDF vocabularies)

To describe services:

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Mario's Pizzaria",

"potentialAction": {

"@type": "QuestionAction",

"about": "Pizza"

}

},

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Maria's Pizzaria",

"address": {

"@type": "PostalAddress",

"addressLocality": "Mexico Beach",

"addressRegion": "FL",

"streetAddress": "3102 Highway 98"

},

"potentialAction": {

"@type": "QuestionAction",

"about": "Pizza"

}

}

{

"@context": "http://schema.org",

"@type": "Restaurant",

"name": "Luigi's Pizzaria",

"potentialAction": [

{

"@type": "QuestionAction",

"about": "Pizza"

},

{

"@type": "QuestionAction",

"about": "Pepperoni Pizza"

},

{

"@type": "QuestionAction",

"about": "Margherita Pizza"

},

{

"@type": "QuestionAction",

"about": "Calabrese Pizza"

}

]

}

<VOICE>

</VOICE>

7 of 16

Two Levels of Voice HTML

Voice Service Metadata in a registry

  • Service discovery information used for search, filtering, discovery
    • Properties, Topics, Tags, Intents, Description, ….
    • URI of voice service site
  • RDF/Schema.org (in Microformat in HTML or JSON-LD, can be parsed / used by different user agents)

Voice Service Detail Data at Site of Voice Service Provider

  • Full Service Details
    • Properties, Topics, Tags, Intents
    • actual service URI
  • RDF/Schema.org (in Microformat in HTML or JSON-LD, can be parsed / used by different user agents

<VOICE>

</VOICE>

8 of 16

Voice Service metadata in a registry

allows to filter and search for services

  • kind of service
  • short description/documentation for service
    • for different locales, for different ages, for different user capabilities or impairments, …
  • properties & restrictions of service
    • especially location/area of service
    • authentication and authorization options of service
    • payment options of service

All of this data should have a long validity because this data may be contained in several registries where the update process and intervall is not in control of the service provider

<VOICE>

</VOICE>

9 of 16

Voice Service detail data

allows to get detailed information about the service

  • kind of service
  • description/documentation for service
    • for different locales, for different ages, for different user capabilities or impairments, …
    • links to documentation (pdf, text, video, formal: Interaction Flow Modeling Language, state machine description, VoiceXML, UML-diagrams, … . Discuss: a formal description of a voice interaction flow may be difficult/impossible to generate or not wanted for a AI-based, multi tier, complex, game interaction and may be difficult to present to users especially via voice.)
  • central properties and restrictions of service
    • especially location/area of service
    • authentication and authorization options of service
    • payment options of service

Data may be updated frequently at service provider site

<VOICE>

</VOICE>

10 of 16

How are the voice service registries filled

Look back at the history of the web and expect parallels

  • Device Local registry
    • Preset Favorites on devices. Manufacturers provided/built in lists of voice services.
    • User adds own lists of preferred voice services.
    • Syncing of voice service lists between different devices of users
    • Sharing of lists between users
  • Remote registries
    • Manually curated lists of voice services
    • Manually curated, structured lists of voice services: like DMOZ for websites
    • Service providers ask registries and search engines to include their voice services
    • Search engines crawl the web and provide lists of voice services

Other options for voice service discovery may include: DNS, peer2peer exchange of lists, ….

<VOICE>

</VOICE>

11 of 16

Relation / difference between Voice Service description in Registry and at service provider

  • static parts shall not differ
  • service description in registry is subset of description at service provider
  • use established (semantic web) techniques to write the service description
    • schema.org
  • descriptions should both be easily parsable and usable for different kinds of user agents
    • pure voice user agents with assistant software or browser
    • small screen user agents (watches) with assistant software or browser
    • TV browser or assistant software
    • Mobile phone browser or assistant software
    • Laptop browser or e.g. powerful assistant software
    • Desktop browser or e.g. really powerful assistant software

<VOICE>

</VOICE>

12 of 16

Further requirements for Voice HTML

Has to be accepted by service provider

  • focus on relevant information
  • easy to provide and maintain
  • capable to integrate existing solution
    • linking, use established file format and transports, locale handling, ….
  • extendable for special use cases
    • use extendable standard format to provide needed information
  • must allow voice service handover
    • voice service providers will provide a sets of services
    • voice html for additional has to be integrated in voice service output/interaction

<VOICE>

</VOICE>

13 of 16

Further requirements for Voice HTML

Has to be accepted by service users / user agent providers

  • focus on relevant information
  • easy to parse and use
  • capable to integrate existing solutions
    • linking, use established file format and transports, locale handling, ….
  • capable to handle special use cases

<VOICE>

</VOICE>

14 of 16

Voice Service access: assistant or browser �as basic user agent - I

  • Voice HTML must allow linking Web Services from Web pages
    • Use a Browser to start interaction with a Voice Service
    • Registries can be implemented as Webpages
  • Voice HTML must be usable by assistant software
    • Parse Registries
    • Parse Websites
  • User can interact directly with voice service, Browser with TTS and STT is usable as basic User Agent
    • Browser with add-ons can act as assistant

<VOICE>

</VOICE>

15 of 16

Voice Service access: assistant or browser �as basic user agent - II

  • Enable handover between assisted and direct user interaction with voice service
    • Interaction with a Voice Service does not need an assistant software
    • Interaction with a Voice Service may be enhanced by an assistant software

  • Standard Pattern
    • Assistant Software uses registries to select voice service based on �personal preferences and service properties
    • Assistant Software hands over voice interaction with service to user

<VOICE>

</VOICE>

16 of 16

Questions

Find this presentation at: https://goo.gl/ttzdwW�Find more Information at https://public.etherpad-mozilla.org/p/voice-hackz

<VOICE>

</VOICE>