1 of 44

with Azure Cognitive Services �and Ably Realtime

WEARABLE �LIVE CAPTIONS

#1c4966

2 of 44

Hello � Build� Stuff!

Jo Franchetti

DevRel at Ably

@thisisjofrank

@ablyrealtime

#1c4966

3 of 44

MDD

Momma Franchetti

Mum Driven Design

#1c4966

4 of 44

INSPIRATION STRUCK!

At Cyberdog in Camden of all places

#1c4966

5 of 44

INSPIRATION STRUCK!

At Cyberdog in Camden of all places

#1c4966

6 of 44

INSPIRATION STRUCK!

At Cyberdog in Camden of all places

#1c4966

7 of 44

LET’S BUILD IT!

But what will we need?

#1c4966

8 of 44

DISPLAY

MICROPHONE

A small, flexible array of addressable LEDs

SPEECH TO TEXT

IOT PROTOCOL

A device to pick up speech

Software to process the audio and convert it into text

Data transfer between the software and the display

#1c4966

9 of 44

THE DISPLAY

  • A 28 x 8 array of neopixels (very small addressable LEDs)
  • Powered by a rechargeable USB battery
  • Controlled by an ESP8266 microcontroller

#1c4966

10 of 44

THE ESP8266

  • Measures 5cm x 2cm
  • 80MHz processor with 3.3V logic/power
  • USB Power
  • WiFi on board

#1c4966

11 of 44

THE MICROPHONE

Everyone has one

We need to process the data, which is a little intensive for the microprocessor

Why not use the one on our phones?

#1c4966

12 of 44

SPEECH �TO TEXT

WEB

APP

+

AZURE

COGNITIVE

SERVICES

Build a web app with HTML, CSS and JS which will run on a phone or computer

Requires an internet connection and a microphone

Use Web APIs to get data from the mic

Send to Azure Cognitive Services to get text back

https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/

#1c4966

13 of 44

NODE JS APP

LED MATRIX DRIVER

ACS SPEECH SDK

REALTIME MESSAGING

THE WEB APP

https://github.com/ably-labs�/live-caption-demo

https://npmjs.com/package�/@snakemode/matrix-driver

https://npmjs.com/package/microsoft�-cognitiveservices-speech-sdk

https://npmjs.com/package/ably

#1c4966

14 of 44

#1c4966

15 of 44

USING ACS SPEECH SDK

public async streamSpeechFromBrowser() {

const speechConfig = await this.getConfig();

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

recognizer.recognized = async (_, speechResult) => {

const text = speechResult.privResult.privText ?? "";

this._callback(text);

};

recognizer.startContinuousRecognitionAsync(

function () { },

function (error) { console.log(error); }

);

}

#1c4966

16 of 44

USING ACS SPEECH SDK

public async streamSpeechFromBrowser() {

const speechConfig = await this.getConfig();

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

recognizer.recognized = async (_, speechResult) => {

const text = speechResult.privResult.privText ?? "";

this._callback(text);

};

recognizer.startContinuousRecognitionAsync(

function () { },

function (error) { console.log(error); }

);

}

#1c4966

17 of 44

USING ACS SPEECH SDK

const speech = new AzureCognitiveSpeech("/api/createAzureTokenRequest");

speech.onTextRecognised((text) => {

ledDriver.text.scroll(text, colorValue, speedValue);

});

#1c4966

18 of 44

SENDING DATA TO THE BOARD

  • MQTT is the protocol designed for IoT devices
  • It is lightweight and high latency
  • To use in the web we require an MQTT broker
  • For this project we’ll use Ably’s MQTT broker

+

#1c4966

19 of 44

THE ESP8266

  • Measures 5cm x 2cm
  • 80MHz processor with 3.3V logic/power
  • USB Power
  • WiFi on board

#1c4966

20 of 44

AN EXAMPLE COMMAND

const textMessage = {

value: "My line of text",

mode: 1,

scrollSpeedMs: 25,

color: { r: 255, g: 255, b: 255 }

};

#1c4966

21 of 44

AN EXAMPLE COMMAND

0

0x11

ASCII DC1 - Device Control 1

1

0x54

The ASCII character T - “text mode”

2

0x01

The value 1 - “scrolling text”

3

0x00-0xFF

The scroll interval, in ms. A number between 0-255 in hexadecimal

4

0x00-0xFF

The R component of the RGB text colour

5

0x00-0xFF

The G component of the RGB text colour

6

0x00-0xFF

The B component of the RGB text colour

7

0x02

ASCII STX - “start of text”

8

0x02-0x7E

The ASCII code for each character

9

0x03

ASCII ETX - “end of text”

10

0x04

ASCII EOT - “end of transmission”

#1c4966

22 of 44

AN EXAMPLE COMMAND

0

0x11

ASCII DC1 - Device Control 1

1

0x54

The ASCII character T - “text mode”

2

0x01

The value 1 - “scrolling text”

3

0x00-0xFF

The scroll interval, in ms. A number between 0-255 in hexadecimal

4

0x00-0xFF

The R component of the RGB text colour

5

0x00-0xFF

The G component of the RGB text colour

6

0x00-0xFF

The B component of the RGB text colour

7

0x02

ASCII STX - “start of text”

8

0x02-0x7E

The ASCII code for each character

9

0x03

ASCII ETX - “end of text”

10

0x04

ASCII EOT - “end of transmission”

#1c4966

23 of 44

THE ABLY MQTT BROKER

const ablyClient = new Ably.Realtime.Promise({ authUrl: "/api/createTokenRequest" });

const ledDriver = new RemoteMatrixLedDriver({

displayConfig: { width: 28, height: 8 },

deviceAdapter: new ArduinoDeviceAdapter([new AblyTransport(ablyClient)])

});

#1c4966

24 of 44

THE ABLY MQTT BROKER

const ablyClient = new Ably.Realtime.Promise({ authUrl: "/api/createTokenRequest" });

const ledDriver = new RemoteMatrixLedDriver({

displayConfig: { width: 28, height: 8 },

deviceAdapter: new ArduinoDeviceAdapter([new AblyTransport(ablyClient)])

});

#1c4966

25 of 44

THE ENTIRE SYSTEM

#1c4966

26 of 44

THE DISPLAY DRIVER

ledDriver.pixel.set({ x: 5, y: 5, color: "#FFFFFF" });

ledDriver.pixel.set([{ x: 1, color: "#ff0000" }, { x: 2 }]);

ledDriver.pixel.setAll(

[

" ",

" ",

" # ",

" ",

" # ",

" ",

" ",

" "

], "#FFFFFF", true

);

#1c4966

27 of 44

THE DISPLAY DRIVER

ledDriver.image.set("./test.png");

ledDriver.text.scroll("The quick brown fox jumped over the lazy dog");

ledDriver.text.scroll("The quick brown fox jumped over the lazy dog",� "#FF0000", 25);

#1c4966

28 of 44

CREATING A PIXEL ‘FONT’

#1c4966

29 of 44

Convert the pixels to binary values.��filled = 1� empty = 0

CREATING A PIXEL ‘FONT’

[0, 0, 0, 0,

0, 0, 0, 0,

1, 0, 1, 0,

1, 1, 0, 1,

1, 0, 0, 1,

1, 0, 0, 1,

1, 0, 0, 1,

0, 0, 0, 0];

#1c4966

30 of 44

CREATING A PIXEL ‘FONT’

[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0];

JIMP to �the rescue

https://www.npmjs.com/package/jimp

#1c4966

31 of 44

CREATING A PIXEL ‘FONT’

#1c4966

32 of 44

HARDWARE DIFFERENCES

#1c4966

33 of 44

CONFIGURING

THE ARDUINO

// Display config

const int display_gpio_pin = 4;

const int display_width = 32;

const int display_height = 8;

const index_mode display_connector_location = index_mode::TOP_LEFT;

const carriage_return_mode line_wrap = carriage_return_mode::SNAKED_VERTICALLY;

const neoPixelType neopixel_type = NEO_GRB + NEO_KHZ800;

#1c4966

34 of 44

PROCESSING THE MESSAGES

auto controlCode = (int)bytes[0];

if (controlCode != DeviceControl1) {

Serial.println("Not a device control command.");

return;

}

auto controlMode = bytes[1];

switch (controlMode) {

case P_PixelMode:

processSetPixelsMessage(bytes, length);

return;

case C_ControlMessage:

processControlMessage(bytes, length);

return;

case T_TextMode:

processSetTextMessage(bytes, length);

return;

}

#1c4966

35 of 44

PROCESSING THE MESSAGES

void message_processor::processSetTextMessage(const char *bytes, int length)

{

auto messageModeByte = (int)bytes[2];

auto scrollSpeedMs = (int)bytes[3];

auto r = (int)bytes[4];

auto g = (int)bytes[5];

auto b = (int)bytes[6];

auto shouldScroll = messageModeByte == 1;

drawEntireString(bytes, length, 0, r, g, b);

}

#1c4966

36 of 44

WRITING TEXT TO THE DISPLAY

void message_processor::drawEntireString(const char *bytes, int length, int shiftXposBy, int r, int g, int b)

{

auto textStart = search_utils::indexOf(bytes, length, STX_StartOfText) + 1;

auto footerLength = 2;

auto spaceWidth = 1;

auto xPosition = shiftXposBy;

for (auto i = textStart; i < length - footerLength; i++) {

auto asciiCode = bytes[i];

auto spriteAndWidth = sprites.spriteDataFor(asciiCode);

drawSpriteAtPosition(spriteAndWidth.spriteData, spriteAndWidth.width, xPosition, r, g, b);

xPosition += spriteAndWidth.width + spaceWidth;

}

lights_->flush();

}

#1c4966

37 of 44

SCROLLING

THE TEXT

ON THE

DISPLAY

#1c4966

38 of 44

SCROLLING THE TEXT

if (shouldScroll) {

auto maxWidthOffLeft = sprites.getTotalWidthOfText(bytes, length) * -1;

auto scrollPosition = cfg_->display.width;

while (scrollPosition > maxWidthOffLeft) {

lights_->clearWithoutFlush();

drawEntireString(bytes, length, scrollPosition, r, g, b);

scrollPosition--;

delay(scrollSpeedMs);

}

} else {

drawEntireString(bytes, length, 0, r, g, b);

}

#1c4966

39 of 44

SCROLLING THE TEXT

if (shouldScroll) {

auto maxWidthOffLeft = sprites.getTotalWidthOfText(bytes, length) * -1;

auto scrollPosition = cfg_->display.width;

while (scrollPosition > maxWidthOffLeft) {

lights_->clearWithoutFlush();

drawEntireString(bytes, length, scrollPosition, r, g, b);

scrollPosition--;

delay(scrollSpeedMs);

}

} else {

drawEntireString(bytes, length, 0, r, g, b);

}

#1c4966

40 of 44

IT WORKS!

#1c4966

41 of 44

NO DISPLAY?

NO PROBLEM!

https://live-caption.ably.dev/

#1c4966

42 of 44

NO DISPLAY?

NO PROBLEM!

#1c4966

43 of 44

MAKE IT YOUR OWN

github.com/ably-labs/live-caption-demo

#1c4966

44 of 44

THANK

YOU

FOR

LISTENING

Jo Franchetti

DevRel at Ably

@thisisjofrank

@ablyrealtime

#1c4966