with Azure Cognitive Services �and Ably Realtime
WEARABLE �LIVE CAPTIONS
#1c4966
Hello � Build� Stuff!
Jo Franchetti
DevRel at Ably
@thisisjofrank
@ablyrealtime
#1c4966
MDD
Momma Franchetti
Mum Driven Design
#1c4966
INSPIRATION STRUCK!
At Cyberdog in Camden of all places
#1c4966
INSPIRATION STRUCK!
At Cyberdog in Camden of all places
#1c4966
INSPIRATION STRUCK!
At Cyberdog in Camden of all places
#1c4966
LET’S BUILD IT!
But what will we need?
#1c4966
DISPLAY
MICROPHONE
A small, flexible array of addressable LEDs
SPEECH TO TEXT
IOT PROTOCOL
A device to pick up speech
Software to process the audio and convert it into text
Data transfer between the software and the display
#1c4966
THE DISPLAY
#1c4966
THE ESP8266
#1c4966
THE MICROPHONE
Everyone has one
We need to process the data, which is a little intensive for the microprocessor
Why not use the one on our phones?
#1c4966
SPEECH �TO TEXT
WEB
APP
+
AZURE
COGNITIVE
SERVICES
Build a web app with HTML, CSS and JS which will run on a phone or computer
Requires an internet connection and a microphone
Use Web APIs to get data from the mic
Send to Azure Cognitive Services to get text back
https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
#1c4966
NODE JS APP
LED MATRIX DRIVER
ACS SPEECH SDK
REALTIME MESSAGING
THE WEB APP
https://github.com/ably-labs�/live-caption-demo
https://npmjs.com/package�/@snakemode/matrix-driver
https://npmjs.com/package/microsoft�-cognitiveservices-speech-sdk
https://npmjs.com/package/ably
#1c4966
#1c4966
USING ACS SPEECH SDK
public async streamSpeechFromBrowser() {
const speechConfig = await this.getConfig();
const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
recognizer.recognized = async (_, speechResult) => {
const text = speechResult.privResult.privText ?? "";
this._callback(text);
};
recognizer.startContinuousRecognitionAsync(
function () { },
function (error) { console.log(error); }
);
}
#1c4966
USING ACS SPEECH SDK
public async streamSpeechFromBrowser() {
const speechConfig = await this.getConfig();
const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
recognizer.recognized = async (_, speechResult) => {
const text = speechResult.privResult.privText ?? "";
this._callback(text);
};
recognizer.startContinuousRecognitionAsync(
function () { },
function (error) { console.log(error); }
);
}
#1c4966
USING ACS SPEECH SDK
const speech = new AzureCognitiveSpeech("/api/createAzureTokenRequest");
speech.onTextRecognised((text) => {
ledDriver.text.scroll(text, colorValue, speedValue);
});
#1c4966
SENDING DATA TO THE BOARD
+
#1c4966
THE ESP8266
#1c4966
AN EXAMPLE COMMAND
const textMessage = {
value: "My line of text",
mode: 1,
scrollSpeedMs: 25,
color: { r: 255, g: 255, b: 255 }
};
#1c4966
AN EXAMPLE COMMAND
0 | 0x11 | ASCII DC1 - Device Control 1 |
1 | 0x54 | The ASCII character T - “text mode” |
2 | 0x01 | The value 1 - “scrolling text” |
3 | 0x00-0xFF | The scroll interval, in ms. A number between 0-255 in hexadecimal |
4 | 0x00-0xFF | The R component of the RGB text colour |
5 | 0x00-0xFF | The G component of the RGB text colour |
6 | 0x00-0xFF | The B component of the RGB text colour |
7 | 0x02 | ASCII STX - “start of text” |
8 | 0x02-0x7E | The ASCII code for each character |
9 | 0x03 | ASCII ETX - “end of text” |
10 | 0x04 | ASCII EOT - “end of transmission” |
#1c4966
AN EXAMPLE COMMAND
0 | 0x11 | ASCII DC1 - Device Control 1 |
1 | 0x54 | The ASCII character T - “text mode” |
2 | 0x01 | The value 1 - “scrolling text” |
3 | 0x00-0xFF | The scroll interval, in ms. A number between 0-255 in hexadecimal |
4 | 0x00-0xFF | The R component of the RGB text colour |
5 | 0x00-0xFF | The G component of the RGB text colour |
6 | 0x00-0xFF | The B component of the RGB text colour |
7 | 0x02 | ASCII STX - “start of text” |
8 | 0x02-0x7E | The ASCII code for each character |
9 | 0x03 | ASCII ETX - “end of text” |
10 | 0x04 | ASCII EOT - “end of transmission” |
#1c4966
THE ABLY MQTT BROKER
const ablyClient = new Ably.Realtime.Promise({ authUrl: "/api/createTokenRequest" });
const ledDriver = new RemoteMatrixLedDriver({
displayConfig: { width: 28, height: 8 },
deviceAdapter: new ArduinoDeviceAdapter([new AblyTransport(ablyClient)])
});
#1c4966
THE ABLY MQTT BROKER
const ablyClient = new Ably.Realtime.Promise({ authUrl: "/api/createTokenRequest" });
const ledDriver = new RemoteMatrixLedDriver({
displayConfig: { width: 28, height: 8 },
deviceAdapter: new ArduinoDeviceAdapter([new AblyTransport(ablyClient)])
});
#1c4966
THE ENTIRE SYSTEM
#1c4966
THE DISPLAY DRIVER
ledDriver.pixel.set({ x: 5, y: 5, color: "#FFFFFF" });
ledDriver.pixel.set([{ x: 1, color: "#ff0000" }, { x: 2 }]);
ledDriver.pixel.setAll(
[
" ",
" ",
" # ",
" ",
" # ",
" ",
" ",
" "
], "#FFFFFF", true
);
#1c4966
THE DISPLAY DRIVER
ledDriver.image.set("./test.png");
ledDriver.text.scroll("The quick brown fox jumped over the lazy dog");
ledDriver.text.scroll("The quick brown fox jumped over the lazy dog",� "#FF0000", 25);
#1c4966
CREATING A PIXEL ‘FONT’
#1c4966
Convert the pixels to binary values.��filled = 1� empty = 0
CREATING A PIXEL ‘FONT’
[0, 0, 0, 0,
0, 0, 0, 0,
1, 0, 1, 0,
1, 1, 0, 1,
1, 0, 0, 1,
1, 0, 0, 1,
1, 0, 0, 1,
0, 0, 0, 0];
#1c4966
CREATING A PIXEL ‘FONT’
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0];
JIMP to �the rescue
https://www.npmjs.com/package/jimp
#1c4966
CREATING A PIXEL ‘FONT’
#1c4966
HARDWARE DIFFERENCES
#1c4966
CONFIGURING
THE ARDUINO
// Display config
const int display_gpio_pin = 4;
const int display_width = 32;
const int display_height = 8;
const index_mode display_connector_location = index_mode::TOP_LEFT;
const carriage_return_mode line_wrap = carriage_return_mode::SNAKED_VERTICALLY;
const neoPixelType neopixel_type = NEO_GRB + NEO_KHZ800;
#1c4966
PROCESSING THE MESSAGES
auto controlCode = (int)bytes[0];
if (controlCode != DeviceControl1) {
Serial.println("Not a device control command.");
return;
}
auto controlMode = bytes[1];
switch (controlMode) {
case P_PixelMode:
processSetPixelsMessage(bytes, length);
return;
case C_ControlMessage:
processControlMessage(bytes, length);
return;
case T_TextMode:
processSetTextMessage(bytes, length);
return;
}
#1c4966
PROCESSING THE MESSAGES
void message_processor::processSetTextMessage(const char *bytes, int length)
{
auto messageModeByte = (int)bytes[2];
auto scrollSpeedMs = (int)bytes[3];
auto r = (int)bytes[4];
auto g = (int)bytes[5];
auto b = (int)bytes[6];
auto shouldScroll = messageModeByte == 1;
drawEntireString(bytes, length, 0, r, g, b);
}
#1c4966
WRITING TEXT TO THE DISPLAY
void message_processor::drawEntireString(const char *bytes, int length, int shiftXposBy, int r, int g, int b)
{
auto textStart = search_utils::indexOf(bytes, length, STX_StartOfText) + 1;
auto footerLength = 2;
auto spaceWidth = 1;
auto xPosition = shiftXposBy;
for (auto i = textStart; i < length - footerLength; i++) {
auto asciiCode = bytes[i];
auto spriteAndWidth = sprites.spriteDataFor(asciiCode);
drawSpriteAtPosition(spriteAndWidth.spriteData, spriteAndWidth.width, xPosition, r, g, b);
xPosition += spriteAndWidth.width + spaceWidth;
}
lights_->flush();
}
#1c4966
SCROLLING
THE TEXT
ON THE
DISPLAY
#1c4966
SCROLLING THE TEXT
if (shouldScroll) {
auto maxWidthOffLeft = sprites.getTotalWidthOfText(bytes, length) * -1;
auto scrollPosition = cfg_->display.width;
while (scrollPosition > maxWidthOffLeft) {
lights_->clearWithoutFlush();
drawEntireString(bytes, length, scrollPosition, r, g, b);
scrollPosition--;
delay(scrollSpeedMs);
}
} else {
drawEntireString(bytes, length, 0, r, g, b);
}
#1c4966
SCROLLING THE TEXT
if (shouldScroll) {
auto maxWidthOffLeft = sprites.getTotalWidthOfText(bytes, length) * -1;
auto scrollPosition = cfg_->display.width;
while (scrollPosition > maxWidthOffLeft) {
lights_->clearWithoutFlush();
drawEntireString(bytes, length, scrollPosition, r, g, b);
scrollPosition--;
delay(scrollSpeedMs);
}
} else {
drawEntireString(bytes, length, 0, r, g, b);
}
#1c4966
IT WORKS!
#1c4966
NO DISPLAY?
NO PROBLEM!
https://live-caption.ably.dev/
#1c4966
NO DISPLAY?
NO PROBLEM!
#1c4966
MAKE IT YOUR OWN
github.com/ably-labs/live-caption-demo
#1c4966
THANK
YOU
FOR
LISTENING
Jo Franchetti
DevRel at Ably
@thisisjofrank
@ablyrealtime
#1c4966