2 of 50

Agenda

Salutations (5 minutes)
Introduction (5 minutes)
Element Capture Follow-up (20 minutes)
Remote-Control API (20 minutes)
Excluding picture-in-picture from screen-capture (20 minutes)
CapturedMouseEvent listener additon after gDM() (20 minutes)
Dynamically switching between sources (20 minutes)
Administrative matters (5 minutes)

3 of 50

Introduction

Welcome!
As per poll during last meeting, the current meeting will be recorded.
Participation in respective GitHub repos encouraged.

General: https://github.com/screen-share/discuss
Element Capture: https://github.com/screen-share/element-capture
CapturedMouseEvents: https://github.com/screen-share/mouse-events

4 of 50

Element Capture Follow-up

Jordan Bayles

Google�jophba@google.com

Mark Foltz

Google�mfoltz@google.com

5 of 50

What is Element Capture?

Recall region capture: it allows a video track captured from a tab to be cropped according to the bounds of some element on the page.

Proprietary + Confidential

6 of 50

Region Capture

Region Capture is a three step process:

1. Captured document creates a cropTarget for the content of interest.

const cropTarget =

await CropTarget.fromElement(mainContentArea);

Proprietary + Confidential

7 of 50

Region Capture

2. Application captures the tab with the embedded cropTarget.

const stream = await

navigator.mediaDevices.getDisplayMedia({

preferCurrentTab: true,

});

const [track] = stream.getVideoTracks();

Proprietary + Confidential

8 of 50

Region Capture

3. Application crops the track with the cropTarget, which will capture only the main content area.

await track.cropTo(cropTarget);

Proprietary + Confidential

9 of 50

What is Element Capture?

Proprietary + Confidential

10 of 50

Region Element Capture

3. Application crops the track with the cropTarget, which will capture only the main content area without any occluding content.

await track.restrictTo(cropTarget);

Proprietary + Confidential

11 of 50

Element Capture - What can be captured?

Element should form a "Stacking Context" - a new context to resolve z-index values

<div class="child"

style="z-index:-1"></div>

</div>

one

child

<div class="one"

style="isolation:isolate">

<div class="child"

style="z-index:-1"></div>

</div>

Proprietary + Confidential

12 of 50

Element Capture - What can be captured?

https://codepen.io/robinrendle/pen/LmzLEL

Element should form a "Backdrop Root" - ancestor elements can't mask or filter it

Proprietary + Confidential

13 of 50

Element Capture - What can be captured?

https://lab.hakim.se/domtree/

Element ancestor should not use 3D transforms

Proprietary + Confidential

14 of 50

Element Capture - Open questions

Applying these constraints to existing elements may change the rendering
We could force these constraints when a crop target is applied
We could force them when capture is started
We could not force them, and pause capture when they are not met

Proprietary + Confidential

15 of 50

Element Capture - Next Steps

Documenting constraints that make an element capturable (PR in review)
Refinements to API shape, feedback from developers
Understanding performance and rendering impacts
Targeting Origin Trial in Q4 2023

Proprietary + Confidential

16 of 50

Element Capture - Possible future work

Signaling back to the capturing application when elements become non-capturable.
Direct capture from elements, i.e. Element.captureStream().
Offscreen element capture.
Element-restricted audio capture.

Proprietary + Confidential

17 of 50

Remote-Control API

Elad Alon

Google

eladalon@google.com

18 of 50

Problem description

A user is in a video call and shares a tab.

How does the user…

…scroll the captured tab?
…change the zoom level?

If the user focused the captured tab…

…how would the user see�remote participants?
…see annotations and additional content (e.g. a timer)?
How would the user interact with VC app’s controls?

19 of 50

20 of 50

Proposed new solution

After a permissions prompt, allow the capturing application limited control over the captured surface.

Scrolling (over coordinates of choice)
Zooming (of the captured tab)
Page-up / page-down (on last keyboard-focused element) (shelved)

Before we delve into the exact shape, let’s have a quick run through an example.

21 of 50

Sample usage 1:

Initiate capture and obtain permission

const controller = new CaptureController();

const stream =

await navigator.mediaDevices.getDisplayMedia({ controller });

const video = document.getElementById('myVideoElement');

video.srcObject = stream;

// Perform a null-action so as to prompt the user for permission.

try {

await controller.sendMouseWheel({});

} catch (e) {

return; // Permission denied. Bail.

}

Permission prompts are only displayed when attempting to use the API they gate access to.�A null event is intentionally used to avoid performing an action before the user requests one.

22 of 50

Sample usage 2:

Relay scroll events to captured surface

// Having obtained the user’s permission, we can now relay subsequent

// wheel events to the captured tab.

video.addEventListener("wheel", event => {

const [x, y] = translateCoordinates(event.offsetX, event.offsetY);

controller.sendMouseWheel({

wheelDeltaX: event.wheelDeltaX,

wheelDeltaY: event.wheelDeltaY});

});

translateCoordinates() scales the coordinates in the video-element to those of the captured surface. Its implementation is left as an exercise for the reader.

23 of 50

Sample usage 3:

Control zoom-level of captured tabs

const zoomInButton = document.getElementById('zoomInButton');

zoomInButton.addEventListener('click', async (event) => {

const oldZoomLevel =

await controller.getZoomLevel();

const newZoomLevel =

Math.min(oldZoomLevel + 10,

controller.getMaxZoomLevel());

controller.setZoomLevel(newZoomLevel);

});

Possible capturing-app UX

Effect in captured tab

24 of 50

Proposed API shape

dictionary CapturedMouseWheelAction {

int x = 0;

int y = 0;

int wheelDeltaX = 0;

int wheelDeltaY = 0;

};

partial interface CaptureController {

// 0. Pre-existing and irrelevant methods omitted.

...

// 1. Scrolling

Promise<undefined> sendMouseWheel(CapturedMouseWheelAction action);

// 2. Zoom-level

int getMinZoomLevel();

int getMaxZoomLevel();

Promise<int> getZoomLevel();

Promise<undefined> setZoomLevel(int zoomLevel);

};

25 of 50

Finer points 1: Permission prompt

Set of possible operations limited to stay safe with a permissions prompt. (Sending mouse-clicks and keyboard-presses intentionally avoided.)
Permissions prompt likely still necessary, or else any capturer could scan the entire tab shared with it before a user has time to react.
User agents only expose permission prompts on first API invocation, so we intentionally include a no-op operation, allowing apps to prompt the user to obtain permission before they expose relevant controls.
Stickiness of permission an interesting topic. Let’s start conservatively?

26 of 50

Finer points 2a: Scrolling and paging

27 of 50

Finer points 2b: MouseWheel vs. Paging

The API allows MouseWheel events at arbitrary coordinates.
This implicitly assumes apps educate user to scroll over a preview tile.
Alternative - exposes an app-level widget to the user.

Problem: Where is the scroll event delivered?

Refined alternative - also support page-up / page-down.

Refined problem: Same question, really. Where is the event delivered?
Possible Answers:

Last keyboard-focused element.
Top-level documents.

These are all imperfect; hence the preference for MouseWheel.
Let’s keep paging for later discussion.

28 of 50

Excluding PiP from screen-capture

Arnaud Budkiewicz

Dialpad

29 of 50

Problem description

Picture-in-Picture is a great feature that keeps on the top of the screen

a playing video,
the main speaker of a video meeting along with its controls,
and goes even beyond with the new Document Picture-in-Picture API.

30 of 50

Problem description

In all these cases, as soon as the user is sharing screen, the PiP window is shared too, as it is part of what is on the screen.

31 of 50

Problem description

When the user is sharing a tab, a window, the PiP window is NOT shared, but sharing a screen exposes it IMMEDIATELY.

32 of 50

Problem description

The size of the PiP window is initially small, hiding a minimal fraction of what is underneath, but can be resized up to almost the size of the entire screen.

33 of 50

Problem description

Removing the PiP window from what getDisplayMedia is sharing with the far end could potentially expose unwanted content.

34 of 50

Proposal

Add an icon to the PiP Window allowing the user to hide/unhide it from screen share: https://bugs.chromium.org/p/chromium/issues/detail?id=1442585�
The state is non-persistent, getting back to the default state everytime PiP is started.

The default state should be discussed:

Hidden (relatively safe when the PiP window is small, like its default size). That is Zoom’s behavior, hiding video windows and meeting controls from what is shared.
Not hidden (current behavior).

35 of 50

One more thing…

While in a meeting, the state of mic and camera are very important information that should be visible at all times, but while in a meeting using PiP, as soon as you move the cursor away from the PiP window, the icons disappear, and the user doesn't know the state of the mic and camera.

As a user, I'd rather have the 3 icons visible at all times.�https://bugs.chromium.org/p/chromium/issues/detail?id=1442389

As a developer, it could be a parameter that would change the default behavior:

Controls are always visible
Auto-hide the controls (current behavior)

36 of 50

CapturedMouseEvent listener addition after getDisplayMedia()

Frédéric Wang

Igalia

fwang@igalia.com

37 of 50

Quick recap

Screen-Capture Mouse Events exposes mouse events over a captured surface:

let controller = new CaptureController();

controller.oncapturedmousechange = (event) => {

console.log(`surfaceX=${event.surfaceX}, surfaceY=${event.surfaceY}`);

};

let stream = await navigator.mediaDevices.getDisplayMedia({

controller: controller

});

Issues #1 and #9: Make CaptureController an EventTarget

👉🏼 to be proposed at the WebRTC WG tomorrow

controller.addEventListener("capturedmousechange", (event) => { ... });

38 of 50

Quick recap

Since mid-May, Tella has funded Igalia’s work on a prototype for Chromium!

39 of 50

Problem description

Issue #14: What happens if we register event listeners after getDisplayMedia()?

let controller = new CaptureController();

controller.oncapturedmousechange = (event) => { ... };

let stream = await navigator.mediaDevices.getDisplayMedia({

controller: controller

});

controller.oncapturedmousechange = (event) => { ... };

controller.addEventListener(“capturedmousechange”, ...);

Current prototype : these listeners will receive events for mouse moves.

40 of 50

Problem description

Modern browser engines have multiple components:

Classes/Objects (~20 involved in prototype)
Threads
Processes

Implementing prototype means crossing all these components:

When starting capture, establish a communication channel from CaptureController to the part tracking mouse moves.
For each mouse move, send the coordinates back to CaptureController.

Current prototype: This extra work is always done as long as we provide a CaptureController, even when listeners will never be registered!

41 of 50

Alternatives

Improve behavior of current prototype (if important use case)

👉🏼 Implementations can postpone establishing communication channel until the first event handler registered.

👉🏼 Implementations can also stop sending events through the communication channel when all event handlers are removed.

Require registration before getDisplayMedia() (implementers’ preference)

Explicit disabled-by-default "will receive mouse events" option on CaptureController vs infer based on whether any listener is registered.
Make addEventListener throw an exception vs accept handler that will never receive events.

👉🏼 Can do something similar to (1) when all event handlers are removed.

42 of 50

Dynamically switching between sources

Elad Alon

Google

eladalon@google.com

43 of 50

Dynamic-switching

Several browsers allow users the ability to change what they’re sharing “on the fly.”

Chrome - change from one tab to another.
Safari - change between different windows, screens.

Often easier for users than starting a new capture session.
Managed by the surfaceSwitching option.
Some limitations remain.

Chrome (any platform)

Safari (macOS)

44 of 50

Challenge

Can we keep extending UA capabilities without new spec changes?

Possibly, but some undesirable results would follow.

Examine the example of MediaStreamTrack.cropTo():

Only callable on tab-capture tracks.
The set of valid inputs changes depending on which tab is captured.

This is a general problem.

The user changing the target is an asynchronous event, outside the control of the app, and not easily observable by the app. And it can mean the difference between:

Method invocation works as expected
Method invocation has unexpected results
An exception is thrown

45 of 50

Alternative (with known issues)

Currently, dynamic-switching involves switching out the source.

But what if it didn’t? What if instead it:

Terminated the old tracks.
Fired a new event with a new stream.

Then:

const controller = new CaptureController();

controller.addEventListener('switch', (event) => {

const videoElement = document.getElementById('myVideoElement');

videoElement.srcObject = event.mediaStream;

});

…�navigator.mediaDevices.getDisplayMedia({ controller });

46 of 50

Proposal analysis

Solves the aforementioned issue.
Solves additional issues (mediacapture-screen-share #255).

(Briefly - when the target changes, the application might wish to also change the way it processes the stream. For example, it might not even wish to continue transmitting it remotely. This decision should block.)

One possibility is for UAs to only allow dynamic switching between different surface types (tabs/windows/screens) if the event listener is registered. Discussed in next slide.

47 of 50

Cross-surface switching based on event-handler

Should we specify that the browser only allows cross-surface switching if the app registers an event handler to handle the new stream?

Registering event handlers should not have side effects, so possibly a slightly different API shape is required.

Maybe: CaptureController.enableDynamicSwitching(handler)

Allowing applications to influence what options browsers expose to the user is risky business.

But we already have surfaceSwitching, so not a big change.

1 of 50

2 of 50

3 of 50

4 of 50

5 of 50

6 of 50

7 of 50

8 of 50

9 of 50

10 of 50

11 of 50

12 of 50

13 of 50

14 of 50

15 of 50

16 of 50

17 of 50

18 of 50

19 of 50

20 of 50

21 of 50

22 of 50

23 of 50

24 of 50

25 of 50

26 of 50

27 of 50

28 of 50

29 of 50

30 of 50

31 of 50

32 of 50

33 of 50

34 of 50

35 of 50

36 of 50

37 of 50

38 of 50

39 of 50

40 of 50

41 of 50

42 of 50

43 of 50

44 of 50

45 of 50

46 of 50

47 of 50

48 of 50

49 of 50

50 of 50