1 of 50

Screen Capture Community Group�2023-06-26

2 of 50

Agenda

  • Salutations (5 minutes)
  • Introduction (5 minutes)
  • Element Capture Follow-up (20 minutes)
  • Remote-Control API (20 minutes)
  • Excluding picture-in-picture from screen-capture (20 minutes)
  • CapturedMouseEvent listener additon after gDM() (20 minutes)
  • Dynamically switching between sources (20 minutes)
  • Administrative matters (5 minutes)

3 of 50

Introduction

4 of 50

Element Capture Follow-up

Jordan Bayles

Google�jophba@google.com

Mark Foltz

Google�mfoltz@google.com

5 of 50

What is Element Capture?

Recall region capture: it allows a video track captured from a tab to be cropped according to the bounds of some element on the page.

Proprietary + Confidential

6 of 50

Region Capture

Region Capture is a three step process:

1. Captured document creates a cropTarget for the content of interest.

const cropTarget =

await CropTarget.fromElement(mainContentArea);

Proprietary + Confidential

7 of 50

Region Capture

2. Application captures the tab with the embedded cropTarget.

const stream = await

navigator.mediaDevices.getDisplayMedia({

preferCurrentTab: true,

});

const [track] = stream.getVideoTracks();

Proprietary + Confidential

8 of 50

Region Capture

3. Application crops the track with the cropTarget, which will capture only the main content area.

await track.cropTo(cropTarget);

Proprietary + Confidential

9 of 50

What is Element Capture?

Proprietary + Confidential

10 of 50

Region Element Capture

3. Application crops the track with the cropTarget, which will capture only the main content area without any occluding content.

await track.restrictTo(cropTarget);

Proprietary + Confidential

11 of 50

Element Capture - What can be captured?

Element should form a "Stacking Context" - a new context to resolve z-index values

<div class="one">

<div class="child"

style="z-index:-1"></div>

</div>

one

child

<div class="one"

style="isolation:isolate">

<div class="child"

style="z-index:-1"></div>

</div>

Proprietary + Confidential

12 of 50

Element Capture - What can be captured?

Element should form a "Backdrop Root" - ancestor elements can't mask or filter it

Proprietary + Confidential

13 of 50

Element Capture - What can be captured?

Element ancestor should not use 3D transforms

Proprietary + Confidential

14 of 50

Element Capture - Open questions

  • Applying these constraints to existing elements may change the rendering
  • We could force these constraints when a crop target is applied
  • We could force them when capture is started
  • We could not force them, and pause capture when they are not met

Proprietary + Confidential

15 of 50

Element Capture - Next Steps

  • Documenting constraints that make an element capturable (PR in review)
  • Refinements to API shape, feedback from developers
  • Understanding performance and rendering impacts
  • Targeting Origin Trial in Q4 2023

Proprietary + Confidential

16 of 50

Element Capture - Possible future work

  • Signaling back to the capturing application when elements become non-capturable.
  • Direct capture from elements, i.e. Element.captureStream().
  • Offscreen element capture.
  • Element-restricted audio capture.

Proprietary + Confidential

17 of 50

Remote-Control API

Elad Alon

Google

eladalon@google.com

18 of 50

Problem description

A user is in a video call and shares a tab.

How does the user…

  • …scroll the captured tab?
  • …change the zoom level?

If the user focused the captured tab…

  • …how would the user see�remote participants?
  • …see annotations and additional content (e.g. a timer)?
  • How would the user interact with VC app’s controls?

19 of 50

Other solutions

Let’s briefly explore alternative solutions and see why we might wish to explore yet more possibilities, such as that proposed in later slides.

  • PiP (Picture-in-Picture)
    • Size: Shared content in its natural size, remote participants likely small.
    • Layout: The user has to manually move PiP to a reasonable location.
    • Additional content: Limited space to expose more content such as annotations or a chat.
    • Additional controls: Limited space to expose controls.
  • Video Portal (presented last time)
    • Still theoretical
    • Challenging to standardize, implement and deploy
    • Does not allow delegating any control to remote users or app itself
    • Limits flexibility of video-conferencing app (requires preview and prevent drawing over it)
    • Great when you need it, but overkill when you don’t.

20 of 50

Proposed new solution

After a permissions prompt, allow the capturing application limited control over the captured surface.

  • Scrolling (over coordinates of choice)
  • Zooming (of the captured tab)
  • Page-up / page-down (on last keyboard-focused element) (shelved)

Before we delve into the exact shape, let’s have a quick run through an example.

21 of 50

Sample usage 1:

Initiate capture and obtain permission

const controller = new CaptureController();

const stream =

await navigator.mediaDevices.getDisplayMedia({ controller });

const video = document.getElementById('myVideoElement');

video.srcObject = stream;

// Perform a null-action so as to prompt the user for permission.

try {

await controller.sendMouseWheel({});

} catch (e) {

return; // Permission denied. Bail.

}

Permission prompts are only displayed when attempting to use the API they gate access to.�A null event is intentionally used to avoid performing an action before the user requests one.

22 of 50

Sample usage 2:

Relay scroll events to captured surface

// Having obtained the user’s permission, we can now relay subsequent

// wheel events to the captured tab.

video.addEventListener("wheel", event => {

const [x, y] = translateCoordinates(event.offsetX, event.offsetY);

controller.sendMouseWheel({

x,

y,

wheelDeltaX: event.wheelDeltaX,

wheelDeltaY: event.wheelDeltaY});

});

translateCoordinates() scales the coordinates in the video-element to those of the captured surface. Its implementation is left as an exercise for the reader.

23 of 50

Sample usage 3:

Control zoom-level of captured tabs

const zoomInButton = document.getElementById('zoomInButton');

zoomInButton.addEventListener('click', async (event) => {

const oldZoomLevel =

await controller.getZoomLevel();

const newZoomLevel =

Math.min(oldZoomLevel + 10,

controller.getMaxZoomLevel());

controller.setZoomLevel(newZoomLevel);

});

Possible capturing-app UX

Effect in captured tab

24 of 50

Proposed API shape

dictionary CapturedMouseWheelAction {

int x = 0;

int y = 0;

int wheelDeltaX = 0;

int wheelDeltaY = 0;

};

partial interface CaptureController {

// 0. Pre-existing and irrelevant methods omitted.

...

// 1. Scrolling

Promise<undefined> sendMouseWheel(CapturedMouseWheelAction action);

// 2. Zoom-level

int getMinZoomLevel();

int getMaxZoomLevel();

Promise<int> getZoomLevel();

Promise<undefined> setZoomLevel(int zoomLevel);

};

25 of 50

Finer points 1: Permission prompt

  • Set of possible operations limited to stay safe with a permissions prompt. (Sending mouse-clicks and keyboard-presses intentionally avoided.)
  • Permissions prompt likely still necessary, or else any capturer could scan the entire tab shared with it before a user has time to react.
  • User agents only expose permission prompts on first API invocation, so we intentionally include a no-op operation, allowing apps to prompt the user to obtain permission before they expose relevant controls.
  • Stickiness of permission an interesting topic. Let’s start conservatively?

26 of 50

Finer points 2a: Scrolling and paging

27 of 50

Finer points 2b: MouseWheel vs. Paging

  • The API allows MouseWheel events at arbitrary coordinates.
  • This implicitly assumes apps educate user to scroll over a preview tile.
  • Alternative - exposes an app-level widget to the user.
    • Problem: Where is the scroll event delivered?
  • Refined alternative - also support page-up / page-down.
    • Refined problem: Same question, really. Where is the event delivered?
    • Possible Answers:
      • Last keyboard-focused element.
      • Top-level documents.
    • These are all imperfect; hence the preference for MouseWheel.
    • Let’s keep paging for later discussion.

28 of 50

Excluding PiP from screen-capture

Arnaud Budkiewicz

Dialpad

29 of 50

Problem description

Picture-in-Picture is a great feature that keeps on the top of the screen

  • a playing video,
  • the main speaker of a video meeting along with its controls,
  • and goes even beyond with the new Document Picture-in-Picture API.

30 of 50

Problem description

In all these cases, as soon as the user is sharing screen, the PiP window is shared too, as it is part of what is on the screen.

31 of 50

Problem description

When the user is sharing a tab, a window, the PiP window is NOT shared, but sharing a screen exposes it IMMEDIATELY.

32 of 50

Problem description

The size of the PiP window is initially small, hiding a minimal fraction of what is underneath, but can be resized up to almost the size of the entire screen.

33 of 50

Problem description

Removing the PiP window from what getDisplayMedia is sharing with the far end could potentially expose unwanted content.

34 of 50

Proposal

  • Add an icon to the PiP Window allowing the user to hide/unhide it from screen share: https://bugs.chromium.org/p/chromium/issues/detail?id=1442585
  • The state is non-persistent, getting back to the default state everytime PiP is started.

  • The default state should be discussed:
    • Hidden (relatively safe when the PiP window is small, like its default size). That is Zoom’s behavior, hiding video windows and meeting controls from what is shared.
    • Not hidden (current behavior).

35 of 50

One more thing…

While in a meeting, the state of mic and camera are very important information that should be visible at all times, but while in a meeting using PiP, as soon as you move the cursor away from the PiP window, the icons disappear, and the user doesn't know the state of the mic and camera.

As a user, I'd rather have the 3 icons visible at all times.�https://bugs.chromium.org/p/chromium/issues/detail?id=1442389

As a developer, it could be a parameter that would change the default behavior:

  • Controls are always visible
  • Auto-hide the controls (current behavior)

36 of 50

CapturedMouseEvent listener addition after getDisplayMedia()

Frédéric Wang

Igalia

fwang@igalia.com

37 of 50

Quick recap

  • Screen-Capture Mouse Events exposes mouse events over a captured surface:

let controller = new CaptureController();

controller.oncapturedmousechange = (event) => {

console.log(`surfaceX=${event.surfaceX}, surfaceY=${event.surfaceY}`);

};

let stream = await navigator.mediaDevices.getDisplayMedia({

controller: controller

});

  • Issues #1 and #9: Make CaptureController an EventTarget

👉🏼 to be proposed at the WebRTC WG tomorrow

controller.addEventListener("capturedmousechange", (event) => { ... });

38 of 50

Quick recap

  • Since mid-May, Tella has funded Igalia’s work on a prototype for Chromium!

39 of 50

Problem description

  • Issue #14: What happens if we register event listeners after getDisplayMedia()?

let controller = new CaptureController();

controller.oncapturedmousechange = (event) => { ... };

let stream = await navigator.mediaDevices.getDisplayMedia({

controller: controller

});

controller.oncapturedmousechange = (event) => { ... };

controller.addEventListener(“capturedmousechange”, ...);

  • Current prototype : these listeners will receive events for mouse moves.

40 of 50

Problem description

  • Modern browser engines have multiple components:
    • Classes/Objects (~20 involved in prototype)
    • Threads
    • Processes

  • Implementing prototype means crossing all these components:
    • When starting capture, establish a communication channel from CaptureController to the part tracking mouse moves.
    • For each mouse move, send the coordinates back to CaptureController.

  • Current prototype: This extra work is always done as long as we provide a CaptureController, even when listeners will never be registered!

41 of 50

Alternatives

  1. Improve behavior of current prototype (if important use case)

👉🏼 Implementations can postpone establishing communication channel until the first event handler registered.

👉🏼 Implementations can also stop sending events through the communication channel when all event handlers are removed.

  • Require registration before getDisplayMedia() (implementers’ preference)
    1. Explicit disabled-by-default "will receive mouse events" option on CaptureController vs infer based on whether any listener is registered.
    2. Make addEventListener throw an exception vs accept handler that will never receive events.

👉🏼 Can do something similar to (1) when all event handlers are removed.

42 of 50

Dynamically switching between sources

Elad Alon

Google

eladalon@google.com

43 of 50

Dynamic-switching

  • Several browsers allow users the ability to change what they’re sharing “on the fly.”
    • Chrome - change from one tab to another.
    • Safari - change between different windows, screens.
  • Often easier for users than starting a new capture session.
  • Managed by the surfaceSwitching option.
  • Some limitations remain.

Chrome (any platform)

Safari (macOS)

44 of 50

Challenge

Can we keep extending UA capabilities without new spec changes?

Possibly, but some undesirable results would follow.

Examine the example of MediaStreamTrack.cropTo():

  • Only callable on tab-capture tracks.
  • The set of valid inputs changes depending on which tab is captured.

This is a general problem.

The user changing the target is an asynchronous event, outside the control of the app, and not easily observable by the app. And it can mean the difference between:

  1. Method invocation works as expected
  2. Method invocation has unexpected results
  3. An exception is thrown

45 of 50

Alternative (with known issues)

Currently, dynamic-switching involves switching out the source.

But what if it didn’t? What if instead it:

  1. Terminated the old tracks.
  2. Fired a new event with a new stream.

Then:

const controller = new CaptureController();

controller.addEventListener('switch', (event) => {

const videoElement = document.getElementById('myVideoElement');

videoElement.srcObject = event.mediaStream;

});

…�navigator.mediaDevices.getDisplayMedia({ controller });

46 of 50

Proposal analysis

  • Solves the aforementioned issue.
  • Solves additional issues (mediacapture-screen-share #255).
    • (Briefly - when the target changes, the application might wish to also change the way it processes the stream. For example, it might not even wish to continue transmitting it remotely. This decision should block.)
  • One possibility is for UAs to only allow dynamic switching between different surface types (tabs/windows/screens) if the event listener is registered. Discussed in next slide.

47 of 50

Cross-surface switching based on event-handler

Should we specify that the browser only allows cross-surface switching if the app registers an event handler to handle the new stream?

  • Registering event handlers should not have side effects, so possibly a slightly different API shape is required.
    • Maybe: CaptureController.enableDynamicSwitching(handler)
  • Allowing applications to influence what options browsers expose to the user is risky business.
    • But we already have surfaceSwitching, so not a big change.

48 of 50

Administrative matters

Image by Sear Greyson.

49 of 50

Administrative matters

  • Thank you for joining!
  • Reminder to join the mailing list
  • Tentative time for next meeting (Late August? Early September?)
  • Solicitation of presentations in our next meetings
    • Email eladalon@google.com

50 of 50

Until next time!

Image by Josh Nezon.