Picture-in-Picture for arbitrary content
This Document is Public
Authors: beccahughes@chromium.org
We are adding support for requesting a Picture-in-Picture window that can contain optionally interactive arbitrary HTML content instead of a video layer.
Desktop
Android (non-interactive only)
https://crbug.com/959867 (internal only)
Picture in Picture API
Picture-in-Picture V2 will include a new web api that is still under discussion. An explainer for the new API can be found here.
partial interface HTMLElement { [NewObject] Promise<PictureInPictureWindow> requestPictureInPicture( PictureInPictureOptions options); attribute EventHandler onenterpictureinpicture; attribute EventHandler onleavepictureinpicture; }; partial interface PictureInPictureWindow { Document? Document; }; dictionary PictureInPictureOptions { required aspectRatio; optional interactive = false; }; |
The aspectRatio will used to determine the size of the Picture-in-Picture window.
The promise will allow a clearer async API and would offer a way to expose that interactive isn’t supported by the platform.
When Picture-in-Picture is requested and the window is displayed we will copy the requested element to the body of the new window. We should use the deep version of the Document.importNode algorithm for copying.
We will need to add support to HTMLMediaElement to move to a new document without resetting the underlying media player. There is a FIXME in the code to do this which we will fix.
We will add a MoveToNewFrame method on WebMediaPlayer that will move a WebMediaPlayer to a new frame and recreate any frame specific objects (e.g. delegate, audio sink).
For WebMediaPlayerImpl we will add a WebMediaPlayerImplRendererFactory that can be used to recreate frame specific objects. This is already available in WebMediaPlayerMS as WebMediaStreamRendererFactory.
In the case of the delegate we will notify on the old frame that the player is gone and then update with delegate and delegate id for the new frame. We will then update the new delegate with the current state of the media element (e.g playing).
There is currently a PictureInPictureService and PictureInPictureSession called from Blink over mojo. We will replace the existing player_id and surface_id parameters with a new PictureInPictureSource type that will contain the parameters required to create a Picture-in-Picture window for each type of source.
struct PictureInPictureWebSource { // The id of the web source. This is generated by Blink and if this // changes then Chrome will reset the web contents of the // Picture-in-Picture window. mojo_base.mojom.UnguessableToken id; // Whether the web Picture-in-Picture window is interactable. If it // is not interactable then it will not receive any input events. bool interactable; // The aspect ratio of the web content. double aspect_ratio; }; struct PictureInPictureVideoSource { // WebMediaPlayer id used to control playback via the Picture-in-Picture // window. uint32 player_id; // SurfaceId to be shown in the Picture-in-Picture window. viz.mojom.SurfaceId? Surface_id; // Size of the video frames. gfx.mojom.Size natural_size. }; union PictureInPictureSource { PictureInPictureWebSource web; PictureInPictureVideoSource video; }; // Representation of a Picture-in-Picture session. The caller receives this // object back after a call to `StartSession()` on the main service. It can // be used to interact with the session after its creation. interface PictureInPictureSession { // Update the information used to create the session when it was created. // This call cannot happen after `Stop()`. Update(PictureInPictureSource source, bool show_play_pause_button, bool show_mute_button); }; // PictureInPictureService is a Document-associated interface that lives in // the browser process. It is responsible for implementing the browser-side // support for https://wicg.github.io/picture-in-picture and manages all // picture-in-picture interactions for Document. Picture-in-picture allows // a Document to display a video in an always-on-top floating window with // basic playback functionality (e.g. play and pause). interface PictureInPictureService { // Notify the service that it should start a Picture-in-Picture session. // source: Source of the Picture-in-Picture window. // show_play_pause_button: Whether a play/pause control should be offered // in the Picture-in-Picture window. // show_mute_button: Whether a mute control should be offered in the // Picture-in-Picture window. StartSession( PictureInPictureSource source, bool show_play_pause_button, bool show_mute_button) => ( PictureInPictureSession session, gfx.mojom.Size size); }; |
When creating a web contents for the Picture-in-Picture window we will set the opener and SiteInstance based on the frame that requested Picture-in-Picture.
The new web contents will be navigated to about:blank. This will allow the Picture-in-Picture window to communicate with the host frame using window.opener.
We will add an unguessable token picture_in_picture_token to CreateViewParams which will be passed to the renderer, this will be used to identify that this is a Picture-in-Picture window.
When we call the PictureInPictureService from Blink to start the web Picture-in-Picture session we will block the callback until we have finished loading the blank page. We will then call out to ChromeClient to find the frame with the matching token that we generated for the PictureInPictureWebSource. We can then use that to populate the document that will be accessible on the PictureInPictureWindow object and also be used to copy the element to.
OverlayWindowViews is a views::Widget in chrome that contains the UI of the Picture-in-Picture window. It currently has all the controls as child views and the logic for the controls and the Picture-in-Picture window in this class. There is also a views::View called video_layer which hosts the video surface.
We will add a new OverlayWindowWebSourceView that will be a views::View that will contain a views::WebView that will display the Picture-in-Picture web contents including any additional UI that is specific to a web Picture-in-Picture window. We will also set a custom WebContentsDelegate on the web contents which can be used to disable any functionality that we do not want the Picture-in-Picture window to have (e.g. Picture-in-Picture support).
As a P2 we should move the controls out of OverlayWindowViews into a generic OverlayWindowMediaControlsView that will encapsulate all the logic around the controls.
The EnterPictureInPicture sequence is currently highly dependent on a Picture-in-Picture window being tied to a surface id. The current sequence is the following:
In this diagram PictureInPicture has been shortened to PiP.
The content public OverlayWindow interface will have a SetSource method added which will take a PictureInPictureSource object. If the WebContentsDelegate returns an empty size we can use this to indicate the embedder does not support Picture-in-Picture and we will reject the promise.
This is the resulting flow:
class PictureInPictureSource { public: enum class Type { kVideo, kWeb }; virtual Type type(); }; class PictureInPictureVideoSource : public PictureInPictureSource { public: const viz::SurfaceId& surface_id() const; const gfx::Size& size() const; }; class PictureInPictureWebSource : public PictureInPictureSource { public: // Returns the web contents to be embedded in the picture in picture // window. If this is the first call the web contents will be created. WebContents* GetOrCreateWebContents(); // Returns whether the web contents should be interactable. bool interactable() const; // Returns the aspect ratio of the web content. double aspect_ratio() const; }; |
If the PictureInPictureWebSource is not interactable then we will disable interaction on the web contents by calling SetIgnoreInputEvents(true).
If the PictureInPictureWebSource is interactable then we will still limit keyboard events to a whitelist. We will implement this by overriding PreHandleKeyboardEvent in our WebContents. The following tables DOMKeys will be allowed:
We will add a PictureInPictureWindow.SourceType to track the usage of the different types of Picture-in-Picture window sources.
Waterfall
For interactive Picture-in-Picture there are concerns around impersonating system UI. Therefore, we will ensure the UX of the Picture-in-Picture window is distinct enough by adding a border (and maybe an indicator of the origin).
We will disable trusted UI (e.g. permission prompts, autofill) and also remove regular keyboard events to reduce the attack surface of the Picture-in-Picture window.
The following sandbox flags will be applied to the web contents: