1 of 33

Accelerate graphics performance with ozone-gbm �on Intel based Linux systems

BlinkOn9(bit.ly/blinkon9-info) Apr 18-19, 2018

Joone Hur(joone.hur@intel.com, joone@chromium.org)

Intel Open Source Technology Center

Intel Confidential — Do Not Forward

2 of 33

Agenda

    • Overview
    • New graphics architecture in CrOS
    • Ozone & Ozone-gbm
    • History about running ozone-gbm on Linux systems
    • Why ozone-gbm for Linux systems?
    • Why ozone-gbm on Intel GPU?
    • Intel Graphics features: zero-copy texture upload, video decoding, and hardware overlay
    • Upstream status
    • Enabling ozone-gbm in Yocto Linux and Arch Linux.
    • Demo

2

Intel Confidential — Do Not Forward

3 of 33

Overview

  • Intel based Chromebooks have enabled graphics acceleration features through ozone-gbm:
    • zero-copy texture upload, hardware overlay/automic page-flip, and video encoding/decoding.
    • Those features enable Intel Chromebooks to achieve better performance, save system memory, and extend battery life.
  • Make Linux based systems run on ozone-gbm
    • DTVs, digital signage devices, IVI systems, and etc.

3

Intel Confidential — Do Not Forward

4 of 33

New graphics architecture in CrOS (2015)

  • Intel based Chromebooks had been updated to the new freon graphics stack(ozone-gbm) in 2015

4

Xlib

DRM/KMS

i915 device driver

Aura

View

ChromeOS UI

Ozone-gbm

DRM/KMS

i915 device driver

Aura

View

ChromeOS UI

X-Server

Intel Confidential — Do Not Forward

5 of 33

Ozone

  • Platform abstraction layer for:
    • Accelerated surface
    • Low-level input and event handling.
  • Allow Chromium to run on embedded SoC targets and a new windowing systems: Wayland or Mir
  • Ozone Backend:
    • X11, Wayland, and GBM.

5

ozone

Aura/View

Chrome/Content

ozone implementations

(X11, Wayland, GBM)

X11

Wayland

DRM/KMS

Intel Confidential — Do Not Forward

6 of 33

Ozone-gbm

  • It allows Chromium to directly talk to DRM/KMS & evdev:
    • Allocate a buffer in GPU memory that can be accessed by a render process.
    • Change the display mode (mode-setting)
    • Align events from evdev and painting with vsync signal from GPU
  • It enables Intel graphics features:
    • Texture zero-copy, hardware overlay/atomic page flip and video encoding/decoding.

6

Chromium

Linux Kernel

events

DRM/KMS

GEM

MESA

mini-gbm

GpuMemoryBuffer

ozone-gbm

evdev

i915 Driver

Intel GPU

libdrm

Intel Confidential — Do Not Forward

7 of 33

History

7

Intel Confidential — Do Not Forward

8 of 33

History

8

Intel Confidential — Do Not Forward

9 of 33

Why ozone-gbm for Linux systems?

  • Chromium is a de facto standard for Linux based devices.
    • They are limited to use CSS animations, video playback, and WebGL.
  • Ozone-gbm is highly optimized for Intel based Chromebooks.
    • It could work on a regular Linux. o/

9

Intel Confidential — Do Not Forward

10 of 33

Why ozone-gbm on Intel GPU?

  • Ozone-gbm allows to use modern graphics systems:
    • Hardware compositor, atomic page flip, hardware overlay, and video encode/decode
  • Chromium has enabled the following graphics features:
    • Zero-copy texture uploads for 2D rendering
    • Video encoding/decoding for HTML5 video and WebRTC
    • Hardware overlay/atomic page flip for video, ARC++, and WebGL.
  • Those features have been commercially verified on Intel based chromebooks.

10

ARC++ is the Android app runtime for ChromeOS

Google Pixel (7th i5m $999)

Samsung Chromebook Pro (Core M, $509)

ASUS C202SA�(Celeron, $218)

HP Chromebook 14 �(Celeron, $219)

Intel Confidential — Do Not Forward

11 of 33

Intel graphics features for Chromium

  • Zero-copy texture uploads for 2D rendering
  • Video encoding/decoding for HTML5 video and WebRTC
  • Hardware overlay for video, ARC++, and WebGL.

11

ARC++ is the Android app runtime for ChromeOS

Intel Confidential — Do Not Forward

12 of 33

Intel graphics features: zero-copy texture upload

  • Need one copy to upload a bitmap to GPU memory as texture.
  • In Intel SoC, CPU and GPU share the same physical memory.
  • It enables the render process to paint content on an imported GPU buffer.

12

Intel Confidential — Do Not Forward

13 of 33

Workload test (zero-copy texture upload)

  • Zero-copy texture upload is almost 4 times faster than the fallback in certain cases. Usually, it is 30~40% faster (link)

13

Test page

software fallback

zero-copy texture upload

5.3 fps

22.3 fps

4.2X faster

12.2 fps

48.2 fps

3.95X faster

Intel Compute stick X5-Z8300 (Cherry-trail, Gen8 graphics, GPU RAM 512MB)

Disclaimer: Estimate only. Not an official benchmark results.

Intel Confidential — Do Not Forward

14 of 33

Memory consumption(zero-copy texture upload)

  • Zero-copy texture upload is also more effective in memory consumption and power saving
  • In case of ChromeOS, GPU process is about 65% lower with zero-copy compared to software fallback. In the Renderer process the memory consumed is about 20% lower with zero-copy.

14

Test page

software fallback

zero-copy texture upload

80 MB

48 MB

40% lower

Intel Compute stick X5-Z8300 (Cherry-trail, Gen8 graphics, GPU RAM 512MB)

Disclaimer: Estimate only. Not an official benchmark results.

Intel Confidential — Do Not Forward

15 of 33

Intel graphics features: video/image accelerations

15

Skylake(6th, Gen9)

Kaby(7th) / Gemini /

Coffee Lake(8th) (Gen9.5)

H.264

Yes

Yes

Yes

Yes

Decode only

Decode only

Decode only

Decode only

JPEG

Yes

Yes

Yes

Yes

VP8

Yes

Yes

Yes

Yes

Decode only

Yes

Yes

Yes

HEVC 10-bit

No

No

Decode only (8K)

Yes

VP9

No

No

Decode only (4K)

Yes

Intel Confidential — Do Not Forward

16 of 33

Workload test: video decoding, Intel Compute stick(Atom)

  • HW accelerated video decoding is 1.7~3.6x faster, which allow to play bigger size videos(2K, 4K).
  • Zero-copy texture upload is also effective in software decoding by reducing a copy operation between CPU to GPU.

16

test page

software fallback

hardware accelerated (video decoding, zero-copy texture upload)

H.264 video 2K (vimeo video)

13.6 fps

24 fps

1.7x higher

H.264 video 4K (vimeo video)

5.7 fps

20.7 fps

3.6x higher

VP9 1K (Youtube)

30 fps

30 fps

VP9 2K (Youtube)

video keeps pausing

25 fps, slightly pausing

* Intel Compute stick X5-Z8300 (Cherry-trail, Gen8 graphics, GPU RAM 512MB)

* VP9 uses software decoding

Disclaimer: Estimate only. Not an official benchmark results.

Intel Confidential — Do Not Forward

17 of 33

Workload test: video decoding, Skylake(Core 6th gen)

  • HW accelerated video decoding is 1.1~1.7x faster, which allow to play bigger size videos(4K).
  • Zero-copy texture upload is also effective in software decoding by reducing a copy operation between CPU to GPU.

17

test page

software fallback

hardware accelerated (video decoding, zero-copy texture upload)

H.264 video 2K (vimeo video)

25 fps

39.2 fps

1.5 x higher

H.264 video 4K (vimeo video)

24 ps

27.3 fps

1.1 x higher

VP9 1K (Youtube)

58.4 fps

58.4 fps

VP9 2K (Youtube)

23.8 fps, frequently pausing

42.6 fps, slightly pausing

1.7 X

* Intel Compute stick STK2m364CC (Skylake, Gen9 graphics, GPU RAM 512MB)

* VP9 uses software decoding

Disclaimer: Estimate only. Not an official benchmark results.

Intel Confidential — Do Not Forward

18 of 33

Workload test: video decoding, Kaby Lake(Core 7th gen),

  • HW accelerated video decoding can only play 4K 60fps Youtube videos.
  • Software fallback uses almost 3 times memory than hardware decoding.

18

test page

software fallback

hardware accelerated

VP9 4K 60fps (Youtube)

video keeps pausing

50 fps

VP9 2K 60fps (Youtube)

60 fps(14.9 MB)

60 fps (4.5 MB)

Intel Kaby Lake i7 NUC NUC7i7BNH (Gen9 graphics, GPU RAM 512MB)

Disclaimer: Estimate only. Not an official benchmark results.

Intel Confidential — Do Not Forward

19 of 33

Intel graphics features: hardware overlay

Display controller can scale and composite overlay planes.

19

Source: pictures from this doc.

Intel Confidential — Do Not Forward

20 of 33

Legacy: GPU Composition

20

  • GPU composites all planes
  • Composition bandwidth: 2.028GB/s @60fps

Video Layer

Web contents

Browser UI

Primary plane

Render Target

Open GL GPU Composition

RGBA / R

RGBA / R

RGBA / R

RGBA / R

RGBA / W

Picture by Dongseong Hwang

Intel Confidential — Do Not Forward

21 of 33

Brand-new: Hardware Overlay

21

Video Layer

Web contents

Browser UI

Primary plane

Render Target

Open GL GPU Composition

RGBA / R

RGBA / R

RGBA / R

RGBA / R

Overlay Plane

  • Saves the bandwidth(2.028GB/s ⇒ 0.949GB/s @ 60fps), and power(75 mW)
  • GPU is idle, which saves more power.

RGBA / W

Picture by Dongseong Hwang

Intel Confidential — Do Not Forward

22 of 33

Hardware overlay support in CrOS

  • Video overlay
    • Enabled in M61.
  • ARC++ overlay
    • Enabled in M61.
    • Run Android applications as Wayland client in CrOS
  • WebGL overlay
    • It will be supported in Gen10 (bug).

22

Intel Confidential — Do Not Forward

23 of 33

Benefits of Ozone-gbm

  • Performance
    • Zero-copy texture is 30~40% faster.
  • Memory consumption
    • The GPU process uses 65% less memory.
    • The render process also consumes 20% lower memory.
  • UI Responsiveness
    • Ozone-gbm provide a better frame rate by aligning events from evdev and painting with Vsync signal from GPU.
    • It reduces significant latency of touch and stylus.
  • Power Saving
    • Using hardware feature saves more battery such as hardware overlay, hardware encode/decode, and zero-copy texture upload.

23

Intel Confidential — Do Not Forward

24 of 33

Upstream : Allow to run ozone_demo on a Linux/Intel desktop (Landed)

24

commit 63612d32bdf79162b3aadf5b80ee61e36624fd14

Author: Joone Hur <joone.hur@intel.com>

Date: Fri Feb 2 22:09:35 2018 +0000

Allow to run ozone_demo on a Linux/Intel desktop

First, we need to enable Intel driver by building

Mesa 17.0.2 with --with-egl-platform=surfaceless

--with-dri-drivers=i965

BUG=733450

TEST=ozone_demo

$ cd ~/git/chromium/src

$ gn gen out/Release "--args=use_ozone=true ozone_platform_gbm=true

use_intel_minigbm=true"

$ ninja -C out/Release ozone_demo

$ export EGL_PLATFORM=surfaceless

$ out/Release/ozone_demo

Change-Id: Ic560b2b4f36701f3c159fd35e771d04c2e1ec97e

Reviewed-on: https://chromium-review.googlesource.com/886836

Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>

Reviewed-by: Robert Kroeger <rjkroege@chromium.org>

Commit-Queue: Joone Hur <joone.hur@intel.com>

Cr-Commit-Position: refs/heads/master@{#534168}

Intel Confidential — Do Not Forward

25 of 33

Upstream : Separate display configurator from CrOS build (Landed o/)

25

commit d3ae8737f7186154d2fdc3ccc5fe43bf290f91b5

Author: Joone Hur <joone.hur@intel.com>

Date: Tue Apr 17 18:05:09 2018 +0000

Separate display configurator from CrOS build

This CL moves all files in ui/display/manager/chromeos to

ui/display/manager and adds build_display_configuration GN arg

so that the display configurator could be used in Linux

desktop.

BUG=733450

Change-Id: Idd790cfe6f9e5daf6ccad23353573028ebe5d7ee

Reviewed-on: https://chromium-review.googlesource.com/969416

Reviewed-by: Yusuke Sato <yusukes@chromium.org>

Reviewed-by: James Cook <jamescook@chromium.org>

Reviewed-by: Devlin <rdevlin.cronin@chromium.org>

Reviewed-by: Dale Curtis <dalecurtis@chromium.org>

Reviewed-by: Steven Bennetts <stevenjb@chromium.org>

Reviewed-by: Nico Weber <thakis@chromium.org>

Reviewed-by: Ahmed Fakhry <afakhry@chromium.org>

Reviewed-by: kylechar <kylechar@chromium.org>

Reviewed-by: Malay Keshav <malaykeshav@chromium.org>

Commit-Queue: Joone Hur <joone.hur@intel.com>

Cr-Commit-Position: refs/heads/master@{#551390}

Intel Confidential — Do Not Forward

26 of 33

Upstream : [WIP] Make mus_demo work on a desktop Linux

26

[WIP] Make mus_demo work on a desktop Linux

BUG=733450

TEST=mash --service=mus_demo --enable-features=Mash

$ cd ~/git/chromium/src

$ gn gen out/Release "--args=use_ozone=true ozone_platform_gbm=true use_intel_minigbm=true"

$ ninja -C out/Release mash:all mus_demo

$ export EGL_PLATFORM=surfaceless

$ out/Release/mash --service=mus_demo --enable-features=Mash

Change-Id: I7ab7db87d74f1e1440f63a8a10118c8394597ad4

Intel Confidential — Do Not Forward

27 of 33

Upstream : Enable VAVDA, VAVEA and VAJDA on linux with VAAPI only

27

For more details, see

Enable VAVDA, VAVEA and VAJDA on linux with VAAPI only

This patch contains all the changes necessary to use VA-API along with

vaapi-driver to run all media use cases supported with hardware acceleration.

It is intended to remain as experimental accessible from chrome://flags on linux.

It requires libva/intel-vaapi-driver to be installed on the system path where

chrome is executed. Other drivers could be tested if available. Flags are

kept independent for linux, where this feature has to be enabled before

actually using it. This should not change how other OSes use the flags

already, the new flags will show at the buttom on the section of unavailable

experiments

The changes cover a range of compiler pre-processor flags to enable the stack.

It moves the presandbox operations to the vaapi_wrapper class as the hook function

is available there. vaInit will open driver on the correct installed folder.

chrome flags consolidtation into only two flags for linux. Mjpeg and accelerated

video are used. The other flags are kept for ChromeOS and other OSes.

Developer testing was made on skylake hardware, ChromeOS and Ubuntu.

Intel Confidential — Do Not Forward

28 of 33

How to enable ozone-gbm on Yocto Linux

  • Prepare the same Linux kernel & Mesa and its patches for CrOS.
    • https://github.com/joone/poky/tree/pyro_gbm
  • Needs several Chromium patches(link) in M58:
    • Enable i915 driver in DRM for mini-gbm.
    • Add CrOS display configurator to Chromium and etc.
  • Chromium browser and mash demo work.
    • Support zoro-copy texture upload and video decoding.
  • Build Chromium ozone-gbm, Linux kernel and Mesa using Yocto recipes:

28

Intel Confidential — Do Not Forward

29 of 33

How to enable ozone-gbm on Arch Linux

  • ozone_demo and mus_demo work fine in upstream(r550918_0415_2018)

29

$ cd ~/git/chromium/src

$ gn gen out/Release "--args=use_ozone=true ozone_platform_gbm=true use_intel_minigbm=true"

$ ninja -C out/Release ozone_demo mash:all mus_demo

$ export EGL_PLATFORM=surfaceless

$ out/Release/ozone_demo --enable-drm-atomic --enable-overlay

$ out/Release/mash --service=mus_demo --enable-features=Mash

Intel Confidential — Do Not Forward

30 of 33

Demo

  • Demo on Intel Atom processor (Cherry-trail)

30

Intel Confidential — Do Not Forward

31 of 33

Future Plan

  • Run Chromium browser or content_shell with ozone-gbm on Intel SoCs.
  • Update Yocto recipe.
  • Keep it work in ToT.
  • Enable more graphics acceleration features.

31

Intel Confidential — Do Not Forward

32 of 33

32

Thanks!

Contact us:

Joone Hur�joone.hur@intel.com

Intel Open Source Technology Center

http://01.org

Intel Confidential — Do Not Forward

33 of 33

Intel Confidential — Do Not Forward

Intel Confidential — Do Not Forward