OSPM-summit-20 topics

	A	B	C	D	E	F	G	H
1	name/surname	affiliation	short bio	email	title	abstract	30min/50min slot	Requests/comments

2	Rafael Wysocki	Intel		rafael@kernel.org	Deadline scheduler and CPU idle states	Is it viable to use the deadline scheduler and allow CPUs to use idle states at the same time? If "yes", then what does need to be done to arrange for that, and if "no", then why? [Discussion topic]	30
3	Parth Shah	IBM		parth@linux.ibm.com	Introducing Latency Nice for Scheduler Hints and Optimizing Scheduler Task Wakeup	The concept of latency nice has been proposed to allow the userspace to provide hints to the kernel scheduler of the latency requirements of certain tasks. The concept was originally introduced by Paul T in OSPM 2019 and since then many kernel developers have have provided their requirement and possible usecases with such a per-task attribute [1]. In this talk we will summarize the interface and the usecases as provided by the patches posted in the mailing list introducing latency_nice [2]. In addition to this, the talk will focus on the potential scheduler wake-up path optimization with the use of such latency_nice per-task attribute, among which are: - Use of Peter Zijlstra's select_idle_sibling rework [3] patchset and Subhra Mazumdar's Task latency-nice [4] patches together to reduce idle-core scan search time. - Use sched_idle_cpu() everywhere in-place of available_idle_cpus. There are still few places like in "select_idle_core" where SCHED_IDLE task running CPU is not considered idle during task wakeup. - Prioritize based on the current idle-type of an idle CPUs (busy, sched_idle, preempted_idle and non_preempted_idle) when selecting a CPU for the task wakeup. - Use an appropriate strategy (packing vs spreading) depending on the latency_nice attribute of the tasks. References: [1]. https://lkml.org/lkml/2019/9/30/215 [2]. https://lkml.org/lkml/2020/1/16/319 [3]. https://lkml.org/lkml/2018/5/30/632 [4]. https://lkml.org/lkml/2019/8/30/829 [5]. https://lkml.org/lkml/2019/7/8/13	30
4	Pratik Rajesh Sampat	IBM		psampat@linux.ibm.com	Alternate approach to gather and use history in the TEO governor	Currently the TEO governor apart from the TEO timer and hit/miss/early hit buckets; also gathers history of 8 intervals and if there are significant idle durations less than the current, then it decides if a shallower state must be chosen. The current sliding history window does do a fair job at prediction, however, the hardcoded window can be a limiting factor for an accurate prediction and having the window size increase can also linearly affect both space and time complexity of the prediction. To complement the current moving window history, we present an approach wherein each idle state separately maintains a weight for itself and its counterpart idle states to form a probability distribution which helps in choosing the next probable idle state. The weights are dynamic and can shift based on the behaviour of the workloads tipping the odds in favor or against an idle state. Another advantage of using dynamic weights as a heuristic of gathering and predicting is that it can help maintain lifetime idle state history in constant space. The session aims to discuss the approach along with the initial benchmarks with the community.	30
5	Dario Faggioli	SUSE	Dario is a Virtualization Software Engineer at SUSE. He's been active in the Open Source virtualization space since ~8 years, mostly on the Xen project, and he's currently a maintainer of the Xen hypervisor scheduler. Currently, he is also working on KVM, Libvirt, QEMU and other things. His primary focus is on scheduling and on performance evaluation and improvement. During his Ph.D, he worked on real-time scheduling on Linux, and he's one of the original author of what today is the SCHED_DEADLINE scheduling class. Since 2010, he has spoke and presented his work at several events and conferences, like Linux Kernel Summit, Linux Plumbers, Xen Project Developers Summit, FOSDEM, LinuxLab and previous editions of OSPM.	dfaggioli@suse.com	MMTests for Benchmarking the Scheduler. And Virtualization. And Power Management. And...	MMTests is a testing and benchmarking suite for Linux. "MM" in the name stands for "Memory Management". In fact, the tool was born to stress and benchmark changes done in the MM code of the Linux kernel. So how come there are other subsystems mentioned in the title? Well, fact is the tool has evolved a lot over the years, and it is now possible to use it for performance measuring in a much broader sense. It can run a lot of different benchmarks and it can even run them inside multiple Virtual Machines at the same time, and keeping their iteration synchronized. In this presentation, we will show how MMTests can be very useful for scheduling development and benchmarking. And not only because it can run several typical scheduler benchmarks, but because it is also integrated very well with monitoring and tracing tool such as ftrace and perf.	30
6	Dhaval Giani / Dario Faggioli	Oracle/SUSE		dhaval.giani@oracle.com; dfaggioli@suse.com	Core Scheduling: Current State	Core Scheduling is a feature requested by multiple users, looking to tackle the issues due to Hardware bugs, let's get the group updated with the effort	50
7	Dhaval Giani / Dario Faggioli	Oracle/SUSE		dhaval.giani@oracle.com; dfaggioli@suse.com	Core Scheduling: Performance results	Lots of us have been running core scheduling patches through various test suites, in various environments and configurations. This is an opportunity to share some of those results.	50
8	Dhaval Giani / Dario Faggioli	Oracle/SUSE		dhaval.giani@oracle.com; dfaggioli@suse.com	Core Scheduling: How do we upstream?	What is the roadmap to getting core scheduling upstreamed? What are blockers, and who can assist with what?	50
9	Ionela Voinescu	Arm		ionela.voinescu@arm.com	Improve (Frequency Invariant Engine) FIE interfaces in the context of multiple data sources on multiple architectures	Since the introduction of frequency invariance its use has extended to multiple architectures (arm, arm64, riscv, x86) using multiple data sources (cpufreq and counters, future: firmware notifications) but the implementation and interfaces have not evolved. This results in both improper support for asymmetric systems, difficulty in data source selection and duplication of what can be generic code. The talk will explore each of the shortcomings and potential ways to fix them.	30
10	Qais Yousef Valentin Schneider	Arm		qais.yousef@arm.com valentin.schneider@arm.com	Capacity inversion / utilization inheritance	Suppose a scenario with two tasks A (util_avg=800) running on CPU0 (capacity=1024), and task B (util_avg=200) running on CPU1 (capacity=512). If task B acquires a lock that then task A contends on, the progress of task A will be dictated by the progress of task B, which is running at a much lower performance level (slower CPU, potentially lower frequency). This has been called "capacity inversion". The talk shall explore the scope of the problem and potential ways to fix it.	30
11	Douglas Raillard Valentin Schneider	Arm		douglas.raillard@arm.com valentin.schneider@arm.com	Dealing with new tasks in CFS	Figuring out how to initialize the PELT signal of newly forked tasks is not a straightforward affair. We currently settle on half of the spare capacity of the CPU on which the new task is created - this is a completely arbitrary value, but arguably there is no sane value to choose at this point in time. However, this arbitrary initial value has a long-lasting impact on the PELT signal of the task, as it takes ~300ms to decay completely. The talk will present the impact of this initial value and potential tweaks to it.	30
12	Abhishek Goel	IBM		abhisgoe@in.ibm.com	Pseudo cpuidle driver for evaluating future platform idle states and cpuidle governors	In this talk we would like to discuss the implementation of a pseudocpuidle driver, that will allow the user to define custom idle states with the desired properties such as latency and residencies. The concept was suggested by Rafael Wysocki in the mailing list [1].This pseudo cpuidle driver is aimed at the following usecases: 1) To test the behaviour of cpuidle governors for different combinations of cpuidle states. This is useful for validating the behaviour of new enhancements to cpuidle governors across different platforms, since the idle states of those platforms can be modelled using the pseudocpuidle driver. 2) To evaluate efficacy of platform idle states before the processor is fabricated. Example: This would be useful to understand the manner in which introduction of new platform idle states with certain latencies and residencies will impact existing workloads. The focus of this talk would be to discuss the interfaces in which the pseudo idle states should be defined by the user, and the manner in which the latencies and the residency should be modelled. It will also be to solicit any additional features that can be included in this driver that can help debug the behaviour of the governors/platform idle states better. [1] https://lkml.org/lkml/2019/12/17/655	30
13	Giovanni Gherdovich	SUSE		ggherdovich@suse.cz	Frequency scaling in the datacenter: The case for a more aggressive intel_pstate/powersave	The server space is, in large part, populated by Intel x86 deployments. In this landscape frequency scaling is synonym with the "powersave" governor from the intel_pstate driver. When intel_pstate/powersave fails to meet customers expectations for a workload, the entire concept of frequency scaling takes a hit: experience shows that instead of requesting vendors and upstream for more tuning efforts, "just set it to 'performance'" begins to appear in tuning recommendations and technical blog posts, distributions may start shipping with the 'performance' governor set by default and so forth. In most server applications it is anti-economical (and environmentally irresponsible) to run CPUs always at top speed. Yet, for server users to find frequency-scaling appealing, intel_pstate/powersave must compromise a little on the side of power efficiency or otherwise it will be confined to mobile consumer electronics; that would be such a missed opportunity for server deployments. We'll present a number of out-of-tree intel_pstate patches we've been carrying in SLES and openSUSE, among which a boost mechanism for CPUs coming out from idle, without which our customers won't ever consider the governor 'powersave' as a viable option, and discuss a possible way forward to send them upstream.	30
14	Lukasz Luba	Arm		lukasz.luba@arm.com	Energy Model for devices	Currently the Energy Model (EM) framework supports CPUs only. It is used in Energy-Aware Scheduler (EAS) and from v5.4 in thermal CPUFreq cooling device. There is a patch set which aims to add Devfreq devices in EM. Thanks to this the devfreq cooling device code can be simplified and a lot of duplications removed. Example power model for a GPU driver with driver's specyfic em_callback function (which provides simple power model) will be presented. In addition, the changes in devfreq_cooling and example of usage of the power model in the power budget calculation is going to be shown. Patch set: https://lkml.org/lkml/2020/2/6/377	30
15	Juri Lelli	Red Hat		juri.lelli@redhat.com	RT and BPF: friend or foe?	Patches for fixing BPF on RT has been proposed and reasonably well accepted (even if not merged yet). Report about current status and some experiments showing how BPF can potentially be useful on RT systems as well. Discuss what works and what doesn't ye, highlighting next steps.	30
16	Vincent Guittot	Linaro		vincent.guittot@linaro.org	A better detection of overloaded group and waiting tasks	Replace runnable_load_avg which track the load of runnable tasks by a simple runnable_avg to track the waiting time of task and use it to improve the classification of rq and task	30
17	Ambroise Vincent	Arm		ambroise.vincent@arm.com	Idle state selection and active load balance	The scheduler currently has no way to communicate with the idle framework. This is specifically a problem during the active load balancing process. In a situation where a CPU A tries to pull a task currently running on a CPU B, CPU A will send an IPI to CPU B to kick the stopper. When CPU A doesn't have other tasks to execute, the idle governor will be called to choose which idle state to enter while waiting for CPU B's answer. The idle governor often makes a wrong decision since it doesn't know that an IPI is expected very soon from CPU B, resulting in lost energy and performance. Presentation of the problem and potential ways to solve it.	30
18	Andrea Righi	Canonical		andrea.righi@canonical.com	Power management and hibernation in the cloud	Hibernation is always considered as the "sleep feature" for laptops. But recently this feature is showing up also in cloud computing environments, as a fast way to warm up instances as-needed and add them quickly to production, preventing over-provisioning. This talk aims to analyze the current PM/hibernation implementation in Linux, with a special focus on how to trace and highlight potential issues in terms of reliability and performance.	30
19	Qais Yousef	Arm		qais.yousef@arm.com	RT Capacity Awareness	Using util_clamp we can bias the performance level requested by the RT task, hence influence frequency and CPU selection. This brings up new challenges like dealing with conflicts between priority vs capacity requests when the system is overloaded; and managing the default RT behavior of boosting all tasks to max which will cause the big CPUs to be overloaded.	30
20	Vincent Guittot	Linaro		vincent.guittot@linaro.org	Better numa and normal load balance collaboration	current collaboration is quite minimal with Numa balancing preventing the normal load balancer to migrate tasks on "wrong" node and normal load balancer allowing a small degree of imbalance between node but the collaboration could be more tight especially when node are not overloaded
21	Chris Redpath Lukasz Luba Vincent Donnefort	Arm		chris.redpath@arm.com lukasz.luba@arm.com vincent.donnefort@arm.com	SoC vendor PM changes in the kernel	Normally SoC vendors heavily modify the mainline Linux kernel to be able to meet their targets in terms of performance and power consumption. This talk intends to present some of those modified areas to potentially replace the proprietary implementation behind them with existing or new Linux kernel mainline functionality.	50
22	Dietmar Eggemann	Arm		dietmar.eggemann@arm.com	The latency nice use case for Energy-Aware-Scheduling (EAS) in Android Common Kernel (ACK)	An ACK specific implementation of the latency nice concept is used within EAS to prefer returning an idle instead of the most energy-efficient CPU for certain tasks. The classification of those task is done via task groups. This talk will discuss how this proprietary solution can be replaced by the proposed latency nice mainline solution with a focus on the missing per task-group interface.	30
23	Michal Sojka Cláudio Maia	CVUT, ISEP		michal.sojka@cvut.cz CLRRM@isep.ipp.pt	Testing thermal-aware scheduling for avionics applications	Many applications have stringent thermal requirements but cannot use fans or massive heat sinks. Mobile phones are a typical example. Embedded avionics applications have similar constraints dictated by size, weight and power constraints but also by the additional potential failure sources that those components may introduce into the system, which is hardly compatible with avionics systems' dependability requirements. As avionics systems must enforce critical timing and safety guarantees, it is crucial to implement thermal/power management techniques that are compatible with those. Although avionics applications cannot run Linux for safety reasons, Linux is often used for prototyping and HW platform evaluation. We present our test-bench for analysis of thermal properties and how these are influenced by platform configuration, task scheduling and task-to-core mapping. The test-bench environment consists of popular multi-core platforms (such as NXP i.MX8 and NVIDIA Tegra X2) and a thermal camera. Together with our software tooling and a set of CPU and GPU benchmarks, we can efficiently measure the effect of various workloads, scheduling, and power-management decisions. We will present preliminary results obtained with our test-bench. The goal is to gather feedback from the audience and learn how our tooling can be useful to others.	30
24	José Martins Sandro Pinto	Universidade do Minho, Portugal		josemartins90@gmail.com	Bao - a lightweight static partitioning hypervisor	Virtualization is already a key-enabling technology for mixed-criticality embedded systems. Open-source hypervisors such as KVM or Xen were not originally tailored for embedded constraints and real-time requirements, and depend on Linux, resulting in large TCBs and wide attack-surfaces. Furthermore, they do not address the numerous microarchitectural contention points and side-channels that have been shown to break true VM isolation. Bao is a lightweight, open-source embedded hypervisor that aims at providing strong isolation and real-time guarantees. Similarly to Jailhouse, it is a partitioning hypervisor leveraging hardware virtualization support; unlike Jailhouse, it is completely self-sufficient, not depending on Linux. Currently supporting ARMv8 and RISC-V, Bao was developed from scratch to provide a minimal clean-slate and industry-grade solution and to engage both academia and industry in tackling the challenges of modern automotive and industrial systems.	30
25
26
27
28							4
29							16
30							1
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100