Part 1: Multi-Task Learning
Iddo Drori Joaquin Vanschoren
MIT TU Eindhoven
AAAI 2021
https://sites.google.com/mit.edu/aaai2021metalearningtutorial
Meta Learning Tutorial
Multi-Task Learning (MTL) Agenda
Multi-Task Learning (MTL) Progress and Motivation
Learning 57 Atari Games
Source: Human-level control through deep reinforcement learning, Mnih et al, Nature 2015
Progress in Atari Games
2015 2018
Montezuma’s revenge and pitfall were at random performance in 2015 and super human in 2018, all 57 games are at super-human performance in 2020
Learning 57 Fields
Source: Measuring Massive Multitask Language Understanding, Hendrycks et al, 9.7.2020
Expected Progress in Learning 57 Fields
2020 2023
2020: Learning US Foreign policy performance is at 70%.
College Chemistry and Physics are the hardest being slightly above random performance using GPT-3.
Learning machine learning has slightly better performance.
Expected progress:
College Chemistry and Physics will be superhuman in 2023. All fields will be super-human in 2025.
Learning to learn courses is already happening.
EE
"In an SR latch built from NOR gates, which condition is not allowed","S=0, R=0","S=0, R=1","S=1, R=0","S=1, R=1",D
"In a 2 pole lap winding dc machine , the resistance of one conductor is 2Ω and total number of conductors is 100. Find the total resistance",200Ω,100Ω,50Ω,10Ω,C
"The coil of a moving coil meter has 100 turns, is 40 mm long and 30 mm wide. The control torque is 240*10-6 N-m on full scale. If magnetic flux density is 1Wb/m2 range of meter is",1 mA.,2 mA.,3 mA.,4 mA.,B
"Two long parallel conductors carry 100 A. If the conductors are separated by 20 mm, the force per meter of length of each conductor will be",100 N.,0.1 N.,1 N.,0.01 N.,B
A point pole has a strength of 4π * 10^-4 weber. The force in newtons on a point pole of 4π * 1.5 * 10^-4 weber placed at a distance of 10 cm from it will be,15 N.,20 N.,7.5 N.,3.75 N.,A
Source: Measuring Massive Multitask Language Understanding, Hendrycks et al, 9.7.2020
Multi-Task Learning (MTL)
Multi-Task Learning
task
data
task
data
task
data
predictor
multi-task
learning
algorithm
predictor
predictor
Task A
Task B
Task C
Multi-Task Learning: Self Driving Cars
Source: Tesla AutoPilot
Multi-Task Learning: Edge Devices
Multi-Task Learning (MTL) Architectures
Multi-Task Learning (MTL) Questions
Multi-Task Learning
Shared backbone for multiple tasks with multiple heads
Task B
Task A
Task C
i
i
i
task specific layers
shared backbone network
Shared backbone for multiple tasks with multiple heads
Task B
Task A
Task C
task specific layers
shared backbone network
i
i
Linear Scalarization for MTL
min𝜽ℒ(𝜽) = ∑𝒕 𝜶𝒕ℒ𝒕(𝜽)
Linear Scalarization
min𝜽ℒ(𝜽) = ∑𝒕 𝜶𝒕ℒ𝒕(𝜽)
Linear Scalarization
min𝜽ℒ(𝜽) = ∑𝒕 𝜶𝒕ℒ𝒕(𝜽)
two solutions 𝜽 and 𝜽‘ s.t. ℒ𝒕1(𝜽s,𝜽𝒕1) < ℒ𝒕1(𝜽’s,𝜽’𝒕1) and ℒ𝒕2(𝜽s,𝜽𝒕2) > ℒ𝒕2(𝜽’s,𝜽’𝒕2) for tasks 𝒕1 and 𝒕2
Shared backbone for multiple tasks with multiple heads
Task B
Task A
Task C
task specific layers
shared backbone network
Shared backbone for multiple tasks with multiple heads
Task B
Task A
Task C
task specific layers
shared backbone network
Individual network for each task
Task B
Task A
Task C
Individual network for each task
Task B
Task A
Task C
Negative Transfer
Negative Transfer
Multi-Task Learning and Adversarial Attacks
MTL Architectures
Architectures
Hard Parameter Sharing
Task B
Task A
Task C
task specific layers
hard sharing backbone layers
Multi-Objective Optimization
minℒ(𝜽s,𝜽1,...,𝜽T) = min(ℒ1(𝜽s,𝜽1),...,ℒT(𝜽s,𝜽T))
𝜽s,𝜽1,...,𝜽T
𝜽s,𝜽1,...,𝜽T
Multi-Objective Optimization
𝒇(𝒙) = (𝒇1(𝒙),...,𝒇T(𝒙))
𝒇𝒕(𝒙): ℝ -> ℝ
n
T
n
Pareto Optimal
n
Pareto Frontier
Source: Wikipedia
Pareto Stationary
𝜶𝒕 >= 0, ∑𝒕 𝜶𝒕 = 1 and ∑𝒕 𝜶𝒕 ∇𝒇𝒕(𝒙) = 0
T
MTL Algorithm
min { ||∑𝒕 𝜶𝒕 ∇𝒇𝒕(𝒙)|| | ∑𝒕 𝜶𝒕 = 1, 𝜶𝒕 >= 0 for all 𝒕 }
min { ||∑𝒕 𝜶𝒕 ∇𝜽sℒ𝒕(𝜽s, 𝜽𝒕)|| | ∑𝒕 𝜶𝒕 = 1, 𝜶𝒕 >= 0 for all 𝒕 }
𝜶1,...,𝜶T
𝜶1,...,𝜶T
Source: Multi-task learning as multiobjective optimization, Sener and Koltun, 2018
Soft Parameter Sharing
Task B
Task A
Task C
task specific layers
soft sharing backbone
Ad-hoc Sharing
Task B
Task A
Task C
task specific layers
shared backbone
Learn Shared Architecture
Source: Learning to branch for multi-task learning, Guo et al, 2020
Layer Routing
Source: AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning, Sun et al, 2019
Taskonomy Dataset
Source: Taskonomy: Disentangling Task Transfer Learning, Zamir et al, 2018
Transfer Relationships between tasks
Source: Taskonomy: Disentangling Task Transfer Learning, Zamir et al, 2018
Multi-Task Learning
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
Multi-Task Learning
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
transfer learning affinities
MTL learning affinities
MTL: Combinatorial Optimization Problem
MTL: Combinatorial Optimization Problem
1
2
3
t1
t2
t3
t4
t5
tasks
networks
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
MTL: Combinatorial Optimization Problem
1
2
3
t1
t2
t3
t4
t5
tasks
networks
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
MTL: Combinatorial Optimization Problem
1
2
3
t1
t2
t3
t4
t5
tasks
networks
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
MTL: Combinatorial Optimization Problem
infinity if network does not solve task
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
MTL: Combinatorial Optimization Problem
L(S, ti) = min(n in S) L(n, ti)
L(S) = ∑(ti in T) L(S, ti)
S* = argmin S:cost(S)<b L(S)
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
Multi-Task Learning
Source: Which Tasks Should Be Learned Together in Multi-task Learning? Standley et al, 2020
Meta Learning Tutorial
Iddo Drori Joaquin Vanschoren
MIT TU Eindhoven
AAAI 2021
https://sites.google.com/mit.edu/aaai2021metalearningtutorial