STRUCTURED KERNELS
A better (more structured) way to write kernels!
Hackathon: The Plan
Contents
WHAT ARE THEY?
What are they?
Benefit: Shape Checking
* Exact meta tensor user API is subject to change
Benefit: Shape Checking – Under the Hood
* Exact meta tensor user API is subject to change
Benefit: Shape Checking – Potential Use Cases
* Exact meta tensor user API is subject to change
HOW TO WRITE A NEW OPERATOR: RECAP
How to write a new operator: Recap
How to write a new operator: Recap
How to write a new operator: Recap
Has to handle many things:
We could call at::native::empty_cpu here for better perf (but different for cuda!)
MAKING IT STRUCTURED
Making it Structured
Making it Structured
Making it Structured
WHAT’S GOING ON UNDER THE HOOD
What’s going on under the hood
These are class methods
What’s going on under the hood
at::impl::MetaBase
at::meta::upsample_nearest1d
….
Meta() defined per op!
These are class methods
aten/src/ATen/TensorMeta.h
build/aten/src/ATen/MetaFunctions.h (codegen’d)
What’s going on under the hood
at::impl::MetaBase
at::meta::upsample_nearest1d
at::native::structured_upsample_nearest1d_out_cpu
at::native::structured_upsample_nearest1d_out_cuda
Impl() defined per backend!
These are class methods
Meta() defined per op!
build/aten/src/ATen/MetaFunctions.h (codegen’d)
aten/src/ATen/TensorMeta.h
build/aten/src/ATen/NativeFunctions.h (codegen’d)
What’s going on under the hood
at::impl::MetaBase
at::meta::upsample_nearest1d
at::native::structured_upsample_nearest1d_out_cpu
at::native::structured_upsample_nearest1d_out_cuda
structured_upsample_nearest1d_out_cpu_out
structured_upsample_nearest1d_out_cpu_functional
structured_upsample_nearest1d_out_cuda_out
structured_upsample_nearest1d_out_cuda_functional
Set_output() defined for functional/inplace/out!
These are class methods
Impl() defined per backend!
Meta() defined per op!
aten/src/ATen/TensorMeta.h
build/aten/src/ATen/MetaFunctions.h (codegen’d)
build/aten/src/ATen/NativeFunctions.h (codegen’d)
build/aten/src/ATen/RegisterCPU.cpp
(codegen’d)
set_output() is defined in codegen.
Handles empty_cpu / empty_cuda / resize_output
Under the hood: Dispatcher Registration
In build/aten/src/Aten/RegisterCPU.cpp
Before (unstructured kernel)
In build/aten/src/Aten/RegisterCPU.cpp
After (structured kernel)
Under the hood: TensorIterator Integration
Under the hood: TensorIterator Integration
at::impl::MetaBase
at::meta::add_Tensor
at::native::structured_add_out
structured_add_out_out
structured_add_out_functional
structured_add_out_inplace
structured_add_out_out
structured_add_out_functional
structured_add_out_inplace
cpu
cuda
build/aten/src/ATen/RegisterCPU.cpp
(codegen’d)
build/aten/src/ATen/RegisterCUDA.cpp
(codegen’d)
at::TensorIteratorBase
at::TensorIterator
Implements meta() (you write it)
Implements impl() (you write it)
Implements set_output() (codegen’d)
Method on TensorIteratorBase
Under the hood: Shape Computation
In build/aten/src/Aten/RegisterMeta.cpp
Skipped call to op.impl()
CURRENT STATE
Current State
Current State
QUESTIONS?