Embrace...� Extend...� Extinguish...
Valentin Churavy�vchuravy@mit.edu
Yet another high-level language?
Dynamically typed, high-level syntax
Open-source, permissive license
Built-in package manager
Interactive development
julia> function mandel(z)� c = z� maxiter = 80� for n = 1:maxiter� if abs(z) > 2� return n-1� end� z = z^2 + c� end� return maxiter� end
julia> mandel(complex(.3, -.6))�14
Yet another high-level language?
Typical features
Dynamically typed, high-level syntax
Open-source, permissive license
Built-in package manager
Interactive development
Unusual features
Great performance!
JIT AOT-style compilation
Most of Julia is written in Julia
Reflection and metaprogramming
Embrace...
Julia is built on the shoulders of giants.
Example UCX.jl
function ucp_put_nb(ep, buffer, length, remote_addr, rkey, cb)
ccall(
(:ucp_put_nb, libucp),
ucs_status_ptr_t,
(ucp_ep_h, Ptr{Cvoid}, Csize_t, UInt64, ucp_rkey_h, ucp_send_callback_t),
ep, buffer, length, remote_addr, rkey, cb)
end
function send_callback(req::Ptr{Cvoid}, status::API.ucs_status_t, user_data::Ptr{Cvoid})
@assert user_data !== C_NULL
request = UCXRequest(user_data)
request.status = status
notify(request)
API.ucp_request_free(req)
nothing
end
function put!(ep::UCXEndpoint, request, data::Ptr, nbytes, remote_addr, rkey)
cb = @cfunction(send_callback, Cvoid, (Ptr{Cvoid}, API.ucs_status_t, Ptr{Cvoid}))
ptr = ucp_put_nb(ep, data, nbytes, remote_addr, rkey, cb)
return handle_request(request, ptr)
end
function put!(ep::UCXEndpoint, buffer, nbytes, remote_addr, rkey)
request = UCXRequest(ep, buffer) # rooted through ep.worker
GC.@preserve buffer begin
data = pointer(buffer)
put!(ep, request, data, nbytes, remote_addr, rkey)
end
end
Introspection and staged metaprogramming
AST
@code_lowered
Code
@edit
@which
LLVM IR
@code_llvm optimize=false
@code_llvm
Typed IR
@code_warntype
@code_typed optimize=false
@code_typed
@code_native
Assembly
LLVM.jl @asmcall
llvmcall
LLVM.jl
Generated functions
Cassette.jl passes
Macros
String macros
Julia my favorite LLVM frontend
Extend: Adding capabilities
The PSAAP-III Center (cesmix.mit.edu) is an example of using Julia for the outer loop, while developing new capabilities.
(Language independent)
Automatic Gradient Synthesis
10
Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
Moses, and Churavy, NeuroIPS 2020
% Slowdown vs. Enzyme
https://enzyme.mit.edu/
LAMMPS.jl — Prototype for extendability
using LAMMPS
# ...
command(lmp, "fix julia_lj all external pf/callback 1 1")
function compute_force(rsq, itype, jtype)
coeff = coefficients[itype][jtype]
r2inv = 1.0/rsq
r6inv = r2inv^3
lj1 = coeff[1]
lj2 = coeff[2]
return (r6inv * (lj1*r6inv - lj2))*r2inv
end
function compute_energy(rsq, itype, jtype)
# ...
return (r6inv * (lj3*r6inv - lj4))
end
# Register external fix
lj = LAMMPS.PairExternal(lmp, "julia_lj", "zero", compute_force, compute_energy, cutoff)
Extinguish
eth-cscs/ImplicitGlobalGrid.jl
gets its Power from Extensible Compiler Design
13
Language design
Efficient execution
Julia: Dynamism and Performance�Reconciled by Design (doi:10.1145/3276490)
AST
IR
xPU back end
Effective Extensible Programming: Unleashing Julia on GPUs (doi:10.1109/TPDS.2018.2872064)
CPU
GPU
GPU
14
Why Julia for HPC?�Walks like Python, talks like Lisp, runs like Fortran
AST
IR
xPU back end
CUDA.jl
oneAPI.jl
Metal.jl
AMDGPU.jl
Rich GPU Ecosystem
HPC suffers from the many language problem;
Domain experts and performance engineers use
different programming languages:
Communication and Collaboration bottleneck
Magic of Julia
Abstraction, Specialization, and Multiple Dispatch
rand(N, M) * rand(K, M)'
Matrix * Transpose{Matrix}
function mul!(C::Matrix{T}, A::Matrix{T}, tB::Transpose{<:Matrix{T}}, a, b) where {T<:BlasFloat}
gemm_wrapper!(C, 'N', 'T', A, B, MulAddMul(a, b))
end
15
Did I really need to move memory for that transpose?
No I did not! I know ABT is the dot product of every row of A with every row of B .
compiles to
Parallel programming with 3 character changes: Array type programming model
Array types — where memory resides and how code is executed
�
16
A = Matrix{Float64}(64,32) | CPU (Intel, IBM, Apple) |
A = CuMatrix{Float64}(64,32) | NVIDIA (CUDA) GPU |
A = ROCMatrix{Float64}(64,32) | AMD (ROCm) GPU |
Distribute(Blocks(16, 4), A) | Across nodes of a cluster;�orthogonal to the types above |
composes with
Array programming
using LinearAlgebra
loss(w,b,x,y) = sum(abs2, y - (w*x .+ b)) / size(y,2)�loss∇w(w, b, x, y) = ...�lossdb(w, b, x, y) = ...
function train(w, b, x, y ; lr=.1)� w -= lmul!(lr, loss∇w(w, b, x, y))� b -= lr * lossdb(w, b, x, y)� return w, b�end
n = 100; p = 10�x = randn(n,p)'�y = sum(x[1:5,:]; dims=1) .+ randn(n)'*0.1�w = 0.0001*randn(1,p)�b = 0.0
for i=1:50� w, b = train(w, b, x, y)�end
x = CuArray(x)
y = CuArray(y)
w = CuArray(w)
17
Sustain
Examples:
Julia’s development cycle
https://julialang.org/blog/2023/04/julia-1.9-highlights
https://julialang.org/blog/2019/08/release-process
Celeste.jl
CLIMA
CUDA
CESMIX
AMDGPU
oneAPI
Metal.jl
Ecosystem verification
CI/Benchmarking
Packages all have similar structure.
95% of Julia packages in the registry had some form of CI (youtube.com/watch?v=9YWwiFbaRx8)
Packages often test against Julia nightly
Benchmarks:
Who develops Julia?
https://julialang.org/blog/2019/02/julia-entities/
JuliaLab@MIT: Mixed CS/Applied Mathematics research lab
JuliaHub nee JuliaComputing
Community of individuals, research groups and companies
Industry, academic, and national lab partners have brought these improved compiler tools to production
Automating parallelism and performance is transforming all collaborations
Faster Drug Development
More efficient batteries
Energy Efficient Buildings
Climate modeling for improved agriculture
JuliaCon: Yearly user and developer meetup
�25th to 29th of July, 2023�Cambridge, MA
https://juliacon.org/2023
Monthly HPC user group call
4th Tuesday of the Month at 2pm EST (next May 23rd)� Details: https://julialang.org/community/ — Events
26