JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 8

2 of 8

Agenda

New v3.8.3 patch release
Progress on the oneAPI backend
Update documentation style
Fast Math
Misc internal updates

3 of 8

ArrayFire v3.8.3

Support for CUDA 12

Updated minimum supported Toolkit to 10.2

Improved performance in JIT, memcpy, and join
OpenCL 3.0 migration

4 of 8

ArrayFire oneAPI Backend

Many standalone kernels ported
JIT not ported

JIT will need to be implemented for PTX and OpenCL
Other backends may require SPIRV
Will reuse a lot of existing infrastructure

Memory manager not ported. Currently directly allocating
Many changes already in the master branch

5 of 8

ArrayFire oneAPI Backend

JIT reuse doesn’t seem workable

Lots of minor differences between backends prevent this
Could be addressed in the future but dropping for now
Only targeting OpenCL based kernels. Expand to CUDA after

6 of 8

ArrayFire Fast Math

A CMake flag to enable fast-math optimizations
Enables compiler flags that use less precise but faster hardware units
Performs the following operations:

Passes –use_fast_math to NVCC and NVRTC
Enables TF32 Ops and Atomics in cuBLAS
Passes -cl-fast-relaxed-math in OpenCL
Enables less precise flags in all compilers

Address several issues due to lack of inf and nan in some fast math operations
Should be done at runtime

7 of 8

Internal Misc changes

CMake refactor to 3.10.2

Uses CUDA language support in CMake

Improved vcpkg, and spack support
Update GTest which now show skipped tests
Added support for Intel compilers
Put all internal symbols in the arrayfire namespace to address issues in new cuda and oneAPI symbols

8 of 8

Documentation Improvements