1 of 8

2 of 8

Agenda

  • New v3.8.3 patch release
  • Progress on the oneAPI backend
  • Update documentation style
  • Fast Math
  • Misc internal updates

3 of 8

ArrayFire v3.8.3

  • Support for CUDA 12
    • Updated minimum supported Toolkit to 10.2
  • Improved performance in JIT, memcpy, and join
  • OpenCL 3.0 migration

4 of 8

ArrayFire oneAPI Backend

  • Many standalone kernels ported
  • JIT not ported
    • JIT will need to be implemented for PTX and OpenCL
    • Other backends may require SPIRV
    • Will reuse a lot of existing infrastructure
  • Memory manager not ported. Currently directly allocating
  • Many changes already in the master branch

5 of 8

ArrayFire oneAPI Backend

  • JIT reuse doesn’t seem workable
    • Lots of minor differences between backends prevent this
    • Could be addressed in the future but dropping for now
    • Only targeting OpenCL based kernels. Expand to CUDA after

6 of 8

ArrayFire Fast Math

  • A CMake flag to enable fast-math optimizations
  • Enables compiler flags that use less precise but faster hardware units
  • Performs the following operations:
    • Passes –use_fast_math to NVCC and NVRTC
    • Enables TF32 Ops and Atomics in cuBLAS
    • Passes -cl-fast-relaxed-math in OpenCL
    • Enables less precise flags in all compilers
  • Address several issues due to lack of inf and nan in some fast math operations
  • Should be done at runtime

7 of 8

Internal Misc changes

  • CMake refactor to 3.10.2
    • Uses CUDA language support in CMake
  • Improved vcpkg, and spack support
  • Update GTest which now show skipped tests
  • Added support for Intel compilers
  • Put all internal symbols in the arrayfire namespace to address issues in new cuda and oneAPI symbols

8 of 8

Documentation Improvements