1 of 14

Batch-efficient EigenDecomposition for Small and Medium Matrices

Yue Song, Nicu Sebe, Wei Wang

University of Trento

2 of 14

Background

EigenDecomposition (ED) is at the heart of many computer vision algorithms. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep networks.

In this paper, we propose a QR-based batch-efficient ED method dedicated to the application scenarios of computer vision, which fully utilizes the parallel computational power of GPUs.

3 of 14

Methodology

Our batch-efficient ED solver dedicated to small matrices consists of two steps:

Batched Householder reflections to reduce the matrix into tri-diagonal form.

where is tri-diagonal matrix

Batched Givens rotation based QR iteration to diagonalize the tri-diagoanl matrix.

where successive iterations are performed

4 of 14

Batched Householder Reflection

By using Householder reflector , , the matrix is reduced to tri-diagonal form as:

Each Householder reflection can be calculated as:

The whole process can be implemented via matrix-vector product, which is GPU- and batch-efficient

5 of 14

Batched QR Iteration

The QR iteration can be efficiently performed by Givens rotation as:

The diagonal matrix is calculated as:

The complexity of basic QR is O(n^5), which limits the application only on tiny matrices.

6 of 14

Acceleration: Double Wilkinson Shifts

Since the cvgc of QR depends on , it is natural to shift the matrix by such that the new cvgc is accelerated to . . We extract the shift coefficients from the 2×2 block on the right bottom corner:

After attaining the shifts, we can reformulate the QR iterations with double shifts:

The shifting operation makes , , , which makes the cvgc speed of a mini-batch of matrices consistent.

One QR iteration can reduce at least one column and row.

7 of 14

Acceleration: Progressive Dimension Shrinkage

We check the cvgc of bottom right corner by:

If the values approach zero, we can shrink the matrix as:

This is a direct benefit of Wilkinson shifts.

8 of 14

Acceleration: Economic Eigenvector Computation

The basic QR iteration is calculated as:

The theorem implies that the calculation can be simplified by involving part of columns as:

The computational complexity is greatly reduced as:

9 of 14

Computational Complexity Summary

Our ED solver reduces the time from O(4n^5 ) to O(n^3 ) for computing the eigenvalues, and saves the time from O(2n^5 ) to O( 2/3 n^5 ) for eigenvectors, with the reduction term -(2r+1) n^4

10 of 14

Experiments: Numerical Tests

For matrices whose dimensions are smaller than 24, our Batched ED is consistently faster than the SVD for any batch size. When the matrix dimension is 32, our method is faster than the SVD from batch size 256 on.

11 of 14

Experiments: Decorrelated Batch Normalization

Depending on the number of groups, our method can be 2X faster, 10X faster, and even 28X faster

12 of 14

Experiments: Second-order Vision Transformer

Our method is about 44% and 27% faster than the SVD for covariance in different sizes.