Batch-efficient EigenDecomposition for Small and Medium Matrices
Yue Song, Nicu Sebe, Wei Wang
University of Trento
Background
EigenDecomposition (ED) is at the heart of many computer vision algorithms. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep networks.
In this paper, we propose a QR-based batch-efficient ED method dedicated to the application scenarios of computer vision, which fully utilizes the parallel computational power of GPUs.
Methodology
Our batch-efficient ED solver dedicated to small matrices consists of two steps:
where is tri-diagonal matrix
where successive iterations are performed
Batched Householder Reflection
By using Householder reflector , , the matrix is reduced to tri-diagonal form as:
Each Householder reflection can be calculated as:
The whole process can be implemented via matrix-vector product, which is GPU- and batch-efficient
Batched QR Iteration
The QR iteration can be efficiently performed by Givens rotation as:
The diagonal matrix is calculated as:
The complexity of basic QR is O(n^5), which limits the application only on tiny matrices.
Acceleration: Double Wilkinson Shifts
Since the cvgc of QR depends on , it is natural to shift the matrix by such that the new cvgc is accelerated to . . We extract the shift coefficients from the 2×2 block on the right bottom corner:
After attaining the shifts, we can reformulate the QR iterations with double shifts:
The shifting operation makes , , , which makes the cvgc speed of a mini-batch of matrices consistent.
One QR iteration can reduce at least one column and row.
Acceleration: Progressive Dimension Shrinkage
We check the cvgc of bottom right corner by:
If the values approach zero, we can shrink the matrix as:
This is a direct benefit of Wilkinson shifts.
Acceleration: Economic Eigenvector Computation
The basic QR iteration is calculated as:
The theorem implies that the calculation can be simplified by involving part of columns as:
The computational complexity is greatly reduced as:
Computational Complexity Summary
Our ED solver reduces the time from O(4n^5 ) to O(n^3 ) for computing the eigenvalues, and saves the time from O(2n^5 ) to O( 2/3 n^5 ) for eigenvectors, with the reduction term -(2r+1) n^4
Experiments: Numerical Tests
For matrices whose dimensions are smaller than 24, our Batched ED is consistently faster than the SVD for any batch size. When the matrix dimension is 32, our method is faster than the SVD from batch size 256 on.
Experiments: Decorrelated Batch Normalization
Depending on the number of groups, our method can be 2X faster, 10X faster, and even 28X faster
Experiments: Second-order Vision Transformer
Our method is about 44% and 27% faster than the SVD for covariance in different sizes.
Experiments: Neural Style Transfer
Thanks!