1 of 1

Error Estimation for Random Fourier Features

Scientific Achievement

Our work develops a bootstrap approach to estimate the errors of Random Fourier Features (RFF) approximations. Advantages of our approach are: (1) error estimates are specific to the problem at hand; and (2) the approach enables adaptive computation, so that the user can quickly inspect the error of a rough kernel approximation and then predict how much extra work is needed.

Significance and Impact

Developing and improving methods for error estimation and uncertainty quantification for machine learning (ML) models is highly relevant to the DOE’s mission, and for society. Our proposed error estimation method will be applicable to a broad range of problems which involve kernel methods.

This figure illustrates a hypothetical scenario where it is possible to track the random variable that represents the RFF approximation error of as the number of features is increased over a grid ranging from 1 to 1600. The result is displayed with the red curve. Similarly, by repeating this experiment many times, a large collection of such random curves can be generated, and these are displayed in blue. In addition, the 90% quantile of the curves at each value of s is plotted in black. Hence, if the user had access to the black curve it would be possible to know if a given number of features is adequate. Despite the fact that none of the curves in Figure 1 are available to the user in practice, our work will show that, that we can compute a close approximation for the black curve.

Technical Approach

Conceptually, the proposed bootstrap method for error estimation is based on generating a collection of ``pseudo error variables'' that behave approximately like i.i.d. samples of the true unknown error variable.

Once the pseudo error variables have been generated, their empirical (1-α)-quantile can then be used to define the error estimate.

Yao, Junwen, N. Benjamin Erichson, and Miles E. Lopes. "Error Estimation for Random Fourier Features." In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics.

approximation error

number of features, s

TALKING POINTS:

Kernel methods are widely used in Scientific Machine Learning. However, kernel method have limited scalability when they are applied to large datasets in a direct manner.
Random Fourier Features is among the most popular and broadly applicable approaches for scaling up kernel methods.
The core idea of RFF is to avoid direct computations on a large kernel matrix by working more efficiently with an approximation built from “randomly sampled features”.
As a result of this approximation, RFF involves an inherent tradeoff between computational cost and accuracy.
However, managing this tradeoff in practice is complicated by the fact that the user does not know the actual error of the approximation, or how this error may jeopardize downstream results.
In addition, this uncertainty about error can lead the user to sample far more features than are really necessary, which erodes the computational gains of RFF.
To overcome this challenge, we develop a systematic way to numerically estimate the errors of RFF approximations.
Our error estimates are fully-data driven, and hence tailored to the inputs in a given problem. This bypasses practical limitations of worst-case error bounds.
The error estimates enhance the computational efficiency of RFF, by guiding the user to choose a number of features that is just enough for a preferred error tolerance.

METADATA:

Name of the associated awarded project:

PI name(s):

Name of the program manager:

CITATIONS:

Yao, Junwen, N. Benjamin Erichson, and Miles E. Lopes. "Error Estimation for Random Fourier Features." In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics.
Github repository: https://github.com/jwyyy/bootstrappedRFF

AWARDS:

Nominated as an oral at AISTATS 2023 (only 32 out of 1689 submissions have been selected as orals).

REPRODUCIBILITY:

Our work is fully reproducible.

BACKGROUND: