(PhD01) A Fast Multipole Method for Training Neural Networks
Event Type
PhD Forum
AI/Machine Learning/Deep Learning
Math Library Design
Parallel Algorithms
TimeMonday, June 17th1pm - 6pm CEST
DescriptionWe propose a fast multipole method for the fast approximation and factorization of the Gauss-Newton Hessian for fully-connected multilayer perceptron and convolutional neural networks. We use a block-low rank approximation scheme that is inspired by methods for N-body problems in computational physics. In addition, we propose precomputation and sampling algorithms that reduce the complexity of the overall scheme. For a net with N weights, an average layer dimension d, and batch size n, the Gauss-Newton Hessian requires O(N 2 n) work and O(N 2 ) memory. By introducing a controllable approximation error, our method requires only O(dnN ) work and O(N (n + r_o )) memory, where r o is the rank of the off-diagonal blocks of the Gauss-Newton Hessian. After construction, the cost of factorization of the regularized Gauss-Newton Hessian is O(N r_o^2 ), which results in an improvement of several orders of magnitude compared to the standard O(N^3 ) complexity for factorization of dense matrices of this size. If a global low-rank approximation of the Gauss-Newton Hessian is used, then we can construct an approximation and a factorization in O(N r 2 ) work and O(N r) memory. Our method becomes preferable when r_o
Poster PDF