FBGEMM: High-Performance Low-Precision Library for Deep-Learning Inference
AI/Machine Learning/Deep Learning
Performance Analysis and Optimization
TimeWednesday, June 19th2:52pm - 3:15pm
DescriptionDeep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed FBGEMM, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. FBGEMM achieves efficiency by fusing common quantization operations with a high-performance GEMM implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2× performance gains with respect to our current production baseline.