Machine Learning for Systems
Machine Learning Day
AI/Machine Learning/Deep Learning
TimeWednesday, June 19th9:15am - 10am CEST
DescriptionI will present some of our recent work at the intersection of machine learning and systems. First, I discuss our work on the sparsely-gated mixture of experts, a neural architecture that allows training models with 130B+ parameters (10x larger than any previous model) on datasets with 100B+ examples. This architecture uses an intelligent gating mechanism that routes input examples to a subset of the modules (“experts”) within the larger model. This model runs 2-3x faster than top-performing baselines and sets a new state of the art in machine translation and language modeling. Next, I discuss our work on deep reinforcement learning models that learn to do resource allocation, a combinatorial optimization problem that repeatedly appears in computer systems. Our method is end-to-end and abstracts away the complexity of the underlying optimization space; the RL agent learns the implicit tradeoffs between computation and communication of the underlying resources and optimizes the allocation using only the true reward function (e.g., the runtime of the generated allocation). The complexity of our search space is on the order of 9^80000, compared to 10^360 states for Go (solved by AlphaGo). Finally, I discuss our work on deep models that learn to find solutions for the classic problem of balanced graph partitioning with minimum edge cuts. Our method enables generalization; we can train models that produce performant partitions at inference time on unseen graphs. The generalization significantly speeds up the partitioning process over all existing baselines which solve the problem from scratch for each new graph.
Senior Research Scientist