These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
A major challenge in training large-scale machine learning models is
configuring the training process to maximize model performance, i.e., finding
the best training setup from a vast design space. In this work, we unlock a
gradient-based approach to this problem. We first introduce an algorithm for
efficiently calculating metagradients -- gradients through model training -- at
scale. We then introduce a "smooth model training" framework that enables
effective optimization using metagradients. With metagradient descent (MGD), we
greatly improve on existing dataset selection methods, outperform
accuracy-degrading data poisoning attacks by an order of magnitude, and
automatically find competitive learning rate schedules.