These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The adaptive momentum method (AdaMM), which uses past gradients to update
descent directions and learning rates simultaneously, has become one of the
most popular first-order optimization methods for solving machine learning
problems. However, AdaMM is not suited for solving black-box optimization
problems, where explicit gradient forms are difficult or infeasible to obtain.
In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that
generalizes AdaMM to the gradient-free regime. We show that the convergence
rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor
of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$
is problem size. In particular, we provide a deep understanding on why
Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type
methods. As a byproduct, our analysis makes the first step toward understanding
adaptive learning rate methods for nonconvex constrained optimization.
Furthermore, we demonstrate two applications, designing per-image and universal
adversarial attacks from black-box neural networks, respectively. We perform
extensive experiments on ImageNet and empirically show that ZO-AdaMM converges
much faster to a solution of high accuracy compared with $6$ state-of-the-art
ZO optimization methods.