Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-start

TOP Literature Database Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-start

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2202.03397

PDF

https://arxiv.org/pdf/2202.03397

Paper Information

Author: Riccardo Grazzi;Massimiliano Pontil;Saverio Salzo
Published: 2-8-2022
Updated: 11-16-2023
Affiliation: Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia
Country: Italy
Conference: J. Mach. Learn. Res.

Labels Estimated by AI

Convergence Analysis Algorithm Design Weight Update Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

We analyse a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower-level problem, i.e.~they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise (near) optimal sample complexity. In particular, we propose a simple method which uses (stochastic) fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates.