TY - JOUR
T1 - On The Complexity of First-Order Methods in Stochastic Bilevel Optimization
AU - Kwon, Jeongyeol
AU - Kwon, Dohyun
AU - Lyu, Hanbaek
N1 - Publisher Copyright:
Copyright 2024 by the author(s)
PY - 2024
Y1 - 2024
N2 - We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions y∗(x) in response to the changes in the upper-level variables x. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call y∗-aware, that returns an O(ϵ)estimate of the lower-level solution, in addition to first-order gradient estimators locally unbiased within the Θ(ϵ)-ball around y∗(x). We study the complexity of finding stationary points with such an y∗-aware oracle: we propose a simple first-order method that converges to an ϵ stationary point using O(ϵ-6), O(ϵ-4) access to first-order y∗-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by O(ϵ) with minimal assumptions. We then provide the matching Ω(ϵ-6), Ω(ϵ-4) lower bounds without and with an additional smoothness assumption on y∗-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an y∗-aware oracle must suffer the same lower bounds.
AB - We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions y∗(x) in response to the changes in the upper-level variables x. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call y∗-aware, that returns an O(ϵ)estimate of the lower-level solution, in addition to first-order gradient estimators locally unbiased within the Θ(ϵ)-ball around y∗(x). We study the complexity of finding stationary points with such an y∗-aware oracle: we propose a simple first-order method that converges to an ϵ stationary point using O(ϵ-6), O(ϵ-4) access to first-order y∗-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by O(ϵ) with minimal assumptions. We then provide the matching Ω(ϵ-6), Ω(ϵ-4) lower bounds without and with an additional smoothness assumption on y∗-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an y∗-aware oracle must suffer the same lower bounds.
UR - http://www.scopus.com/inward/record.url?scp=85203802937&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85203802937
SN - 2640-3498
VL - 235
SP - 25784
EP - 25811
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 41st International Conference on Machine Learning, ICML 2024
Y2 - 21 July 2024 through 27 July 2024
ER -