publications
Research contributions towards a principled understanding of optimization dynamics in deep learning.
Stochastic optimization in deep learning
We study the long-run behavior of stochastic gradient descent (SGD) on non-convex objectives, providing the first characterization of SGD’s invariant measures and global convergence times.
- What is the long-run distribution of stochastic gradient descent? (ICML 2024, poster)
- The global convergence time of stochastic gradient descent in non-convex landscapes (ICML 2025, poster)
Talks: Thoth seminar (slides), LPSM Paris (slides), Université Côte d’Azur (slides), Morgan Stanley ML Research (slides), Inria Argo team (slides)
Internal mechanisms of large language models
Understanding the robustness of uncertainty quantification methods and in-context learning capabilities through targeted experiments.
- The geometries of truth are orthogonal across tasks (R2FM Workshop@ICML 2025) — Work at Apple ML Research
- How does the pretraining distribution shape in-context learning? (arXiv) — Work at Morgan Stanley ML Research
Wasserstein distributionally robust optimization
Regularization schemes and generalization guarantees for Wasserstein DRO models.
- Regularization for Wasserstein distributionally robust optimization (ESAIM COCV)
- Exact generalization guarantees for (regularized) Wasserstein distributionally robust models (NeurIPS 23, slides)
Talks: Erice 2022 (slides), FOCM 2023 (poster), NeurIPS@Paris 2023 (slides)
Last-iterate convergence of mirror methods
Determining how Bregman geometry impacts last-iterate guarantees in variational inequalities.
- The last-iterate convergence rate of optimistic mirror descent in stochastic variational inequalities (COLT 21, slides, poster)
- The rate of convergence of Bregman proximal methods (to be published in SIOPT)
Talks: COLT 21, ICCOPT 22 (slides), SMAI MODE 2024 (slides)
Graph neural networks
- Expressive power of invariant and equivariant graph neural networks (ICLR 21, slides) — with Marc Lelarge
Smooth game optimization for Machine Learning
Unified analyses and accelerated methods for differentiable games.
- A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games (AISTATS 20, slides)
- Accelerating smooth games by manipulating spectral shapes (AISTATS 20)
Full bibliography
2025
- How does the pretraining distribution shape in-context learning? task selection, generalization, and robustnessarXiv: 2510.01163, 2025
- The geometries of truth are orthogonal across tasksIn ICML 2025 Workshop on Reliable and Responsible Foundation Models, 2025
- Almost sure convergence of stochastic gradient methods under gradient dominationTransactions on Machine Learning Research, 2025
2024
- skwdro: a library for Wasserstein distributionally robust machine learningarXiv: 2410.21231, 2024
2023
- Regularization for Wasserstein distributionally robust optimizationESAIM: Control, Optimisation and Calculus of Variations, 2023
- Automatic Rao-Blackwellization for sequential Monte Carlo with belief propagationIn ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023