Seminaire Stats & Machine Learning 2024-2025
Séminaire Stats & Machine Learning
2024-2025
(ibrahim.kaddouri@universite-paris-saclay.fr, romain.perier@universite-paris-saclay.fr)
Les jeudis, de 13h15 à 14h30 en salle 0A1, Institut de Maths d’Orsay [accès] ou en visio sur BBB.
NB: Lorsque l’orateur est présent physiquement, nous assurons une retransmission sur BBB pour les étudiants ne pouvant pas l’être.
Les abstracts ainsi que les slides et/ou les offres de stage/thèse des orateurs se trouvent en bas du programme.
Date | Invité | Titre |
03/10 | Gilles Stoltz | |
10/10 | Nicolas Chopin Annulé, reporté au 16/01 | A connection between tempering Sequential Monte Carlo and entropic mirror descent |
17/10 | Vianney Perchet | Learning learning-augmented algorithms |
24/10 | Claire Lacour | Introduction à la confidentialité différentielle en Statistiques |
31/10 | Pas de séminaire | |
07/11 | Guillaume Lecué | A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator |
14/11 | Anne Sabourin | Statistical learning for extreme value analysis: learning on extreme covariates |
21/11 | Aurélie Fisher | |
28/11 | Zacharie Naulet | |
05/12 | Sylvain Le corff | Modèles génératifs et simulation conditionnelle. Applications aux problèmes inverses. |
12/12 | Mohamed Ndaoud (ANNULÉ) | |
19/12 | Erwan Scornet | |
09/01 | Ismaël Castillo (En distanciel, lien BBB ci-dessus) | Understanding Variational Bayes in high dimensional models |
16/01 | Nicolas Chopin | A connection between tempering Sequential Monte Carlo and entropic mirror descent |
Abstracts
03/10:
Trouver un bon stage de M2 et surtout une thèse
Gilles Stoltz, CNRS / Laboratoire de mathématiques d'Orsay
Email: gilles.stoltz@universite-paris-saclay.fr
Je voudrais vous présenter deux sujets de thèse (en bandits stochastiques, et en inférence causale), mais aussi et surtout, vous expliquer au passage ma vision sur les enjeux de l'année de M2 : i) trouver un sujet mais surtout un financement de thèse, avec des encadrants disponibles et ii) considérer les cours du M2 comme autant d'énoncés et d'opportunités de thèses, ce qui doit induire une attitude en cours adéquate, se dégageant du souci scolaire pour viser l'action de recherche. Je partagerai en particulier mes sentiments très positifs, mais aussi mes points d'attention, concernant les thèses CIFRE.
10/10:
A connection between tempering Sequential Monte Carlo and entropic mirror descent
Nicolas Chopin, ENSAE
Email: n.chopin.mcmc@gmail.com / nicolas.chopin@ensae.fr
In this talk, I will make connections between tempering SMC (Sequential Monte Carlo) and entropic mirror descent to sample from a target probability distribution whose unnormalized density is known. My co-authors and myself have established that tempering SMC corresponds to entropic mirror descent applied to the reverse Kullback-Leibler (KL) divergence and obtain convergence rates for the tempering iterates. Our result motivates the tempering iterates from an optimization point of view, showing that tempering can be seen as a descent scheme of the KL divergence with respect to the Fisher-Rao geometry, in contrast to Langevin dynamics that perform descent of the KL with respect to the Wasserstein-2 geometry. We use the connection between tempering and mirror descent iterates to justify common practices in SMC and derive adaptive tempering rules that improve over other alternative benchmarks in the literature. Based on https://arxiv.org/abs/2310.11914 which was accepted at ICML 2024.
Learning learning-augmented algorithms
Vianney Perchet, ENSAE
Email : vianney.perchet@gmail.com
We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution. We start by analyzing the scenario where the type characteristics are known and then move to two learning scenarios where the types are unknown: non-preemptive problems, where each started job must be completed before moving to another job; and preemptive problems, where job execution can be paused in the favor of moving to a different job. In both cases, we design algorithms that achieve sublinear excess cost, compared to the performance with known types, and prove lower bounds for the non-preemptive case. Notably, we demonstrate, both theoretically and through simulations, how preemptive algorithms can greatly outperform non-preemptive ones when the durations of different job types are far from one another, a phenomenon that does not occur when the type durations are known.
Introduction à la confidentialité différentielle en Statistiques
Claire Lacour, Université Gustave Eiffel
Email : claire.lacour@univ-eiffel.fr
La première partie de cet exposé sera consacrée aux motivations et premières tentatives pour une analyse de données personnelles préservant la confidentialité. Je donnerai ensuite la définition de la confidentialité différentielle, quelques propriétés et présenterai le mécanisme de Laplace, procédé le plus connu pour confidentialiser un algorithme. J'en déduirai une méthode d'estimation confidentialisée de densité et de paramètre. Enfin on abordera la confidentialité locale et quelques résultats de vitesses minimax dans ce cadre.
A geometrical viewpoint on the benign overfitting property of the minimum $\ell_2$-norm interpolant estimator.
Joint work with Zong Shang.
Guillaume Lecué, ESSEC
Email : lecue@essec.edu
Practitioners have observed that some deep learning models generalize well even with a perfect fit to noisy training data [1,2]. Since then many theoretical works have revealed some facets of this phenomenon [3,4,5] known as benign overfitting. In particular, in the linear regression model, the minimum l_2-norm interpolant estimator \hat\bbeta has received a lot of attention [3,4,6] since it was proved to be consistent even though it perfectly fits noisy data under some condition on the covariance matrix \Sigma of the input vector. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [6]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [4]: \hat\bbeta can be written as a sum of a ridge estimator \hat\bbeta_{1:k} and an overfitting component \hat\bbeta_{k+1:p} which follows a decomposition of the features space \bR^p=V_{1:k}\oplus^\perp V_{k+1:p} into the space V_{1:k} spanned by the top k eigenvectors of \Sigma and the one V_{k+1:p} spanned by the p-k last ones. We also prove a matching lower bound for the expected prediction risk. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint coincides with the effective rank from [3,6] and is the key tool to handle the behavior of the design matrix restricted to the sub-space V_{k+1:p} where overfitting happens.
[1] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA, 116(32):15849–15854, 2019.
[2] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64(3):107–115, 2021.
[3] Peter L. Bartlett, Philip M. Long, Gabor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117(48):30063–30070, 2020.
[4] Peter L. Bartlett, Andreas Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. To appear in Acta Numerica, 2021.
[5] Mikhail Belkin. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. To appear in Acta Numerica, 2021.
[6] Alexander Tsigler and Peter L. Bartlett. Benign overfitting in ridge regression. 2021.
Statistical learning for extreme value analysis: learning on extreme covariates
Anne Sabourin, Université Paris Cité
Email : anne.sabourin@math.cnrs.fr
Extreme Value Analysis is a branch of probability and statistics focused on the tail behavior of random processes, namely on limiting distributions of rescaled excesses above high thresholds. In applications (typically, finance or environmental sciences), such tail processes encapsulate crucial features for risk management, e.g. the dependence relationships between tail events such as an excess over a high threshold in different components.
This talk will provide an overview of a recent line of work aiming at establishing non-asymptotic guarantees on estimators and machine learning algorithms dedicated to learning key features of tail behaviors. We shall focus in particular on the problem of learning on extreme covariates' and review recent advances and open research directions.
References:
Open PhD position: https://helios2.mi.parisdescartes.fr/~asabouri/offreStage2024.pdf
Jalalzai, H., Clémençon, S., & Sabourin, A. (2018). On binary classification in extreme regions. Advances in Neural Information Processing Systems, 31.
Huet, N., Clémençon, S., & Sabourin, A. (2023). On Regression in Extreme Regions. arXiv preprint arXiv:2303.03084.
Clémençon, S., Jalalzai, H., Lhaut, S., Sabourin, A., & Segers, J. (2023). Concentration bounds for the empirical angular measure with statistical learning applications. Bernoulli, 29(4), 2797-2827.
Estimation de courbe & Statistique pour le climat
Aurélie Fischer, LPSM, Université Paris Cité
Email : aurelie.fischer@u-paris.fr
Dans cet exposé, je présenterai deux thèmes de recherche en estimation et apprentissage statistique : un axe théorique, comportant un aspect géométrique, et un projet appliqué aux sciences du climat.
Le premier thème concerne l'estimation de courbes au moyen d'objets appelés courbes principales, qui sont des résumés unidimensionnels d'une loi de probabilité ou d'un nuage de points.
Dans le second projet, l'objectif est d'améliorer, en utilisant de l'apprentissage statistique et des mesures obtenues grâce à des ballons super-pressurisés, la connaissance et la description des ondes de gravité : ces processus physiques de petite échelle ne sont pas explicitement décrits dans les modèles de climat, mais jouent un rôle crucial dans la circulation atmosphérique.
Frontiers to the learning of Hidden Markov Models
Zacharie Naulet, INRAE
Email : zacharie.naulet@inrae.fr
Hidden Markov models (HMMs) are flexible tools for clustering dependent data coming from unknown populations, allowing nonparametric identifiability of the population densities when the data is « truly » dependent. In the first part of the talk, I will talk about our result characterizing the frontier between learnable and unlearnable two-state nonparametric HMMs in term of a suitable notion of « distance » to independence. I will present surprising new phenomena emerging in the nonparametric setting. In particular, it is possible to « borrow strength » from the estimator of the smoothest density to improve the estimation of the other. We conduct a precise analysis of minimax rates, showing a transition depending on compared smoothnesses of the emission densities.
Joint work with Kweku Abraham and Elisabeth Gassiat.
Modèles génératifs et simulation conditionnelle. Applications aux problèmes inverses.
Sylvain Le Corff, LPSM
Email : sylvain.le_corff@sorbonne-universite.fr
Les modèles génératifs basés sur le score (SGM), aussi connus sous le nom de modèles de diffusion, visent à estimer une distribution en estimant des fonctions de score à l’aide d’échantillons perturbés issus de la distribution cible. Ces méthodes ont permis l’obtention de résultats empiriques très impressionnants dans différents domaines complexes (traitement d’image, séries temporelles, etc.) et garantissant des performances au-delà des méthodes de l’état de l’art. Dans cet exposé, nous présenterons les modèles de diffusion ainsi que de nouvelles méthodes de simulation conditionnelle basées sur ces approches. Les modèles génératifs basés sur le score ont récemment été appliqués avec succès à différents problèmes inverses avec des applications par exemple en imagerie médicale. Dans ce cadre, nous pouvons exploiter la structure particulière de la loi a priori définie par le SGM pour définir une séquence de problèmes inverses intermédiaires. A mesure que le niveau de bruit diminue, les lois a posteriori de ces problèmes inverses se rapprochent de la loi cible du problème inverse initial. Pour échantillonner cette séquence de lois, nous proposons d’utiliser des méthodes de Monte Carlo séquentielles (SMC). L’algorithme proposé, MCGDiff, bénéficie de garanties théoriques pour la reconstruction des lois cibles et diverses simulations numériques illustrent qu’il est plus performant que les méthodes concurrentes lorsqu’il s’agit de traiter des problèmes inverses mal posés.
Going beyond the fear of emptyness to gain consistency.
Erwan Scornet, LPSM
Email : erwan.scornet@polytechnique.edu
Missing data are ubiquitous in many real-world datasets as they naturally arise from gathering information from various sources in different format. Most statistical analyses have focused on estimation in parametric models despite missing values. However, accurate estimation is not sufficient to make predictions on a test set that contains missing data: a manner to handle missing entries must be designed. In this talk, we will analyze two different approaches to predict in presence of missing data: imputation and pattern-by-pattern strategies. We will show the consistency of such approaches and study their performances in the context of linear models.
Related papers:
- On the consistency of supervised learning with missing values https://arxiv.org/abs/2405.09196,
- What is a good imputation to predict with missing values https://arxiv.org/abs/2106.00311
- Near-optimal rate of consistency for linear models with missing values https://proceedings.mlr.press/v162/ayme22a/ayme22a.pdf
- Naive imputation implicitly regularizes high-dimensional linear models https://arxiv.org/abs/2301.13585
- Harnessing pattern-by-pattern linear classifiers for prediction with missing data https://arxiv.org/pdf/2405.09196
Understanding Variational Bayes in high dimensional models.
Ismael Castillo, LPSM
Email : ismael.castillo@upmc.fr
Variational Bayes (VB) methods are a family of algorithms that are particularly popular in statistics and machine learning. Given a Bayesian posterior distribution that can be high-dimensional and possibly non-trivial to sample from, VB methods approximate it (often, in a Kullback-Leibler-type sense) within a simpler family of distribution. This boils down to an optimization problem for which efficient computational methods are often available. While there is much empirical success, theoretical understanding of VB is currently quite partial, in particular in high-dimensional models involving sparsity constraints, or when approximation classes are high or infinite-dimensional.
I will first give a quick introduction to VB methods. Then I will illustrate recent advances for high-dimensional sparse linear regression: in a preprint with Alice l'Huillier, Kolyan Ray and Luke Travis we derive results for estimating a fixed number of coordinates of the regression vector using a "de-biased" VB approach. We will also discuss a number of open questions in this setting and more generally for VB methods.
Slides, stages et offres de thèse des orateurs
Des offres de stages/thèses seront ajoutées au fur et à mesure dans ce folder: Offres de stages