Paula Cordero-Encinar, Francesca Crucinio and Deniz Akyildiz
Some motivation: Latent Variable Models in Biology
Objectives
Perform inference and learning in latent variable models whose joint probability distribution is non-differentiable.
set of static parameters
latent (unobserved, hidden, or missing) variables
(fixed) observed data
The statistical estimation tasks we focus are:
Some motivation: Latent Variable Models and EM algorithm
The MMLE task in LVMs is classically solved via the Expectation-Maximisation (EM) algorithm.
Challenges
Background: Langevin Algorithms
Langevin algorithms are used to draw samples from a probability distribution by running the following SDE
Langevin algorithms can be reformulated as a minimisation problem in the space of probability distributions
Background: Reformulating MMLE via Particle Systems
EM algorithm is equivalent to performing coordinate descent of a free energy functional [2], whose minimum is the maximum likelihood estimate of the latent variable model and the optimal posterior
Based on this observation, we can construct an extended stochastic dynamical system [1,2] which can be run in the space , with the aim of jointly solving the problem of latent variable sampling and parameters optimisation. In particular, IPLA [1]
[1] Akyildiz et al. (2025) Interacting particle Langevin algorithm for maximum marginal likelihood estimation
[2] Kuntz et al. (2023) Particle algorithms for maximum likelihood training of latent variable models
Background: Proximal map and Moreau-Yosida approximation
Algorithms
Our goal is to extend interacting particle algorithms for the MMLE problem to cases where the distribution
may be non-differentiable.
Moreau-Yosida Interacting Particle Langevin Algorithm
Proximal Interacting Particle Gradient Langevin Algorithm
Example I: Bayesian Neural Network with Sparse Prior
Apply a Bayesian 2-layer neural network to classify MNIST digits.
We consider a Laplace prior on the weights x which is a sparsity-inducing prior.
Example I: Bayesian Neural Network with Sparse Prior
The sparse representation of our experiment has the potential advantage of producing models that are smaller in terms of memory usage when small weights are zeroed out.
Figure: Histogram and density estimation of the weights of a BNN for a randomly chosen particle from the final cloud of particles.
Example II: Image Deblurring with Total Variation Prior
Recover a high-quality image from a blurred and noisy observation ,
where is a circulant blurring matrix and
Inverse problem is ill-conditioned incorporate prior knowledge.
We use a total variation prior , which promotes smoothness and preserves edges
The strength of this prior depends on a hyperparameter that typically requires manual tuning (expert knowledge). Instead of fixing this parameter manually, we estimate its optimal value
Example II: Image Deblurring with Total Variation Prior
The strength of this prior depends on a hyperparameter that usually requires manual tuning. Instead, we estimate its optimal value.
Conclusions
Our algorithms present a novel approach for handling Bayesian models arising from different types of non-differentiable regularisations, including Lasso, elastic net, nuclear-norm and total variation norm.
We establish theoretical guarantees under strong convexity assumptions, however, in practice, our methods perform well under more general conditions and demonstrate robustness and stability across a range of regularisation parameter values.
See you at the poster presentation!
T
h
a
n
k
y
o
u
HAPPY TO CHAT MORE AT THE POSTER PRESENTATION THIS AFTERNOON!