1 of 16

Paula Cordero-Encinar, Francesca Crucinio and Deniz Akyildiz

2 of 16

Some motivation: Latent Variable Models in Biology

3 of 16

Objectives

Perform inference and learning in latent variable models whose joint probability distribution is non-differentiable.

set of static parameters

latent (unobserved, hidden, or missing) variables

(fixed) observed data

The statistical estimation tasks we focus are:

Inference: estimating the latent variables given the observed data and the model parameters through the computation of the posterior distribution

Learning: estimating the model parameters given the observed data through the computation and maximisation of the marginal likelihood (often intractable)

4 of 16

Some motivation: Latent Variable Models and EM algorithm

The MMLE task in LVMs is classically solved via the Expectation-Maximisation (EM) algorithm.

E-step: given we estimate the latent variables and compute
M-step: maximises the expectation of the E-step to provide a new estimate of :

The E and M steps are typically intractable and require approximations, which can degrade performance.
The inherently sequential nature of EM's iterative steps limits opportunities for parallelism, making it computationally inefficient for large-scale problems.

Challenges

5 of 16

Background: Langevin Algorithms

Langevin algorithms are used to draw samples from a probability distribution by running the following SDE

Langevin algorithms can be reformulated as a minimisation problem in the space of probability distributions

6 of 16

Background: Reformulating MMLE via Particle Systems

EM algorithm is equivalent to performing coordinate descent of a free energy functional [2], whose minimum is the maximum likelihood estimate of the latent variable model and the optimal posterior

Based on this observation, we can construct an extended stochastic dynamical system [1,2] which can be run in the space , with the aim of jointly solving the problem of latent variable sampling and parameters optimisation. In particular, IPLA [1]

[1] Akyildiz et al. (2025) Interacting particle Langevin algorithm for maximum marginal likelihood estimation

[2] Kuntz et al. (2023) Particle algorithms for maximum likelihood training of latent variable models

7 of 16

Background: Proximal map and Moreau-Yosida approximation

8 of 16

Algorithms

Our goal is to extend interacting particle algorithms for the MMLE problem to cases where the distribution

may be non-differentiable.

9 of 16

Moreau-Yosida Interacting Particle Langevin Algorithm

10 of 16

Proximal Interacting Particle Gradient Langevin Algorithm

11 of 16

Example I: Bayesian Neural Network with Sparse Prior

Apply a Bayesian 2-layer neural network to classify MNIST digits.

We consider a Laplace prior on the weights x which is a sparsity-inducing prior.

12 of 16

Example I: Bayesian Neural Network with Sparse Prior

The sparse representation of our experiment has the potential advantage of producing models that are smaller in terms of memory usage when small weights are zeroed out.

Figure: Histogram and density estimation of the weights of a BNN for a randomly chosen particle from the final cloud of particles.

13 of 16

Example II: Image Deblurring with Total Variation Prior

Recover a high-quality image from a blurred and noisy observation ,

where is a circulant blurring matrix and

Inverse problem is ill-conditioned incorporate prior knowledge.

We use a total variation prior , which promotes smoothness and preserves edges

The strength of this prior depends on a hyperparameter that typically requires manual tuning (expert knowledge). Instead of fixing this parameter manually, we estimate its optimal value

14 of 16

Example II: Image Deblurring with Total Variation Prior

The strength of this prior depends on a hyperparameter that usually requires manual tuning. Instead, we estimate its optimal value.

15 of 16

Conclusions

Our algorithms present a novel approach for handling Bayesian models arising from different types of non-differentiable regularisations, including Lasso, elastic net, nuclear-norm and total variation norm.

We establish theoretical guarantees under strong convexity assumptions, however, in practice, our methods perform well under more general conditions and demonstrate robustness and stability across a range of regularisation parameter values.

See you at the poster presentation!

16 of 16

T

h

a

n

k

y

o

u

HAPPY TO CHAT MORE AT THE POSTER PRESENTATION THIS AFTERNOON!