1 of 35

Unleashing the Potential of Machine Learning for Efficient Analysis of Solar Observations

Carlos José Díaz Baso

Rosseland Centre for Solar Physics, Institute of Theoretical Astrophysics, University of Oslo, N-0315 Oslo, Norway

carlos.diaz@astro.uio.no

2 of 35

2

¿Big Data?

IRIS (2013-now) ~ 61 TB^*

*Level 2

DKIST, EST, SST ~ TB/h/instr

Numbers courtesy of Bart De Pontieu, Marc DeRosa, Ryan Timmons (10/2022)

Hinode/SOT (2006-now) ~ 35 TB^*

*Level 1 (FG) + 1&2 (SP)

3 of 35

3

Exploration and dimensionality reduction

Typical questions of someone that recently got a big dataset:

How diverse vs big is my dataset?

How many of these features are actually really informative

or contain just redundant information?

4 of 35

4

Dimensionality reduction: Principal Component Analysis

1.- Find the “principal components” basis

2.- Project the data into a truncated version of it.

What is the basic idea behind PCA?

Why is it useful in solar observations?

Martínez González et al. (2008a)

compressibility

(e.g. Asensio Ramos et al. 2007; Asensio Ramos & López Ariste 2010)

One of the most common methods for dimensionality reduction in solar physics is PCA, or Principal component analysis.

What is the basic idea behind PCA?

The idea behind PCA is to obtain an orthogonal basis where the large amount of the data can be described with the least number of vectors.

Once we have the basis, we can project the data into truncated version of it, like this example where we kept the 2 dimensions that explain the maximum variance.

Why is it useful in solar observations?

This is because when we use spectrograph to observe a spectral line, the number of wavelength points is usually much larger than the number of parameters I would need to describe the shape of the line. In PCA, we can quantify the contribution of each eigenvector to explain the data in solar observations, the amplitudes drop very fast by several orders of magnitude. This implies that the most of the information of the data is confined in few components.

5 of 35

5

Dimensionality reduction: PCA applications

Martínez González et al. (2008a,b)

Denoising

Casini et al. (2012, 2021)

Removal of fringes

(e.g. Asensio Ramos et al. 2007; Asensio Ramos & López Ariste 2010; Paletou 2012; Pastor Yabar et al. 2018; Trelles Arjona et al. 2021)

6 of 35

6

Dimensionality reduction: PCA applications

Ruiz Cobo & Asensio Ramos (2013)

(e.g. Rees et al. 2000; López Ariste & Casini 2002; Skumanich & López Ariste 2002; Casini et al. 2005; Casini et al. 2009, 2013; Sainz Dalda et al. 2019)

PCA inversion

Socas-Navarro et al. (2001)

(e.g. Quintero Noda et al. 2015, 2016; Felipe et al. 2016, Griñón-Marín 2021)

PCA deconvolution

7 of 35

7

Finding patterns

Once we have processed our dataset …

Can I find a way to distinguish “groups” of similar properties?

The conclusions will be the same for all the examples within the group!

Where are all these “groups” located in the solar surface?

8 of 35

8

Clustering: K-means algorithm

(e.g., Pietarila et al. 2007; Viticchié & Sánchez Almeida 2011; Panos et al. 2018; Sainz Dalda et al. 2019; Bose et al. 2019; Rouppe van der Voort et al. 2021; Robustini et al. 2019; Joshi & Rouppe van der Voort 2020b; Kuckein et al. 2020; Bose et al. 2021a,b; Barczynski et al. 2021; Nóbrega-Siverio et al. 2021; Kleint & Panos 2022; Joshi et al. 2022; Thoen Faber 2022).

1.- Define the K clusters and draw the centroids

2.- Assign each point to the closest centroid (Euclidean distance)

3.- The centroids are updated as the average of their cluster

λ₁

λ₂

9 of 35

9

K-means applications

Viticchié & Sánchez Almeida (2011)

QS magnetic field

Bose et al. (2019, 2021a,b)

Type-II Spicules

10 of 35

10

K-means applications

Brandon Panos et al. (2018)

Nóbrega-Siverio et al. (2021)

(e.g. Magnus Woods et al. 2021)

11 of 35

PCA and k-means are not the only ones

11

Affinity Propagation
Agglomerative Hierarchical Clustering
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Gaussian Mixture Models (GMM)
K-Means
Mean Shift Clustering
Mini-Batch K-Means
OPTICS
Spectral Clustering

Clustering techniques

Dimensionality reduction

Feature selection
Principal Component Analysis (PCA)
Non-negative matrix factorization (NMF)
Linear discriminant analysis (LDA)
Generalized discriminant analysis (GDA)
Missing Values Ratio
Low Variance Filter
High Correlation Filter
Backward Feature Elimination
Forward Feature Construction
t-SNE (T-distributed stochastic neighbour embedding)

12 of 35

12

Classification and prediction

Can we find a recipe that links our data with some properties?
they can come from different sources
just very computationally expensive
very difficult to manually find a way

Can it be general enough to be applied to future data?

Even more important, can we learn something from it?

Once you know the interesting part in our dataset …

13 of 35

13

Classification and prediction

Support Vector Machines

margin

Bobra et al. (2015, 2016)

Flare prediction

(e.g. Yuan et al. 2010; Nishizuka et al. 2017; Florios et al. 2018)

14 of 35

14

Nonlinear modeling → Neural networks

What happens if this relation that we try to model is very non-linear?

A

B

Input Free parameters Output

15 of 35

15

Intensity + Polarization

Solar

model

Radiative transfer

NLTE Radiative transfer calculations

non-LTE pop. → Intensity + Polarization

Solar

model

Neural Network

Chappell & Pereira (2021)

Vicente Arévalo et al. (2021)

1D / departure coefficients

3D / LTE → non-LTE populations

Hα

16 of 35

16

(e.g. Carroll et al. 2001, 2008; Socas-Navarro, H. 2003, 2005; Sainz Dalda et al. 2019; Milić et al. 2020; Gafeira et al. 2021; Centeno et al. 2022)

Intensity + Polarization

Solar

model

Radiative transfer

Intensity + Polarization

Solar

model

Neural Network

Spectropolarimetric inversions

Sainz Dalda et al. (2019)

Synthesis + Inversions ~ 10³ - 10⁶ faster

17 of 35

17

Accelerating the inference …

Kianfar S. et al. (2019)

… in large FOVs

Morosin R. et al. (2022)

Ca II 8542 Å

Net radiative losses

CRISP@SST

… in long time-series

18 of 35

Autoencoders (the non-linear PCA)

18

Sparse Representation

Flint S. & Milić I. (2021)

Encoder

Decoder

Input Data

Encoded Data

Reconstructed Data

(e.g. Skumanich & López Ariste 2002; Sadykov et al. 2021; Sergey Ivanov et al. 2021)

19 of 35

19

If I want to analyze a megapixel image (10⁶), do I need a neural network with O(>10⁶) learnable parameters?

Convolutional Neural Networks

github.com/vdumoulin/

Translational Equivariance

Efficient parametrization

Information of neighbouring pixels

Input data

Output data

Translational Equivariance

visual.cs.ucl.ac.uk/pubs/harmonicNets

f(g(x))=g(f(x))

20 of 35

20

Automatic catalogues

Armstrong & Fletcher (2019)

The image as a “whole”

Automatic segmentation

Xulong Guo et al (2022)

(e.g. Ahmadzadeh et al 2019; Zhu et al. 2019; Gaofei Zhu et al 2021, Illarionov & Tlatov (2018); Diercke et al 2022)

21 of 35

21

Díaz Baso et al. (2018)

Image deconvolution

(e.g. Asensio Ramos et al. 2018, 2021; Armstrong et al. 2021; Wang et al 2021; Deng et al 2021)

Using the information from nearby pixels

Asensio Ramos & Díaz Baso (2019)

Hinode PSF-compensated Stokes inversions

1D inversion code

Convolutional Network

22 of 35

22

Solar image denoising

Ca II 8542 A

Díaz Baso et. al (2019)

CNNs applications

(e.g. Eunsu Park et al. 2020)

Horizontal velocity fields

Benoit Tremblay et al. (2020, 2021)

(e.g. Asensio Ramos et al. 2017; Ishikawa et al., 2022)

23 of 35

23

Fibril orientation

Other applications

(Gravitational) wave classification

Plamen G. Krastev (2020)

Haodi Jiang et al (2021)

(e.g. Heming Xia et al. 2020, Richard Qiu et al 2022)

24 of 35

24

Enhancing CNNs with temporal information

Far-side activity detection

Broock et al. (2022)

(e.g. Felipe et al. 2019; Broock et al 2021; Zeyu Sun et al. 2022)

25 of 35

All that glitters is not gold

Challenges and future directions

25

- Why is this AR classified as a flare-producer?

Interpretability

Zeyu Sun et al. (2022)

To name a few methods:

Learned Features
Pixel Attribution (Saliency Maps)
Testing Concepts
Adversarial Examples
Influential Instances
Symbolic regression

(e.g. Kangwoo Yi et al. 2021)

Vishal Upendran et al. (2020)

26 of 35

Does your method know when it doesn’t know?

26

Friday, July 23, 2021 | Virtual Worldwide

Uncertainty quantification

27 of 35

27

A probabilistic perspective: conditional information

Designing observing sampling

Mutual information

Low correlation/ Medium correlation/ High correlation

Panos et al. (2021a,b)

Díaz Baso et al. (in prep)

(e.g. Szenicer et al. 2019; Lim et al. 2021; Salvatelli et al. 2022)

(e.g. Snelling et al. 2020)

28 of 35

28

A probabilistic perspective: inverse problems

forward modeling

inverse problem

29 of 35

29

Inherent in many problems

Wehrbein et al. (2021)

Input image

3D Pose

Input image

Lugmayr A. et al. (2020)

Super-Resolution

Pose estimation

30 of 35

Normalizing flows

30

NNs

λ

NFlows

λ

Rezende & Mohamed (2015), Dinh et al. (2016)

31 of 35

Normalizing flows

31

Díaz Baso et al. (2021)

(e.g. Osborne et al. 2019, Asensio Ramos et al. 2021)

Rezende & Mohamed (2015), Dinh et al. (2016)

N-LTE inversion

Only using the Fe I 6301 line

Also using the Ca II 8542 profile

Height -

32 of 35

Normalizing flows → Diffusion models

32

Sohl-Dickstein et al. (2015), Yang & Ermon (2019), Ho et al. (2020)

Grizzly bear taking a selfie on the Golden Gate bridge on a windy day

Irish Terrier riding a horse in Patagonia and playing the harmonica

Cat with a yellow hat going down the stairs under water

Panda mad scientist mixing sparking chemicals, artstation

Ramesh et al. (2022)

→ valuable effort in complex inverse problems.

33 of 35

33

Machine learning can help in many different ways: explore patterns, image reconstruction, compression, denoising, parameter inference, classification, and tracking, etc. Inference is extremely fast.

The question you want to address is as important as the method. Depending on the goal, no need to always "reinvent the wheel" (literature vs own design).

There are still many ways to improve them: incorporating physical constraints (e.g. symmetries, conservation laws), making them interpretable, quantifying uncertainty, enabling multimodal solutions, etc.

Summary and conclusions

34 of 35

34

Machine learning can help in many different ways: explore patterns, image reconstruction, compression, denoising, parameter inference, classification, and tracking, etc. Inference is extremely fast.

The question you want to address is as important as the method. Depending on the goal, no need to always "reinvent the wheel" (literature vs own design).

There are still many ways to improve them: incorporating physical constraints (e.g. symmetries, conservation laws), making them interpretable, quantifying uncertainty, enabling multimodal solutions, etc.

Summary and conclusions

(e.g. scikit-learn - python)

35 of 35

35

What a time to be alive!

Machine learning can help in many different ways: explore patterns, image reconstruction, compression, denoising, parameter inference, classification, and tracking, etc. Inference is extremely fast.

The question you want to address is as important as the method. Depending on the goal, no need to always "reinvent the wheel" (literature vs own design).

There are still many ways to improve them: incorporating physical constraints (e.g. symmetries, conservation laws), making them interpretable, quantifying uncertainty, enabling multimodal solutions, etc.

Summary and conclusions