1 of 25

1

Emille E. O. Ishida

Laboratoire de Physique de Clermont - Université Clermont-Auvergne

Clermont Ferrand, France

SNAD: Machine learning assisted discovery in astronomy

https://snad.space/

IN2P3/IRFU Machine Learning workshop

17 March 2021 - zoom

2 of 25

SuperNova Anomaly Detection … historically

What is SNAD?

International collaboration aimed to develop machine learning tools which can optimize astronomical discovery in the era of big data.

arXiv:astro-ph/1905.11516

arXiv:astro-ph/1909.13260

arXiv:astro-ph/2012.01419

France - Russia - USA

3 of 25

In algorithmic terms ...

Anomaly Detection

“An anomaly is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism”

Hawkins, 1980

4 of 25

Philosophically it is about discovery ...

Machine Learning only produces recommendations

Observed data set

Anomaly detection algorithm

Potentially interesting anomalies:

  • Candidate 1
  • Candidate 2
  • Candidate 3
  • ...
  • ...

Expert analysis

Not interesting

Interesting

Very interesting!

5 of 25

SNAD work philosophy:

Machine Learning only produces recommendations

Observed data set

Anomaly detection algorithm

Potentially interesting anomalies:

  • Candidate 1
  • Candidate 2
  • Candidate 3
  • ...
  • ...

Expert analysis

Not interesting

Interesting

Very interesting!

Get more data

or

Publication

6 of 25

6

Experiment

First try: the Open Supernova Catalog

Pruzhinskaya et al., 2019 - MNRAS - https://arxiv.org/abs/1905.11516

Public catalog of supernova, known to have some contamination

.. after selection and pre-processing, ~2000 objects

7 of 25

Many trees make a forest …

Isolation Forest

8 of 25

8

Experiment

First try: the Open Supernova Catalog

Pruzhinskaya et al., 2019 - MNRAS - https://arxiv.org/abs/1905.11516

  • Anomaly detection via Isolation Forest
  • Visually inspected 2% in each set (~100 objs)
  • Results:
    • 81 identified anomalies
    • SLSN, peculiar SNe, miss-classified stars
    • 1 AGN and 1 binary micro-lensing

Active Galactic Nuclei

Binary microlensing

9 of 25

Anomaly detection:

Second Try: Zwicky Transient Facility DR3

Figure by Maria Pruzhinskaya

  • Survey currently in operation, telescope in California
  • 3 fields from Dara Release 3 (DR3)��After selection cuts and feature extraction, 2.25 million objects

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

10 of 25

Experiment

Second Try: Zwicky Transient Facility DR3

Visualization generated with the SNAD ZTF viewer: https://ztf.snad.space/

ZTF Data Release 3

was expected to contain stars and periodic variables (no transients)

11 of 25

Experiment

Second Try: Zwicky Transient Facility DR3

  • Feature extraction

  • Anomaly detection algorithms:

    • Isolation Forest
    • Local Outlier Factor
    • Gaussian Mixture Model
    • One-Class Support Vector Machine

  • Initial data: 2.25 million objects

  • Expert analysis: 277 objects

Results:

  • 68 % (188) - artifacts, bogus
  • 24 % (66) - previously cataloged
  • 8 % (23) - discoveries

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

12 of 25

Experiment

Second Try: Zwicky Transient Facility DR3

  • Feature extraction

  • Anomaly detection algorithms:

    • Isolation Forest
    • Local Outlier Factor
    • Gaussian Mixture Model
    • One-Class Support Vector Machine

  • Initial data: 2.25 million objects

  • Expert analysis: 277 objects

Results:

  • 68 % (188) - artifacts, bogus
  • 24 % (66) - previously cataloged
  • 8 % (23) - discoveries

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

Still super high!

13 of 25

Philosophical question:

What is a scientifically interesting anomaly?

Problem: Still high incidence of “non-important” anomalies (68 % for ZTF DR3)

Goal: Maximize the number of scientifically interesting anomalies shown to

the expert

Strategy:

Incorporate human knowledge in the machine learning model

a. k.a. adaptive learning ...

14 of 25

The recommendation system can get better with time ...

Machine Learning only produces recommendations

Observed data set

Anomaly detection algorithm

Potentially interesting anomalies:

  • Candidate 1
  • Candidate 2
  • Candidate 3
  • ...
  • ...

Expert analysis

Not interesting

Interesting

Very interesting!

Get more data

or

Publication

15 of 25

15

Human in the loop:

Active Anomaly Detection

Data

Object with highest anomaly score

Anomaly Detection Algorithm

Show to the expert:

Is this an anomaly?

Yes/No

Das, S., et al., 2017, in Workshop on Interactive Data Exploration and Analytics (IDEA’17), KDD workshop, arXiv:1708.09441

16 of 25

16

Fraction of true anomalies

AAD was able to increase the incidence of true anomalies

presented to the expert in 80%

Then make it more complicated ...

AAD on real data: The Open Supernova Catalog

17 of 25

17

  • It requires some time for changes to be effectively incorporated

  • Late queries:
    • Objects which were not found in the static case
    • Higher concentration of true anomalies

Then make it more complicated ...

AAD on real data: The Open Supernova Catalog

14

20

18 of 25

18

33

14

Fast identification of binary microlensing event

Then make it more complicated ...

AAD on real data: The Open Supernova Catalog

19 of 25

19

Summary

  • Detection is merely one item in the process of scientific discovery

  • Coherent combination of expert knowledge with machine learning algorithms can speed-up the interpretation of detect anomalies

  • In this context, expert feedback is irreplaceable and the bias it introduces is a feature

We are currently applying AAD in ZTF DR4 …

news should be out soon! Stay tuned!!

https://snad.space/

20 of 25

20

Thank you, Merci, Спасибо

From the SИAD team!

21 of 25

Extra slides

22 of 25

Experiment

Second Try: Zwicky Transient Facility DR3

  • Feature extraction

  • Anomaly detection algorithms:

    • Isolation Forest
    • Local Outlier Factor
    • Gaussian Mixture Model
    • One-Class Support Vector Machine

  • Initial data: 2.25 million objects

  • Expert analysis: 277 objects

Results:

  • 68 % (188) - artifacts, bogus
  • 24 % (66) - previously cataloged
  • 8 % (23) - discoveries
  • 1 RS Canum Venaticorum star
  • 1 red dwarf flare
  • 4 Supernova candidates

There are no SN in ZTF Data releases ...

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

23 of 25

Curiosities

From ZTF DR3: Examples of artifacts

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

24 of 25

Curiosities

From ZTF DR3: IW Dra and its echoes

Malanchev et al., 2020 - MNRAS - https://arxiv.org/abs/2012.01419

25 of 25

Curiosities

From ZTF DR3: The Barcelona asteroid

Malanchev et al., 2021 - MNRAS - https://arxiv.org/abs/2012.01419