1 of 52

Selection Function Basics*�HWRix, D.Hogg, D.Boubert, A.Brown, R.Drimmel, A.Everall, M. Fousneau, A. Price-Whelan�2021, AJ, 162, 142

  • What is a selection function? (SF)
  • When do I need it?
  • How do I construct it (well)?
  • What do I do with it?
    • … a worked example ..
  • When is it simple, when tricky?

* n the context of Gaia data

2 of 52

What is a selection function, S(dobs)?

  • S(dobs) is the probability of object with properties dobs being in catalog

  • S(dobs) is multiplicative link between model predictions for quantities dobs and the catalog-incidence of objects with these dobs

inferred WD Spatial Density

density of WD Catalog Entries

3 of 52

When do I need an SF?

  • Fit a model to an ensemble of catalog data …

  • Understand the incidence of even one single object …

  • Make predictions for a specific experiment …

4 of 52

A selection function should be a function of …?

  • the minimal set of dobs needed to predict sample membership probability
    • for any object with actual (or counter-factual) properties q!!
    • if the selection depends on (many) other objects‘ properties: it gets complicated!!
  • dobs‘s for which
    • model predictions can be made (cf. selection of YSOs/QSOs by variability?)
    • S(dobs) can be practically quantified
  • dobs‘s should usually be „observables“
    • position, magnitude, parallax, Teff, [X/H], .. (observed vs “modeled“ quantities is a blurry division)
    • ..what about selecting on S/N cuts and error flags?

5 of 52

Fundamental Catalog SF vs Sample Cut

6 of 52

Basic & Common Issue:��The selection function depends on dobs that are not immediately relevant to the model‘s physics� e.g. dobs=(actual sky position, parallax, etc..)��⇒ integrate/marginalize out these dobs��

7 of 52

… for the case of modelling spatial densities …

8 of 52

Example: space density of WD‘s, given a catalog

  •  

this calculates an effective survey volume!

9 of 52

That‘s the upshot of this worked example

Density of WD Catalog Entries

inferred WD spatial density

… a 10.000-parameter fit

the selection function

10 of 52

To recapitulate: simple operative SF procedure

  •  

* traditionally often done by Monte-Carlo simulations (=integration)

11 of 52

Some Selection Function Discussion Topics

  • How to combine different selection functions? (incl. different catalogs)
    • a boolean AND becomes a multiplication of SF‘s

  • What if the SF of an object is a function of other objects‘ properties?

  • What are the consequences of making sample cuts on noisy data?
    • (When) is it sensible to make S/N sample cuts?

  • How well do we have to know the selection function?

  • What about „contamination“? Is that a selection function issue?

12 of 52

13 of 52

14 of 52

15 of 52

16 of 52

17 of 52

18 of 52

Selection function (in SDSS-V)

  • Basically all SDSS-V (MOS) studies are population studies (incl. discovery!)

  • Population studies require a selection function, S(q) ( 0<S(q)<1 )
    • minimize # of q needed to quantify sample membership
    • q should be „observables“
    • the model you want to fit must be able to predict the q !

  • A simple and stringent , S(q)
    • may give you fewer of the objects you love
    • but will allow you to do more with them

  • For „but what if…“, have a look at Rix, Hogg et al 2021arXiv210607653

19 of 52

20 of 52

21 of 52

22 of 52

scale length = f(𝛕 ,[Fe/H])

scale height = f(𝛕 ,[Fe/H])

𝛒*(𝛕 ,[Fe/H]) at Sun

𝛒*(𝛕 ) at Sun „SFR“

linear scale

density increases with R

23 of 52

24 of 52

25 of 52

26 of 52

velocity shift matrix for XA and XB

XA

XB

multi-epoch �obs. data

epoch 1

epoch 2

epoch 3

disentangled �spectra XA, XB

27 of 52

28 of 52

29 of 52

  • Selection „cuts“, say in m and 𝜛, do not need to be rectangular!
    • E.g. you can mimic S/N cuts

30 of 52

31 of 52

Initial candidate sample

  • „simple“ cuts:
    • 𝝕 > 3 mas; G < 20 mag; G+5log10(𝝕/100 ) „well below MS“
    • Gaia query yields

mostly „garbage“

32 of 52

A simple model for the luminosity-color function of WD‘s: their space density = f(MG,c) near the Sun

model parameters (about 10,000)

For the total number of WD‘s in the catalog at (M,c) the actual spatial positions are nuisance parameters, to be marginalized out.

33 of 52

  • This becomes: for uniform density

34 of 52

S/N cuts: e.g. 𝛡 / 𝛅 𝛡 > 20

  • Why? E.g. to get good `resolution‘ in MG
  • How does that affect the SF? … a lot!

35 of 52

Initial data-quality cut:

  • Rybizki, Green et al 2021
    • astrometric fidelity > 0.9 🡪 75% of sources were bad measurements
  • How does that affect the SF? Basically not at all!
    • eliminates only 2% of spectroscopically confirmed WD G<19.5

36 of 52

before after

37 of 52

Eliminate astrophysical interlopers

  • Binaries, Galaxies, ???
  • Exploit that ‚objects of interest‘, WDs, have very tight color-color locus
  • How does that affect the SF? … not at all, as Model(q)≡0 there

38 of 52

A closer look at S/N cuts:

  • If we choose SF(q), we must ‚model‘ q‘s 🡪 should be physical observables
  • Can we avoid making the SF an explicit function of the uncertainties?

SF to apply to the sample

39 of 52

40 of 52

Note: the SF is (can be) defined everywhere�..also for counter-factual objects 🡪 „upper limits“

41 of 52

42 of 52

Issues to be mentioned, but not pursued

  • How do I actually determine the selection function
    • Gaia Unlimited!!

  • What if I can‘t determine the selection function?
    • or How well do I need to know the selection function?

  • What if my selection is on very noisy q?

  • How to combine surveys

  • By what mechanism do we all come to a mutually agreeable scope definition & terminology/formulation?

43 of 52

Quality cuts?

  • Cuts designed to eliminate contamination/garbage
    • If they don‘t cut „objects of desire“ (OoD):
      • No change of the selection function
      • But the model got simpler/better (don‘t need to model CV,binary contamination)
    • If they cut out a fraction of OoDs, this has do be incorporated in S(o)

  • What about 𝝕/𝜹𝝕 > X ?
    • signal to noise cuts can often be expressed in terms of observables
      • thereby becoming model-predictable
    • Here: 𝝕/𝜹𝝕 ∼ 𝝕/flux0.5 (+position etc.) 🡪 S(mG, 𝝕 | M)

44 of 52

What does one get for different 𝝕/d 𝝕 cuts?

𝝕 cut

pseudo-𝝕/𝜹𝝕 cut

mG cut

45 of 52

A worked example: �a stellar census of the solar neighbourhood

  • Result: the space density of objects, as a function of M and color

with units:

46 of 52

How to devise a „good“ selection function?

  • Minimal set of „observables“ that
    • allow us to determine membership
    • yields sample good for constraining „model“, here φ0(M,c)
  • Conjecture: S(mG,𝟉) „magnitude and parallax“
    • Why mG? … max .. Obvious: „spectral S/N“, catalog availability
    • Why 𝟉 ? … min .. Obvious: „solar neighbourhood“
    • What about „parallax S/N“?
      • S/N 𝟉,min yield data that are good to model (small MG uncertainties etc..)
      • <S/N 𝟉,min > can be expressed as f(mG,𝟉)
      • Avoids taking many spectra at needlessly low S/Nspec, just to be volume-complete

47 of 52

A closer look at S/N cuts:

  • If we choose SF(q), we must ‚model‘ q‘s 🡪 should be physical observables
  • Can we avoid making the SF an explicit function of the uncertainties?

SF to apply to the sample

apply in modelling

48 of 52

A simple model for the luminosity-color function of object‘s space density = f(MG,B-R) near the Sun

model parameters (about 10,000)

49 of 52

  • This becomes
    • for the isotropic case and n(x)=constant

50 of 52

That‘s what happens for WDs

51 of 52

What about the SDSS-V 100pc or 200pc sample?

52 of 52

What would that mean for the 200pc sample?