1 of 26

DRP data flow

Yusra AlSayyad

Google Slides

DF meeting

Feb 11 2025

1

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

2 of 26

Introduction

I know you just want me to give you a list of all the datasetType names that will be generated during DR1 with a categorization of which will be final, which will be intermediate.

But the DR1 pipeline freeze won’t be until after April 2026.

We don’t know what datasetTypes will exist in April 2026

So, I’m going to focus on the parts that won’t change as much

And show you how to find out the parts that will (Timeline for automating this is LSSTCam commissiong)

And we can check that we’re all on the same page with respect to assumptions

2

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

3 of 26

3

As of April 9 2024

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

4 of 26

Assuming that the categories of data to be transferred include:

  • Quality Monitoring:
    • Metrics: All datasets with MetricMeasurementBundle storage class.
    • Plots: All datasets with Plot storage class.
  • Final data products
  • Inter-stage* (I’ll clarify what I mean by stage in a couple minutes)
    • Inputs for global steps -> USDF
    • Outputs from global calibration → UK, Fr

  • On demand inputs for reproducing errors.

Assumption: Data products are assumed to be transferred as soon as they are produced.

4

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

5 of 26

5

As of Feb 10 2025

Assumes all DF assignment will be by continuous regions of tract

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

6 of 26

6

a sharding step

A “Stage” a.k.a.“checkpoint step”

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

7 of 26

subset is the middleware name for a collection of Tasks “Stages” are a conceptual collection of “steps”

Stages for humans:

  • The DRP pipeline will split into four high-level "checkpoint" steps after which we expect to perform significant analysis, before proceeding with processing.
  • These stages will change slowly
  • => Focusing on these today

Steps for robots

  • For the contexts when sharding or partial-coverage tract avoidance is necessary, each checkpoint step is split into multiple "sharding" steps.
  • Change quickly.
  • Anything I tell you today will be out of date in a few weeks

7

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

8 of 26

8

1-

initial

2-recalibration

3-coadds

4-revisit

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

9 of 26

The plan for implementing this (DM-47320) lives on this Slack Canvas:

Each “stage” a.k.a. “checkpoint step” also includes any analysis tasks that are necessary to validate that step's outputs. Draft through stage 3 on ticket branch.

Steps will proliferate. Stages will consolidate.

Steps will be re-named as components of their stage. For example:

  • 1a-initial-detectors
  • 1b-initial-consolidate-visits
  • 1c-initial-consolidate-all-tracts
  • 1d-initial-consolidate-global

  • 2a-recalibration-global
  • 2b-recalibration-tracts
  • 2c-recalibration-visits
  • 2d-recalibration-tracts
  • 2e-recalibration-global

If small dataset, middleware will handle launching of whole stage

If full-DRP scale that requires sharding (“groups”) , cm-service will handle launching of whole stage.

9

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

10 of 26

Stage 0: Calibration Production

Inputs: flats/darks/bias

Output: combined flats/darks/biases

Calibration collections are going to evolve rapidly while we’re on sky

Should be possible to process a final set for DRPs at the USDF in < 2 weeks

Example transfers today: Outputs (along with skymaps, refcats) to be distributed to UKDF, FrDF for stage 1. Also, Includes things like skyFrames (for SkyCorr), fgcmLookUpTable, real-bogus models, and photo-z models.

10

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

11 of 26

Stage 1-initial: Single Frame Processing

  • Input: raws, calibrations, refcats
  • Output: calexps (initial_pvi), preSources, associated (tract-matched) presources
  • Example transfers today: isolated_star_presources, isolated_star_presource_association, visitSummary

  • QA:
    • Matched Visit Metrics/Plots on preSources
    • Visit-level metrics

  • Before moving on:
    • Do we understand every failure?
    • Are our criteria for excluding bad visits from coadds OK?

11

Short term sharding steps:

1-initial:

  • 1a-initial-detectors
  • 1b-initial-consolidate-visits
  • 1c-initial-consolidate-all-tracts
  • 1d-initial-consolidate-global

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

12 of 26

12

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

13 of 26

Stage 2-recalibration: Global Recalibration

  • Input: associated (tract-matched) presources, visitSummary tables
  • Output: photometric calibration, astrometric calibration, global backgrounds, updated Visit Summaries.

The last step in stage 1 is a global makeCcdVisitTable/makeVisitTable (which takes visitSummary today

The first step in stage 2 is a global FGCM step, which will be run at the USDF.

Example Transfers Today:

  • → to USDF: isolated_star_presources, isolated_star_presource_associations, visitSummary
  • -> from USDF: fgcmPhotoCalibCatalog

13

Short term sharding steps:

  • 2-recalibration:
  • 2a-recalibration-global
  • 2b-recalibration-tracts
  • 2c-recalibration-visits
  • 2d-recalibration-tracts
  • 2e-recalibration-global

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

14 of 26

Pause between stage 2 and stage 3 to check on the quality of the calibrations using the recalibrated preSources (matchedVisit and maybe visit-level metrics)

FGCM plots are already at USDF

matchedVisit plots/metrics need to be copied back.

Use astro/photometric calibrations and final PSFs to run a pilot run of stage 3 and beyond with these calibrations and new pipeline candidate.

We expect to be fixing bugs in the pipelines that affect stage 3 while e.g. stage 1 is running.

14

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

15 of 26

Stage 3-coadds: Coaddition and Coadd Processing

Input:

  • calexp/initial_pvi images to be coadded,
  • finalVisitSummary to decide what should be coadded,
  • finalized_src_table, preSourceTable_visit flags get propagated to Object Table

Output: Coadds, Diffim Templates, Object Tables (today those are spelled deepCoadd, goodSeeingCoadd, objectTable_tract)

15

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

16 of 26

Pause to assess Objects, Coadds, and Templates

Example transfers today:

Send data back to USDF for global aggregation:

  • Healsparse property maps
  • Object table metrics
  • Objects to run global rho statistics

16

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

17 of 26

Stage 4-revisit: Difference Imaging Analysis (DIA)

Input: images, finalVisitSummaries, diffIm templates, Objects

Output: DIASource, DIAObject, and ForcedSource Tables, PVIs, image differences.

Categories of Tasks include:

  • reprocessVisitImage: to make final PVIs and final Source Tables
  • DIA: e.g. getTemplate, subtractImage, detectAndMeasureSource, real-bogus
  • Multi-epoch forced Photometry: forcedPhotDiffim, forcedPhotCcd

17

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

18 of 26

“Stages” are a conceptual collection of “steps”

Stages for humans:

  • The DRP pipeline will split into four high-level "checkpoint" steps after which we expect to perform significant analysis, often before proceeding with processing.
  • Change slowly
  • Focusing on these today

Steps for robots

  • For the contexts when sharding or partial-coverage tract avoidance is necessary, each checkpoint step is split into multiple "sharding" steps.
  • Change quickly.
  • Anything I tell you today will be out of date in a few weeks

18

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

19 of 26

Some information about input and outputs of steps is available now

Docs on working with pipeline graphs

$ pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP.yaml#step3a --show pipeline-graph

$ pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP.yaml#step3a --pipeline-mermaid

19

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

20 of 26

Pre-rendered, full version of pipelines graphs available @ tigress-web.princeton.edu/~lkelvin/pipelines

20

For example: HSC DRP-RC2, step1

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

21 of 26

Currently, we manually track dataset type retention info: which datasetTypes fulfill which final data products

For DP1 this mapping is in progress on:

DP1 dataset planning

DM-47725 - needed for LSSTCam scale, so expect March.

21

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

22 of 26

Sharding-dimension information can be stored with pipelines now, but hasn’t been yet

22

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

23 of 26

In summary

Whether a dataset type is designated as intermediate or final is information that changes quickly.

Much of that information (what inputs are needed for a step) is machine readable now. Well before DR1, all of it (including is objectTable_tract the Object Table?) will be machine readable.

Implementation is still underway and requests welcome.

23

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

24 of 26

24

1-

initial

2-recalibration

3-coadds

4-revisit

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

25 of 26

Appendix

25

Vera C. Rubin Observatory | DF Workshop | 11 February 2025

26 of 26

26

DR3 Processing

DR2 Processing

DR2 Release

DR3 Preview

Vera C. Rubin Observatory | DF Workshop | 11 February 2025