RootInteractive expert tool for multidimensional statistical analysis, machine learning and analytical model validation.
Marian Ivanov (GSI Darmstadt), Marian Ivanov (UK Bratislava)
On behalf of ALICE collaboration
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Alice Run 3 - goals and challenges
2
A high interaction rate environment, pile-up, distortions fluctuation, etc. ... necessitates the use of advanced methods of data analysis. Experts and highly customisable tools are needed
Record large pp and Pb-Pb minimum bias sample
Tracking challenge: space charge in TPC detector distorting trajectories
PID challenge: Significant baseline bias and fluctuation
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive project
Multi-Dimensional interactive analysis - ML, fits, histograming, data aggregation on server (Jupyter notebook, python scripts) and on clients O(106-107 rows, 108 entries rows x columns) (browser)
3
Seeing is believing
Querying/Iterative Interacting/predicting is understanding
Reconstruction/distortion monitoring example
107 points x 50 attributes (space points,track, MC predictions)
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive - current ALICE expert projects
4
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Μulti-dimensional analysis vs shadow projections
Object and reference objects (models/reference models, MC/Data,Data/ref. data), should be compared optimally in the full relevant multi-dimensional space.
5
Track DCA bias due space charge distortion contribution before and after correction
Reference-ML prediction at low rate without SC
Without correction
With analytical correction
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive general purpose tool for multi-dimensional statistical analysis
6
By oversimplifying in analysis level, the explanations tends to be more complex resp. wrong
Our goal to provide a tool to deal with multidimensional problem simplify data analysis in many dimensions :
A detailed differential understanding of the detector system, MC and reconstruction/calibration performance is a prerequisite for the successful application of Machine learning in physical analysis
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Consideration: symmetries, alarms and invariants
Aggregation/projections of normalized data e.g. (data-model), (MC-Data), (data-symmetry) in multiple dimensions :
In RootInteractive supported mostly comparing data with reference “symmetric regression” and “template support” automatic comparison to reference data
7
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Multidimensional parameter optimization example - ALICE digital signal processing
8
Digital signal processing (13 parameters in example) needed for particle identification and data volume optimization. O(200000) parameter settings simulated/generated on server
Simulation and visualization/aggregation (NDPipeline+RootInteractive ) done by bachelor student, fully solving optimization problems of DSP (several attempts before failed)
Presentation, notebook, interactive dashboard and movie in RootInteractive tutorial:
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Machine learning in RootInteractive - differential validation of MC/data and ML models
Using external models:
RootInternactive extensions wrappers to scikit-learn and xgboost
Interactive validation in RootInteractive on client O(106-107) points
9
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Generalized linear (kernel) regression in RootInteractive - client side
10
Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables
Example of a cubic polynomial regression, which is a type of linear regression. Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.
regressionArray=[
{“name”:“regre1”, “varX”:[“x1”,”x2”...,”xn”], “varY”:”y1”, “weights”:”w”}
{“name”:“regreAgg1”, “varX”:[“x1”,”x2”...,”xn”], “varY”:”y1”, “weights”:”w”
“varAgg”:[“xagg1”,”xagg2”...,”xaggn”],”nbinsAgg”:[...],”rollingAgg”:[...]}
]
Example, declaring generalized linear kernel regression
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Data preparation - RDataFrame <-> awkward (new interface)
dEdx optimization example
11
Defining RDataFrame
Loading awkward array
Significant performance increase with parallel "RDataFrame ↔ awkward" in respect to previously used direct Tree queries interface. Used extensively, e.g. in fastMCKalman (distortion simulation/correction) and in trackCombinator (V0,cascade,cosmic,loop finder) prototyping use case studies
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive/Multi-Interactive project preparation and presentation
Expert data preparation
Data presentation:
12
The data is presented in a multidimensional way. The aim is to answer all questions within one meeting/session. If the information is not sufficient, new data sources to be agreed on.
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive pad map dashboard declarations
13
User defined RootInteractive properties are required to get the html output (explained in next slides)
Simplification of using interface using set of predefined parameterizable templates to define standard layouts, extending only user defined widget control.
Templates focussed mostly on comparison of data and reference data, resp comparison of their distributions for user defined selection
aliasArray, variables, parameterArray, widgetParams, widgetLayoutDesc, histoArray, figureArray, figureLayoutDesc = getDefaultVarsDiff()
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Functions on client - derived variables and functional composition
Many different ways to define derived variables and functional composition. Dependency trees to resolve functional and data source dependencies.
14
Custom javascript function (javascript function as a text)
Predefined parametric javascript function
Anonymous function (used for example in ND histograms as weights or variable)
Figure axis transformation
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Histogram declaration - calibration QA browser
Customizable Ndimensional histograms and projection. Example:
15
Set of the 2D, 3D (ND) histograms declared ()
Parameterized histograms:
Anonymous function (used for example in ND histograms as weights or variables)
QA example mean charge: left - raw values(varZ) , right-normalized to “expectation” (varZNorm)
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Webasm interface - under development
New functions/transformations/data sources using wasm:
16
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive - conclusion
RootInteractive is used extensively and successfully in many ALICE use cases for multidimensional analysis
Current use cases, now mainly related to detector(calibration, simulation, QA) and global reconstruction (RUN3, RUN2 as reference, Alice 3)
Pilot N-dimensional physical analysis with sampled/skimmed data is in the queue
17
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
Backup
18
DCA-DCA0 bias - rate evolution (4,330 kHz, 660 kHz)
DCA bias in phi direction strongly eliminated - residuals O(2 mm) comparable with intrinsic resolution of the tracks in vertex O(0.2 cm). New analytical fits - fitting also density profile
DCA bias in theta direction strongly eliminated. Remaining bias due charge up on C side - to add up in the analytical fit version (IFC and OFC fit). Charging up rate and time dependent (see Run1,Run2 studies)
19
Without correction
With analytical correction
Without correction
With analytical correction
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk
RootInteractive usage in ALICE
20
in following slides code snippet with user code declaration shown for illustration without further discussion
Machine learning - derived variables - RF regression - per channel QA example
Defining models:
Global (varListG) and local regression (varListLocal) extracting for basic calibration and QA properties of ALICE TPC calibration and QA variables
Robust local statistics - median and local std estimator for the outlier tagging and PDF description
21
statDictionary={"mean":None,"median":None, "std":None}
varListG=["lx","ly","GainMap","A_Side"]
varListLocal=["lx","ly","GainMap","roc"]
vars=[
"NClusters_Clusters_Mean",'NClusters_Digits_Mean',
'QMax_Clusters_Mean', 'QMax_Digits_Mean',
'IDC0_Mean','SAC0_Mean'
]
statOut=miErrPDF.predictStat(dfK0[variableX],statDictionary)
Per channel QA and example derived QA variables for NClusters_Clusters:
Marian Ivanov GSI, Marian Ivanov jr (UK Bratislava) CHEP 2023 Norfolk