ABCDEFGHIJKLMNOPQRSTUVWXY
1
Exit CodeIssue TypeProblemSolution / ProcedureComments
2
3
50664workflowtime/event issueIf there is a primary input to the wf, run ACDC with 10x splitting. else, If only 50664, report to PPD (Paolo) on JIRA and mark it as blocker. If other exit codes appear for the same task, focus on those instead of 50664 unless there are too many jobs failing. If it is a rereco, we make a recovery with the python script giving it lumisperjob 1.When reporting to PPD: please always mark it as blocker
4
50664workflowtime/event issueIf it is a ReReco you can also run ACDC with 10X splitting
5
71305agentWallclock timeACDC
6
8016workflowEventCorruption - Geant4 errorCreate a JIRA, mark it as blocker and report to PPD (Paolo Gunnelli)
7
71104sitesite was in drain when workflows were runningCheck to see if site is out of drain(https://dashb-ssb.cern.ch/dashboard/request.py/siteviewhistory?columnid=237), check ggus tickets and then create ACDC. Else create/update JIRA ticket.
8
8021workflow/site/transferfile read error1st time: create ACDC0, if still having issues check DAS for the MC file and run xrootd commands to check for unmerged files. If still file not found, issues persist(don't create ACDC2 for the same file as in ACDC1), tag transfers team (Donata/Pradeep) to the JIRA to have a look and open a GGUS ticket for the site.
9
8028workflow/site/transferfile read error1st time: create ACDC0, 2nd time check the files on DAS for MC file and run xrootd commands for unmerged files. If still having issues(don't create ACDC2 for the same file as in ACDC1), tag transfer team(Donata/Pradeep) to the JIRA to check the file and open a GGUS ticket for the site.
10
50664sitestage out errorCheck the site error and then run an ACDC
11
60450agent/site skipped fileIf Rereco, run a recovery with the python script. If MC workflow, focus on other tasks. 1. If the wf has requested number of events >=300k and stats >80% bypass and telling the reason, stats and tag PPD(Paolo Gunnelli) saying "please submit an extension if needed" and mark it as blocker 2. If the wf has events>=300k and stats<80%, make a jira and ask PPD if the stats are ok or should we bypass it? 3. If events < 300k, and wf has primary input with stats <90%, kill&clone. 4. If events < 300k, no primary input, stats> 80%, bypass else Kill&clone.
12
8001workflowEventGenerationFailureCreate a JIRA, mark it as blocker and report to PPD (Paolo Gunnelli). Also tag component as GEN-OPS. If the stats are ~85%, bypass.
13
8001workflowExternal LHEpProducer Error, error with child exit code 1Create a JIRA, mark it as blocker and report to PPD (Paolo Gunnelli). Also tag component as GEN-OPS. If the stats are ~85%, bypass.
14
99109sitestage out errorCheck the site error and then run an ACDC
15
50115sitestage out errorCheck the site error and then run an ACDC
16
50115workflowsegmentation violationCheck stats and other job failures. If none or this one dominates the failures over others, Create a JIRA, mark it as blocker and report to PPD(Paolo Gunnelli)
17
18
8012workflowInvalidReferenceCreate a JIRA, mark it as blocker
19
8006workflowProductNotFoundBypass this wf
20
8501workflowEventGenerationFailureCreate a JIRA, mark it as blocker
21
50660Gen or Config issueMaxPSScreate a JIRA, mark it as blocker and report to PPD (Paolo Gunnelli & Jordan Martin) also tag GEN-OPS
22
8009Agent
Illegal parameter found in configuration. The parameter is named:
23
enforceGUIDInFileName'
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100