OSG-PATh Staff Meeting�HTCSS Update�09/14/2022
1
Update since April 2022
�
2
2
HTCSS mechanisms to�help with�"Submit file challenges"�
3
Submit file challenges #1�Confusing Syntax for �Custom Attributes
Why do some entries need a plus and not others? Why do I sometimes need quotations and sometimes not?
executable = foo.exe
+JobDurationCategory = "Long"
transfer_input_files = my_data
queue
�
4
EXTENDED_SUBMIT_COMMANDS
5
# use just like normal submit keywords, the value will be converted into the correct type of data
executable = foo.exe
JobDurationCategory = Long
transfer_input_files = my_data
queue
submit file
EXTENDED_SUBMIT_COMMANDS @=end
WantGlidein = true
JobDurationCategory = "Long"
SingularityImage = "/whatever"
@end
condor_config file on AP
EXTENDED_SUBMIT_HELPFILE
6
# return the contents of this file to the user�EXTENDED_SUBMIT_HELPFILE = $(LOCAL_DIR)/submit_help.txt�# or return the URL to the user�EXTENDED_SUBMIT_HELPFILE = http://example.com/submit_help
> condor_submit -capabilities
Schedd ap0.chtc.wisc.edu
Has Late Materialization enabled
Has Extended submit commands:
accounting_group_user value is forbidden� LongJob value is Boolean true/false
ProjectName value is string� RetryIfTransferFails value is string
WantFlocking value is boolean true/false
WantGlidein value is boolean true/false
Has Extended help:
http://example.com/submit_help
Submit file challenges #2�Job Policy
Users specify poor job policies – examples:
periodic_release = True
7
Submit file challenges #2�Job Policy
Users specify poor job policies – examples:
periodic_release = True
8
New! A job explicitly placed on hold with condor_hold can now only be released via an explicit condor_release command.
Submit file challenges #2�Job Policy
Users specify poor job policies – examples:
periodic_release = HoldReasonCode == 13
�
9
Submit file challenges #2�Job Policy
Users specify poor job policies – examples:
periodic_release = HoldReasonCode == 13
�
10
File Transfer Error Propagation
11
Example of "condor_q –hold" (output directory missing on the Access Point)
Old Message:
Error from slot1@TODDS480S: STARTER at 127.0.0.1 failed to send file(s) to <127.0.0.1:50288>; SHADOW at 127.0.0.1 failed to write to file C:\condor\test\not_there\blah: (errno 2) No such file or directory
�New Message:�Transfer output files failure at access point TODDS480S while receiving files from execution point slot3@TODDS480S. Details: writing to file C:\condor\test\not_there\blah: (errno 2) No such file or directory
Submit file challenges #2�Job Policy
But why make users craft their own policy? Let them pick from a menu! �So instead of :
periodic_release = HoldReasonCode == 13��Have users do:
RetryIfTransferFails = Syracuse
�
12
Submit File Challenge #3:�Changes since user "cut-n-paste"
13
SUBMIT_TEMPLATE_NAMES = $(SUBMIT_TEMPLATE_NAMES) TensorFlow�SUBMIT_TEMPLATE_TensorFlow @=end� if ! $(1?)
error : Template:TensorFlow requires at least 1 argument - TensorFlow(flavor)
endif
Universe = container� container_image = TensorFlow$(1).sif
MY.wantCVMFS = true�@end
use Template : TensorFlow(95)
config file
submit file