Subsections


USING RMCS, RCOMMANDS AND AGENTX

References [#!globus!#,#!globus2!#].

This chapter describes the input file format for RMCS which is based on the format used by Condor, DAGMan and Condor-G. Information is based on MCS Developers' release notes v1.4.0 and v.1.4.1 last updated Sept'2007.

The format of the my_condor_submit input file is heavily based on the condor input file format and in fact, many of the input lines come directly from condor_submit input files. The only difference between a condor_submit input file and a my_condor_submit input file is that the my_condor_submit input file can take a few extra input lines. All lines recognised by my_condor_submit are listed below and the context in which they can be used described in the following sections. Any line specified that is not recognised by my_condor_submit will cause a warning to be given.

Job Definition Commands

ExecutableType

Specifies if the job is an MPI or serial executable. Note that it is likely that some resource job managers contain bugs when used for single processor MPI jobs or multi-processor "serial" jobs (to maximise memory or for executables that manage there own inter-thread communication).

Arguments

This specifies any command line arguments needed to run the job.

Notes: Arguments input by this method replace any arguments from the globusRSL line. This is parsed as a string. This should be fixed when the parser is re-wored.

ExtraPreScript

Runs a user specified command or script at the end of the preScript.

ExtraPostScript

Runs a user specified command or script at the start of the postScript.

Resource Selection Commands

These four tags allow the user to select subsets of the machines in the resource database to submit to. Three of the tags were new in version 1.4. The interaction is possibly non-obvious - it's a logical AND for all four tags, with the proviso that the defaults are to exclude noting and include all machines and all Grids. Note that in addition to these commands, it is possible for the Grid administrators to disable machines for all users (for example, to prevent failing machines from being used). MCS will exit with an error if no machines can be found according to the preferences expressed using the tags below.

preferredMachineList

This line is used to specify a list of resources to metaschedule to.

preferredGridList

This selects a subset of machines to schedule to.

excludedMachineList

Lists machines not to be submitted to.

excludedGridList

This excludes a subset of machines from consideration for scheduling.

Data Staging and Retrieval

The following lines all relate to staging of data and executables to the execute machine or retrieval of program output from the execute machine.

pathToExe

This specifies the SRB path to the ``Executable''. This path is not the SRB full path to the executable. The 'architecture' string of the machine the job runs on is appended to the pathToExe at job run time. For example if

[frame=single]
 Executable = ossia.x
 pathToExe = /ngs/home/joe-bloggs.ngs/test
 preferredMachineList = vidar.ngs.manchester.ac.uk-serial
                        ngs.rl.ac.uk-serial

and the job runs on ngs.rl.ac.uk-serial, then RMCS will look to upload and run the executable in SRB at /ngs/home/joe-bloggs.ngs/test/linux-64-serial/ossia.x.

The mapping from machinename (eg ngs.rl.ac.uk-serial) to architecture can be found in the 'architecture' column of the MCS Grid Hosts table.

Sdir

This specifies a collection to get files from or to upload files to within the SRB as part of the job submission.

Suggested changes are to Remove "S" from command name with Sdir to remain as a synonym with warning.

Sdirect

This specifies whether data transfer to / from the SRB should be directly between the execute machine and the SRB vault. Direct transfer leads to much improved performance but requires extra firewall holes. Set this to false if you are unable to transfer directly between your chosen execute resource and all of the SRB vaults. All e-Minerals vaults and execution resources allowed direct transfers.

Sforce

This line specifies whether to overwrite local / SRB files when getting / putting files. A value of ``true'' will allow overwriting and ``false'' will not allow overwriting. Note a value of `false' will cause my_condor_submit to fail with an error if files being retrieved / uploaded already exist.

Sget

This specifies a list of files to retrieve from the previously specified collection within the SRB at the start of the submitted job. Note wildcards (*) are now properly supported and can be used as they would be with any Linux etc. command line command. Also, recursion is also allowed (i.e. subdirectories are downloaded) if a related Srecurse line is specified for the continaing Sdir line.

Files will only be retrieved recursively if used in conjunction with the SRecurse line described below.

Suggested changes are to: Allow any number per Dir block; Remove "S" from command name; Sget to remain as a synonym with warning; Better specification of wildcard arguments; and Expansion of wildcards at submit time .

Shome

This line is used to specify the location of the Scommands on the machine on which the Sput / Sget commands are called. You need not specify this line if the Scommands are in /home/srbusr/SRB3_3_1/utilities/bin.

Sput

This specifies a list of files to put into the previously specified collection within the SRB at the end of the submitted job. Wildcards (*) are now probably supported and can be used in the same manner as with normal Linux etc. command line commands. Directories can be uploaded recursively when used in conjunction with the SRecurse line described below.

Suggested changes are to: Allow any number per Dir block; Remove "S" from command name; Sput to remain as a synonym with warning; and Better specification of wildcard arguments

SRecurse

This line specifies whether to recursively upload / download files to / from the SRB. Used in conjunction with wildcards in Sget / Sput commands.

PerArch

Turns on architecture-specific download / upload for this dir block.

Metadata management

The following lines all relate to obtaining and uploading of metadata to the e-Minerals metadata database. It is worth noting that metadata parameters are limited in length (currently to 50 characters for the value and 30 for the name). MCS will detect cases where this limit will be exceeded and attempt to warn the user to minimise the risk of loss of data integrity. This warning is achieved by inserting "***TRUNCATED DATA***" at the end of the stored string in the database and writing a warning to out.err which includes the original (un-truncated) string.

AgentX

This line is used to instruct my_condor_submit to collect other data values from within a CML file and store them as metadata. The annotation will be created with the name as specified as the nameForAnnotation part of the line and will be retrieved from the file specified by the filename part of the line. The value will then be selected by evaluating the path specified by the rest of the line. A full description of this evaluation is given below.

AgentXDefault

This line is used to instruct my_condor_submit to extract metadata from a specified CML file. The metadata extracted will consist of all of the parameters within the first parameterList element within the CML file - this will typically consist of simulation input parameters. Also all of the metadata elements within the first metadataList will be extracted. All metadata extracted will be stored as annotations on the created data object. In addition an attempt is made to locate a UUID stored in the file. If this is found and it passes (partial) validation then this is stored in the database. Otherwise a null UUID (00000000-0000-0000-0000-000000000000) is stored.

AgentXHome

This line is used to instruct my_condor_submit as to where AgentX should look for its mappings and ontology if the default location is not to be used. This location must have the same directory structure as that seen at the default location.

AgentXLibs

This line is used to instruct my_condor_submit as to where AgentX is installed on the execute machine if not in the default location or in a location that my_condor_submit does not know about.

GetEnvMetadata

This line is used to instruct my_condor_submit as to whether or not it should collect metadata regarding the submission and execution environments of the jobs which will then be stored within the metadata database. All metadata collected will be stored as annotations on the created data object.

MetadataString

This line is used to instruct my_condor_submit to store a specified string of metadata with a specified name within the created metadata data object. The string will be given the name as specified by name and will have value as specified by value.

RdatasetID

This line is used to specify the ID of a dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used instead of the RStudyID and DatasetName lines.

RDatasetName

This line is used to specify a string to be used as the name of a created dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used in conjunction with the RStudyID line instead of the RDatasetID line.

RDesc

This line is used to specify the name to be given to the created data object within the metadata database. A data object with name equal to this line and URL equal to the preceeding Sdir line will be created to contain all harvested metadata.

RHome

This line is used to instruct my_condor_submit as to where the RCommand binaries are installed if they are not in the default location or a location that my_condor_submit already knows about.

RStudyID

This line is used to specify the ID of a study in which to create a dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used in conjunction with the RDatasetName line instead of the RDatasetID line.

Meta-scheduling

The following lines all relate to meta-scheduling across the e-Minerals minigrid resources.

jobType

This line is used to specify the type of job being submitted which must be either `performance' or `throughput'. Choosing `performance' results in the job being submitted to a cluster machine while choosing `throughput' will submit to a condor pool.

numOfProcs

This line is used to specify the number of processors to be used on the remote machine

pathToExe

See above.

Standard Condor Tags

The following lines are all standard condor input file tags that my_condor_submit understands and will accept as part of its input file.

Error

This line is used to specify the name of the file to which stderr should be redirected for the main part of the submitted job i.e. the stderr from the actual job execution rather than data-staging sections of the submission.

Executable

This line is used to specify the name of the executable to be run for the main part of the submitted job i.e. the the actual job execution rather than data-staging sections of the submission.

GlobusRSL

This line is used to specify a additional arguments etc to the main part of the submitted job. Can be used to specify stdin, stdout and stderr for the main section of the job if desired

Example:

[frame=single]
GlobusRSL = (stdin=file.in)(stdout=file.out)(arguments=-f
            example\_argument)

GlobusScheduler

This line is used to specify a particular machine and jobmanager to submit to and can only be used when not meta-scheduling. This line can be used to submit to a machine that my_condor_submit does not know about as long as the specified jobmanager is one which my_condor_submit supports.

Input

This line is used to specify the name of a file to be used for stdin for the main part of the submitted job. Can be used instead of the (stdin) section of the globusRSL line.

Log

This line if specified will be ignored by my_condor_submit which will instead use the default value.

Notification

This line is not currently supported by the NGS RMCS server pending debug.

This line is used to specify whether you want condor to notify you of the status of the main part of the submitted job once it finishes by email. Possible values are 'always', 'complete', 'error' or 'never'

Output

This line is used to specify a name for the file to be used for stdout for the main part of the submitted job. This file will be left on the remote machine to be uploaded using a relevant Sput line if desired.

Queue

This line is used within condor to tell it to submit the job and my_condor_submit uses it for the same purpose, however it is not actually needed by my_condor_submit and will actually just be ignored if specified.

Transfer_Error

This line is used to specify whether to return the stderr from the execution machine to the local machine (a value of `true') or leave it on the execution machine (a value of `false') to be uploaded using an appropriate Sput line.

Transfer_Executable

This line is used to specify whether my_condor_submit should transfer the executable from the local machine to the execute machine rather than using the SRB. This doesn't make sense for meta-scheduled jobs. A value of ``true'' will transfer the file from the local machine while ``false'' will not

Transfer_Input_Files

This line is used to specify a set of files that should be sent with the executable to the execution node within a condor pool. This line does not make sense when submitting to anything other than a condor jobmanager and so will be ignored by my_condor_submit in this case. The files will be transferred from the condor pool's submit node (to which my_condor_submit submits its job) to the relevant execution machine after they have been downloaded using the pre stage of the my_condor_submit job.

Transfer_Output

This line would be used to specify whether to return the main job's stdout file to the submission machine. However this does not make sense within the my_condor_submit context and so is ignored and the output file is always left on the remote machine to be uploaded with a relevant Sput line.

Universe

This line is included to provide backward compatibility with older versions of my_condor_submit and is used to tell condor that it should use Globus to submit to the remote execution machine. The only permissible value is `Globus'

x509_user_proxy

This line is used to specify the location of the user's x509 certificate's certificate proxy should it not be in the location specified by the X509_USER_PROXY environment variable. The value specified here will override the value retrieved from grid-proxy-info and the environment variable. This line is designed to allow my_condor_submit to be used when the user has gsissh'd into a submit machine

Rob Allan 2009-11-10