pMatlab Job Errors
All standard error output from pMatlab jobs will go into the file
MatMPI/pRUN_Parallel_Wrapper.err
.
Finding the output file associated with the error
Each error line in the file is prefixed with the Pid where the error occurred (see example below):
$ cat MatMPI/pRUN_Parallel_Wrapper.err
Pid=1: Undefined function or variable 'dis'.
Pid=1: Error in pSUCCESS (line 21)
Pid=1: dis(['Pid = ' num2str(Pid)]);
Pid=1: Error in pRUN_Parallel_Wrapper (line 25)
Pid=1: eval(m_file);
. . .
You can use the Pid to identify which .out file to look in to investigate the reported error.
Where to find the .out files
For jobs submitted using triples mode [Nnode Nppn Ntpp]: MATLAB® and
Octave *.out
files are written to one or more subdirectories:
where:
p<start-pid>
is the id of the first process running on the compute node whose name is<compute-node-name>
p<end-pid>
is the id of the last process running on the compute node whose name is<compute-node-name>
If your job was run on multiple compute nodes, the log files will be spread across multiple subdirectories
In the example above, the reported Pid is 1, so you would look in the
file <pMatlab_script_name>.1.out
to investigate the source of the
errors.
More information on where to find pMatlab output can be found on the Finding pMatlab Output page.
Finding the compute node associated with a process
Sometimes a compute node may get into a failure state, so it may
be useful to know which compute node a failed process ran on. You can
find the name of the compute node where the process was executed by
looking in the file <pMatlab_script_name>.<Pid>.out
. Look for the line
of output that begins with MANYCORE JOB BEGIN
.
For example:
If you suspect that your process was running on a node that was in a failed state, contact supercloud@mit.edu and provide the job id and the name of the compute node.