pMatlab Job Errors

All standard error output from pMatlab jobs will go into the file MatMPI/pRUN_Parallel_Wrapper.err.

Finding the output file associated with the error

Each error line in the file is prefixed with the Pid where the error occurred (see example below):

$ cat MatMPI/pRUN_Parallel_Wrapper.err
Pid=1: Undefined function or variable 'dis'.
Pid=1: Error in pSUCCESS (line 21)
Pid=1:    dis(['Pid = ' num2str(Pid)]);
Pid=1: Error in pRUN_Parallel_Wrapper (line 25)
Pid=1: eval(m_file);
. . .

You can use the Pid to identify which .out file to look in to investigate the reported error.

Where to find the .out files

For jobs submitted using triples mode [Nnode Nppn Ntpp]: MATLAB® and Octave *.out files are written to one or more subdirectories:

./MatMPI/p<start-pid>-p<end-pid>_<compute-node-name>/<pMatlab-script-name>.<pid>.out

where:

p<start-pid> is the id of the first process running on the compute node whose name is <compute-node-name>
p<end-pid> is the id of the last process running on the compute node whose name is <compute-node-name>

If your job was run on multiple compute nodes, the log files will be spread across multiple subdirectories

In the example above, the reported Pid is 1, so you would look in the file <pMatlab_script_name>.1.out to investigate the source of the errors.

More information on where to find pMatlab output can be found on the Finding pMatlab Output page.

Finding the compute node associated with a process

Sometimes a compute node may get into a failure state, so it may be useful to know which compute node a failed process ran on. You can find the name of the compute node where the process was executed by looking in the file <pMatlab_script_name>.<Pid>.out. Look for the line of output that begins with MANYCORE JOB BEGIN.

For example:

>> >> >> >>
MANYCORE JOB BEGIN: on b-6-18-4

If you suspect that your process was running on a node that was in a failed state, contact supercloud@mit.edu and provide the job id and the name of the compute node.