Getting Started with pMatlab
pMatlab was created at MIT Lincoln Laboratory to provide easy access to parallel computing for engineers and scientists using the MATLAB® language. pMatlab provides the interfaces to the communication libraries necessary for distributed computation. In addition to MATLAB®, pMatlab works seamlessly with Octave, an open source MATLAB® toolkit. This page provides an overview on how to create pMatlab code.
This page focuses on converting your serial code to pMatlab code. Once you have pMatlab code you should review the following guides to launching pMatlab jobs on the SuperCloud systems:
Creating pMatlab Codes: The Basics
To get started converting serial MATLAB® or Octave codes into parallel MATLAB® code using pMatlab, we recommend our online course, Practical HPC. The PGAS Example: pMatlab Implementation section, within the Distributed Applications module provides a detailed introduction, working step-by-step through the Parameter Sweep application that is in the examples directory in your home directory. It should take approximately 30-45 minutes to work through the section.
Creating pMatlab Codes: Single Program Multiple Data Considerations
The programming model used by pMatlab is "Single Program, Multiple Data" (SPMD) which means that every process of your application is executing the same commands (program) but on different data. While the use of multiple processes speeds up the computation, it has significant impact on any I/O that your program executes. Some common concerns and the appropriate parallel programming techniques include:
I/O Considerations: Saving a MATLAB® workspace or variable
When saving your workspace, you need to remember that each processor has a copy of each variable and a distinct workspace. To ensure that you properly save all the data and don't overwrite the data on one processor with that from a second processor you need to:
- Save the workspace of each processor into a distinct file
- Differentiate between the values of variables on separate processors
The easiest way to do this is to tag variables with the label "local", e.g. myVar_local, and tag the files with the processor ID (Pid) which is unique. If the data is uniquely tagged it will be possible to reconstruct the complete workspace and data structures in a post-processing step.
Saving Distributed Workspaces
The example code snippet below saves the entire
workspace of the processor into a file called output.<pid>.mat
.
Saving a Distributed Variable
To save individual variables within the workspace follow the template code block shown below.
filename = ['output.' num2str(Pid) '.mat'];
save(filename, 'variable1', 'variable2', ..., 'variableN');
If you do not specify a path for your output data, it will be saved in the directory where you ran the code, generally the current working directory.
Note: the same rules apply when writing out data files from all processes, each file must have a distinct file name. The easiest way to accomplish this is to use the process id, Pid, as a unique identifier.
Creating Unique Random Numbers
The Random Number Generator in MATLAB® has been designed to start from a set value every time MATLAB® is restarted. Since a pMatlab job starts a new MATLAB® process on a remote processor, each MATLAB® that is part of your job will start with the same value. In general, in order to get good statistics you want to start from different values on each processor. To accomplish this, you want to seed the random number generator so that each process has a different set of random numbers. MATLAB® has changed the recommended methods for achieving the generation of different random numbers and we recommend checking the MATLAB® documentation when modifying your random number generation routines. (See Mathworks document: "Generate Random Numbers That Are Different")
For example, the following code segment can be used with pMatlab in order to generate different seed on each processor.
More details are available at the following web page:
http://www.mathworks.com/help/matlab/examples/controlling-random-number-generation.html
Using Global Variables
Global variables are permitted in pMatlab code, but remember that in this distributed memory model a variable that is global to the code is local to the processor and each instance of the global variable can have a different value on each processor which is running your code. Great care should be taken when using global variables in parallel, you may want to explicitly set the value somewhere or read it into from a file to ensure consistency.
Also, note that when you clear global variables in your code, through the use of "clear all" or "clear global" commands, all of the pMatlab library commands will also be cleared causing unpredictable behavior in your parallel environment.
How to Run a pMatlab Job
For information on how to launch your pMatlab job to run on the SuperCloud system, see the Launching pMatlab Jobs page.
Troubleshooting pMatlab Problems
If you run into problems launching your pMatlab jobs, or you are getting errors during job execution, see this page on Troubleshooting pMatlab Jobs.