...
delete(gcp);
exit;
Expected Output
The batch bash script used for this example is very similar to the one used for Running in Parallel on One Node except ntasks was 4 instead of 20. When completed the out put file for the output file will contain the MATLAB version information, followed by information about the cluster and the pool. Next is the results of the jobs:
...
As you can see each worker is aware of what rank they are. A simple way to utilize this would be to name your data sets somethings along the lines of dataset1, dataset2dataset_1, dataset_2, ..., etc. and have each worker use
load(['datafiledataset_' num2str(labindex) '.ascii']).
parafeval
parafeval is useful for when you want to run a loop that you can stop early. For example, if you are analyzing a very large data set, you may want to stop when the results are 'good enough' instead of waiting for the entire set to be completed. parafeval is also useful for running functions in the background because it doesn't block MATLAB from continuing to work. parafeval will split up the workers in the pool itself.
Syntax
f = parfeval(fcn,numout,in1,in2,...)
fnc: the function to execute
numout: expected number of outputs from the function
in1, in2: the parameters for the function
f: a future object. By itself, it doesn't mean a lot, the data has to be extracted from it when it's ready with fetchNext(f).
If you want to break out of a loop using parafeval use cancel(f) to stop the evaluation of the future object.
Example
This example shows how you can use parafeval to evaluate a function an get the results as they are available.
%=================================% Simple example of parfeval% From MATLAB documentation% must run setPool before this%=================================% evaluate the magic function 10 times
for idx = 1:10 f(idx) = parfeval(@magic, 1, idx);end% preallocate place to store results
magicResults = cell(1,10);% get the results and put them in the array
for idx = 1:10 [completedIdx, value] = fetchNext(f); magicResults{completedIdx} = value fprintf('Got results with index %d.\n', completedIdx);end%clean up the pool and exit
delete(gcp);
exit;Expected Output
The bash script for this example is identical to the script for previous spmd example. The output file has MATLAB's version information, followed by the cluster and pool properties. The actual results of the MATALAB should be 'Got result with index: 1', then 2, then 3, ... etc., up to 10. If this was a much larger job, then the indexes, may not be in order; it would all depend on which future object was ready for fetchNext(f) first.
Quick Guide
Brief explanation of terms to know when using MATLAB's Parallel Computing Toolbox.
| worker | The MATLAB computational engine that processes the code. Can also be called a lab. Each worker is assigned a number called its rank. |
numlabs | Returns the total number of workers available. |
| labindex | Returns the rank of the worker. |
| parpool | The parallel pool of workers. It is created in the MATLAB program with parpool('local', #ofWorkers). The number of workers is the number of cpus requested on the node - 1. |
| gcp | MATLAB function that will get the current pool. At the end of the parallel code using delete(gcp) will neatly shutdown all the workers. |
| parfor | The parallel for loop. Splits the iterations of the for loop among the workers to be done in parallel. The step of the iteration must be +1, the iterations cannot rely on one another, parfor loops cannot be nested, and you cannot break out of the loop early. |
| spmd | Single Program Multiple Data. Allows for control over each worker. Use the worker's rank to assign jobs. Useful for when you want to do the same thing to different data sets. Like parfor: cannot nest spmd blocks in each other and cannot break out of thembreak out of them. |
parafeval | Parallel Function Evaluation (not official, just assuming that what it stands for) will allow you to run functions in parallel without having MATLAB be blocked from running other things. Call parafeval as many times as you want the function to run in a loop and call fetchNext to get the results. |
Trouble Shooting
Sometimes things just don't go right, here are some tips to help. If none of these tips help email us at rc-help@rit.edu.
- Check the .err files for your job, they will help you the most. Often it will explicitly state the error that occurred in a specific file on a specific line. If you don't understand the error message try to look it up online. Always double check your spelling and capitalization.
- Run
sacct. This command tells you the state of each of your jobs and the exit code. You might get a state ofOUT_OF_ME+and an exit code of 125. This means you need to increased the amount of memory you request in your .sh file. Remember#SBATCH --memuses megabytes by default. Append M, G, or T to the end of the number to request megabytes, gigabytes, or terabytes. - Your job might not be running because there are no resources available. Run
squeueand look under theREASONcolumn. If you see (Resources) that means you need to wait for other jobs to finish before yours can run. You can also see this by runningsinfo. If you look in the rows for the tier you are trying to run your job in, you'll see none with theSTATE idleormix.
More Reading
The topics touched on in this documentation will be enough to get you up and running with parallel code for MATLAB. However, there is much more to MATLAB's Parallel Computing Toolbox, such as sending specific messages between workers and increasing the performance of your parfor loops.
If there are any further questions, or there is an issue with the documentation, please contact rc-help@rit.edu for additional assistance.