Vous regardez une version antérieure (v. /display/rc/MATLAB) de cette page.

afficher les différences afficher l'historique de la page

« Afficher la version précédente Vous regardez la version actuelle de cette page. (v. 3) afficher la version suivante »

MATLAB is a very powerful tool used for the creation, analysis, and visualization of data. The MATLAB language is an easy-to-learn, high-level programming language. This documentation will not cover how to install or use the basics of MATLAB; there are many tutorials online that can teach you MATLAB. Here is MATLAB's official interactive, online tutorial and a link to their documentation. This document covers parallelization of your MATLAB code, i.e., breaking up long processes into chunks that can be processed at the same time. In MATLAB these chunks that process code in parallel are called workers. These workers can be spread out across multiple nodes and cores to utilize the Research Computing resources. This mean you can get your results much faster. To do this we will be using MATLAB's Parallel Computing Toolbox.

Getting Your MATLAB Code Ready

Parfor loop

MATLAB'S Parallel Computing Toolbox comes with the parfor loop construction. It splits the work done in the for loop among workers to complete the work in parallel. To use simply replace the for loop in your code with parfor, however, there are some rules:

  1. Parfor loops cannot be nested within each other
  2. Parfor loop iterations must be independent of each other, that is, the results of one iteration cannot depend on the results of any other iteration. (No recursion!)
  3. Parfor loop variables must be consecutive, increasing integers. The step used for the loop cannot be any number but 1. To work around this use where iValues is the step you would use in the regular for loop:
    iValues = 0:0.2:1; 
    parfor idx = 1:numel(iValues)
        i = iVal%ues(idx);
         ... 
    end
  4. You cannot break out of a parfor loop early, that means no break or return statements in your parfor loop. If this creates an issue with your loop try parfeval.

Example MATLAB Code

Using Parameters: If you use parameters in your program you must start the code with a function declaration. This function must have the same name as the MATLAB file.

function [a]=matlab_example(nloop, jobid)
% Example function to demonstrate parfor loop and parallel code
% nloop: number of iterations
% jobid: Slurm job id used to save output to specific file
if ischar(nloop) %checking data type of first input
    nloop=str2num(nloop);
end
%preallocate output array in a nloop x 1 matrix
a=zeros(nloop, 1);
% TIME CONSUMING LOOP
tic; %beginning of timing the loop
parfor i=1:nloop
    a(i)=IntensiveFunction();
end
time=toc %end of timing the loop
%save the output to its own file
save(['example_' num2str(jobid) '.out'], 'time', 'a', '-ascii')
end
function result=IntensiveFunction()
% Computation intensive calculation
    result = max(abs(eig(rand(500))));
end

Click here to download matlab_example.m.

Running Your Code

Using One Core and a Slurm Array

This example is not running code in parallel, but if you want to test your parfor loop, learn how arrays work in Slurm, or just run your MATLAB code multiple times in a row, this is useful. 

#!/bin/bash -l
#NOTE the -l flag!
#SBATCH -J matlab_example_multiple #Name of the job
#Standard out and Standard error output files %A is the job id and %a is the array index
#SBATCH -o matlab_example_multiple_%A_%a.out
#SBATCH -e matlab_example_multiple_%A_%a.err
# To send emails and notify when done
#SBATCH --mail-user abc1234@rit.edu
#SBATCH --mail-type=END
#SBATCH -t 1:00:00 # Request 1 hour MAX for the job
#Run on tier3
#SBATCH -p tier3
#SBATCH -mem=2000M #memory requirement of jobs in MB
#SBATCH --array=1-10 #create array of 10 to run code 10 times
#Load my environment
module load matlab
#Create the Job and Array ID to to pass into the function
slurmArrayID="${SLURM_ARRAY_JOB_ID}%{SLURM_ARRAY_TASK_ID}"
export slurm ArrayID
#Run the file with no graphical display
matlab -nodisplay -nosplash -singleCompThread -r "matlab_example(10, ${slurmArrayID});exit;"

Expected Output

All files will come in batches of 10 because we created an array of size 10. Each index in the array gets its own output, error, and results file.

Click here to download this example.

Running in Parallel on One Node

Now this is our first real taste of parallelization. You will notice in this example the word parpoolThis function creates the workers that process the chunks created by parfor. In the example we ask for one node ( --nodes=1) and that twenty tasks be done on it ( --ntasks-per-node=20). The number of tasks we ask for per node is the number of workers - 1 we want. One task is dedicated to running the MATLAB instance that manages the worker which is why we only request 19 with SLURM_NTASKS-1. The pool of workers is created in a seperate file setPool.m. It can be substituted for parpool('local', ${numWorkers}), but a separate file gives us more flexif we want.

setPool.m

%Basic setup for a parpool

pc = parcluster('local')    %use SPORC the local cluster

parpool(pc, str2num(getenv('numWorkers'))) %create pool from local cluster with ntask - 1 workers


matlab_parallel_example.sh

#!/bin/bash -l
#NOTE the -l flag!
#SBATCH -J matlab_example_multiple #Name of the job
#Standard out and Standard error output files %A is the job id and %a is the array index
#SBATCH -o matlab_example_multiple_%A_%a.out
#SBATCH -e matlab_example_multiple_%A_%a.err
# To send emails and notify when done
#SBATCH --mail-user abc1234@rit.edu
#SBATCH --mail-type=END
#SBATCH -t 1:00:00 # Request 1 hour MAX for the job
#Run on tier3
#SBATCH -p tier3
#SBATCH --mem=15000M #memory requirement of jobs in MB
#SBATCH --nodes=1    #Number of nodes to run on
#SBATCH --ntasks-per-node=20 #Number of workers -1 to create on each node
#Load my environment
module load matlab
# Run the file with no graphical display
# Create a pool of workers on the 'local' SPORC cluster. 
# It must be one less than the ntasks-per-nodes because one of those tasks run the 
# MATLAB instance itself and cannot be a worker
numWorkers=$((SLURM_NTASKS-1))
export numWorkers
# Run the MATLAB programg
matlab -nodisplay -nosplash -r \
 "setPool;matlab_example(200,${SLURM_JOB_ID});exit;"

Expected Output

There will be three created from this example: example_JOBID.out, parforTest.err, and parforTest.out. If everything goes right parforTest.err is empty. The results from all the workers are aggregated into the example_JOBID.out file. The parforTest.out file contains product information about MATLAB and then the statement: Starting parallel pool (parpool) using the 'local' profile ... connected to 19 workers, and then end with the time it took to complete the job. However, if you want to see parallel computing in action, in the body of the parfor loop add: fprintf('Loop %d completed\n', i) and run the job again. You will now see the iteration of the loop were not completed consecutively.

Download this example.

Parallel Computing with Slurm Array on a Single Node

This next example combines the previous two. Now we will be using multiple workers and repeatedly running a MATLAB script on one node. The .sh file will look very similar to the previous two.

#!/bin/bash -l

#NOTE the -l flag!
#SBATCH -J multiple_array_test #Name of the job
#Standard out and Standard error output files %A is the job id and %a is the array index
#SBATCH -o multiple_array_%A_%a.out
#SBATCH -e multiple_array_%A_%a.err
# To send emails and notify when done
#SBATCH --mail-user abc1234@rit.edu
#SBATCH --mail-type=END
#SBATCH -t 1:00:00 # Request 1 hour MAX for the job
#Run on tier3
#SBATCH -p tier3
#SBATCH --mem=15000M #memory requirement of jobs in MB
#SBATCH --ntasks=20 #Number of workers -1 to create on each node
#Load my environment
module load matlab
#create ID to identify job
slurmArrayID="${SLURM_ARRAY_JOB_ID}${SLURM_ARRAY_TASK_ID}"
export slurmArrayID
# get number for setPool
numWorkers=$((SLURM_NTASKS)-1)
export numWorkers
# Run MATLAB parallel program
matlab -nodisplay -nosplash -r "setPool;matlab_example(200,${slurmArrayID});exit;"

Download this example.

Trouble Shooting

Sometimes things just don't go right, here are some tips to help. If none of these tips help email us at rc-help@rit.edu.

  1. Check the .err files for your job, they will help you the most. Often it will explicitly state the error that occurred in a specific file on a specific line. If you don't understand the error message try to look it up online. Always double check your spelling and capitalization.
  2. Run sacct. This command tells you the state of each of your jobs and the exit code. You might get a state of OUT_OF_ME+ and an exit code of 125. This means you need to increased the amount of memory you request in your .sh file. Remember #SBATCH --mem uses megabytes by default. Append M, G, or T to the end of the number to request megabytes, gigabytes, or terabytes.
  3. Your job might not be running because there are no resources available. Run squeue and look under the REASON column. If you see (Resources) that means you need to wait for other jobs to finish before yours can run. You can also see this by running sinfo. If you look in the rows for the tier you are trying to run your job in, you'll see none with the STATE idle or mix.
  • Aucune étiquette