Difference between revisions of "Tutorial: Using MATLAB PCT and Parallel Server in Red Cloud"
Line 97: | Line 97: | ||
</pre> | </pre> | ||
− | To run on MDCS | + | To run it on MDCS, we will use the createJob function. Here is an example of running the same code on a 2-core instance in Red Cloud. That instance is assumed to be already validated, and it is assumed to be the default cluster in the Parallel menu. |
<pre> | <pre> | ||
>> clust = parcluster(); | >> clust = parcluster(); |
Revision as of 22:54, 18 May 2017
Introduction
This tutorial assumes that you have completed the steps in MATLAB Distributed Computing Server (MDCS) in Red Cloud, and that you have already launched a Red Cloud instance of at least 2 cores which will serve as your MDCS "cluster". The MDCS cluster should also have passed the built-in validation test in your local MATLAB client, as explained on that page.
Overview of PCT and MDCS
What is MDCS?
Quoting from The MathWorks:
- MATLAB Distributed Computing Server™ lets you run computationally intensive MATLAB® programs and Simulink models on computer clusters, clouds, and grids.
- You develop your program or model on a multicore desktop computer using Parallel Computing Toolbox™ (PCT) and then scale up to many computers by running it on MDCS.
- The server supports batch jobs, parallel computations, and distributed large data. The server includes a built-in cluster job scheduler.
PCT opens up parallel possibilities
MATLAB has multithreading built into core libraries, but it mostly aids big array operations and is not within user control. The Parallel Computing Toolbox (PCT) enables user-directed parallelism that allows you to take better advantage of the available CPUs.
- PCT supports various kinds of parallel computations that you can run interactively...
- - Parallel for-loops: parfor
- - Single program, multiple data: spmd, pmode
- - Array partitioning for big-data parallelism: (co)distributed
- PCT also facilitates batch-style parallelism...
- - Multiple independent runs of a serial function: createJob
- - Single run of parallelized code: createCommunicatingJob
This tutorial focuses mainly on batch-style usage, because such computations do not tie up the console of your MATLAB client. It is therefore a suitable way to offload work to Red Cloud--which is most often seen as a distinct resource for handling long-running jobs. But along the way, you will also get an overview of how parallel programming works in PCT. For more in-depth coverage, you can refer to The MathWorks' own documentation.
Parallel resources: local and remote
You use the Parallel menu to choose the cluster where your PCT parallel workers ("labs") will run. Often this is the local cluster, which is merely the collection of processor cores on the machine where your MATLAB client is running. (This is the default unless you say otherwise.) However, you can also specify a remote cluster that has been made available to you through MDCS.
Two ways to use PCT, interactive vs. batch-style
- Interactive: Start local or remote pool using parpool, run PCT commands directly (can be scripted)
- Batch: Specify local or remote cluster using parcluster, submit jobs and task functions to it
Interactive PCT: Major Concepts
- parpool: pool of distinct MATLAB processes = "labs"
- - Differs from multithreading! No shared address space
- - Ultimately allows the same basic concepts to work on distributed-memory MDCS clusters, remotely
- parfor: parallel for-loop, iterations must be independent
- - Labs (workers) in the pool split up iterations
- - Communication among labs is needed because load balancing is built in
- spmd: single program, multiple data
- - All labs in the pool execute every command; labs can communicate
- (co)distributed: array is partitioned among labs
- - Treated as "multiple data" for spmd, but appears as one array to MATLAB functions
- - distributed from client's point of view; codistributed from labs' point of view
The MathWorks provides a complete list of PCT functions and details on how they are used.
Batch-Style PCT: Basic Workflow
Within a MATLAB session, submitting jobs to Red Cloud with MATLAB follows a general pattern:
- The user selects cacscheduler as the MATLAB parallel scheduler. This will cause parallel/distributed jobs to be submitted to Red Cloud with MATLAB for execution, rather than the local machine.
- Individual jobs are created, configured, and submitted by the user.
- Jobs may be composed of one or more individual tasks.
- If jobs consist of one's own MATLAB code, all files and associated job dependencies should be uploaded in advance to Red Cloud via gridFTP().
- Job status and results are stored in Red Cloud indefinitely until destroyed by the user using the destroy(job) function. A user can log in and view his or her past and present jobs from any client at any time.
- The exact ending time of a job depends on the nature of the tasks that have been defined within it and the available resources.
Jobs and tasks
figure, tree of objects
Types of jobs
PCT has 3 types of jobs: independent, SPMD, and pool
- Independent: createJob()
- Can contain many tasks; workers run the tasks one by one
- SPMD: createCommunicatingJob(...,'Type','SPMD',...)
- Has ONE task to be run by ALL workers, like a spmd block
- Pool: createCommunicatingJob(...,'Type','Pool',...)
- Has ONE task which is run by ONE worker
- Other workers run spmd blocks or parfor loops in the task
- Mimics the interactive mode of using PCT
The MATLAB documentation has a wide array of options for running your code by utilizing the Parallel Computing Toolbox (PCT). The execution strategy depends on how jobs are configured. There are three recommended choices for creating jobs:
- createJob (Distributed Job). Submits tasks independently to available processing nodes. Typically used for parametric sweeps (running the same code with different inputs).
- createMatlabPoolJob (Pool Job). For code that requires one of the workers to distribute work to the other workers. Such code would be run locally with the batch command and might include, e.g., parfor loops.
- createParallelJob (Parallel Job). For code would be run locally in a spmd block or in pmode. The code typically makes use of explicit message passing functions such as labBarrier, labindex, etc. Such code could also define one or more codistributed arrays as a means of handling data too large to fit into the memory of any one machine.
Most of our users have found that Distributed Jobs works the best for running parametric sweeps. Users that had been using a parfor loop have opted to convert their code from Pool Jobs to Distributed Jobs. One of the advantages of using a Distributed Job is that it tends to be scheduled ahead of Pool Jobs on the CAC cluster. For this tutorial we will cover Distributed and Pool job submission.
Simple distributed job
For this example we’ll consider a trivial function that takes any input and waits 5 seconds before returning the input value. We will run this function locally, then compare with distributed execution on Red Cloud.
First, create a file simple_distributed_job.m containing the following:
function output = simple_distributed_job(input) pause(5); output = input; end
Running simple_distributed_job in your local MATLAB window should yield (after a pause):
>> simple_distributed_job(1) ans = 1 >>
To run it on MDCS, we will use the createJob function. Here is an example of running the same code on a 2-core instance in Red Cloud. That instance is assumed to be already validated, and it is assumed to be the default cluster in the Parallel menu.
>> clust = parcluster(); >> job = createJob(); >> task = createTask(job, @simple_distributed_job, 1, {1}); >> submit(job); >> wait(job); Downloading completed job: Job60. >> fetchOutputs(job) ans = [1] >>
This by itself is not that interesting, but let’s pretend we want to run simple_distributed_job 10x in a row, running locally in serial this should take 10x5 = 50 seconds.
>> tic; for i=1:10; simple_distributed_job(1); end; toc Elapsed time is 50.012614 seconds. >>
Using a distributed job, we can make it faster.
>> for i=1:10 createTask(job,@simple_distributed_job, 1, {i}); end >> tic; submit(job); wait(job); toc Downloading completed job: Job61. Elapsed time is 69.754727 seconds.
The distributed job is 30% faster than running it the job in serial. One might expect a better speed-up, but there is an overhead associated with time to submit the job and retrieve the results from the remote server. Calling getAllOutputArguments on the job object yields the results of the 10 tasks.
>> getAllOutputArguments(job) ans = [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] >>
If you need help with any of the commands please be sure to make use of MATLAB’s built-in help function.
>> help gridFTP gridFTP is a CAC object that provides simple stateless access to the Red Cloud storage facility to enable you to examine file in your home directory as well as upload and download files to the system. This tool is primarily useful for uploading or downloading single files or for ensuring the location of files on the storage system.
Simple pool job
For this example we’ll consider another trivial function that waits for 100 seconds before running the input value.
function output = simple_pool_job(input) parfor i=1:100 pause(1); end output=input end
Running the job locally we might first run without a pool job where we would expect the function to take 100 seconds to run.
>> tic; simple_pool_job(1); toc output = 1 Elapsed time is 100.098521 seconds.
Running with two MATLAB labs would result in a speed-up.
>> matlabpool local 2 Starting matlabpool using the 'local' configuration ... connected to 2 labs. >> tic; simple_pool_job(1); toc output = 1 Elapsed time is 50.275047 seconds. >> matlabpool close Sending a stop signal to all the labs ... stopped. >>
Don’t forget to run “matlabpool close” after running this example.
Similar to the distributed job the first step of running on the CAC cluster is to upload a copy of the “simple_pool_job.m” file to the CAC server (see distributed job example).
To run on the CAC cluster we need to use the createMatlabPoolJob function from the MATLAB PCT. Here is an example of running the same code on the CAC cluster:
>> ClusterInfo.setQueueName('Default'); >> job = createMatlabPoolJob(); >> task = createTask(job, @simple_pool_job, 1, {1}); >> job.MinimumNumberOfWorkers = 8; >> job.MaximumNumberOfWorkers = 8; >> tic; submit(job); wait(job); toc Downloading completed job: Job64. Elapsed time is 89.065123 seconds.
But wait, the CAC cluster is slower than running it locally with two labs! Again, there is overhead involved in submitting/retrieving the results. In this example there were 7 labs that were running in the cluster. This is one less than the number of requested workers due to one worker overseeing the work that is done on the other 7 workers.
Note: if we were to submit this same job to the local configuration, we would get 8 labs for the pool job. The reason for this difference is that the MATLAB GUI is already running on the local machine, so there is no need to assign one worker as an overseer. (Try it—but don’t forget to switch back to cacscheduler when you’re done!)
Simple debugging
Here’s a few examples of common problems and how to detect the problem. One common example is forgetting to upload your code. This will result in something like:
>> job = createJob(); >> createTask(job, @does_not_exist_function, 1, {}); >> submit(job); >> wait(job); Downloading completed job: Job65. >> job job = Job ID 65 Information ===================== UserName : apb18 State : finished SubmitTime : Tue Oct 25 13:16:00 GMT-05:00 2011 StartTime : Tue Oct 25 14:16:22 EDT 2011 Running Duration : 0 days 0h 0m 1s - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 0 Number Running : 0 Number Finished : 1 TaskID of errors : 1 >>
Note the last line "TaskID of errors". This indicates that one of the tasks had an error. Inspecting the task object associated with the job, we can see details of the error.
>> job.tasks(1) ans = Task ID 1 from Job ID 65 Information ==================================== State : finished Function : @does_not_exist_function StartTime : Tue Oct 25 14:16:22 EDT 2011 Running Duration : 0 days 0h 0m 1s - Task Result Properties ErrorIdentifier : MATLAB:UndefinedFunction ErrorMessage : Undefined function or variable 'does_not_exist_function'. >>
Long running jobs
Does your MATLAB computation require hours or even days to run? You'll be glad to know you can exit MATLAB completely and re-open it later to check on the status of a job that takes a long time to complete. For the sake of example, let’s create a job that waits for 5 minutes then returns. For this example we will just call the built-in pause function and will not upload any code.
>> job = createJob(); >> createTask(job, @pause, 0, {300}); >> submit(job); pause(10); job job = Job ID 66 Information ===================== UserName : apb18 State : running SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 1 Number Running : 0 Number Finished : 0 TaskID of errors : >>
Note that the job is in the state “running”. It is now possible to exit MATLAB and start a new session. To retrieve the previously running job we need to use the findResource and findJob functions to retrieve the job that was running. We need to remember to record the “Job ID” from the previous session.
>> sched = findResource(); >> job = findJob(sched, 'Name', 'Job66') job = Job ID 66 Information ===================== UserName : apb18 State : running SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 1 Number Running : 0 Number Finished : 0 TaskID of errors : >>
If you decide later that you do not want the job to complete you can cancel the job using the cancel function.
>> cancel(job); >> job job = Job ID 66 Information ===================== UserName : apb18 State : finished SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 0 Number Running : 0 Number Finished : 1 TaskID of errors : 1 >>
If you forgot the “Job ID” you can inspect the “jobs” field of the “sched” object that was returned by findResource (the following output is for R2011b and newer).
>> sched.jobs ans = Jobs: 46-by-1 ============= # Job ID State FinishTime UserName #tasks ---------------------------------------------------------- 1 1 queued - apb18 2 2 2 queued - apb18 2 3 5 finished Oct 21 13:47:11 apb18 2 4 8 finished Oct 21 12:50... apb18 2 5 9 finished Oct 21 12:50... apb18 2 6 10 finished Oct 21 13:50:54 apb18 1 7 13 finished Oct 21 13:02... apb18 2 8 14 finished Oct 21 13:03... apb18 2 9 15 finished Oct 21 14:04:00 apb18 1 10 18 finished Oct 21 13:06... apb18 2 11 19 finished Oct 21 13:07... apb18 2 12 20 finished Oct 21 14:08:13 apb18 1 13 23 finished Oct 21 13:10... apb18 2 14 24 finished Oct 21 13:11... apb18 2 15 25 finished Oct 21 14:11:49 apb18 1 16 28 finished Oct 21 13:14... apb18 2 17 29 finished Oct 21 13:14... apb18 2 18 30 finished Oct 21 14:15:22 apb18 1 19 33 finished Oct 21 13:17... apb18 2 20 34 finished Oct 21 13:18... apb18 2 21 35 finished Oct 21 14:18:42 apb18 1 22 38 finished Oct 24 10:44... apb18 2 23 39 finished Oct 24 10:45... apb18 2 24 40 finished Oct 24 11:45:27 apb18 1 25 43 finished Oct 24 10:51... apb18 2 26 44 finished Oct 24 10:51... apb18 2 27 45 finished Oct 24 11:51:49 apb18 1 28 46 finished Oct 24 13:16:41 apb18 1 29 47 finished Oct 24 13:21:02 apb18 1 30 48 finished Oct 24 13:27:17 apb18 20 31 49 finished Oct 24 13:31:28 apb18 10 32 50 failed - apb18 8 33 51 failed - apb18 8 34 54 queued - apb18 101 35 55 finished Oct 24 15:11... apb18 8 36 56 finished Oct 24 16:20:21 apb18 8 37 57 finished Oct 24 16:04... apb18 8 38 58 finished Oct 24 17:05:55 apb18 8 39 59 pending - apb18 1 40 60 finished Oct 25 13:27:41 apb18 1 41 61 finished Oct 25 13:35:42 apb18 10 42 62 finished Oct 25 13:02... apb18 8 43 63 finished Oct 25 13:05... apb18 8 44 64 finished Oct 25 13:10... apb18 8 45 65 finished Oct 25 14:16:23 apb18 1 46 66 finished Oct 25 13:29... apb18 1 >>
More on SPMD jobs and spmd blocks
- The SPMD task function, like a spmd block, is responsible for implementing parallelism using "labindex" logic
- The lab* functions allow workers (labs) to communicate; they act just like MPI message-passing methods
- - labSend(data,dest,[tag]); % point-to-point
- - labReceive(source,tag); % datatype, size are implicit
- - labReceive(); % take any source
- - labBroadcast(source); labBarrier; gop(f,x); % collectives
- (Co)distributed arrays are sliced across workers so huge matrices can be operated on; collect slices with gather
Distributing work with parfeval and batch
- createJob() isn't the only way to run independent tasks...
- parfeval() requests the given function be excuted on one worker in a parpool, asynchronously
- batch() does the same on one worker NOT in a parpool
- - It creates a one-task job and submits it to a parcluster
- - It can also be a one-line method for initiating a pool job
- - It works with either a function or a script
- Either can easily be called in a loop over a list of tasks
- - Use fetchNext() to collect results as they become available
Distributing work without PCT or MDCS
- Create a MATLAB .m file that takes one or more input parameters (such as the name of an input file).
- Apply the MATLAB C/C++ compiler (mcc), which converts the script to C, then to a standalone executable.
- Run N copies of the executable on an N-core batch node or a cluster, each with a different input parameter
– mpirun can launch non-MPI processes, too
- Matlab runtimes (free!) must be available on all nodes
- For process control, write a master script in Python, say
When Is File Transfer Needed?
- If your workers do not share a disk with your client, and they will require custom functions or datafiles
- Example:
>> j = createJob(sched); >> createTask(j,@rand,1,{3,3}); >> createTask(j,@myfunction,1,{3,3}); >> submit(j);
- - The rand function is no problem at all, it’s built in
- - But myfunction.m does not exist on the remote computer
- - We’ll want to transfer this file and get it added to the path
MATLAB can copy files... or you can
- Setting the AutoAttachFiles property tells MATLAB to copy files containing your function definitions
- Use AttachedFiles to copy any data files or directories the task will need; directory structures are preserved
- - Not very efficient, though: file transfer occurs separately for each worker running a task for that particular job
- - OK for small projects with a couple of files
- A better-scaling alternative is to copy your files to disk(s) on the remote server(s) in advance
- - Use AdditionalPaths to make the files available at run time
GPGPU in MATLAB PCT: Fast and Easy
- Many functions are overloaded to call CUDA code automatically if objects are declared with gpuArray type
- Benchmarking with large 1D and 2D FFTs shows excellent acceleration on NVIDIA GPUs
- MATLAB code changes are trivial
- - Move data to GPU by declaring a gpuArray
- - Call method in the usual way
Are GPUs really that simple?
- No. Your application must meet four important criteria.
- - Nearly all required operations must be implemented natively for type GPUArray.
- - The computation must be arranged so the data seldom have to leave the GPU.
- - The overall working dataset must be large enough to exploit 100s of thread processors
- - On the other hand, the overall working dataset must be small enough that it does not exceed GPU memory.
PCT and MDCS: The Bottom Line
- PCT can greatly speed up large-scale computations and the analysis of large datasets
- - GPU functionality is a nice addition to the arsenal
- MDCS allows parallel workers to run on cluster and cloud resources beyond one’s laptop, e.g., Red Cloud
- Yes, a learning curve must be climbed…
- - General knowledge of how to restructure code so that parallelism is exposed
- - Specific knowledge of PCT functions
- But speed often matters!