Difference between revisions of "Tutorial: Using MATLAB PCT and Parallel Server in Red Cloud"
(amended the "Simple debugging" section) |
|||
Line 205: | Line 205: | ||
==== Simple debugging ==== | ==== Simple debugging ==== | ||
− | Here's an example of a common problem and how to diagnose it. It is very easy to forget to upload one or more files that | + | Here's an example of a common problem and how to diagnose it. It is very easy to forget to upload one or more files that may be needed by your task. The <tt>AutoAttachFiles</tt> property can help with this, but it only looks for program files (.m, .p, .mex). If you neglect to list your data file in the <tt>AttachedFiles</tt> cell array, the result will be something like this: |
<pre> | <pre> |
Revision as of 21:20, 7 June 2017
Introduction
In order to run the examples in this tutorial, you must have completed the steps in MATLAB Distributed Computing Server (MDCS) in Red Cloud. Use those instructions to launch an appropriate Red Cloud instance of at least 2 cores which will serve as your MDCS "cluster". The MDCS cluster should be able to pass the built-in validation test in your local MATLAB client, as explained on that page.
Overview of PCT and MDCS
What is MDCS?
Quoting from The MathWorks:
- MATLAB Distributed Computing Server™ lets you run computationally intensive MATLAB® programs and Simulink models on computer clusters, clouds, and grids.
- You develop your program or model on a multicore desktop computer using Parallel Computing Toolbox™ (PCT) and then scale up to many computers by running it on MDCS.
- The server supports batch jobs, parallel computations, and distributed large data. The server includes a built-in cluster job scheduler.
PCT opens up parallel possibilities
MATLAB has multithreading built into its core libraries, but multithreading mostly aids big array operations and is not within user control. The Parallel Computing Toolbox (PCT) enables user-directed parallelism that allows you to take better advantage of the available CPUs.
- PCT supports various kinds of parallel computations that you can run interactively...
- - Parallel for-loops: parfor
- - Single program, multiple data: spmd, pmode
- - Array partitioning for big-data parallelism: (co)distributed
- PCT also facilitates batch-style parallelism...
- - Multiple independent runs of a serial function: createJob
- - Single run of parallelized code: createCommunicatingJob
This tutorial focuses mainly on batch-style usage, because such computations do not tie up the console of your MATLAB client. It is therefore a suitable way to offload work to Red Cloud--which is most often seen as a distinct resource for handling long-running jobs. But along the way, you will also get an overview of how parallel programming works generally in PCT. For more in-depth coverage, you can refer to The MathWorks' own documentation.
Parallel resources: local and remote
Use the Parallel menu to choose the cluster where your PCT parallel workers ("labs") will run. Often this will be the local cluster, which is merely the collection of processor cores on the machine where your MATLAB client is running. (Local is the default unless you say otherwise.) However, you can also specify a remote cluster that has been made available to you through MDCS.
Two ways to use PCT, interactive vs. batch-style
- Interactive: Start local or remote pool using parpool, run PCT commands directly (can be scripted)
- Batch: Specify local or remote cluster using parcluster, submit jobs and task functions to the cluster
Interactive PCT: Major Concepts
The MATLAB documentation presents a wide array of options for parallelizing your code with PCT. The parallel execution strategy depends on the specific commands that are chosen. We'll start with commands that would be more typical of interactive work.
- parpool: pool of distinct MATLAB processes = "labs"
- - Labs are aware of each other; they can work together
- - Differs from multithreading! No shared address space
- - This is ultimately what allows the same basic concepts to work on remote, distributed-memory MDCS clusters
- parfor: parallel for-loop, iterations must be independent
- - Labs (workers) in the pool split up iterations
- - Communication among labs is needed because load balancing is built in
- spmd: single program, multiple data
- - All labs in the pool execute every command
- - Programmer specifies how and when the labs communicate
- (co)distributed: array is partitioned among labs
- - distributed from client's point of view; codistributed from labs' point of view
- - Treated as "multiple data" for spmd, but appears as one array to MATLAB functions
The MathWorks provides a complete list of PCT functions and details on how they are used.
Batch-Style PCT: Basic Workflow
Within a MATLAB session, submitting jobs to MDCS in Red Cloud follows a general pattern:
- The user designates a Red Cloud instance to be the parallel cluster. This allows jobs to be submitted to the MATLAB Job Scheduler (MJS) on that instance for execution, rather than the local machine.
- Individual jobs are created, configured, and submitted by the user.
- Jobs may be composed of one or more individual tasks.
- If tasks within the job consist of one's own MATLAB code, the function files will be auto-uploaded to the cluster by default. Data files may be included in this upload by specifying them in the AttachedFiles cell array.
- Alternatively, files can be uploaded in advance to Red Cloud using typical clients such as sftp or scp. MDCS finds them through the AdditionalPaths cell array.
- Job state and results are saved in Red Cloud and synced back to the client.
- Job state and results are retained until deleted by the user using the delete(job) function, or until the instance is terminated.
- Job state and results from past and present jobs can be viewed from any client at any time.
- The exact ending time of a job depends on the nature of the tasks that have been defined within it and the available resources.
- PCT has callbacks that allow actions to be taken upon job completion, depending on completion status.
Jobs and tasks
figure, tree of objects
Types of jobs
Batch-style PCT has 3 types of jobs: independent, SPMD, and pool.
- Independent: createJob()
- Can contain many tasks; workers run the tasks one by one
- SPMD: createCommunicatingJob(...,'Type','SPMD',...)
- Has ONE task to be run by ALL workers, like a spmd block
- Typically makes use of explicit message passing syntax such as labBarrier, labindex, etc.
- Could also define one or more codistributed arrays for data too large to fit into the memory of any one machine
- Pool: createCommunicatingJob(...,'Type','Pool',...)
- Has ONE task which is run by ONE worker
- Other workers run spmd blocks or parfor loops in the task
- Mimics the interactive mode of using PCT
Experience has shown that independent jobs work the best for running parametric sweeps. A pool job seems most natural for this if you are accustomed to using a parfor loop, but one worker is essentially wasted, as it does nothing more than manage the loop iterations of the other workers. Therefore, consider converting your code from pool to independent; it is a relatively easy process. For this tutorial we will cover both independent and pool job submission.
Simple independent job
For this example we’ll consider a trivial function that takes any input and waits 5 seconds before returning the input value. We will run this function locally, then compare with distributed execution on Red Cloud.
First, create a file simple_independent_job.m containing the following:
function output = simple_independent_job(input) pause(5); output = input; end
Running simple_independent_job in your local MATLAB window should yield (after a pause):
>> simple_independent_job(1) ans = 1
To run it on MDCS, we will use the createJob function. Here is an example of running the same code on a 2-core instance in Red Cloud. We assume that the instance is already validated, and that it is the default cluster in the Parallel menu.
>> clust = parcluster(); >> job = createJob(clust); >> task = createTask(job, @simple_independent_job, 1, {1}); >> submit(job); >> wait(job); >> fetchOutputs(job) ans = cell [1]
This by itself is not that interesting, but let's pretend we want to run simple_independent_job 10x in a row. Running locally in serial, this should take 10x5 = 50 seconds.
>> tic; for i=1:10; simple_distributed_job(1); end; toc Elapsed time is 50.012614 seconds.
But if we run it as a PCT independent job, we can make it finish faster. Note that we don't have to upload a copy of the "simple_independent_job.m" file to the Red Cloud instance in advance because the AutoAttachFiles property is true (logical value 1) by default.
>> job = createJob(clust); >> for i=1:10 createTask(job,@simple_independent_job, 1, {i}); end >> tic; submit(job); wait(job); toc Elapsed time is 34.117942 seconds.
The independent job takes 30% less time to run than it did in serial. Since there are 2 workers, one might expect a speedup of 2 rather than 1.43, but there is an overhead associated with time to submit the job and retrieve the results from the remote server. Calling fetchOutputs on the job object yields the results of the 10 tasks.
>> fetchOutputs(job) ans = 10×1 cell array [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10]
If you need help with any of the commands please be sure to make use of MATLAB's built-in help function.
Simple pool job
For this example we'll consider another trivial function that pauses 1 second 50 times in a row before returning the input value.
function output = simple_pool_job(input) for i=1:50 %parfor i=1:50 pause(1); end output=input; end
Running the code locally we might we would expect the function to take 50 seconds to run. What happens if we change the for for-loop into a parfor-loop, i.e., if we move the comment sign up one line? Let's assume that our Red Cloud instance with two MATLAB labs is still active and the default profile still points to it.
>> tic; simple_pool_job(1); toc Starting parallel pool (parpool) using the 'Red Cloud' profile ... connected to 2 workers. Analyzing and transferring files to the workers ...done. Elapsed time is 36.040343 seconds. >> delete(gcp('nocreate')) Parallel pool using the 'Red Cloud' profile is shutting down.
Simply issuing the parfor command triggers MATLAB to start a pool of workers on the default cluster and to use the pool to execute the loop. Just as before, parallel overhead prevents the code from running twice as fast on two labs. The final command closes the pool after running the example.
Another way to run this function on the Red Cloud cluster is to use the createCommunicatingJob command from PCT. Here is how to do it that way:
>> clust = parcluster(); >> job = createCommmunicatingJob(clust,'Type','Pool'); >> task = createTask(job, @simple_pool_job, 1, {1}); >> tic; submit(job); wait(job); toc Elapsed time is 63.563045 seconds.
But wait, this pool job with two labs is far slower than the interactive pool with two labs! Again, there is overhead involved in submitting/retrieving the results. In this example only 1 worker was running on the cluster. This is one less than the number of available workers due to one worker overseeing the work that is done on the other worker.
Note: if we were to submit this same job to the local configuration, we would get 2 labs for the pool job. The reason for this difference is that the MATLAB GUI is already running on the local machine, so there is no need to assign one worker as an overseer. (Try it—but don’t forget to switch back to Red Cloud when you’re done!)
Simple debugging
Here's an example of a common problem and how to diagnose it. It is very easy to forget to upload one or more files that may be needed by your task. The AutoAttachFiles property can help with this, but it only looks for program files (.m, .p, .mex). If you neglect to list your data file in the AttachedFiles cell array, the result will be something like this:
>> clust = parcluster(); >> job = createJob(clust); >> createTask(job, @type, 1, {'nonexistent_file.txt'}); >> submit(job); >> wait(job); >> job job = Job Properties: ID: 25 Type: independent Username: slantz State: finished SubmitDateTime: 07-Jun-2017 21:10:59 StartDateTime: 07-Jun-2017 21:10:59 Running Duration: 0 days 0h 0m 2s NumThreads: 1 AutoAttachFiles: true Auto Attached Files: List files AttachedFiles: {} AdditionalPaths: {} Associated Tasks: Number Pending: 0 Number Running: 0 Number Finished: 1 Task ID of Errors: [1] Task ID of Warnings: []
Note the next-to-last line "Task ID of Errors". This indicates that one of the tasks had an error. Inspecting the task object associated with the job, we can see details of the error.
>> job.Tasks(1) ans = Task with properties: ID: 1 State: finished Function: @type Parent: Job 25 StartDateTime: 07-Jun-2017 21:10:59 Running Duration: 0 days 0h 0m 2s Error: File 'nonexistent_file.txt' not found. Warnings:
Long running jobs
Does your MATLAB computation require hours or even days to run? You'll be glad to know you can exit MATLAB completely and re-open it later to check on the status of a job that takes a long time to complete. For the sake of example, let’s create a job that waits for 5 minutes then returns. For this example we will just call the built-in pause function and will not upload any code.
>> job = createJob(); >> createTask(job, @pause, 0, {300}); >> submit(job); pause(10); job job = Job ID 66 Information ===================== UserName : apb18 State : running SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 1 Number Running : 0 Number Finished : 0 TaskID of errors : >>
Note that the job is in the state “running”. It is now possible to exit MATLAB and start a new session. To retrieve the previously running job we need to use the findResource and findJob functions to retrieve the job that was running. We need to remember to record the “Job ID” from the previous session.
>> sched = findResource(); >> job = findJob(sched, 'Name', 'Job66') job = Job ID 66 Information ===================== UserName : apb18 State : running SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 1 Number Running : 0 Number Finished : 0 TaskID of errors : >>
If you decide later that you do not want the job to complete you can cancel the job using the cancel function.
>> cancel(job); >> job job = Job ID 66 Information ===================== UserName : apb18 State : finished SubmitTime : Tue Oct 25 13:24:46 GMT-05:00 2011 StartTime : Running Duration : - Data Dependencies FileDependencies : {} PathDependencies : \\matlabstorage01.cac.cornell.edu\matlab\apb18 - Associated Task(s) Number Pending : 0 Number Running : 0 Number Finished : 1 TaskID of errors : 1 >>
If you forgot the “Job ID” you can inspect the “jobs” field of the “sched” object that was returned by findResource (the following output is for R2011b and newer).
>> sched.jobs ans = Jobs: 46-by-1 ============= # Job ID State FinishTime UserName #tasks ---------------------------------------------------------- 1 1 queued - apb18 2 2 2 queued - apb18 2 3 5 finished Oct 21 13:47:11 apb18 2 4 8 finished Oct 21 12:50... apb18 2 5 9 finished Oct 21 12:50... apb18 2 6 10 finished Oct 21 13:50:54 apb18 1 7 13 finished Oct 21 13:02... apb18 2 8 14 finished Oct 21 13:03... apb18 2 9 15 finished Oct 21 14:04:00 apb18 1 10 18 finished Oct 21 13:06... apb18 2 11 19 finished Oct 21 13:07... apb18 2 12 20 finished Oct 21 14:08:13 apb18 1 13 23 finished Oct 21 13:10... apb18 2 14 24 finished Oct 21 13:11... apb18 2 15 25 finished Oct 21 14:11:49 apb18 1 16 28 finished Oct 21 13:14... apb18 2 17 29 finished Oct 21 13:14... apb18 2 18 30 finished Oct 21 14:15:22 apb18 1 19 33 finished Oct 21 13:17... apb18 2 20 34 finished Oct 21 13:18... apb18 2 21 35 finished Oct 21 14:18:42 apb18 1 22 38 finished Oct 24 10:44... apb18 2 23 39 finished Oct 24 10:45... apb18 2 24 40 finished Oct 24 11:45:27 apb18 1 25 43 finished Oct 24 10:51... apb18 2 26 44 finished Oct 24 10:51... apb18 2 27 45 finished Oct 24 11:51:49 apb18 1 28 46 finished Oct 24 13:16:41 apb18 1 29 47 finished Oct 24 13:21:02 apb18 1 30 48 finished Oct 24 13:27:17 apb18 20 31 49 finished Oct 24 13:31:28 apb18 10 32 50 failed - apb18 8 33 51 failed - apb18 8 34 54 queued - apb18 101 35 55 finished Oct 24 15:11... apb18 8 36 56 finished Oct 24 16:20:21 apb18 8 37 57 finished Oct 24 16:04... apb18 8 38 58 finished Oct 24 17:05:55 apb18 8 39 59 pending - apb18 1 40 60 finished Oct 25 13:27:41 apb18 1 41 61 finished Oct 25 13:35:42 apb18 10 42 62 finished Oct 25 13:02... apb18 8 43 63 finished Oct 25 13:05... apb18 8 44 64 finished Oct 25 13:10... apb18 8 45 65 finished Oct 25 14:16:23 apb18 1 46 66 finished Oct 25 13:29... apb18 1 >>
More on SPMD jobs and spmd blocks
- The SPMD task function, like a spmd block, is responsible for implementing parallelism using "labindex" logic
- The lab* functions allow workers (labs) to communicate; they act just like MPI message-passing methods
- - labSend(data,dest,[tag]); % point-to-point
- - labReceive(source,tag); % datatype, size are implicit
- - labReceive(); % take any source
- - labBroadcast(source); labBarrier; gop(f,x); % collectives
- (Co)distributed arrays are sliced across workers so huge matrices can be operated on. Collect slices with gather.
Distributing work with parfeval and batch
- createJob() isn't the only way to run independent tasks...
- parfeval() requests the given function be excuted on one worker in a parpool, asynchronously
- batch() does the same on one worker NOT in a parpool
- - It creates a one-task job and submits it to a parcluster
- - It can also be a one-line method for initiating a pool job
- - It works with either a function or a script
- Either can easily be called in a loop over a list of tasks
- - Use fetchNext() to collect results as they become available
Distributing work without PCT or MDCS
- Create a MATLAB .m file that takes one or more input parameters (such as the name of an input file).
- Apply the MATLAB C/C++ compiler (mcc), which converts the script to C, then to a standalone executable.
- Run N copies of the executable on an N-core batch node or a cluster, each with a different input parameter
– mpirun can launch non-MPI processes, too
- Matlab runtimes (free!) must be available on all nodes
- For process control, write a master script in Python, say
When Is File Transfer Needed?
- If your workers do not share a disk with your client, and they will require custom functions or datafiles
- Example:
>> j = createJob(sched); >> createTask(j,@rand,1,{3,3}); >> createTask(j,@myfunction,1,{3,3}); >> submit(j);
- - The rand function is no problem at all, it’s built in
- - But myfunction.m does not exist on the remote computer
- - We’ll want to transfer this file and get it added to the path
MATLAB can copy files... or you can
- Setting the AutoAttachFiles property tells MATLAB to copy files containing your function definitions
- Use AttachedFiles to copy any data files or directories the task will need; directory structures are preserved
- - Not very efficient, though: file transfer occurs separately for each worker running a task for that particular job
- - OK for small projects with a couple of files
- A better-scaling alternative is to copy your files to disk(s) on the remote server(s) in advance
- - Use AdditionalPaths to make the files available at run time
GPGPU in MATLAB PCT: Fast and Easy
- Many functions are overloaded to call CUDA code automatically if objects are declared with gpuArray type
- Benchmarking with large 1D and 2D FFTs shows excellent acceleration on NVIDIA GPUs
- MATLAB code changes are trivial
- - Move data to GPU by declaring a gpuArray
- - Call method in the usual way
Are GPUs really that simple?
- No. Your application must meet four important criteria.
- - Nearly all required operations must be implemented natively for type GPUArray.
- - The computation must be arranged so the data seldom have to leave the GPU.
- - The overall working dataset must be large enough to exploit 100s of thread processors.
- - On the other hand, the overall working dataset must be small enough that it does not exceed GPU memory.
Are GPUs available in Red Cloud?
- No. Unfortunately, Red Cloud does not include GPUs at the present time.
- - You can try out GPU functionality on your laptop or workstation if it has an NVIDIA graphics card.
- - Your graphics card must have a sufficiently high level of compute capability and the latest drivers.
PCT and MDCS: The Bottom Line
- PCT can greatly speed up large-scale computations and the analysis of large datasets
- - GPU functionality is a nice addition to the arsenal
- MDCS allows parallel workers to run on cluster and cloud resources beyond one’s laptop, e.g., Red Cloud
- Yes, a learning curve must be climbed…
- - General knowledge of how to restructure code so that parallelism is exposed
- - Specific knowledge of PCT functions
- But speed often matters!