Red Cloud with MATLAB FAQ

From CAC Documentation wiki
Jump to navigation Jump to search

Why does the cacscheduler configuration seem to fail MATLAB's built-in validation test?

MATLAB allows you to "validate" a configuration via the Parallel > Manage Configurations menu. If you do this for cacscheduler, the first few steps will work fine. But upon reaching the Parallel section of the procedure the validation will appear to fail with a message like, "Please set the maximum number of workers to a finite value prior to submission of a parallel job." This is expected behavior. The CAC parallel configuration purposely specifies a ClusterSize of Inf (infinity). This allows for total flexibility in adding or subtracting hardware from the resource and/or its various queues over time.

In spite of this alarming-looking message from MATLAB's built-in test, it is not at all a sign that your setup is somehow defective. To make the full validation procedure work, simply do the following: in the Parallel menu, go to Manage Configuations, then double-click "cacscheduler". This will allow you to edit the CAC configuration directly. Change the ClusterSize parameter from Inf to (say) 4 and click "Save". Re-run the validation; you should find that cacscheduler now passes. When you're done with the test, change the ClusterSize back to Inf and save again.

From the above, it should be clear that in any parallel job you submit, you'll want to set job.MaximumNumberOfWorkers appropriately for the queue to which you are submitting your job.

Generally, if you are concerned about whether you have a working configuration, it's best to try running cac_runtests. This will test more aspects of Red Cloud's functionality.

How many MATLAB workers can you use at a time?

The answer depends on both the job type and the queue to which you submit.

For a parallel job, the workers must all be able to communicate with each other; therefore, the max size is limited to the number of cores that are present in your chosen queue. In the Default queue, there are 52 cores, so you could have up to 52 parallel workers in there. In the Quick queue, the max is 4; in the GPU queue, it's 8.

For a pool job, the max is again limited by the number of cores. But bear in mind that one worker must take the place of your local MATLAB session, and its only role is to run the main job function for you. This means that the matlabpool size will be 1 less than the number of workers. Therefore, in the Default queue the max is 51 labs (52 workers), in Quick it's 3 labs, and in GPU it's 7 labs.

For a distributed job, there is no limit, essentially. All tasks are independent, and they will make their way through any of the queues singly--task by task by task--until the list is exhausted. However, at any given instant, a maximum of 52 of these tasks could be running simultaneously in the Default queue, etc.

How do I save extremely large arrays in my pool job?

In a pool job, if a distributed array is too large to gather() into the memory of the master process, you can save() it piecemeal from within a spmd or parfor loop, using the technique described here. Subsequently, you may reassemble the array on your local workstation, after transferring its various pieces via gridFTP.