MATLAB Distributed Computing Server (MDCS) in Red Cloud

From CAC Documentation wiki
Jump to: navigation, search

What You Need to Know About This Service

MATLAB Distributed Computing Server will only be useful to you if you own and are familiar with the MATLAB Parallel Computing Toolbox (PCT is included in Cornell's normal site license). Gaining PCT knowledge is to your advantage, though, because it is the best way for MATLAB to make effective use of multiple CPUs on any system--even the multiple cores in your laptop. One starting point for learning PCT is CAC's tutorial. Extending PCT's basic concepts to MDCS in Red Cloud should be natural and easy.

Red Cloud offers the following advantages for your PCT/MDCS computations.

  • Up to 64 parallel workers are available in total.
  • Licenses for the workers are included in your subscription.
  • Workers have exclusive access to their allocated cores and memory.
  • Data are readily transferred through the campus network at no extra cost.

Assumptions

  1. You are a member of an academic community with access to an academic MATLAB R2016a or R2017a client.
  2. Your group has started a CAC project giving you access to Red Cloud, and you are familiar with its Eucalyptus console.
  3. The MATLAB R2016a or R2017a client, including the Parallel Computing Toolbox (PCT), is installed on your local workstation.
  4. First time Red Cloud login has been completed.
  5. Create a Red Cloud key pair has been completed.

Create a Security Group

In the Eucalyptus Management Console (for recdcloud-ith), you need to create a Security Group for your MATLAB instances. Its purpose is to open up certain TCP ports so your client has proper access to your MDCS server(s).

Choose "Security Groups", then "Create a new Security Group", and set the following Rules. Note, to auto-fill <your client IP address>/32, you can select the "Use my IP address" link after entering either the Custom TCP port range or the standard SSH port. You must click "Add Rule" at the bottom of the page after making each new entry. You must also click "Create Security Group" after adding all rules.

Protocol Port range Allow traffic from IP address
Custom TCP 14000 - 15000 <your client IP address>/32
Custom TCP 27000 - 28000 <your client IP address>/32
SSH (for terminal access) 22 - 22 <your client IP address>/32


If you wish, you can set the security group to allow access from a broader range of IP addresses. For example, if the allowed IP ranges are set to:

  • 128.84.0.0/16,
  • 128.253.0.0/16,
  • 132.236.0.0/16,
  • 192.35.82.0/24,
  • 192.122.235.0/24,
  • 192.122.236.0/24, and
  • 10.0.0.0/8

access is permitted from anywhere on Cornell network (reference 1, reference 2). However, you should be aware that in this case, any Cornell user who has knowledge of the IP address and the cluster name will be able to submit MDCS jobs to your Red Cloud instance(s).

In general, if other users require access to your Red Cloud instance for their MDCS jobs--or if you will be accessing MDCS from multiple IP addresses--then it is probably best to add the extra IP addresses to the security group individually, in the same way as the first.

Note, these same port ranges have to be open on the client side, too. For example, if you have Windows Firewall enabled, you will need to set up special rules that allow inbound TCP and UDP connections to MATLAB through/from any port (a typical client-side firewall will not have outbound restrictions). You may need to consult MathWorks documentation to see what to do in your particular case.

Start the MATLAB Cluster

Currently only one-node clusters are supported. However, a single node can support many workers, up to the total number of cores that you assigned to your instance. Multi-node cluster support is forthcoming.

In the Euca Console for Red Cloud ITH or Red Cloud NYC:

  • From the dashboard, click on the "Launch Instance" button.
  • On the "image" screen, select the image matching your Matlab client version:
    1. Scroll to find the image and click its "Select" button.
    2. Enter the image ID matching your Matlab client version in the search box at the bottom and click the "Next" button.
Client Version Image ID
Red Cloud ITH (https://euca44.cac.cornell.edu)
R2016a emi-6758cbc6
R2017a emi-27b0c2cf
Red Cloud NYC (https://euca4-nyc.cac.cornell.edu)
R2016a emi-d6272cf4
R2017a emi-ee590c3b
  • On the "Details" screen, select the desired instance type. One MATLAB worker will be started per CPU in the instance. Click on the Next button.
  • On the "Security" screen, select the ssh keypair you'd like to use to access the instance via ssh (optional), and select your MATLAB security group.
  • Click on "Launch Instance"
  • After the instance is running, the MATLAB cluster should be reachable at the public IP address of the instance within 2 minutes.

Note: after you finish using your MATLAB cluster, remember to stop or terminate the instance on the Euca Console to stop charges against your Red Cloud subscription.

Connect to Your MATLAB Cluster

Perform the following steps in your local MATLAB client:

  • Open Parallel > Manage Cluster Profiles.
  • Choose Add > Custom > MATLAB Job Scheduler (MJS).
  • In the warning dialog that comes up, click "OK".
  • In the lower right corner of the scheduler, click "Edit".
  • Enter values for at least the top three values in Properties:
    1. Description: Red Cloud or another name of your choosing.
    2. Host: 128.84.8.XXX (where XXX matches the public address of your Red Cloud instance)
    3. Name: EC2_job_manager
  • Optional: if you want each worker to have more than the standard allotment of memory or disk per core, scroll down and set NumWorkersRange to have a maximum value which is less than the number of cores in your cluster. (In that case, you may also choose to set NumThreads > 1.)
  • Click "Done". Click "Rename" in the toolbar to give a new scheduler a better name. This name will appear in your MATLAB client.
  • Click "Validate" in the toolbar to ensure the scheduler is configured properly. As each stage completes, a green circle with a check mark in it should be displayed.

Possible validation issues

Validation may fail for a number of reasons. Here is a short list of things to try if it does:

  1. If the first validation stage fails, it is most likely because the Euca Security Group is not allowing access from your client's IP address.
    • Log into the Euca Console (redcloud-ith) using a web browser on the same machine as your MATLAB client.
    • Choose the Security Group you created and try to add one of the above rules, finishing with "Use my IP address".
    • If you do not get the message "Rule already exists", go ahead and add all three of the above rules for your current IP address.
    • Wait a few minutes until the new rules take effect in your running instance (you may also want to restart your MATLAB client). Run the validation test again to ensure your MATLAB cluster passes all the stages.
  2. If the client is able to connect to the cluster, but the second stage of validation fails, check the results ("Show Details").
    • If you see an error message saying, "This MATLAB Job Scheduler does not support running jobs for MATLAB release...", this just means that the workers are not yet ready.
    • Wait a few more minutes and re-try validation.
  3. If you still cannot pass validation, and error messages such as "Could not contact an MJS lookup service on host..." persist, it means your network connection is being blocked.
    • Double-check your Eucalyptus Security Group and firewall settings as described above.
    • Then contact your departmental IT support, as there may be port blocking in effect on departmental routers. (Campus Wi-Fi connections should be sufficiently open.)

Test Your MATLAB Cluster

Finally, you can run this quick "Hello, world" test from the command line in your client. In the first line, supply the name of your scheduler. If you did not rename the scheduler when you created it, its name appears in the Cluster Profiles Manager dialog.

pool = parpool('Red Cloud')
spmd; fprintf('Hello from lab %d of %d', labindex, numlabs); end
delete(pool)

The number of replies should equal the number of workers ("labs"), which by default is equal to the number of cores in your instance. Note that the labs are numbered starting from 1, not 0.

Upload Large Files to Your MATLAB Cluster

MATLAB PCT provides built-in mechanisms for uploading data files so they can be accessed by your MDCS workers. The primary ones are the AttachedFiles keyword in functions like parpool() and createJob(), and the addAttachedFiles() function for an existing parallel pool. Unfortunately, these mechanisms are not suitable for large files, because they generate a separate copy of the file for each worker. This is inefficient and unnecessary in Red Cloud, where in most cases, all the workers share a file system on the same instance. Here we present two alternatives that should help you to make data files available to your MDCS workers.

Prerequisites: you must have created a Red Cloud key pair before starting your instance, and you must have specified this key pair when the instance was launched. You should also be familiar with how public key authentication works in Linux. Finally, in order to connect to the instance using ssh, sftp, or scp, the Red Cloud security group should include a rule to allow incoming connections to port 22 from the address of the computer that is trying to connect.

Alternative 1: Upload to /tmp on your instance

This method is probably the simpler of the two. Any files you upload will persist on your instance until you terminate it. The only tricky part is knowing how to authenticate with the key pair when you connect to your instance with a file transfer client. It is straightforward to do this type of authentication from the command line in Linux or MacOS, if you use either sftp or scp:

sftp -i ~/.ssh/myCACid-key.pem root@128.84.8.NNN
sftp> put file.txt /tmp
scp -i ~/.ssh/myCACid-key.pem file.txt root@128.84.8.NNN:/tmp

The above examples assume you have stored the the key pair (or at least the private-key portion of it) in your local .ssh folder in Linux or MacOS. Note that in Windows, the PuTTY client comes with a psftp client that you might want to try.

If sftp or scp does not accept the -i option in your OS, you can try using ssh-agent and ssh-add to make the private key available to these commands.

For exceptionally large files, you can make use of your instance's ephemeral storage, which is located at /dev/vdb. You will need to format it and create a mount point for it. The volume persists only as long as the instance is running, but it is large (100 GB minimum) and fast (local RAID 5).

Alternative 2: Upload to your CAC home folder

Your Red Cloud subscription includes a home folder at CAC. It is available as a network share located at //linuxlogin.cac.cornell.edu/myCACid, where myCACid is your CAC username. Your Red Cloud subscription comes with 50GB of storage, part of which can be used to store data files at this location. (More storage can be added to your subscription if desired.) To upload files to your home folder, use your favorite file transfer client such as WinSCP, or a command-line utility such as sftp or scp. Point your file transfer client or utility to the above address, making sure to provide your CAC username and password.

But this CAC home folder is not automatically available to your Red Cloud instances. The preferred way to make it accessible is to mount the network share using Samba/CIFS. First log in to your instance as root, which you do with your private key:

ssh -i ~/.ssh/myCACid-key.pem root@128.84.8.NNN

The above example again assumes you have stored the the key pair in your .ssh folder in Linux or MacOS. In Windows, you may wish to use PuTTY as the ssh client (in which case you will have to generate a .ppk file from the .pem file using PuTTYgen.) After you are logged in, issue the following commands:

yum install cifs-utils
mkdir /home/myCACid
mount -t cifs -o username=myCACid //storage01.cac.cornell.edu/myCACid /home/myCACid
<supply your CAC password when prompted>

At this point all files in your home folder should be available to all MDCS workers, via a path starting with /home/myCACid/.

If you stop this Red Cloud instance and start it back up, the mount command will have to be executed anew. To make the Samba mount automatic during restarts, add an appropriate entry to /etc/fstab in the instance.

Fast example of file I/O

Let's say you have copied a file, file.txt, to /tmp on your instance by using scp as described in Alternative 1 above. Let's also suppose this file contains 3 lines (or any arbitrary number) with 1 integer per line. If you'd like to have all your MDCS workers read this file into vector b and print b to the MATLAB console, you can do the following:

spmd; fid=fopen('/tmp/file.txt'); b=fscanf(fid,'%d'); disp(b); end

Vector b is now available in the workspace of all the workers, where it can be used for further parallel computations. Note: from your MATLAB client, you can also use spmd in combination with system(), pwd, etc., in order to explore the environment of your MDCS workers in Red Cloud. (Or you can just use ssh to take a look around.)