V4 Linux Cluster

From Cornell CAC Documentation

Jump to: navigation, search

Contents

Introduction

The largest part of v4 is a cluster of 112 Dell M600 server blades, each with 8 CPUs or cores, for a total of 896 cores. In addition, four Dell R900 servers have 64 GB of RAM each for large memory applications.

New Users

You have likely received an email with basic account information. When you first login, find the temporary password given to you in the welcome email, connect to the login node, and follow steps at #Changing a Password at First Login to pick a new password.

First Linux Cluster Job walks through a first login and submission of a batch job.

For any questions about accounting for projects and user limits, see Project Management and Member Limits.

Login Nodes

The login servers are your gateway to the clusters. On a login server, you can interactively compile programs, create and edit files, establish input data sets, submit and monitor jobs, and check resulting data from batch runs. Don't do intensive work there. Submit intensive jobs to the clusters listed below.

Hostname linuxlogin.cac.cornell.edu
Server Dell PowerEdge 1950
Processors Dual 3.0 GHz Quad Core Intel Xeon E5450 ("Harpertown")
L2 Cache 2x6 MB per processor, 24 MB total
RAM 8 GB
Operating System Red Hat Enterprise Linux Server relase 5.1

Additional information about the processors can be displayed using "cat /proc/cpuinfo".

Software

Check the list of v4 Linux Software. It includes Eclipse and UberFTP.

To run Matlab on the cluster, see Matlab In Batch.

Connect to the Cluster

From Linux

Using Secure Shell

Secure Shell (SSH) will give you a remote command shell on the login node. Telnet is disabled for security reasons. Most Unix varieties have an SSH2 implementation, required by our clusters.

Using X-Windows

The standard way to use X-Windows is to tunnel the X-Windows protocol through an ssh connection. If you open your ssh session with the -X option, it will automatically set up the necessary tunnel and environment variables.

localhost$ ssh -X username@linuxlogin3.cac.cornell.edu
linuxlogin3$ echo $DISPLAY
localhost:11.0
linuxlogin3$ xclock&

You can see that your DISPLAY environment variable is set and test it with xclock.

Using VNC

For security reasons, we are requiring all VNC connections to be tunneled inside ssh. Because the firewall running on linuxlogin3 blocks all incoming ports except for ssh, VNC connections must be made over a ssh tunnel like this:

Initial Setup (You only need to do this once)
  • On linuxlogin3, set the password for your VNC server using the "vncpasswd" command.
Start your VNC server:
  • On linuxlogin3, start the VNC server using the "vncserver" command like this:
 vncserver -geometry 1024x768 -localhost

The geometry numbers, 1024x768, specify the size, in pixels, of the desktop.

  • You will get the display number from the output of the vncserver command:
 New 'linuxlogin3.cac.cornell.edu:1 (shl1)' desktop is linuxlogin3.cac.cornell.edu:1

 Starting applications specified in /home/gfs01/shl1/.vnc/xstartup
 Log file is /home/gfs01/shl1/.vnc/linuxlogin3.cac.cornell.edu:1.log

  • vncserver is running on port 5900 + display number. For example, :1 is running on port 5901.
  • On your client computer: Set up ssh forwarding. From Linux, type into a terminal:
 ssh -L 10000:localhost:<port number on which vncserver is running> linuxlogin3.cac.cornell.edu

From Windows, ssh clients, such as Putty, can do X11 port forwarding. See VNCTunnelWindows.

  • Leave this ssh session running on your local client computer.
  • On your client computer: Launch your vnc client program. Connect to localhost:10000. Type in your VNC server password when prompted.
To disconnect your client:
  • Close the vnc client program
  • Disconnect the ssh forwarding session.
When you are all done:
  • On linuxlogin3, type this command to shut down the VNC server
 vncserver -kill :<display number>

From Windows

Using Secure Shell

Telnet is disabled for security reasons, but Secure Shell (ssh) clients work nicely as long as they support the SSH2 protocol. A popular client for Windows is the free Putty client.

Using X-Windows

You will need to install an X-Windows server on your Windows machine.

  • Hummingbird Exceed and Exceed 3D - Cornell has a site license. Installing Exceed 3D will improve performance of graphics applications.
  • Xming - Open Source. A shareware contribution will get you a version with improved performance for graphics (GLX). When you download Xming, be sure to install Xming-fonts, as well.
Configure the X Client for Multiwindow Mode

The program you just installed is called an X Windows Client; the server is on the CAC machine. In single window mode, the client will show a desktop from the server, which means you will see the server's version of a taskbar, a trashcan, and all those little goodies. We tend not to run this way because it can be a little slow. In multi-window mode, you first connect to the server with an ssh client, such as Putty, then start your editor (gedit, eclipse, emacs), which opens a nice point-and-click window on your local computer. It's the best of both worlds.

  • For Xming, just click the Xming icon.
  • For Exceed, click the start menu shortcut that just says "Exceed" (there are several other options).
Start Putty with X11 Forwarding

When the ssh client, Putty, starts, you can see where to type the host name. Look on the lower-left for "SSH". Click the plus sign beside it and select the now-visible "X11." On the upper-right will now be a box to "Enable X11 forwarding." Check it. Then you can click Open.

The X11 forwarding box asks Putty to set up an encrypted channel over which window and mouse data from X-Windows can travel. From the Linux command-line, typing

$ echo $DISPLAY
localhost:10.0

shows that X-Windows on the server is sending its data to an invisible fake window on the server where ssh grabs the messages, encrypts them, and sends them to your computer for display. It's called ssh tunneling.

Test it by starting a window.

$ xclock

You should see a small clock appear.

Using VNC

Install a VNC client. TightVNC has the best performance. The instructions are mostly the same as connecting from a Linux client except that you must use Putty for ssh and enable tunneling. Detailed instructions are in VNCTunnelWindows.

File Transfer To Clusters

A single, central file server, storage01.cac.cornell.edu, serves all CAC user home directories. You can connect to this server in a variety of ways from any operating system to access your files.

From Linux

Secure Copy

Secure copy is a standard tool to copy files to and from remote hosts.

localhost$ scp localfile.dat username@linuxlogin3.cac.cornell.edu:remoteinput.dat
localhost$ scp username@linuxlogin3.cac.cornell.edu:results.dat localresults.dat 

Secure FTP

FTP is disabled for security reasons, but sftp's interface is nearly identical.

Samba Client

Type

smbclient //storage01.cac.cornell.edu/<user name> -U <user name>

Enter the password for your CAC account when prompted You will see the smb:\> prompt. You can now start transferring files between your local machine and CAC home directory similar to ftp client. Type help for more instructions.

-sh-3.2$ smbclient //storage01.cac.cornell.edu/<user name> -U <user name>
    Password: 
    Domain=[CTC_ITH] OS=[Unix] Server=[Samba 3.0.28-1.el5_2.1]
    smb: \> help
    

From Windows

Secure Copy

The people who make Putty provide a secure copy client called pscp. From the command prompt, type:

cmd> pscp localfile.dat username@linuxlogin3.cac.cornell.edu:remoteinput.dat
    <enter your username's password when prompted>
cmd> pscp username@linuxlogin3.cac.cornell.edu:results.dat localresults.dat 

Secure FTP

FTP is disabled for security reasons, but psftp's interface is nearly identical. From the command prompt, type:

cmd> psftp username@linuxlogin3.cac.cornell.edu
    <enter your username's password when prompted>
psftp> put localresults.dat results.dat
psftp> quit

Home Directory Access

From computers on the Cornell network or using the Cornell VPN, it is possible to mount your home directory on your local machine.

Windows XP/Vista Users

  • Open My Computer
  • Click on Tools -> Map Network Drive
  • Drive H: (if you are already using this drive letter, use another letter)
  • Folder: \\storage01.cac.cornell.edu\<userid>
  • Then:
    • Select "Connect using a different user name:". This will allow you to enter the domain associated with CAC and your userid at CAC, rather than those associated with your own machine.
    • User name: CTC_ITH\your_userid
    • Password: your CAC password
  • Troubleshooting: If you have already mapped the drive and subsequently have problems, disconnect the drive and remap it.
  • Next, Change the DNS settings for TCP/IP:
    • Start
    • Control Panel
    • Network and Internet Connections
    • Network Connections
    • Right click on a connection
    • Properties
    • Internet Protocol(TCP/IP)
    • Properties
    • Advanced
    • DNS Tab
    • Append this DNS suffix:
    • Add
    • cac.cornell.edu

MacOS X Users

  1. In the Finder, select Connect to Server... from the Go menu.

    Image:FileAccess1.jpg
  2. Enter smb://storage01.cac.cornell.edu/<user name> in the Server Address field as shown below. You may need to use smb://<username>@storage01.cac.cornell.edu/<username>.

    Image:FileAccess2.jpg
  3. Enter your CAC user name and password to log in.

Linux Users

You cannot mount the CAC home directory as NFS for security reasons. To mount it as a CIFS drive, you need to be root, which often means using the sudo command. Then execute

 mount -o user=<username> -t cifs //storage01.cac.cornell.edu/<username> /mount/point

where <username> is your username, and /mount/point is the name of a directory you have already created on your local filesystem. Enter the password for CAC account when prompted. See man mount.cifs for available options for the mount command

If you see errors, such as "missing codepage or helper program," then you have not installed the mount and umount packages for CIFS on your local machine. If problems persist, send your initial command and the results of dmesg | tail.

Changing Your Password

Your password can be set or changed on any of the CAC login nodes. The password will be updated on all CAC resources. After you set or change your password, additional steps must be taken for batch and for using MPI. Passwords expire every six months. Do not share your password.

Rules for Creating Passwords

Do not share your password. Each user should be the only one to know the password for his or her account. Well-chosen passwords are essential to preserve the integrity of the system and individual user accounts. Never leave your password in plain text (unencrypted) in any of your files. Passwords stored in this way are easily stolen.

When you change your password, the new password must comply with our password complexity policy:

  • Each password must have at least eight characters.
  • Each password must contain at least three of the following four elements among its first eight characters:
    • uppercase letters (English, A through Z)
    • lowercase letters (English, a through z)
    • special characters (for example, !, $, #, %)
    • digits (0 through 9)
  • Do not use a space in a password. A space will cause the command used to register your password with the batch system to fail and you will not be able to run batch jobs.
  • Do not form a password by appending a digit to a word--this type of password is easily guessed.
  • Each password must differ from the user's login name and any permutation of that login name. For comparison purposes, an upper case letter and its corresponding lower case letter are equivalent.
  • New passwords should differ from the old by at least three characters.

If you need additional ideas for creating a new password, please see http://online.securityfocus.com/infocus/1554/. Items 2, 4 and 8 are useful tips for creating strong passwords.

Changing a Password at First Login

When you are issued a login id, you should first logon to a login node. You will be prompted to change your password. Refer to the #Rules for Creating Passwords. After you change your password, you will be logged in.

Assume that you have an old password, 0ldpassw0rd!! and a new password, newpassw0rd!!. Here is what should happen:

$ ssh your_username@linuxlogin3.cac.cornell.edu
Password: (ENTER 0ldpassw0rd!!) 
WARNING: Your password has expired. 
You must change your password now and login again! 
Changing password for user your_username. 
Kerberos 5 Password: (ENTER 0ldpassw0rd!!) 
New UNIX password: (ENTER newpassw0rd!!) 
Retype new UNIX password: (ENTER newpassw0rd!!) 
passwd: all authentication tokens updated successfully. 
Connection to linuxlogin3 closed. 

If you get a token error it very likely means that the password is not complex enough. Your password must be a mix of any three of the following: lower case letters, upper case letters, numbers and some sort of punctuation to create an 8 character or longer password (it is slightly more complex; don't use your user name or previous password - more info was above).

If you have additional trouble, you can rdesktop or the remote desktop client to the windows login nodes, winlogin1.tc.cornell.edu or winlogin2.tc.cornell.edu. They give better information about password complexity issues during the password change.

Changing a Password at Any Time

You can change a password at any time using passwd at the command prompt. Be sure that you have no other open connections to any CAC resources:

  • The only open interactive session should be the one in which you are changing the password. Failure to do so will lead to the system locking your account. Disconnecting is not enough.
  • Log off all other sessions connected to login nodes.
  • Log off all remote connections to other CAC machines.
  • Disconnect locally mapped drives to the CAC file server. If you do not do this, the system will automatically lock your account.

If Your Password Already Expired

Your password will expire after six months or 185 days. About a week before your password expires, you will be asked if you want to change it. You can do it then or wait until it expires. If your password has expired, you will be prompted to change it, consistent with the #Rules for Creating Passwords. After you change your password, you will be logged in.

Password Expiration Date

To see when your password expires, open a command prompt window on a login node, then issue the command

 net user <your login id> /domain

There is no Linux equivalent.

Locked Accounts

There have been instances in which user accounts have been locked. Some common causes of locked accounts and the solutions are:

  • Mistyping your password several times in a row.
    Solution: Wait about a 1/2 hour and then try again. Be sure that your caps lock key is not on!
  • Trying to login to a Windows login node by using SSH when you have a new or expired password.
    Solution: Login to a Windows login node using Remote Desktop Connection or SSH to a linux login node.
  • Failing to log off all other sessions connected to login nodes.
    Solution: Log off all remote connections. Disconnecting the sessions is not enough.
  • Failing to disconnect locally mapped drives to the CAC file server before changing your password.
    Solution: Disconnect all locally mapped drives, wait a 1/2 hour until account is unlocked, and then re-map the drive with the new password.

If you can't log on or can't wait you can submit a Password Reset ticket on our [issue tracking system]

Linux Shells

A good general introduction to the UNIX/Linux shells and their use is the UNIX Tutorial for Beginners.

The default login shell on v4 interactive and batch nodes is sh. Be aware that in Red Hat Enterprise Linux, /bin/sh is a soft-link to /bin/bash, so you are really using a variant of bash. Accordingly, you will find that "man sh" brings up the man page (the help document) for bash. In a way, then, you can think of your login shell as being bash, too.

There are slight differences between sh and bash, however. The "Invocation" section of the man page states: "If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible." Therefore, you will find that ~/.profile is run at login, because this behavior is common to both sh and bash; but any interactive sh shells you start thereafter will not run ~/.bashrc as you might expect from bash. The way to get sh to do this is to "export ENV=~/.bashrc" beforehand (perhaps as part of your .profile).

Let's say you simply prefer to have bash as your default shell and be done with it. There are two ways to accomplish this. First, you can "export SHELL=/bin/bash" in your .profile; then all subsequent interactive shells will truly be bash. Second, you can enter "chsh -s /bin/bash", which forces all login and interactive shells to be bash (because you have changed your default shell). The problem with the second method is it may well wreck your batch environment, too, because the scheduler sets it up under the assumption that the login shell is sh.

The relationship between the csh and tcsh shells is similar to the one between sh and bash. For instance, your csh shells are automatically endowed with the tcsh-style ability to retrieve history through the up- and down-arrow keys. The best way to make tcsh into your everyday working shell is to run it on top of sh after you log in (again, you can do this as part of your .profile).

References for Bash

Batch Scheduler - Moab

V4 is scheduled by the Moab scheduler and jobs are run on the system by the Torque resource manager. This means that you'll be submitting jobs to Moab, but they'll be executed by Torque. The links below provide you with information on how to effectively use the system.

Queues

Operating system: Red Hat Enterprise Linux Server release 5.1
Processors: 2 quad-core CPUs, 2.5 GHz Intel Xeon E5420 ("Harpertown")
L2 Cache: 2x6 MB/processor, 24 MB/server
Network: Force10 Gigabit Ethernet

  • v4 - Main batch queue to get access to blades.
    Number of Nodes: 19 servers (152 cores total)
    Named: compute-3-[28-46]
    Memory: 16 GB RAM/server
    Limits: Maximum of 19 nodes(152 processors); no walltime limit
  • v4dev - Development queue to get access to development blades for debugging/testing, etc.
    Named: compute-3-[47-48]
    Limits: Maximum of 2 nodes(16 processors). Maximum walltime: 60 minutes
  • v4-64g - Queue for access to high-memory (64GB) servers
    Number of Nodes: 4 servers, with total of 32 cores
    Named: lmcompute-4-[1-3]
    Memory: 64 GB RAM/server
    Limits: Maximum of 3 nodes(24 processors); no walltime limit

Compiling and Linking Codes

Use /tmp to compile large codes and software packages. This will provide improved performance and greater system stability.

C/C++ and Fortran Codes

  • GNU compilers gcc, g++, g77 are in /usr/bin, which is in the default path.
  • Intel compilers require setup files.
    • Fortran (ifort):
      bash: source /opt/intel/fc/9.1.032/bin/ifortvars.sh
      tcsh: source /opt/intel/fc/9.1.032/bin/ifortvars.csh
    • C/C++ (icc):
      bash: source /opt/intel/cc/9.1.045/bin/iccvars.sh
      tcsh: source /opt/intel/cc/9.1.045/bin/iccvars.csh
  • Help: First source the setup file.
    • Fortran: man ifort or info ifort
    • C/C++: man icc or info icc
  • Standard compiler options - The clusters are Intel Core2 processors, so standard compiler options are:
    • For Intel: -O3 -fno-alias -ipo -mtune=core2 -march=core2 -xT
  • Other options of possible interest (consult man pages):
    • For Intel: -align -scalar_rep -prefetch

Generating Debugging Info

  • Intel compilers
    • icc -Wall
    • ifort -g -debug -warn -C (-CB for bounds checking only)

MPI Programs

To convert an existing makefile to use the facilities at the CAC, we recommend that you use mpicc and mpif77. These use the Intel compilers by default. Documentation for the Intel MPI Library, including mpdboot and mpiexec is in PDF on the Intel Support Site The ROCKS operating system comes with several versions of MPI (mpich2, mvapich). You have to play with environment variables and paths to get them to work.

Intel MKL

Intel's Math Kernel Library (MKL) is a good source of optimized routines for linear algebra, Fast Fourier Transforms, vector math, and other mathematical operations. OpenMP multithreading is built into MKL. It can be enabled simply by using "export" or "setenv" to set OMP_NUM_THREADS to the desired number. This might be 8, say, to utilize eight cores on a given compute node. Here are examples of compiler commands you can use to link MKL statically:

  • icc mycode.c -o mycode -L/opt/intel/mkl/10.0.3.020/lib/em64t -lmkl_em64t -openmp
  • mpicc mympicode.c -o mycode -L/opt/intel/mkl/10.0.3.020/lib/em64t -lmkl_em64t -openmp

The -openmp option is preferred when using Intel compilers for static linking. If additional static libraries are required (e.g., LAPACK), they should be specified using the full path ($MKLPATH/libmkl_lapack.a). The $MKLPATH variable is defined after the following steps have been performed:

  • In bash (or sh):
    • source /opt/intel/mkl/10.0.3.020/tools/environment/mklvarsem64t.sh
    • export MKLPATH=$MKLROOT/lib/em64t
  • In tcsh (or csh):
    • source /opt/intel/mkl/10.0.3.020/tools/environment/mklvarsem64t.csh
    • setenv MKLPATH $MKLROOT/lib/em64t

If you prefer to link dynamically, here are the equivalent commands to the above:

  • icc mycode.c -L/opt/intel/mkl/10.0.3.020/lib/em64t -lmkl -liomp5 -lpthread
  • mpicc mympicode.c -L/opt/intel/mkl/10.0.3.020/lib/em64t -lmkl -liomp5 -lpthread

Running a dynamically linked code requires the directory /opt/intel/mkl/10.0.3.020/lib/em64t to be in your LD_LIBRARY_PATH at runtime. You can use one of the "mklvars" scripts take care of this. (Note: -lguide may be used in place of -liomp5 to provide backwards compatibility with older MKL binaries. However, -liomp5 is newer and preferred.)

MKL can also be linked via the GNU compilers, though the syntax becomes rather lengthy. Here are examples of the static and dynamic linking commands:

  • gcc mycode.c -o mycode -Wl,--start-group $MKLPATH/mkl_gnu_lp64 $MKLPATH/libmkl_gnu_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group $MKLPATH/liomp5 -lpthread
  • gcc mycode.c -L/opt/intel/mkl/10.0.3.020/lib/em64t -lmkl_gnu_lp64 -lmkl_gnu_thread -lmkl_core -liomp5 -lpthread

Note, if your main program happens to be threaded with OpenMP, too, then you may want to link MKL sequentially in order to get the best performance. Much more information on linking MKL can be found in the Sec. 5 of the User Guide ("firefox /opt/intel/mkl/10.0.3.020/doc/userguide.pdf").

TeraGrid Access

The TeraGrid is an infrastructure of computing resources, data resources, software toolkits, and other facilities connected by a high performance network across the U.S.A. The CAC cluster V4 connects directly to it on a 10 Gb/s network link. File transfers with other TeraGrid sites over this network are much faster than internet transfers and do not incur Cornell's internet network charges (NUBB charges).

The CAC would be happy to help you apply for a TeraGrid allocation. This is an application process to obtain a TeraGrid account giving you access to computing resources around the country. This account is necessary to use the TeraGrid for file transfer.

To transfer files to or from the CAC from Cornell campus, use secure copy (scp) or mount a network drive. This transfer should be relatively quick over campus networks and will not cost NUBB charges.

Once the files are on CAC fileservers, logon to the v4 cluster's login server, find the Globus Toolkit with "ls -d /opt/globus*", and follow instructions to use TeraGrid Single sign-on.

The Globus Toolkit provides single sign-on versions of ssh (gsissh), scp (gsiscp), and ftp (globus-url-copy), all available from the login server on V4. The CAC does not run a gridftp server, but a gridftp client is enough to exchange files with any of the TeraGrid gridftp servers.

Applications

Using Subversion

What is Source Control?

Subversion and CVS are source control software, which means they help you collaborate with others on developing source code by saving versions of the code as you write it.

  • Subversion installed on Linux and Windows Clusters
  • CVS installed on Linux cluster
  • Git not installed

Cornell's SourceForge

Cornell has moved towards Subversion now that CIT runs a free SourceForge Enterprise site. You can login with Cornell Single Sign-on. If you join Cornell's Forge and get an account, read the welcome email carefully because you need to register for a second password to use SVN.

SVN Versions at CAC

On Windows, the TortoiseSVN client is a newer version than the Linux cluster's version. If you check out code on Windows, then return to Linux, you will get an error of the form:

This client is too old to work with working copy...

You can downgrade the working copy to a common version using a script available on the Subversion FAQ

Personal tools