Difference between revisions of "FAQ"
(Removed content in Red Cloud section because it all refers to Eucalyptus, not OpenStack)
(→Red Cloud: Added tips about Connecting to Instances)
|Line 61:||Line 61:|
= Red Cloud =
= Red Cloud =
Revision as of 16:40, 25 February 2019
How can I obtain a CAC account?
How can I determine the number of hours I have left before I reach my project limit?
Check links from the CAC Projects page.
My account is locked.
If it was locked after repeated password failures, it should automatically unlock after 30 minutes. Otherwise: Contact Support
I forgot my password, or have problems with a new password, or need a password reset.
Are my login id and password the same for all machines?
Yes. For an ssh connection give your login id at the prompt. With a Windows GUI, specify the username as CTC_ITH\<login_id> or <login_id>@tc.cornell.edu.
When I try to use a Remote Desktop client to connect to winlogin, it tells me that my username/password are incorrect.
Make sure that you are logging using the CTC_ITH domain. If you just put your username in the "username" box, it will try to log you into winlogin as a local user, which won't work. Put CTC_ITH\<username> in the "username" box.
How can I copy files to my desktop from H:?
Use SSH client to sftp files. See File_Transfer_To_Clusters.
Can't use scp to transfer files to the CAC.
Problems using WinSCP.
Showed how to use outgoing ftp folder and sent detailed instructions by email.
Can't access files.
System problem. Send email to firstname.lastname@example.org.
Can see files in explorer, but sees files only in home directory with dir at command prompt.
User had navigated Start | Run, then typed the command command. Needs to use the command cmd.
How Do I Transfer Files To and From CAC Machines?
- Use a program to send them - SecureShell
- Faster over slower connections.
- Less hassle.
- Make your CAC home directory look like a local drive - FileAccess
- Works fine on campus.
- Convenient for editing.
If you have any questions, please Web site contact Send email or call 607.254.8686.
Why use a temporary directory
It is faster to perform local file I/O and copy complete data files to/from $HOME at the beginning and the end of the job, rather than perform I/O over the network ($HOME is network mounted on the compute nodes).
- Torque creates a uniquely named directory (/tmp/$PBS_JOBID) when a job starts and stores the path of this directory in the $TMPDIR environment variable. This directory is cleaned up when the job exits.
- To use this feature, reference $TMPDIR
- You may create directories for file read/writes outside your /tmp/$PBS_JOBID in /tmp. You do risk leaving any data there; it may be deleted at any time we see /tmp getting full.
Connecting to Instances
First, ensure that the instance has finished being created by checking in the OpenStack Web Interface or the OpenStack CLI. Next, refer to the available documentation for accessing Linux instances or accessing windows instances. If you are having trouble connecting to your instance, please review this documentation first to ensure you're following the correct steps. If you have created a Linux instance and are having trouble connecting via
ssh, try the troubleshooting steps. If you are still having trouble, Contact Support.
Scheduler Frequently Asked Questions
Why are you using Maui and Torque now?
We have switched to using a nationally recognized resource manager and scheduler in order to make the usage of our systems align more closely with the national community. This also allows us to leverage the considerable capabilities of the Maui software to ensure optimal and flexible use of our systems.
When's my job going to run?
If you have already submitted your job and you'd like to know that, use the showstart command to find estimated start times. If you are trying to decide where to run your job so that it runs the soonest, you'll want to examine the showbf command. This allows you to search for when a job with particular resource requirements will run.
Why is my job stuck in the queue?
Sometimes your job doesn't run, even though it looks like it should. Maybe there are few jobs running in the cluster, and your job still won't run.
- Find your jobids with "showq -u username"
- Use "checkjob -v jobid" to examine one of the jobs. Examining Checkjob -v discusses how to read this output.
Jobs in the "Batch Hold" state initiate emails to the system administrators. For other problems, contact CAC help.
Why is my job deferred?
There can be several reasons for a job to defer. Sometimes when the Maui scheduler's queue is full, two jobs attempt to start on a node at the same time, and one will switch to being deferred. On this occasion, if you type "checkjob -v <jobid>", you will see, at the bottom, the message:
Message job rejected by RM 'scheduler' - job started on hostlist compute-3-40.v4linux,compute-3-37.v4linux,compute-3-35.v4linux,compute-3-34.v4linux at time 13:11:22_07/20, job reported idle at time 13:11:53_07/20 (see RM logs for details)
In this case, the only way to make this job run is to notify help at CAC.
What are the queues/affiliations?
Affiliations was the term used by the vsched scheduler to indicate the name of the queue that jobs were submitted to. Most schedulers use the term queue (The scheduler also uses the term "class" to represent the same entity), so you can substitute the word you prefer. V4 queues are listed on the v4 Linux Cluster page.
When I try to run mpdboot I get an error regarding bad python version
This type of message goes on to say, "You can't run mpdboot on ['compute-3-44.v4linux'] version of python must be >= 2.4, current..." Mpdboot uses python and ssh to start MPI daemons on all nodes of your job. It begins by using ssh to ask what version of python is running on each node.
Usually, this error means that ssh is having a problem establishing communication for the mpds. First, make sure you added "-r ssh" to your mpdboot line. If that looks OK, then try to rename (mv) the .ssh directory in your home directory to something like .ssh_bak. Log out, and log back in. A new .ssh directory should be recreated for you automatically (you can verify with "ls -la") which should have valid keys in it.
You may also get this error if you are using a version of Python which does not work with mpdboot. In general, mpdboot needs python 2.3 or newer, but it gets very picky about versions newer than 2.4, as well. If you are trying to run Python 2.5 or 2.6 from your own directory, sometimes mpdboot will find only older versions when it does ssh to the other nodes in your job (because a non-interactive ssh can have a different path). One way to ensure mpdboot runs properly in this case is to ensure it uses the system copy of python. In bash, you can set the path for a command before you invoke it, here so that the system Python is used.
PATH=/usr/bin:/bin:/opt/intel/impi/3.1/bin64/ mpdboot ...
What variables does PBS define in the job script?
Some of the variables are listed in qsub documentation but a good way to see the working environment is to submit a batch job which just does "env>variables.txt" and look for the ones starting in "PBS_".
No Job Control Warning for CSH and TCSH
The output file from the script starts with the error:
Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
This warning means that the fg, bg, and ampersand will not work in your script files. If your default user shell is csh or tcsh, the job will try to execute your script using csh or tcsh, and you'll get this warning. Bash doesn't have this problem.
You can force your script to start with the Bash shell using a PBS directive:
#PBS -S /bin/sh
When Torque starts your job, it will now use Bash, but it won't actually call your .bashrc. If you have any startup files to modify the path or set other variables, you can add to the start of your script, after the PBS directives:
Another nice way to ensure your favorite variables are defined is to submit the script with the -V option:
nsub -V batch.sh
This option copied whatever environment variables you have defined on the command line to the script when it runs. In short, if you could run something interactively, it should run when the scheduler executes the job.
Mpiexec Won't Accept -ppn Argument
The default MPI, Intel MPI, requires that you put the -ppn argument before the -np argument. The nodes have at least three versions of mpiexec installed. The default is Intel MPI under /opt/intel. If you modify your shell's path, in .bashrc or .cshrc, to put /usr/local/bin before the default path, then you may be getting the OSC mpiexec. This version does not depend on mpdboot. It talks directly with Torque to start jobs. A drawback is that the OSC mpiexec, on our system, cannot start more than one job per node. That's why it's not the default one to use.
I cannot find my output file
If you do not specify an output file when submitting a batch script, then it will automatically produce a file with a name like 110432.scheduler.v4linux.OU in the directory which was the working directory when you submitted your job. If you specify an output file with a command like "#PBS -o out.txt", then that file will be in your $HOME directory. This behavior has changed in recent versions of the scheduler.
Microsoft Visual Studio
Has CAC installed Visual Studio and the Intel compilers on winlogin?
No, not at the present time. This section of the FAQ pertains to Red Cloud users who have installed this software.
Where is nmake?
C:\Program Files\Microsoft Visual Studio\VC98\bin\nmake. Call setup_visualc.bat
How can you find the cl compiler?
Can't find uuid.lib.
It's in C:\Program Files\Microsoft SDK\lib.
LINK fatal error LNK1201: error writing to program database H:\users\...\some.pdb; check for insufficient disk space, invalid path, or insufficient privilege.
Suspicion is that there is an older version of the file some.pdb. Delete that file and rebuild.
How do I use Intel Fortran at the command line?
First, call setup_intelf32.bat. The compilation command is ifort.
Fortran program gives an access violation. What to do? forrtl: severe (157): Program Exception - access violation
Segmentation fault. Look for a place where you are writing past the end of an array.
Fortran program gives stack overflow. What to do? forrtl: severe (170): Program Exception - stack overflow
Increase the space available on the stack with the flag /F, where is the size of the stack in bytes. The default is 1000000. Try /F10000000. Increase as necessary.
What is the command line syntax to compile a Fortran code with OpenMP?
See the info provided by "ifort -h". There are 4 options beginning with /Qopenmp.
Fortran program gives convergence errors when compiled with with /O1, /O2, /O3.
Add /Op flag to enable better floating point precision.
For a Fortran code, how do I set up debugging, either for the Release version in VS or at a command prompt?
Let's say you would like to debug an optimized Intel Fortran code, created either as a Release version in Visual Studio (VS) or at a command prompt with /O2. A Debug version in VS sets the correct debugging flags, but disables optimization. Add the command-line flags /Zi /debug:full /traceback to the Release version. Specify the linker option /pdbfile:filename.pdb to create the program database file. This file and the executable must be copied into the same directory when you run the program.
Can the Intel C compiler handle makefile dependencies without having to use cygwin's makedepend?
Yes. You can use the /QMM compiler option, which is OFF by default.
- /QM - Generates makefile dependency lines for each source file, based on the #include lines found in the source file.
- /QMD - Preprocess and compile. Generate output file (.d extension) containing dependency information.
- /QMF file - Generate makefile dependency information in file. Must specify /QM or /QMM.
- /QMG - Similar to /QM, but treats missing header files as generated files.
- /QMM - Similar to /QM, but does not include system header files.
- /QMMD - Similar to /QMD, but does not include system header files.