|
|
(8 intermediate revisions by 2 users not shown) |
Line 11: |
Line 11: |
| | | |
| ====I forgot my password, or have problems with a new password, or need a password reset.==== | | ====I forgot my password, or have problems with a new password, or need a password reset.==== |
− | {{ContactCAC}} | + | [https://{{SERVERNAME}}/services/myacct.aspx Manage CAC Password] |
− | | |
− | ====Are my login id and password the same for all machines?====
| |
− | Yes. For an ssh connection give your login id at the prompt. With a Windows GUI, specify the username as CTC_ITH\<login_id> or <login_id>@tc.cornell.edu.
| |
| | | |
− | ====When I try to use a Remote Desktop client to connect to winlogin, it tells me that my username/password are incorrect.==== | + | ====Are my login id and password the same for all CAC managed machines?==== |
− | Make sure that you are logging using the CTC_ITH domain. If you just put your username in the "username" box, it will try to log you into winlogin as a local user, which won't work. Put CTC_ITH\<username> in the "username" box.
| + | Yes. For an ssh connection give your CAC login id at the prompt. |
| | | |
| | | |
| =Files= | | =Files= |
| | | |
− | ====How can I copy files to my desktop from H:?====
| + | ====Needed to share a file with a colleague outside the university. This is typically provided via Globus.==== |
− | Use SSH client to sftp files. See [[File_Transfer_To_Clusters]].
| + | {{ContactCAC}} |
− | | |
− | ====Can't use scp to transfer files to the CAC.====
| |
− | Use sftp.
| |
− | | |
− | ====Problems using WinSCP.====
| |
− | Use sftp.
| |
− | | |
− | ====Needed to share a file with a colleague outside the university. This is typically available on to CAC personnel.==== | |
− | Showed how to use outgoing ftp folder and sent detailed instructions by email.
| |
| | | |
| ====Can't access files.==== | | ====Can't access files.==== |
− | System problem. Send email to consult@tc.cornell.edu. | + | System problem. {{ContactCAC}} |
| | | |
− | ====Can see files in explorer, but sees files only in home directory with dir at command prompt.==== | + | ====Why use a temporary directory (/tmp) on a cluster compute node?==== |
− | User had navigated Start | Run, then typed the command command. Needs to use the command cmd.
| + | :* $HOME is network mounted on cluster compute nodes. It is faster to perform local file I/O than to perform I/O over the network. Researchers should copy data from $HOME to the compute node's /tmp at the beginning of the job, run the job from /tmp; and copy the results at job end from /tmp to $HOME. |
| | | |
− | ====How Do I Transfer Files To and From CAC Machines?====
| |
− | # '''Use a program to send them''' - [[SecureShell]]
| |
− | #* Faster over slower connections.
| |
− | #* Less hassle.
| |
− | # '''Make your CAC home directory look like a local drive''' - [[FileAccess]]
| |
− | #* Works fine on campus.
| |
− | #* Convenient for editing.
| |
| | | |
| If you have any questions, please [mailto:help@cac.cornell.edu?subject=CAC Web site contact Send email] or call 607.254.8686. | | If you have any questions, please [mailto:help@cac.cornell.edu?subject=CAC Web site contact Send email] or call 607.254.8686. |
− |
| |
− | ====Why use a temporary directory====
| |
− | '''''It is faster to perform local file I/O and copy complete data files to/from $HOME at the beginning and the end of the job, rather than perform I/O over the network ($HOME is network mounted on the compute nodes).'''''
| |
− | '''
| |
− |
| |
− | * Torque creates a uniquely named directory (/tmp/$PBS_JOBID) when a job starts and stores the path of this directory in the $TMPDIR environment variable. This directory is cleaned up when the job exits.
| |
− | ** To use this feature, reference $TMPDIR
| |
− |
| |
− | * You may create directories for file read/writes outside your /tmp/$PBS_JOBID in /tmp. You do risk leaving any data there; it may be deleted at any time we see /tmp getting full.
| |
| | | |
| = Red Cloud = | | = Red Cloud = |
| | | |
− | == Connecting to Instances == | + | == Getting Started == |
| | | |
− | First, ensure that the [[OpenStack#Instances|instance]] has finished being created by checking in the [[OpenStack#Using_the_OpenStack_Web_Interface_.28Horizon.29|OpenStack Web Interface]] or the [[OpenStack CLI]]. Next, refer to the available documentation for [[Red_Cloud_Linux_Instances#Accessing_Instances|accessing Linux instances]] or [[Red_Cloud_Windows_Instances#Accessing_Instances|accessing windows instances]]. If you are having trouble connecting to your instance, please review this documentation first to ensure you're following the correct steps. If you have created a [[Red_Cloud_Linux_Instances|Linux instance]] and are having trouble connecting via <code>ssh</code>, try the [[Red_Cloud_Linux_Instances#Troubleshooting|troubleshooting steps]]. If you are still having trouble, {{ContactCAC}}.
| + | [[Red_Cloud#New_Users|New to Red Cloud]]? The best way to get started is to read the documentation and try things out. Here is a suggested list of pages to look over to help with getting started managing resources. |
| | | |
− | =Linux Batch= | + | === Suggested Reading === |
− | ==Scheduler Frequently Asked Questions== | |
− | {{ContactCAC}}
| |
− | ====Why are you using Maui and Torque now?====
| |
− | We have switched to using a nationally recognized resource manager and scheduler in order to make the usage of our systems align more closely with the national community. This also allows us to leverage the considerable capabilities of the Maui software to ensure optimal and flexible use of our systems.
| |
− | ====When's my job going to run?====
| |
− | If you have already submitted your job and you'd like to know that, use the '''showstart''' command to find estimated start times. If you are trying to decide where to run your job so that it runs the soonest, you'll want to examine the '''showbf''' command. This allows you to search for when a job with particular resource requirements will run.
| |
− | ====Why is my job stuck in the queue?====
| |
− | Sometimes your job doesn't run, even though it looks like it should. Maybe there are few jobs running in the cluster, and your job still won't run.
| |
− | # Find your jobids with "showq -u username"
| |
− | # Use "checkjob -v jobid" to examine one of the jobs. [[Examining Checkjob -v]] discusses how to read this output.
| |
− | Jobs in the "Batch Hold" state initiate emails to the system administrators. For other problems, contact CAC help.
| |
− | ====Why is my job deferred?====
| |
− | There can be several reasons for a job to defer. Sometimes when the Maui scheduler's queue is full, two jobs attempt to start on a node at the same time, and one will switch to being deferred. On this occasion, if you type "checkjob -v <jobid>", you will see, at the bottom, the message:
| |
− | Message[0] job rejected by RM 'scheduler' - job started on hostlist
| |
− | compute-3-40.v4linux,compute-3-37.v4linux,compute-3-35.v4linux,compute-3-34.v4linux
| |
− | at time 13:11:22_07/20, job reported idle at time 13:11:53_07/20 (see RM logs for details)
| |
− | In this case, the only way to make this job run is to notify help at CAC.
| |
− | ====What are the queues/affiliations?====
| |
− | Affiliations was the term used by the vsched scheduler to indicate the name of the queue that jobs were submitted to. Most schedulers use the term queue (The scheduler also uses the term "class" to represent the same entity), so you can substitute the word you prefer. V4 queues are listed on the [[v4 Linux Cluster]] page.
| |
| | | |
− | ====When I try to run mpdboot I get an error regarding bad python version====
| + | # [[Red Cloud]] - includes information about: |
− | This type of message goes on to say, "You can't run mpdboot on ['compute-3-44.v4linux'] version of python must be >= 2.4, current..." Mpdboot uses python and ssh to start MPI daemons on all nodes of your job. It begins by using ssh to ask what version of python is running on each node.
| + | #* The [[Red_Cloud#First_Time_Login|first time you login]] to your [https://www.cac.cornell.edu/services/myacct.aspx CAC Account] |
| + | #* [[Red_Cloud#How_to_Create_and_Manage_Red_Cloud_Resources|Create/Manage resources]] |
| + | #* [[Red_Cloud#How_to_Access_Instances|Accessing instances]] |
| + | #* [[Red_Cloud#Accounting:_Don.27t_Use_Up_Your_Subscription_by_Accident.21|Accounting]] |
| + | # [[OpenStack]] - a '''highly recommended''' quick-start page including instructions for: |
| + | #* The [[OpenStack#Using_the_OpenStack_Web_Interface_.28Horizon.29|Web interface]] |
| + | #* [[OpenStack#Instances|Managing instances]] including: |
| + | #** [[OpenStack#Launching an Instance|launching a new instance]] |
| + | #** [[OpenStack#Instance_States|changing instance state]] |
| + | # Either instructions for [[Red_Cloud_Linux_Instances|Linux Instances]] OR [[Red_Cloud_Windows_Instances|Windows Instances]] |
| | | |
− | Usually, this error means that ssh is having a problem establishing communication for the
| + | === Other Useful References === |
− | mpds. First, make sure you added "-r ssh" to your mpdboot line. If that
| |
− | looks OK, then try to rename (mv) the .ssh directory in your home directory
| |
− | to something like .ssh_bak. Log out, and log back in. A new .ssh
| |
− | directory should be recreated for you automatically (you can verify with
| |
− | "ls -la") which should have valid keys in it.
| |
| | | |
− | You may also get this error if you are using a version of Python which does not work with mpdboot. In general, mpdboot needs python 2.3 or newer, but it gets very picky about versions newer than 2.4, as well. If you are trying to run Python 2.5 or 2.6 from your own directory, sometimes mpdboot will find only older versions when it does ssh to the other nodes in your job (because a non-interactive ssh can have a different path). One way to ensure mpdboot runs properly in this case is to ensure it uses the system copy of python. In bash, you can set the path for a command before you invoke it, here so that the system Python is used.
| + | :* [[Linux Tutorial]] |
− | PATH=/usr/bin:/bin:/opt/intel/impi/3.1/bin64/ mpdboot ...
| + | :* [[Resizing an Instance|Resizing your instance]] |
| + | :* [[OpenStack Key Pairs| Key Pairs]] |
| + | :* [[OpenStack Security Groups| Security Groups]] |
| + | :* [[Volumes]] |
| + | :* [[Images]] |
| + | :* [[Networks]] |
| + | :* [[OpenStack CLI]] |
| | | |
− | ====What variables does PBS define in the job script?==== | + | == Connecting to Instances == |
− | Some of the variables are listed in [http://www.adaptivecomputing.com/resources/docs/torque/2-5-9/commands/qsub.php qsub documentation] but a good way to see the working environment is to submit a batch job which just does "env>variables.txt" and look for the ones starting in "PBS_".
| |
− | | |
− | ====No Job Control Warning for CSH and TCSH====
| |
− | The output file from the script starts with the error:
| |
− | Warning: no access to tty (Bad file descriptor).
| |
− | Thus no job control in this shell.
| |
− | This warning means that the <tt>fg</tt>, <tt>bg</tt>, and ampersand will not work in your script files. If your default user shell is csh or tcsh, the job will try to execute your script using csh or tcsh, and you'll get this warning. Bash doesn't have this problem.
| |
− | | |
− | You can force your script to start with the Bash shell using a PBS directive:
| |
− | #PBS -S /bin/sh
| |
− | When Torque starts your job, it will now use Bash, but it won't actually call your .bashrc. If you have any startup files to modify the path or set other variables, you can add to the start of your script, after the PBS directives:
| |
− | source ~/.bashrc
| |
− | | |
− | Another nice way to ensure your favorite variables are defined is to submit the script with the -V option:
| |
− | nsub -V batch.sh
| |
− | This option copied whatever environment variables you have defined on the command line to the script when it runs. In short, if you could run something interactively, it should run when the scheduler executes the job.
| |
− | | |
− | ====Mpiexec Won't Accept -ppn Argument====
| |
− | The default MPI, Intel MPI, requires that you put the -ppn argument before the -np argument.
| |
− | The nodes have at least three versions of mpiexec installed. The default is Intel MPI under /opt/intel. If you modify your shell's path, in .bashrc or .cshrc, to put /usr/local/bin before the default path, then you may be getting the [http://www.osc.edu/~pw/mpiexec/ OSC mpiexec]. This version does not depend on mpdboot. It talks directly with Torque to start jobs. A drawback is that the OSC mpiexec, on our system, cannot start more than one job per node. That's why it's not the default one to use.
| |
| | | |
− | ====I cannot find my output file====
| + | First, ensure that the [[OpenStack#Instances|instance]] has finished being created by checking in the [[OpenStack#Using_the_OpenStack_Web_Interface_.28Horizon.29|OpenStack Web Interface]] or the [[OpenStack CLI]]. Next, refer to the available documentation for [[Red_Cloud_Linux_Instances#Accessing_Instances|accessing Linux instances]] or [[Red_Cloud_Windows_Instances#Accessing_Instances|accessing Windows instances]]. If you are having trouble connecting to your instance, please review this documentation first to ensure you're following the correct steps. If you have created a [[Red_Cloud_Linux_Instances|Linux instance]] and are having trouble connecting via <code>ssh</code>, try the [[Red_Cloud_Linux_Instances#Troubleshooting|troubleshooting steps]]. If you are still having trouble, {{ContactCAC}}. |
− | If you do not specify an output file when submitting a batch script, then it will automatically produce a file with a name like 110432.scheduler.v4linux.OU in the directory which was the working directory when you submitted your job. If you specify an output file with a command like "#PBS -o out.txt", then that file will be in your $HOME directory. This behavior has changed in recent versions of the scheduler.
| |
| | | |
| | | |
| {{Template:ContactCAC}} | | {{Template:ContactCAC}} |
− |
| |
− | =Microsoft Visual Studio=
| |
− | ====Has CAC installed Visual Studio and the Intel compilers on winlogin?====
| |
− | No, not at the present time. This section of the FAQ pertains to Red Cloud users who have installed this software.
| |
− |
| |
− | ====Where is nmake?====
| |
− | C:\Program Files\Microsoft Visual Studio\VC98\bin\nmake. Call setup_visualc.bat
| |
− |
| |
− | ====How can you find the cl compiler?====
| |
− | Call setup_visualc.bat
| |
− |
| |
− | ====Can't find uuid.lib.====
| |
− | It's in C:\Program Files\Microsoft SDK\lib.
| |
− |
| |
− | ====LINK fatal error LNK1201: error writing to program database H:\users\...\some.pdb; check for insufficient disk space, invalid path, or insufficient privilege.====
| |
− | Suspicion is that there is an older version of the file some.pdb. Delete that file and rebuild.
| |
− |
| |
− | ====How do I use Intel Fortran at the command line?====
| |
− | First, call setup_intelf32.bat. The compilation command is ifort.
| |
− |
| |
− | ====Fortran program gives an access violation. What to do? forrtl: severe (157): Program Exception - access violation====
| |
− | Segmentation fault. Look for a place where you are writing past the end of an array.
| |
− |
| |
− | ====Fortran program gives stack overflow. What to do? forrtl: severe (170): Program Exception - stack overflow====
| |
− | Increase the space available on the stack with the flag /F, where is the size of the stack in bytes. The default is 1000000. Try /F10000000. Increase as necessary.
| |
− |
| |
− | ====What is the command line syntax to compile a Fortran code with OpenMP?====
| |
− | See the info provided by "ifort -h". There are 4 options beginning with /Qopenmp.
| |
− |
| |
− | ====Fortran program gives convergence errors when compiled with with /O1, /O2, /O3.====
| |
− | Add /Op flag to enable better floating point precision.
| |
− |
| |
− | ====For a Fortran code, how do I set up debugging, either for the Release version in VS or at a command prompt?====
| |
− | Let's say you would like to debug an optimized Intel Fortran code, created either as a Release version in Visual Studio (VS) or at a command prompt with /O2. A Debug version in VS sets the correct debugging flags, but disables optimization. Add the command-line flags /Zi /debug:full /traceback to the Release version. Specify the linker option /pdbfile:filename.pdb to create the program database file. This file and the executable must be copied into the same directory when you run the program.
| |
− |
| |
− | ==== Can the Intel C compiler handle makefile dependencies without having to use cygwin's makedepend?====
| |
− | Yes. You can use the /QMM compiler option, which is OFF by default.
| |
− | * /QM - Generates makefile dependency lines for each source file, based on the #include lines found in the source file.
| |
− | * /QMD - Preprocess and compile. Generate output file (.d extension) containing dependency information.
| |
− | * /QMF file - Generate makefile dependency information in file. Must specify /QM or /QMM.
| |
− | * /QMG - Similar to /QM, but treats missing header files as generated files.
| |
− | * /QMM - Similar to /QM, but does not include system header files.
| |
− | * /QMMD - Similar to /QMD, but does not include system header files.
| |