Troubleshooting
From Cornell CAC Documentation
Account
How can I determine number of hours left on my allocation?
- Check the account management page at [1].
- When logged into one of the v4 linuxlogin nodes, you can run 'showbalance' to view remaining compute time. (If you have jobs currently running, the showbalance result has deducted the time requested for the current running job(s) and adjusts to time used once the current running job(s) complete.)
How can I obtain a CAC account?
See Project Requests.
My account is locked.
If it was locked after repeated password failures, it should automatically unlock after 30 minutes. Otherwise: Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
I changed my password. Now I'm locked out.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Batch
My batch job includes vii0047, but I can't login and MPI/Pro says: MPI/Pro error: Failed to login the user on server: vii0047.tc.cornell.edu System Error: Logon failure: the user has not been granted the requested logon type at this com.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Output from batch is not copied to H:.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Allocated 2 nodes, only allowed to use remote desktop connection to master node.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
MPI/Pro Error:SocketException System Error:No connection could be made because the target machine actively refused it.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Copied files to T:\%USERNAME%, but job doesn't give output.
Must cd to T:\%USERNAME% before running job.
Can't move or delete some files in T: on some batch nodes.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Batch jobs that just disappear from queue, having done nothing.
User had set some parameters with a space before and after the = sign, putting a trailing space on the parameter. Remove the spaces.
What can users do about the long time it takes for jobs to clear?
See the "MPI Cleanup" tip at http://www.tc.cornell.edu/services/support/batch/faster_cleanup.asp
Is there a way to make the copyback.bat (which copies the output files periodocally ) file to copy output from all the nodes to the H: drive
Yes. Start /b mpirun -np N parallel_copyback.bat
Need to have different files for each process. How to do this? Problem doing this by a system call in C++ program.
As part of setup file, use commands
cd /D T: del /Q T:\%USERNAME% mkdir T:\%USERNAME%\%MSTI_RANK% copy files.* T:\%USERNAME%\%MSTI_RANK%
Jobs are stuck clearing.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
How to direct jobs from remote machines to CAC for batch? Need software on CAC batch nodes.
Explained that we can do this and how.
What do I need to do to use v3?
See http://www.tc.cornell.edu/Services/Policies/Pages/usage.htm
Copy of executable and input files failed on vi0004.
System problem. Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
I have an error about the path when connecting to a batch machine.
Check your userlogin.bat file. There may be a reference to Visual Studio(VS) in userlogin.bat, but VS is not on batch nodes. Change syntax to "call setup_visualc.bat" or call a different setup file as appropriate.
Can I telnet to batch machines?
No. You need to use a remote desktop connection from a login node to login interactively to a machine on which you have a job running.
Compilers
Where is nmake?
C:\Program Files\Microsoft Visual Studio\VC98\bin\nmake. Call setup_visualc.bat
How can you find the cl compiler?
Call setup_visualc.bat
forrtl: severe (157): Program Exception - access violation
Segmentation fault. Look for a place where writing more than declared.
Trouble with stack overflow in a Compaq Visual Fortran program.
Increase the stack reserve quota, through a flag to nmake or using editbin.
Intel 8.1 compiler gives stack overflow. Intel 7.1 fine. What to do? 0: forrtl: severe (170): Program Exception - stack overflow
Increase the space available on the stack with the flag /F, where is the size of the stack in bytes. The default is 1000000. Try /F10000000. Increase as necessary.
Can't find uuid.lib.
It's in C:\Program Files\Microsoft SDK\lib on the login nodes.
LINK fatal error LNK1201: error writing to program database H:\users\...\some.pdb; check for insufficient disk space, invalid path, or insufficient privilege.
Suspicion is that there is an older version of the file some.pdb. Delete that file and rebuild.
How do I use Intel Fortran at the command line?
First, call setup_intelf32.bat. The compilation command is ifort.
What is the command line syntax to compile with OpenMP?
See the info provided by "ifort -h". There are 4 options beginning with /Qopenmp.
Does the CAC have a tutorial on OpenMP with Fortran?
No, we don't. The focus is on MPI.
Getting convergence errors with Intel 8.1 Fortran with /O1, /O2, /O3. Answer comes out OK. Performance not obviously degraded. How can I fix this so that I don't get the errors?
Add /Op flag to enable better floating point precision. The convergence errors disappear.
I would like to debug an optimized Intel Fortran code, compiled with a flag such as /O2 , created either as a Release version in Visual Studio (VS) or at a command prompt. A Debug version in VS sets the correct debugging flags, but disables optimization. How do I set the appropriate debugging environment for a Release version in VS or at a command prompt?
Add the command-line flags /Zi /debug:full /traceback. Specify the linker option /pdbfile:filename.pdb to create the program database file. This file and the executable must be copied to the same directory on T: when you run the program.
Can the Intel C compiler handle makefile dependencies without having to use cygwin's makedepend?
Yes. You can use the /QMM compiler option, which is OFF by default.
- /QM - Generates makefile dependency lines for each source file, based on the #include lines found in the source file.
- /QMD - Preprocess and compile. Generate output file (.d extension) containing dependency information.
- /QMF file - Generate makefile dependency information in file. Must specify /QM or /QMM.
- /QMG - Similar to /QM, but treats missing header files as generated files.
- /QMM - Similar to /QM, but does not include system header files.
- /QMMD - Similar to /QMD, but does not include system header files.
Files
How can I copy files to my desktop from H:?
Use SSH client to sftp files. See File_Transfer_To_Clusters.
Can't use scp to transfer files to the CAC.
Use sftp.
Problems using WinSCP.
Use sftp.
Showed how to use outgoing ftp folder and sent detailed instructions by email.
Can't access files.
System problem. Send email to consult@tc.cornell.edu.
Can see files in explorer, but sees files only in home directory with dir at command prompt.
User had navigated Start | Run, then typed the command command. Needs to use the command cmd.
H: Network Drive
Mapping H:, can't see files.
Make sure that the DNS settings are correct. Look under Home_Directory_Access for DNS instructions.
Can't map H: any more. Nothing changed.
Could be that the password had expired. Connect to login node with RDC to change password, then map drive.
Can't find H:.
Send e-mail to consult@tc.cornell.edu.
Problems mapping H:. Can see files in CAC Tools but not home directory.
Disconnect H: and remap.
At home, can see his home directory, but no files.
Only certain domains can map H: (need vpn)
Can't see the files in one of his directories.
Permissions problem. Send email to useracct@tc.cornell.edu
Mapping H: with correct DNS settings, but can's see files.
Send email to consult@tc.cornell.edu.
Cannot see files on H:
Send email to consult@tc.cornell.edu.
User can now map drive but cannot enter directory. Files are located on ctcfsrv8\tc_k.
User needs correct DNS settings. User resolved by pointing to 128.84.5.28 (ctcfsrv8) in his host file.
Can't map H: with DFS in MAC OS X.
MAC user's need to obtain Thursby to map H:.
Using rover, pointing to the ctc winsserver does not allow him to see files when mapping H:.
Try mapping ctcfsrv8, which is where the files are. This worked. Can't use the DNS settings with rover unless using vpn. It isn't trusted the way Cornell ip addresses are.
Can't access files.
System problem. Send email to consult@tc.cornell.edu.
Can see files in explorer, but sees files only in home directory with dir at command prompt.
User had navigated Start | Run, then typed the command command. Needs to use the command cmd.
Login
Which machines are the login nodes?
ctclogina, ctcloginb, ctcloginc, ctclogind.
Can't use login machine because of compute-bound processes on the machine.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Can't get to login node with RDC. Times out.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
rdesktop gives an error message: $ rdesktop ctclogina.tc.cornell.edu ERROR: connect: Connection timed out
A firewall may be blocking outgoing connections.
Can't connect to login node.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Can't login using ssh.
Try different type of connection and see if need to change password. Otherwise send email to useracct@tc.cornell.edu and ask to have password reset.
Could not get to scicenter2(sp?) machine, could yesterday
Terminal serve to login node to change password.
Can't get to ctclogina. Gets error msg "The specified remote computer could not be found."
Use complete name ctclogina.tc.cornell.edu.
mpirun command not found on login node.
This is as expected. Don't run jobs on login node.
Connect from login node to batch node. Disconnect from login node. At reconnect, session hung. Can't close window or logoff.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
I have a disconnected session on a login node. When I reconnect, the login screen is blank. What should I do?
Issue ctrl-shift-esc to bring up Task Manager. Select the Applications tab, then New Task. Enter "explorer" and click OK. A normal desktop should reappear. If it doesn't, send e-mail to consult@tc.cornell.edu and ask to be logged off.
I have a login process on ctcloginb that I can not log off.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Wants to debug on login nodes in visual studio.
Told user why debugging is not permitted on login nodes. Suggested collaboratory.
Can't use rdc to login node.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Can't close command windows on login node.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Why does RDC to ctclogina fail?
It could be that you need to use the completely qualified name ctclogina.tc.cornell.edu.
Password
Forgot password.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Problems with new password.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Need password reset.
Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Changed password. Now locked out.
There may be disconnected sessions that the CTC will have to kill. Contact consultants by submitting a ticket on our issue tracking system or calling 607.254.8686.
Are my login id and password the same for all machines?
Yes. For an ssh connection give your login id at the prompt. With a Windows GUI, specify the User Name as <login_id>@tc.cornell.edu or CTC_ITH\<login_id>.
When I use a Remote Desktop Client to connect to winx64login, it says that my username/password are incorrect.
Make sure that you are logging using the CTC_ITH domain. If you just put your username in the "username" box, it will try to log you into winx64login as a local user, which won't work. Put CTC_ITH\<username> in the "username" box.
Web
User wants access to CAC web space for a personal web page.
This is available only for CAC personnel.
Old links break on new CAC web site.
Navigate from the CAC home page.