|
Technology Insertion
System Error Log Analysis Focused on Failure Prediction & Prevention
Open Fabrics Infiniband Subnet Manager Log Analysis
1. Extract source from opensm-3..1.11.tar · Tar file can be obtained from http://www.openfabrics.org/ · Altap Salamander 2.51 allows a Windows machines to unpack RedHat 2. Run the *Perl Script osm_check.pl from within the opensm-3.1.11/opensm subdirectory & redirect · The resulting output file will contain a list including: * Every opensm routine that calls "osm_log" routine * The line number in the routine "osm_log" is called from * The type of logging message it is (ERROR, DEBUG, INFO, VERBOSE, SYS, FUNCS) * The message that is sent to the "osm_log" routine (For the purpose of this experiment only OSM_LOG_ERROR messages are of interest and
3. Download log files from ranger.tacc.utexas.edu · Files are in /scratch/projects/slaluer/osm
4. Run the *Perl Script Process.pl which: · gunzips each log file · Runs Scan2.pl against it which: * Creates regular expressions of all error messages * Scans each log file looking for error messages * Keeps a count of: ¨ Number of lines in each log file ¨ Number of errors found in each file ¨ Number of comparisons (log file line vs. error message · Re-gzips the file · Repeats for each log file. · Results are in the file Errors.log · Statistics from Errors.log are in osm_log_stats.xlsx
*Note: Perl script may need to be modified to run on your system
Cayuga (example from CAC—Ranger Data Analysis Workshop October 23-24 2008)
CayugaConfig.xml - a Cayuga configuration file
OpenSMSchema.xml - schema for incoming OpenSM events
SendLogtoCayuga.py - tails log file for each event and sends then to Cayuga
SendLogtoCayuga.prop - configuration file for SendLogtoCayuga.py
Log_Sample.txt - sample error messages from an OpenSM log file
|
