Data Analysis and Technology Insertion

Texas Advanced Computing CenterRanger System

 

Cornell University Insignia

Technology Insertion

 

             System Error Log Analysis Focused on Failure Prediction & Prevention

 

                 Open Fabrics Infiniband Subnet Manager Log Analysis

 

1. Extract source from opensm-3..1.11.tar

· Tar file can be obtained from http://www.openfabrics.org/

· Altap Salamander 2.51 allows a Windows machines to unpack RedHat
RPMs
http://www.altap.cz/

2. Run the *Perl Script osm_check.pl from within the opensm-3.1.11/opensm subdirectory & redirect
output to a file

· The resulting output file will contain a list including:

*  Every opensm routine that calls "osm_log" routine

*  The line number in the routine "osm_log" is called from

*  The type of logging message it is (ERROR, DEBUG, INFO, VERBOSE, SYS, FUNCS)

*  The message that is sent to the "osm_log" routine

(For the purpose of this experiment only OSM_LOG_ERROR messages are of interest and
the resulting file is:
opensm-3.1.11-error_messages.txt)

 

3.  Download log files from ranger.tacc.utexas.edu

·    Files are in /scratch/projects/slaluer/osm

 

4.  Run the *Perl Script Process.pl which:

· gunzips each log file

· Runs Scan2.pl against it which:

* Creates regular expressions of all error messages
in
opensm-3.1.11-error_messages.txt

* Scans each log file looking for error messages

* Keeps a count of:

¨ Number of lines in each log file

¨ Number of errors found in each file

¨ Number of comparisons (log file line vs. error message
regular expressions)

· Re-gzips the file

· Repeats for each log file.

· Results are in the file Errors.log

· Statistics from Errors.log are in osm_log_stats.xlsx

 

                                  *Note: Perl script may need to be modified to run on your system

 

             Cayuga (example from CAC—Ranger Data Analysis Workshop October 23-24 2008)

                

                                  CayugaConfig.xml - a Cayuga configuration file
                                                   (
runs on Cayuga Server)

 

                                  OpenSMSchema.xml - schema for incoming OpenSM events
                                                   (
Cayuga directory on Cayuga Server

                                 

                                  SendLogtoCayuga.py - tails log file for each event and sends then to Cayuga
                                                   (
wherever log files of interest are being created)

 

                                  SendLogtoCayuga.prop - configuration file for SendLogtoCayuga.py
                                                   (
wherever  log files of interest are being created)


                                 
EventHandler.py - reads events sent to it over a socket from Cayuga server
                                                   (
wherever  necessary to handle various events)


                                 
OpenSM_query.txt - a simple query that Cayuga will run against the incoming data stream
                                                   (
Cayuga directory on Cayuga Server)

                

                                  Log_Sample.txt  - sample error messages from an OpenSM log file