Work stuff


These are some scripts I did at work. As they are probably useful for other people too, I publish them here.
In case you're wondering why I do that here and not at my workplace: Personal homepages have been turned off at my employer, and "replaced" by an (absolutely useless) blog system.


This is a simple script to check the status of Knuerr Cool Loops (Rack attachments for water cooling). This is a really trivial script, intended for usage with Nagios. Usage instructions can be found in the comments at the beginning of the file.
[Download check_coolloop.pl 110817]


This a rewrite in perl of the mauireswww-utility included with the maui cluster scheduler. The original utility is written in perl and not really configurable, so I decided it was time for a (better) rewrite.
[Download mauireswww-perl 081002]


This little perl script uses the hp array command line utility hpacucli to read the status from their RAID controllers. This is great if you don't want to install the tons of software (that opens half a gazillion security holes) that HP provides just to get a notification when a disc in your RAID fails.
It has been tested with Smart Array P400 controllers as well as external MSA20 boxes connected to a Smart Array 6400.
It is supposed to be run from cron and will keep the status of the RAID in a file. When anything changes, it will output the differences and some other info (serial and what else you might need to open a case) to stdout. Note that it doesn't send mail by itself, that will have to be made by cron.
Usage examples (cron lines):
# Make sure cron mails to the proper address

# Check status of disks on onboard controller in slot 1 once a day
49 3 * * *   /root/bin/checkhpraidstatus.pl -d slot=1 -q -f /var/cache/raidstatus-onboardp400

# Check the MSA20 with the chassis number 01 twice a day
29 10,16 * * * /root/bin/checkhpraidstatus.pl -d chassisname=01 -q -f /var/cache/raidstatus-msa20-nr01
Example output:
Changed RAID status detected!

Changed: physicaldrive 1:6 status: from 'OK' to 'Predictive Failure'
    status of physical drive changed (drive failed/replaced?),
    therefore dumping info about it:
        variable                 old value                 new value                    
        bay                      6                         6                            
        drive type               Data Drive                Data Drive                   
    !!! status                   OK                        Predictive Failure        !!!
        sata ncq capable         False                     False                        
        model                    Seagate ST3500641AS       Seagate ST3500641AS          
        firmware revision        3.AJJ                     3.AJJ                        
        serial number            3PM1R4WT                  3PM1R4WT                     
        size                     500.1 GB                  500.1 GB                     
        interface type           SATA                      SATA                         
        box                      1                         1                            

RAID data:
Serial number           : PAAAXXXXXXX999 (E09XXX9XX)
unused space            : 0 MB
expand priority         : Low
status                  : OK
sata ncq supported      : False
rebuild priority        : High
cache board present     : True
array                   : A
interface type          : SATA
host bus adapter slot   : 1b
bus interface           : SCSI
read cache size         : 56 MB
battery status          : OK
hardware revision       : Rev A
controller status       : OK
chassis name            : 03
surface scan delay      : 15 sec
raid 6 (adg) status     : Enabled
accelerator ratio       : 50% Read / 50% Write
write cache size        : 56 MB
battery pack count      : 2
drive write cache       : Disabled
chassis slot            : 2
total cache size        : 112 MB
firmware version        : 1.52
host bus adapter port   : 1
cache status            : OK

Some more hints: the -d parameter essentially is what is passed as the controller parameter to hpacucli. For builtin controllers it's usually 'slot=1'. Run hpacucli controller all show to get a list of available controllers.
[Download checkhpraidstatus.pl]


This is a rather simple benchmark for testing performance of (parallel) filesystems in C using MPIIO. Just compile with mpicc.
mpiio-bench [--bufsize n] [--numiter n] [--file filename] [--read] [--write] [-v]
Each MPI Process will write/read bufsize Bytes numiter times to/from file and then print the data rate that was achieved. It will also check data read for errors. -v makes output more verbose. Can be repeated multiple times.
[Download mpiio-bench.c]


This little tool can be used to properly pin the threads spawned by OpenMP programs to the CPU cores in the system. OpenMP creates threads using the pthread library, and this utility just overrides the creation function by one that does the pinning. Compile and usage instructions are included in the .c file.
[Download pthread-overload.c]
Note: There is now an improved version of this included under the name likwid-pin in the likwid lightweight performance tools, so there is no reason to use this anymore.  

last modified: 28.08.11 / fox