Linux Guide/Monitoring

      Introduction

      This page is an TODO state. anyone is free to complete/contribute to it. For now (2010-06-11) it contains random notes I've been collecting through the time.

      TODO is a mark meaning "to do" ("TODO" is automatically recognized by some editing tools as a pending tasks).

      HARDWARE MONITORING

      Rescanning the SCSI Bus

      Next link provides a quick script to rescan the SCSI bus in Linux.

      There is a simpler way that most of the time will work properly:

      echo "- - -" > /sys/class/scsi_host/host0/scan
      

      An slightly more complex script example for a Qlogic card:

      #!/bin/bash
      for HBA in `ls -A /proc/scsi/qla2xxx/`
      do
         echo "scsi-qlascan" > /proc/scsi/qla2xxx/${HBA}
      done 
      

      Alternatively iscsiadm can be used if available:

       iscsiadm -t discovery --type sendtargets --portal <IP> 
       iscsiadm -t node --targename <targetname>-- portal<IP> --login 
      

      Amognst other documents available on the net Red Hat Enterprise Linux 5 Online Storage Reconfiguration Guide can also be a usefull help.

      ↑Jump back a section

      DMIDECODE

      Dmidecode reports information about your system's hardware as described in your system BIOS according to the SMBIOS/DMI standard (see a sample output). This information typically includes system manufacturer, model name, serial number, BIOS version, asset tag as well as a lot of other details of varying level of interest and reliability depending on the manufacturer. This will often include usage status for the CPU sockets, expansion slots (e.g. AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial, parallel, USB).

      ↑Jump back a section

      TODO IPMI

         What is IPMI?
         
         The Intelligent Platform Management Interface (IPMI) specification
         defines a set of interfaces for platform management.  It is
         implemented by a large number of hardware manufacturers to support
         system management on motherboards. The features of IPMI that most
         users will be interested in are sensor monitoring (i.e. CPU
         temperatures, fan speeds), remote power control, and serial-over-LAN
         (SOL).
         
         What is FreeIPMI?
         
         FreeIPMI provides in-band and out-of-band IPMI software based on the
         IPMI v1.5/2.0 specification.  FreeIPMI provides tools and libraries
         for users to access and read IPMI sensor readings, system event log
         (SEL) entries, serial-over-LAN (SOL), remote power control functions,
         field replaceable unit (FRU) device information, and more.  More
         information about FreeIPMI can be found at the FreeIPMI webpage at:
         
         http://www.gnu.org/software/freeipmi/index.html
      


      ↑Jump back a section

      TODO smartctl:

         ************************************************************************
         ~# smartctl -d cciss,0 -a /dev/cciss/c0d0
         smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
         Home page is http://smartmontools.sourceforge.net/
         
         Device: HP       DH072ABAA6       Version: HPD7
         Serial number: 3PD19ZMN0000983153B8
         Device type: disk
         Transport protocol: SAS
         Local Time is: Sat Jul 19 20:09:09 2008 CEST
         Device supports SMART and is Enabled
         Temperature Warning Enabled
         SMART Health Status: OK
         
         Current Drive Temperature:     29 C
         Drive Trip Temperature:        68 C
         Elements in grown defect list: 0
         Vendor (Seagate) cache information
           Blocks sent to initiator = 899299930
           Blocks received from initiator = 14843797
           Blocks read from cache and sent to initiator = 3793967485
           Number of read and write commands whose size <= segment size = 48565840
           Number of read and write commands whose size > segment size = 0
         Vendor (Seagate/Hitachi) factory information
           number of hours powered up = 945.00
           number of minutes until next internal SMART test = 7
         
         Error counter log:
                    Errors Corrected by           Total   Correction     Gigabytes    Total
                        ECC          rereads/    errors   algorithm      processed    uncorrected
                    fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
         read:          0        0         0         0          0          0.000           0
         write:         0        0         0         0          0          0.000           0
         
         Non-medium error count:        0
         No self-tests have been logged
         Long (extended) Self Test duration: 840 seconds [14.0 minutes]
         
         ************************************************************************
         ~# smartctl -d cciss,1 -a /dev/cciss/c0d0
         
         smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
         Home page is http://smartmontools.sourceforge.net/
         
         Device: HP       DH072ABAA6       Version: HPD7
         Serial number: 3PD19ZPV000098315CX2
         Device type: disk
         Transport protocol: SAS
         Local Time is: Sat Jul 19 20:09:12 2008 CEST
         Device supports SMART and is Enabled
         Temperature Warning Enabled
         SMART Health Status: OK
         
         Current Drive Temperature:     30 C
         Drive Trip Temperature:        68 C
         Elements in grown defect list: 0
         Vendor (Seagate) cache information
           Blocks sent to initiator = 920490987
           Blocks received from initiator = 14368268
           Blocks read from cache and sent to initiator = 3755437180
           Number of read and write commands whose size <= segment size = 48820139
           Number of read and write commands whose size > segment size = 0
         Vendor (Seagate/Hitachi) factory information
           number of hours powered up = 945.02
           number of minutes until next internal SMART test = 8
         
         Error counter log:
                    Errors Corrected by           Total   Correction     Gigabytes    Total
                        ECC          rereads/    errors   algorithm      processed    uncorrected
                    fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
         read:          0        0         0         0          0          0.000           0
         write:         0        0         0         0          0          0.000           0
         
         Non-medium error count:        0
         No self-tests have been logged
         Long (extended) Self Test duration: 840 seconds [14.0 minutes]
         
         ************************************************************************
         ~# smartctl -d cciss,2 -a /dev/cciss/c0d0
         smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
         Home page is http://smartmontools.sourceforge.net/
         
         Device: HP       DH072ABAA6       Version: HPD7
         Serial number: 3PD1A0SD000098300K39
         Device type: disk
         Transport protocol: SAS
         Local Time is: Sat Jul 19 20:09:15 2008 CEST
         Device supports SMART and is Enabled
         Temperature Warning Enabled
         SMART Health Status: OK
         
         Current Drive Temperature:     31 C
         Drive Trip Temperature:        68 C
         Elements in grown defect list: 0
         Vendor (Seagate) cache information
           Blocks sent to initiator = 913141941
           Blocks received from initiator = 11455509
           Blocks read from cache and sent to initiator = 3697098775
           Number of read and write commands whose size <= segment size = 49159966
           Number of read and write commands whose size > segment size = 0
         Vendor (Seagate/Hitachi) factory information
           number of hours powered up = 944.93
           number of minutes until next internal SMART test = 18
         
         Error counter log:
                    Errors Corrected by           Total   Correction     Gigabytes    Total
                        ECC          rereads/    errors   algorithm      processed    uncorrected
                    fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
         read:          0        0         0         0          0          0.000           0
         write:         0        0         0         0          0          0.000           0
         
         Non-medium error count:        0
         No self-tests have been logged
         Long (extended) Self Test duration: 840 seconds [14.0 minutes]
      




      ↑Jump back a section

      DELL OMSA monitorization

      Installing OMSA fro hardware monitorization in Dell Servers:

      OMSA allows to monitor the health of RAIDs, motherboard/disk/chasis temperature, alarm generation, set/modify BIOS, watch installed devices,

      To install under Debian:

      1.- Add to /etc/apt/sources.list the next line:

      deb ftp://ftp.sara.nl/pub/sara-omsa dell sara
      

      2.- Execute

       apt-get update && apt-get install dellomsa
      

      That install OMSA in /opt/dell.

      3.- To boot the system:

      ~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d -run
      ~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d -run
      

      OMSA Usage Examples:

      To check the health of the disc connected to controller 0:

      ~# /etc/delloma.d/oma/bin/omreport.sh storage pdisk controller=0
      

      The output will look similar to:

      List of Physical Disks on Controller PERC 4e/Di (Embedded)
      
         Controller PERC 4e/Di (Embedded)
         ID                        : 0:0
         Status                    : Ok
         Name                      : Physical Disk 0:0
         State                     : Online
         Failure Predicted         : No
         Progress                  : Not Applicable
         Type                      : SCSI
         Capacity                  : 68.24 GB (73274490880 bytes)
         Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
         Available RAID Disk Space : 0.00 GB (0 bytes)
         Hot Spare                 : No
         Vendor ID                 : MAXTOR  
         Product ID                : ATLAS10K5_73SCA
         Revision                  : JNZY
         Serial No.                : J20KVCTK
         Negotiated Speed          : 320
         Capable Speed             : 320
         Manufacture Day           : Not Available
         Manufacture Week          : Not Available
         Manufacture Year          : Not Available
         SAS Address               : Not Available
      
         ID                        : 0:1
         Status                    : Ok
         Name                      : Physical Disk 0:1
         State                     : Online
         Failure Predicted         : No
         Progress                  : Not Applicable
         Type                      : SCSI
         Capacity                  : 68.24 GB (73274490880 bytes)
         Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
         Available RAID Disk Space : 0.00 GB (0 bytes)
         Hot Spare                 : No
         Vendor ID                 : MAXTOR  
         Product ID                : ATLAS10K5_73SCA
         Revision                  : JNZY
         Serial No.                : J20KV5RK
         Negotiated Speed          : 320
         Capable Speed             : 320
         Manufacture Day           : Not Available
         Manufacture Week          : Not Available
         Manufacture Year          : Not Available
         SAS Address               : Not Available
      
         ID                        : 0:2
         Status                    : Ok
         Name                      : Physical Disk 0:2
         State                     : Online
         Failure Predicted         : No
         Progress                  : Not Applicable
         Type                      : SCSI
         Capacity                  : 68.24 GB (73274490880 bytes)
         Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
         Available RAID Disk Space : 0.00 GB (0 bytes)
         Hot Spare                 : No
         Vendor ID                 : MAXTOR  
         Product ID                : ATLAS10K5_73SCA
         Revision                  : JNZY
         Serial No.                : J20KTS8K
         Negotiated Speed          : 320
         Capable Speed             : 320
         Manufacture Day           : Not Available
         Manufacture Week          : Not Available
         Manufacture Year          : Not Available
         SAS Address               : Not Available
      


      To check the state/configuration of the RAID:

         ~# /etc/delloma.d/oma/bin/omreport.sh storage vdisk controller=0
      

      That will look like:

         Virtual Disk 0 on Controller PERC 4e/Di (Embedded)
         
         Controller PERC 4e/Di (Embedded)
         ID                  : 0
         Status              : Ok
         Name                : Virtual Disk 0
         State               : Ready
         Progress            : Not Applicable
         Layout              : RAID-5
         Size                : 136.48 GB (146548981760 bytes)
         Device Name         : /dev/sda
         Type                : SCSI
         Read Policy         : Adaptive Read Ahead
         Write Policy        : Write Back
         Cache Policy        : Direct I/O
         Stripe Element Size : 64 KB
      

      To get an summary of the server:

         ~# /etc/delloma.d/oma/bin/omreport.sh system summary
         System Summary
         
         ------------------
         Software Profile
         ------------------
         Systems Management
         Name                       : Information not available.
         Version                    : 3.2.0
         Description                : Systems Management Software
         
         Operating System
         Name                       : Linux
         Version                    : Kernel 2.6.18.2 (i686)
         System Time                : Sun Nov 25 18:30:37 2007
         System Bootup Time         : Fri Oct 12 15:20:31 2007
         
         --------
         System
         --------
         System
         Host Name                  : MySuperServidor
         System Location            : Please set the value
         
         ---------------------
         Main System Chassis
         ---------------------
         Chassis Information
         Chassis Model              : PowerEdge 2850
         Chassis Service Tag        :
         Chassis Lock               : Present
         Chassis Asset Tag          :
         
         Processor 1
         Processor Manufacturer     : Intel
         Processor Family           : Xeon
         Processor Version          : Model 4 Stepping 3
         Current Speed              : 3200 MHz
         Maximum Speed              : 3600 MHz
         External Clock Speed       : 800 MHz
         Voltage                    : 1400 mV
         
         Processor 2
         Processor Manufacturer     : Intel
         Processor Family           : Xeon
         Processor Version          : Model 4 Stepping 3
         Current Speed              : 3200 MHz
         Maximum Speed              : 3600 MHz
         External Clock Speed       : 800 MHz
         Voltage                    : 1400 mV
         
         Memory
         Total Installed Capacity   : 2048 MB
         Memory Available to the OS : 2023 MB
         Total Maximum Capacity     : 16384 MB
         Memory Array Count         : 1
         
         Memory Array 1
         Location                   : System Board or Motherboard
         Use                        : System Memory
         Installed Capacity         : 2048 MB
         Maximum Capacity           : 16384 MB
         Slots Available            : 6
         Slots Used                 : 2
         ECC Type                   : Multibit ECC
         
         Slot PCI1
         Adapter                    : [Not Occupied]
         Type                       : PCI X
         Data Bus Width             : 64 Bits
         Speed                      : 133 MHz
         Slot Length                : Long
         Voltage Supply             : 3.3 Volts
         
         Slot PCI2
         Adapter                    : [Not Occupied]
         Type                       : PCI X
         Data Bus Width             : 64 Bits
         Speed                      : 133 MHz
         Slot Length                : Long
         Voltage Supply             : 3.3 Volts
         
         Slot PCI3
         Adapter                    : PRO/100 S Server Adapter
         Type                       : PCI X
         Data Bus Width             : 64 Bits
         Speed                      : 133 MHz
         Slot Length                : Short
         Voltage Supply             : 3.3 Volts
         
         BIOS Information
         Manufacturer               : Dell Inc.
         Version                    : A04
         Release Date               : 09/22/2005
         
         --------------
         Network Data
         --------------
         IP Address Data
         IP Address 0               : 192.168.2.2
         IP Address 1               : 192.168.0.115
         
         --------------------
         Storage Enclosures
         --------------------
         Storage Enclosures
         Name                       : Backplane
         Service Tag                : 62P00P8
      


      ↑Jump back a section

      TODO logwatch

      SOFTWARE MONITORING

      ↑Jump back a section

      Table of monitoring tools

      Sintaxis Brief explanation
      top Allows to watch and administer running processes (usefull to kill processes).
      press 'q' to quit, 'k' to kill a process,
      htop Similar to top, but with a more friendly menu based user interface.
      lsof Shows which processes are "touching" a file or directory and also the set of files beeing accesed by a process (that includes too any network socket, pipe or device).
      netstat Provides stats and reports for network usage and conections (established and listenning connections)
      vmstat Provides stats about the memory ussage
      iostat Provides stats about reads/writes to external devices
      inotifywatch
      inotifywait
      Moderm Linux kernels allow to notify processes (user applications) any access or change to a file instantaneously. 'inotifywatch' and 'inotifywait' commands allows to wait for new events from the kernel notifying anything related to a set of files/directories.
      strace -p <pid>
      Allows to monitor system calls (calls to the services offered by the kernel) from a user aplication.
      stap Allows to monitor the kernel in real time and with high detail. A tutorial can be read here
      oprofile and perfmon2 allow access to hardware performance counters; A tutorial can be browsed here
      AMD CodeAnaylist front-end graphical user interface to Oprofile. An introduction/tutorial can browsed here and here
      Intel VTune Allows for Performance tuning in Intel hardware
      ↑Jump back a section
      Last modified on 3 November 2010, at 15:33