Linux Guide/Monitoring

IntroductionEdit

This page is an TODO state. anyone is free to complete/contribute to it. For now (2010-06-11) it contains random notes I've been collecting through the time.

TODO is a mark meaning "to do" ("TODO" is automatically recognized by some editing tools as a pending tasks).

HARDWARE MONITORINGEdit

Rescanning the SCSI BusEdit

Next link provides a quick script to rescan the SCSI bus in Linux.

There is a simpler way that most of the time will work properly:

echo "- - -" > /sys/class/scsi_host/host0/scan

An slightly more complex script example for a Qlogic card:

#!/bin/bash
for HBA in `ls -A /proc/scsi/qla2xxx/`
do
   echo "scsi-qlascan" > /proc/scsi/qla2xxx/${HBA}
done 

Alternatively iscsiadm can be used if available:

 iscsiadm -t discovery --type sendtargets --portal <IP> 
 iscsiadm -t node --targename <targetname>-- portal<IP> --login 

Amognst other documents available on the net Red Hat Enterprise Linux 5 Online Storage Reconfiguration Guide can also be a usefull help.

DMIDECODEEdit

Dmidecode reports information about your system's hardware as described in your system BIOS according to the SMBIOS/DMI standard (see a sample output). This information typically includes system manufacturer, model name, serial number, BIOS version, asset tag as well as a lot of other details of varying level of interest and reliability depending on the manufacturer. This will often include usage status for the CPU sockets, expansion slots (e.g. AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial, parallel, USB).

TODO IPMIEdit

   What is IPMI?
   
   The Intelligent Platform Management Interface (IPMI) specification
   defines a set of interfaces for platform management.  It is
   implemented by a large number of hardware manufacturers to support
   system management on motherboards. The features of IPMI that most
   users will be interested in are sensor monitoring (i.e. CPU
   temperatures, fan speeds), remote power control, and serial-over-LAN
   (SOL).
   
   What is FreeIPMI?
   
   FreeIPMI provides in-band and out-of-band IPMI software based on the
   IPMI v1.5/2.0 specification.  FreeIPMI provides tools and libraries
   for users to access and read IPMI sensor readings, system event log
   (SEL) entries, serial-over-LAN (SOL), remote power control functions,
   field replaceable unit (FRU) device information, and more.  More
   information about FreeIPMI can be found at the FreeIPMI webpage at:
   
   http://www.gnu.org/software/freeipmi/index.html


TODO smartctl:Edit

   ************************************************************************
   ~# smartctl -d cciss,0 -a /dev/cciss/c0d0
   smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
   Home page is http://smartmontools.sourceforge.net/
   
   Device: HP       DH072ABAA6       Version: HPD7
   Serial number: 3PD19ZMN0000983153B8
   Device type: disk
   Transport protocol: SAS
   Local Time is: Sat Jul 19 20:09:09 2008 CEST
   Device supports SMART and is Enabled
   Temperature Warning Enabled
   SMART Health Status: OK
   
   Current Drive Temperature:     29 C
   Drive Trip Temperature:        68 C
   Elements in grown defect list: 0
   Vendor (Seagate) cache information
     Blocks sent to initiator = 899299930
     Blocks received from initiator = 14843797
     Blocks read from cache and sent to initiator = 3793967485
     Number of read and write commands whose size <= segment size = 48565840
     Number of read and write commands whose size > segment size = 0
   Vendor (Seagate/Hitachi) factory information
     number of hours powered up = 945.00
     number of minutes until next internal SMART test = 7
   
   Error counter log:
              Errors Corrected by           Total   Correction     Gigabytes    Total
                  ECC          rereads/    errors   algorithm      processed    uncorrected
              fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
   read:          0        0         0         0          0          0.000           0
   write:         0        0         0         0          0          0.000           0
   
   Non-medium error count:        0
   No self-tests have been logged
   Long (extended) Self Test duration: 840 seconds [14.0 minutes]
   
   ************************************************************************
   ~# smartctl -d cciss,1 -a /dev/cciss/c0d0
   
   smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
   Home page is http://smartmontools.sourceforge.net/
   
   Device: HP       DH072ABAA6       Version: HPD7
   Serial number: 3PD19ZPV000098315CX2
   Device type: disk
   Transport protocol: SAS
   Local Time is: Sat Jul 19 20:09:12 2008 CEST
   Device supports SMART and is Enabled
   Temperature Warning Enabled
   SMART Health Status: OK
   
   Current Drive Temperature:     30 C
   Drive Trip Temperature:        68 C
   Elements in grown defect list: 0
   Vendor (Seagate) cache information
     Blocks sent to initiator = 920490987
     Blocks received from initiator = 14368268
     Blocks read from cache and sent to initiator = 3755437180
     Number of read and write commands whose size <= segment size = 48820139
     Number of read and write commands whose size > segment size = 0
   Vendor (Seagate/Hitachi) factory information
     number of hours powered up = 945.02
     number of minutes until next internal SMART test = 8
   
   Error counter log:
              Errors Corrected by           Total   Correction     Gigabytes    Total
                  ECC          rereads/    errors   algorithm      processed    uncorrected
              fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
   read:          0        0         0         0          0          0.000           0
   write:         0        0         0         0          0          0.000           0
   
   Non-medium error count:        0
   No self-tests have been logged
   Long (extended) Self Test duration: 840 seconds [14.0 minutes]
   
   ************************************************************************
   ~# smartctl -d cciss,2 -a /dev/cciss/c0d0
   smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
   Home page is http://smartmontools.sourceforge.net/
   
   Device: HP       DH072ABAA6       Version: HPD7
   Serial number: 3PD1A0SD000098300K39
   Device type: disk
   Transport protocol: SAS
   Local Time is: Sat Jul 19 20:09:15 2008 CEST
   Device supports SMART and is Enabled
   Temperature Warning Enabled
   SMART Health Status: OK
   
   Current Drive Temperature:     31 C
   Drive Trip Temperature:        68 C
   Elements in grown defect list: 0
   Vendor (Seagate) cache information
     Blocks sent to initiator = 913141941
     Blocks received from initiator = 11455509
     Blocks read from cache and sent to initiator = 3697098775
     Number of read and write commands whose size <= segment size = 49159966
     Number of read and write commands whose size > segment size = 0
   Vendor (Seagate/Hitachi) factory information
     number of hours powered up = 944.93
     number of minutes until next internal SMART test = 18
   
   Error counter log:
              Errors Corrected by           Total   Correction     Gigabytes    Total
                  ECC          rereads/    errors   algorithm      processed    uncorrected
              fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
   read:          0        0         0         0          0          0.000           0
   write:         0        0         0         0          0          0.000           0
   
   Non-medium error count:        0
   No self-tests have been logged
   Long (extended) Self Test duration: 840 seconds [14.0 minutes]




DELL OMSA monitorizationEdit

Installing OMSA fro hardware monitorization in Dell Servers:

OMSA allows to monitor the health of RAIDs, motherboard/disk/chasis temperature, alarm generation, set/modify BIOS, watch installed devices,

To install under Debian:

1.- Add to /etc/apt/sources.list the next line:

deb ftp://ftp.sara.nl/pub/sara-omsa dell sara

2.- Execute

 apt-get update && apt-get install dellomsa

That install OMSA in /opt/dell.

3.- To boot the system:

~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d -run
~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d -run

OMSA Usage Examples:Edit

To check the health of the disc connected to controller 0:

~# /etc/delloma.d/oma/bin/omreport.sh storage pdisk controller=0

The output will look similar to:

List of Physical Disks on Controller PERC 4e/Di (Embedded)

   Controller PERC 4e/Di (Embedded)
   ID                        : 0:0
   Status                    : Ok
   Name                      : Physical Disk 0:0
   State                     : Online
   Failure Predicted         : No
   Progress                  : Not Applicable
   Type                      : SCSI
   Capacity                  : 68.24 GB (73274490880 bytes)
   Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
   Available RAID Disk Space : 0.00 GB (0 bytes)
   Hot Spare                 : No
   Vendor ID                 : MAXTOR  
   Product ID                : ATLAS10K5_73SCA
   Revision                  : JNZY
   Serial No.                : J20KVCTK
   Negotiated Speed          : 320
   Capable Speed             : 320
   Manufacture Day           : Not Available
   Manufacture Week          : Not Available
   Manufacture Year          : Not Available
   SAS Address               : Not Available

   ID                        : 0:1
   Status                    : Ok
   Name                      : Physical Disk 0:1
   State                     : Online
   Failure Predicted         : No
   Progress                  : Not Applicable
   Type                      : SCSI
   Capacity                  : 68.24 GB (73274490880 bytes)
   Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
   Available RAID Disk Space : 0.00 GB (0 bytes)
   Hot Spare                 : No
   Vendor ID                 : MAXTOR  
   Product ID                : ATLAS10K5_73SCA
   Revision                  : JNZY
   Serial No.                : J20KV5RK
   Negotiated Speed          : 320
   Capable Speed             : 320
   Manufacture Day           : Not Available
   Manufacture Week          : Not Available
   Manufacture Year          : Not Available
   SAS Address               : Not Available

   ID                        : 0:2
   Status                    : Ok
   Name                      : Physical Disk 0:2
   State                     : Online
   Failure Predicted         : No
   Progress                  : Not Applicable
   Type                      : SCSI
   Capacity                  : 68.24 GB (73274490880 bytes)
   Used RAID Disk Space      : 68.24 GB (73274490880 bytes)
   Available RAID Disk Space : 0.00 GB (0 bytes)
   Hot Spare                 : No
   Vendor ID                 : MAXTOR  
   Product ID                : ATLAS10K5_73SCA
   Revision                  : JNZY
   Serial No.                : J20KTS8K
   Negotiated Speed          : 320
   Capable Speed             : 320
   Manufacture Day           : Not Available
   Manufacture Week          : Not Available
   Manufacture Year          : Not Available
   SAS Address               : Not Available


To check the state/configuration of the RAID:

   ~# /etc/delloma.d/oma/bin/omreport.sh storage vdisk controller=0

That will look like:

   Virtual Disk 0 on Controller PERC 4e/Di (Embedded)
   
   Controller PERC 4e/Di (Embedded)
   ID                  : 0
   Status              : Ok
   Name                : Virtual Disk 0
   State               : Ready
   Progress            : Not Applicable
   Layout              : RAID-5
   Size                : 136.48 GB (146548981760 bytes)
   Device Name         : /dev/sda
   Type                : SCSI
   Read Policy         : Adaptive Read Ahead
   Write Policy        : Write Back
   Cache Policy        : Direct I/O
   Stripe Element Size : 64 KB

To get an summary of the server:

   ~# /etc/delloma.d/oma/bin/omreport.sh system summary
   System Summary
   
   ------------------
   Software Profile
   ------------------
   Systems Management
   Name                       : Information not available.
   Version                    : 3.2.0
   Description                : Systems Management Software
   
   Operating System
   Name                       : Linux
   Version                    : Kernel 2.6.18.2 (i686)
   System Time                : Sun Nov 25 18:30:37 2007
   System Bootup Time         : Fri Oct 12 15:20:31 2007
   
   --------
   System
   --------
   System
   Host Name                  : MySuperServidor
   System Location            : Please set the value
   
   ---------------------
   Main System Chassis
   ---------------------
   Chassis Information
   Chassis Model              : PowerEdge 2850
   Chassis Service Tag        :
   Chassis Lock               : Present
   Chassis Asset Tag          :
   
   Processor 1
   Processor Manufacturer     : Intel
   Processor Family           : Xeon
   Processor Version          : Model 4 Stepping 3
   Current Speed              : 3200 MHz
   Maximum Speed              : 3600 MHz
   External Clock Speed       : 800 MHz
   Voltage                    : 1400 mV
   
   Processor 2
   Processor Manufacturer     : Intel
   Processor Family           : Xeon
   Processor Version          : Model 4 Stepping 3
   Current Speed              : 3200 MHz
   Maximum Speed              : 3600 MHz
   External Clock Speed       : 800 MHz
   Voltage                    : 1400 mV
   
   Memory
   Total Installed Capacity   : 2048 MB
   Memory Available to the OS : 2023 MB
   Total Maximum Capacity     : 16384 MB
   Memory Array Count         : 1
   
   Memory Array 1
   Location                   : System Board or Motherboard
   Use                        : System Memory
   Installed Capacity         : 2048 MB
   Maximum Capacity           : 16384 MB
   Slots Available            : 6
   Slots Used                 : 2
   ECC Type                   : Multibit ECC
   
   Slot PCI1
   Adapter                    : [Not Occupied]
   Type                       : PCI X
   Data Bus Width             : 64 Bits
   Speed                      : 133 MHz
   Slot Length                : Long
   Voltage Supply             : 3.3 Volts
   
   Slot PCI2
   Adapter                    : [Not Occupied]
   Type                       : PCI X
   Data Bus Width             : 64 Bits
   Speed                      : 133 MHz
   Slot Length                : Long
   Voltage Supply             : 3.3 Volts
   
   Slot PCI3
   Adapter                    : PRO/100 S Server Adapter
   Type                       : PCI X
   Data Bus Width             : 64 Bits
   Speed                      : 133 MHz
   Slot Length                : Short
   Voltage Supply             : 3.3 Volts
   
   BIOS Information
   Manufacturer               : Dell Inc.
   Version                    : A04
   Release Date               : 09/22/2005
   
   --------------
   Network Data
   --------------
   IP Address Data
   IP Address 0               : 192.168.2.2
   IP Address 1               : 192.168.0.115
   
   --------------------
   Storage Enclosures
   --------------------
   Storage Enclosures
   Name                       : Backplane
   Service Tag                : 62P00P8


TODO logwatchEdit

SOFTWARE MONITORINGEdit

TODO: MonitEdit

Table of monitoring toolsEdit

Sintaxis Brief explanation
top Allows to watch and administer running processes (usefull to kill processes).
press 'q' to quit, 'k' to kill a process,
htop Similar to top, but with a more friendly menu based user interface.
lsof Shows which processes are "touching" a file or directory and also the set of files beeing accesed by a process (that includes too any network socket, pipe or device).
netstat Provides stats and reports for network usage and conections (established and listenning connections)
vmstat Provides stats about the memory ussage
iostat Provides stats about reads/writes to external devices
inotifywatch
inotifywait
Moderm Linux kernels allow to notify processes (user applications) any access or change to a file instantaneously. 'inotifywatch' and 'inotifywait' commands allows to wait for new events from the kernel notifying anything related to a set of files/directories.
strace -p <pid>
Allows to monitor system calls (calls to the services offered by the kernel) from a user aplication.
stap Allows to monitor the kernel in real time and with high detail. A tutorial can be read here
oprofile and perfmon2 allow access to hardware performance counters; A tutorial can be browsed here
AMD CodeAnaylist front-end graphical user interface to Oprofile. An introduction/tutorial can browsed here and here
Intel VTune Allows for Performance tuning in Intel hardware
Last modified on 3 November 2010, at 15:33