System Monitoring with Xymon/Other Docs/FAQ

Introductory FAQ

edit

Q. What is hobbit ?

edit

A. Hobbit was the old name used by Xymon System Monitoring tool. http://www.hswn.dk/hobbit/help/about.html

Q. What is Xymon ?

edit

Q. Why should I use Xymon instead of BB ?

edit

User FAQ

edit

When is the next version going to be released ?

edit

A. Hobbit is a FOSS project by hobbit developers who contribute to the project in their spare time. It is under active development but a release date for the next version has not been announced. The current development snapshot is available for download from sourceforge if required. If you are chasing a specific bug fix or feature not currently available within the existing production release, please search the archive and/or post a query to the general discussion mailing list to see if there is a solution available. You can subscribe to the list by sending an e-mail to hobbit-subscribe@hswn.dk. A searchable archive of the list is available at http://www.hswn.dk/hobbiton/

Please subscribe to the hobbit announcement list to be notified of the release of new versions. You can subscribe to the list by sending an e-mail to hobbit-announce-subscribe@hswn.dk.

There is another answer for this question at http://www.hswn.dk/hobbiton/2008/02/msg00227.html

Where can I find more tests ?

edit

A. Deadcat http://www.deadcat.net Deadcat is mostly Big Brother scripts, but most of them will work on Hobbit without too much fuss.

The Shire Project (Xymonton) https://wiki.xymonton.org/doku.php Shire Project is specific to Xymon(Better than Deadcat)

How do I put duplicate hosts in the bb-hosts file ?

edit

A. The first occurrence of the host is as normal. All further occurrences should be

  
0.0.0.0 hostname # noconn 

How does the conn test work if the first ping doesn't respond ?

edit

A. bbtest-net calls either hobbitping or fping (configurable is FPING= in hobbitserver.cfg)

How many pings are sent simultaneously ?

edit

A. Exactly how many parallel connections are being used depends on your operating system -

   the default is FD_SETSIZE/4, which amounts to 256 on many Unix systems
   You can choose the number of concurrent connections with the "--concurrency=N" option to bbtest-net.

How many packets are sent each time ?

edit

A. hobbitping normally stops pinging a host after receiving a single response, and uses that to determine the round-trip time.

What is the ping time-out ?

edit

A The default timeout is 5 seconds.

Can we configure it?

edit

A. Yes. The—timeout=N parameter to hobbitping allows you to set the timeout to N seconds.

What do the conn test (ping) colours mean?

edit

A. Green is an OK response, yellow with packet loss (not delays, just lost packets) and red when there is no response.

How do I get the hobbit client to download etc/hobbitclient.cfg from the server (or other files in etc or ext), then restart?

edit

A. Why? If the clients are set up to use server-side config, they don't even need the /etc/*.cfg files locally...

Is there a way within Hobbit of creating a graph with data from multiple RRD files ?

edit

A. If you want to create a graph from multiple rrds, I would suggest using the 3rd party tool called drraw http://web.taranis.org/drraw/

How do I disable the REPEAT alerts ?

edit

A. Leaving off the REPEAT= defaults to repeat alerts every 30 minutes. Setting REPEAT=0 defaults to every minute.

  To disable repeat alerts, set REPEAT=365d 
  If your problem lasts that long, you have bigger problems than too many alerts.

The data being sent from my clients is being truncated.

edit

A. The maximum message size can be set in hobbitserver.cfg

  MAXMSG_CLIENT= 
  MAXMSG_DATA= 
  MAXMSG_STATUS= 
  Change the above parameters to suit your needs.

Do I need to restart the Hobbit server after editing the config files ?

edit

A. No. All files are reread every 5 minutes.

  The only exception to this is hobbitserver.cfg Changes to this file require a restart.

Do I need to restart my Hobbit client after making a change ?

edit

A. No. All client config files are reread periodically

How do I monitor a web page ?

edit

A. Add the following line to the bb-hosts file

  ip.ad.dr.ress www.example.com # http://www.example.com/ 

or for https

  ip.ad.dr.ress www.example.com # https://user:passwd@www.example.com/ 

Note that there are also content checks and post and browser options available. See the "HTTP TESTS" section of the bb-hosts(5) manual for details.

My test only runs every hour. How do I get it to not go purple after 30 minutes ?

edit

A. Sometimes you want something monitored/tested only once each hour (remote Internet site), once each day (backup job or DNS update), or even less often. Or perhaps more frequently (a critical connection once per minute or less).

There is a LIFETIME option when you call bb on the client to report status. It defines how long a status message is considered valid (i.e. fresh and not yet stale). Search "LIFETIME" on the bb man page for a bit more info. Syntax is like the following.

$BB $BBDISP "status+4h $MACHINE.$COLUMN $COLOR `date`

The defaults are 5 minute testing intervals and 30 minute status lifetimes, so 5 misses are accepted by default. The lifetime should be adjusted for the acceptable number of misses, otherwise false positives might occur. For example, if the host goes completely offline for 2 hours, or twice coincidentally overlapping each hourly reporting period, an hourly test probably doesn't need immediate debugging. Alternatively, for something being tested every minute, a lifetime of 30 minutes should probably be adjusted downward.

A couple of sample scenarios:

  • If a script reports backup status only when a backup is finished, and the host or display is not communicative at that exact time, purple might or might not be desirable for that host's backup status in that NOC.
  • If a script checks the size of each subdirectory within a parent directory, and the status is more intended for graphing than for green/yellow/red reporting, many misses may be acceptable.

Warning: The man page suggests "sligtly more than the interval between your tests". But if that was actually followed then the default timeout would be 6 minutes. It isn't 6, and it's not designed to be easily changed from 30: Instead of being an option in a configuration file, it's hardcoded in the hobbitd/hobbitd.c file in the handle_status function:

int validity = 30;

Does Hobbit encrypt its transmissions ?

edit

A. Natively, no. Some people have reported success using external encryption.

  Later versions might support data encryption.

How do I monitor files? I have added FILE /path/to/my/file to hobbitclient.cfg but still nothing happens.

edit

A.

You also need to add monitored files to client-local.cfg (This tells the client to send the file metadata to the server) The server then uses the metadata and the config in hobbitclient.cfg to determine test results. Also make sure hobbit has read access to the files.

How do I monitor log files other than the default ones?

edit

Answer 1

"Kauffman, Tom" <KauffmanT@nibco.com> 
 
> On your hobbit server -
> 
> 1) set up etc/client-local.cfg to reference the logs you want AND any exclusions. For AIX, I have:
> [aix]
> log:/var/log/syslog:10240
>         ignore 3004-004
>         ignore 3004-035
>         ignore 3004
> log:/var/log/console.log:10240
> log:/var/log/dsmsched.log:10240
> 
> 2) set up etc/hobbit-clients.cfg to create your alerting criteria. For AIX, I've got these set:
> HOST=%.*
>         LOG /var/log/syslog %.*crit.su.*to.root red
>         LOG /var/log/syslog %.*crit.su   yellow
>         LOG %/var/(adm|log)/console.log %.*not.responding.still.trying yellow
> 
> 
> Change client-local.cfg first. Allow 15 to 20 minutes for this to propagate to the client;
> look for a file called logfetch.<hostname>.cfg in client/tmp. This should match your entries in
> client-local.cfg.
> 
> Once the logs started to coming in, play with the client-local.cfg and a test system, to track
> what you're interested in.
> 
> Tom

Answer 2. I have added LOG /path/to/my/logfile WARNING COLOR=yellow to hobbitclient.cfg but still nothing happens.

You also need to add monitored log files to client-local.cfg. This tells the client to send the log file to the server because Hobbit messages protocol is bi-directional, not just hobbit client sending message to server. hobbit server can actually instruct hobbit client to send in log files that are not default ones.

  The server then uses the sent data and the config in hobbitclient.cfg to determine test results. 
  Also make sure hobbit has read access to the log.

An example client-local.cfg file.


# following are by OS type to ask hobbit clients send in messages file.
[sunos]
log:/var/adm/messages:10240

[osf1]
log:/var/adm/messages:10240

[aix]
log:/var/adm/syslog/syslog.log:10240

[hp-ux]
log:/var/adm/syslog/syslog.log:10240

[win32]

[freebsd]
log:/var/log/messages:10240

[netbsd]
log:/var/log/messages:10240

[openbsd]
log:/var/log/messages:10240

[linux]
log:/var/log/messages:10240
dir:/tmp
ignore MARK

[linux22]
log:/var/log/messages:10240
ignore MARK

[redhat]
log:/var/log/messages:10240
ignore MARK

[debian]
log:/var/log/messages:10240
ignore MARK

[suse]
log:/var/log/messages:10240
ignore MARK

[mandrake]
log:/var/log/messages:10240
ignore MARK

[redhatAS]
log:/var/log/messages:10240
ignore MARK

[redhatES]
log:/var/log/messages:10240
ignore MARK

[rhel3]
log:/var/log/messages:10240
ignore MARK

[irix]
log:/var/adm/SYSLOG:10240

[darwin]
log:/var/log/system.log:10240

[sco_sv]
log:/var/adm/syslog:10240

# following are by machine names to ask hobbit clients send in messages file.
[caoffice2435.mainoffice.test.com]
log:/var/adm/messages:10240

[caoffice2436.mainoffice.test.com]
log:/var/adm/messages:10240

[caoffice2437.mainoffice.test.com]
log:/var/adm/messages:10240

[caoffice2444.mainoffice.test.com]
log:/var/adm/messages:10240

[caoffice2445.mainoffice.test.com]
log:/var/adm/messages:10240

[caoffice2141.comm.test.com]
# Solaris 10 OS log
#   "log:FILENAME:MAXDATA"
log:/var/adm/messages:10240
ignore MARK
# hobbit server logs
log:/var/opt/hobbitserver42/log/acknowledge.log:10240
log:/var/opt/hobbitserver42/log/bb-display.log:10240
log:/var/opt/hobbitserver42/log/bb-network.log:10240
log:/var/opt/hobbitserver42/log/bb-retest.log:10240
log:/var/opt/hobbitserver42/log/bbcombotest.log:10240
log:/var/opt/hobbitserver42/log/cgierror.log:10240
log:/var/opt/hobbitserver42/log/clientdata.log:10240
log:/var/opt/hobbitserver42/log/history.log:10240
log:/var/opt/hobbitserver42/log/hobbitd.log:10240
log:/var/opt/hobbitserver42/log/hobbitlaunch.log:10240
log:/var/opt/hobbitserver42/log/hostdata.log:10240
log:/var/opt/hobbitserver42/log/il02bbhostsallinone.ksh.log:10240
log:/var/opt/hobbitserver42/log/notifications.log:10240
log:/var/opt/hobbitserver42/log/page.log:10240
log:/var/opt/hobbitserver42/log/rrd-data.log:10240
log:/var/opt/hobbitserver42/log/rrd-status.log:10240
log:/var/opt/hobbitserver42/log/runwebalizer.log:10240

# httpd server logs
log:/var/opt/httpd222/log/access_log:102400
log:/var/opt/httpd222/log/error_log:102400

# httpd server logs
log:/var/log/maillog:102400

What's the meaning of the track alert-mail-number in the subject of hobbit alert emails ?

edit

A. The number is the ack-code you can use for acknowledging the alert.

  They are random numbers generated for each alert.

How do I restrict access to Hobbit pages to specific people or groups ?

edit

A. Apache has its own authentication. Use it.

  To give one group access to some info, and another group access to other info, use the PAGE
  statement in bb-hosts. This will create a new directory for each page, which can be controlled
  by Apache's authentication system.

I am having a problem with devmon ....

edit

A. Please post devmon related questions to the devmon support mailing list.

How do I create custom scripts and graphs ?

edit

A. http://xymonton.org/tutorials:customgraph

edit

A. This is typical for tables. One must append the column name to the --multigraphs setting.

  1. Look in the source file web/hobbitsvc.c.
    • Find the multigraphs assignment and note its value.
  2. Edit the configuration file server/etc/hobbitcgi.cfg:
    1. Find the CGI_SVC_OPTS assignment.
    2. Add a --multigraphs= value to the assigned string, including the columns that tend to have tabular data. For example:
      CGI_SVC_OPTS="--env=/home/hobbit/server/etc/hobbitserver.cfg --no-svcid --history=top --multigraphs=disk,if_load,if_dsc"
      (Note that, while the setting in hobbitsvc.c starts and ends with a comma, the setting in hobbitcgi.cfg does not.)
  3. Then try out the change (refresh the web page).

More details on --multigraphs is in the hobbitsvc.cgi documentation. (man hobbitsvc.cgi)

I don't want to display column foo in my display. How do I do that ?

edit

A. Add the entry NOCOLUMNS:foo,bar to hide column foo and bar.

How do I check to ensure something is not running ?

edit

A. In bb-hosts

1.2.3.4 my.host.com # !ftp 

This will cause the test to go red if FTP is running

I want to monitor Windows servers. How do I do that ?

edit

If you have to, use BBWin as your client. http://bbwin.sourceforge.net/

What are the translations between BBWin's XML and Central Mode?

edit

(Note: untested but hopefully accurate)

BBWin approximation hobbit-clients.cfg
<uptime>
  <setting name="delay" value="(bootlimit)" alarmcolor="yellow" />
  <setting name="maxdelay" value="(toolonglimit)" alarmcolor="yellow" />
</uptime>
UP bootlimit toolonglimit
<cpu>
  <setting name="default" warnlevel="(warnlevel)" paniclevel="(paniclevel)" />
</cpu>
LOAD warnlevel paniclevel
N/A CLOCK maximum-offset
<disk>
  <setting name="(drive)" warnlevel="(warn)" paniclevel="(panic)" />
  <setting name="(drive)" ignore="true" />
</disk>
DISK drive warn panic

DISK drive IGNORE

<memory>
  <setting name="physical" warnlevel="(warn)" paniclevel="(panic)" />
  <setting name="virtual" warnlevel="(warn)" paniclevel="(panic)" />
  <setting name="page" warnlevel="(warn)" paniclevel="(panic)" />
</memory>
MEMPHYS warn panic

MEMACT warn panic
MEMSWAP warn panic

<procs>
  <setting name="(processname)" rule="-=3" alarmcolor="(color)" comment="(text)" 
  <setting name="(processname)" rule="+=4" alarmcolor="(color)" comment="(text)" 
</procs>
PROC processname 0 3 color TEXT=text

PROC processname 4 -1 color TEXT=text

N/A FILE filename color
(see fsmon.cfg) DIR directory color SIZE<maxsize SIZE>minsize
<stats>
</stats>
PORT
<svcs>
  <setting name="(svcname)" value="(status") autoreset="(startup)" alarmcolor="(color)">
</svcs>
SVC svcname startup status color
<who>
</who>
N/A

(Note: not completely accurate but good starting point)

BBWin client-local.cfg
<msgs>
  <match logfile="System">
  <ignore logfile="System" value="(text string)">
</msgs>
eventlog:System

ignore text string

I am having a problem with bbwin on my Windows......

edit

A. Please post bbwin issues to the bbwin forum

The BBWIN client works well. I've re-used a script that funcioba before Big Brother. The result is left in C: \ BBWin \ logs.

Everything else is reported to the server but not external.

<?xml version="1.0" encoding="utf-8"?> <configuration>

 <bbwin>
   <setting name="bbdisplay" value="192.168.1.132" />
   <setting name="mode" value="local" />
   <setting name="configclass" value="win32" />
   <setting name="autoreload" value="true" />
   <setting name="timer" value="5m" />
   <load name="cpu" value="cpu.dll" />
   <load name="disk" value="disk.dll" />
   <load name="externals" value="externals.dll" />
   <load name="filesystem" value="filesystem.dll" />
   <load name="memory" value="memory.dll" />
   <load name="msgs" value="msgs.dll" />
   <load name="procs" value="procs.dll" />
   <load name="stats" value="stats.dll" />
   <load name="svcs" value="svcs.dll" />
   <load name="uptime" value="uptime.dll" />
   <load name="who" value="who.dll" />
   <setting name="loglevel" value="3" />
   <setting name="logpath" value="C:\BBWin\logs\BBWin.log" />
   <setting name="logreportfailure" value="false" />
   <setting name="hostname" value="vs3k-gap" />
 </bbwin>
 <cpu>
   <setting name="alwaysgreen" value="false" />
   <setting name="default" warnlevel="90" paniclevel="95" delay="3" />
 </cpu>
 <disk>
   <setting name="alwaysgreen" value="false" />
   <setting name="default" warnlevel="85%" paniclevel="95%" />
   <setting name="remote" value="false" />
   <setting name="cdrom" value="false" />
 </disk>
 <externals>
   <setting name="timer" value="1m" />
   <setting name="logstimer" value="60s" />
   <load value="C:\BBWin\ext\sqlv.cmd" timer="1m" />
 </externals>
 <memory>
   <setting name="alwaysgreen" value="false" />
   <setting name="physical" warnlevel="78" paniclevel="98" />
   <setting name="page" warnlevel="70" paniclevel="90" />
   <setting name="virtual" warnlevel="78" paniclevel="90" />
 </memory>
 <msgs>
   <setting name="alwaysgreen" value="false" />
   <setting name="delay" value="1h" />
   <match logfile="System" type="error" alarmcolor="red" />
   <match logfile="System" type="warning" alarmcolor="yellow" />
   <match logfile="Application" type="error" alarmcolor="red" />
   <match logfile="Application" type="warning" alarmcolor="yellow" />
   <match logfile="Security" type="fail" />
 </msgs>
 <procs>
 </procs>
 <svcs>
   <setting name="alwaysgreen" value="false" />
   <setting name="autoreset" value="false" />
   <setting name="alarmcolor" value="yellow" />
   <setting name="Windows Time" value="started" autoreset="true" alarmcolor="red" />
 </svcs>
 <uptime>
   <setting name="delay" value="30m" />
   <setting name="maxdelay" value="365d" />
 </uptime>

</configuration>

File script generate out:

OK

echo green ***Error en Chequeo SQLV, Servidor:%computername% >> C:\BBWin\logs\sqlv echo ^&green ***Error en Chequeo SQLV, Servidor:%computername% >> C:\BBWin\logs\sqlv

BAD

echo red ***OK en Chequeo SQLV, Servidor:%computername% >> C:\BBWin\logs\sqlv echo ^&red ***OK en Chequeo SQLV, Servidor:%computername% >> C:\BBWin\logs\sqlv

I set up my client, but nothing is appearing on the status page ?

edit

A. Check the ghost client reports or the hobbitd status page. It could be misconfigured.

We are changing the name of a host, and want to keep monitoring it, and keep the history. Is there a way ?

edit

A. Check the Hobbit Tips & Trick page in help menu ~/server/bin/bb 127.0.0.1 "rename OLDHOSTNAME NEWHOSTNAME"

I don't want to use rrdtool to create data averages and round-robin the data, I want to keep all data forever. Can I do it ?

edit

A. There's a method of doing this in the current snapshot, including a new hobbitd_rrd manpage that describes how to run it,

  and what the input to your custom script looks like. 
  The option is—processor=COMMAND 
  It will feed the raw data via stdio into COMMAND 
  COMMAND can then process the data into another storage system.

I don't have a compiler on my AIX system. Where can I get a precompiled Hobbit client ?

edit

A. http://www.docum.org/twiki/bin/view/Hobbit/HobbitClients

How do I unsubscribe from the Hobbit mailing list ?

edit

A. If you must go, send an e-mail to hobbit-unsubscribe@hswn.dk

How do I add a hobbit search engine plugin into Internet Explorer 7 ?

edit

'A. Await contribution.

How do I add a hobbit search engine plugin into FireFox 2 ?

edit

A. Copy the following file and save it as "c:\Program Files\Mozilla Firefox\searchplugins\hobbit.xml" (this is the OpenSearch description format which is compatible both for FireFox 2 and Internet Explorer 7). Adjust the url address for your site accordingly.

hobbit.xml

Then restart Firefox. You should see a hobbit blue smile icon shown up in search bar. References:

Is it possible to disable and acknowledge alerts via email, how?

edit

Include an IGNORE exception with hosts or alert that you wish to ignore:

  HOST=* SERVICE=*
     IGNORE HOST=%[a-z]{3}[0-9]{4} 
     MAIL admin@foo.com

or perhaps more specific:

  HOST=* COLOR=red
     IGNORE HOST=marketing.foo.com SERVICE=cpu TIME=4:1500:1800
     MAIL admin@foo.com

and check configuration:

  $ cd ~/server
  $ ./bin/bbcmd hobbitd_alert—test tic0102 comm
  00026606 2012-03-01 10:52:51 *** Match with 'IGNORE HOST=%[a-z]{3}[0-9]{4}' ***
  00026606 2012-03-01 10:52:51 IGNORE rule found

Can I configure a maximum time limit a alert can be acknowledged for ?

edit

A.

Can I restrict what hosts can be disabled on the enable/disable page ?

edit

A.

Can I enable/disable on only one display server and have it show on both?

edit

Typical installations with dual display servers have them configured to act independently. Each Xymon display server lists only itself in the XYMSERVERS setting in xymonserver.cfg, so usually, the enable/disable page only applies to the server that served the form.

However, the XYMSERVERS value can be overridden for each CGI script, and the enable/disable form will send updates to all Xymon servers defined in XYMSERVERS. To override for enable/disable, perform the following.

Create the file xymonserver-enadis.cfg containing:

 include /etc/xymon/xymonserver.cfg
 XYMSERVERS="display1 display2"  # replace with IP addresses of Xymon servers

Then edit cgioptions.cfg and add this line:

 XYMONENV_ENADIS=/usr/lib/xymon/server/etc/xymonserver-enadis.cfg

and modify CGI_ENADIS_OPTS to reference the new variable:

 CGI_ENADIS_OPTS="--env=$XYMONENV_ENADIS"

Can the SMS alert format be reconfigured to display more or less information ?

edit

A. Not directly, but an alert script (smsplus) has been written to provide more information than the default SMS output does. The script can be easily modified using the following environment variables (from the hobbit documentation):

Name Description
BBCOLORLEVEL The current color of the status
BBALPHAMSG The full text of the status log triggering the alert
ACKCODE The "cookie" that can be used to acknowledge the alert
RCPT The recipient, from the SCRIPT entry
BBHOSTNAME The name of the host that the alert is about
MACHIP The IP-address of the host that has a problem
BBSVCNAME The name of the service that the alert is about
BBSVCNUM The numeric code for the service. From SVCCODES definition.
BBHOSTSVC HOSTNAME.SERVICE that the alert is about.
BBHOSTSVCCOMMAS As BBHOSTSVC, but dots in the hostname replaced with commas
BBNUMERIC A 22-digit number made by BBSVCNUM, MACHIP and ACKCODE.
RECOVERED Is "1" if the service has recovered.
DOWNSECS Number of seconds the service has been down.
DOWNSECSMSG When recovered, holds the text "Event duration : N" where N is the DOWNSECS value.

Q. How do I enable SNMP monitoring with Hobbit Server?

edit

A 1. http://cerebro.victoriacollege.edu/hobbit-trap.html

A 2. http://devmon.sourceforge.net

Q. How to configure multiple yellow to red alert ?

edit
On Wed, Apr 18, 2007 at 04:11:13PM -0400, Galen Johnson wrote:

> I'll admit I haven't put a lot of legwork into this but...is it 
> possible to configure hobbit to go red on a test after a certain 
> number of cycles at yellow?  I have some tests that I don't mind 
> if they are yellow for small period but if they are there too long 
> I need to know.

A. Use the "badTEST" setting in bb-hosts (see the man-page). This delays a yellow or red status from appearing until it has stayed yellow (or red) for a number of test cycles. So you could use this to suppress the yellow status until it had been yellow for some time, so when it does turn yellow you know this is something you have to handle.

Or if your custom test reported a "red" status, you could make it go yellow for the first 5 test cycles, and red after that.

Henrik

Note: Currently, it only work for ping test.

Q. How do I use hobbit client with BB server ?

edit

A.

On Wed, Sep 06, 2006 at 01:48:02PM -0500, Rich Smrcina wrote:
> Is the new Hobbit client compatible with the old BigBrother server?
> BigBrother is run by a different part of the organization and I may not 
> be able to get them to change to Hobbit, but for my Linux guests and my 
> z/VM systems, I would be interested in converting to the new Hobbit code.

In the default mode, you cannot use the Hobbit client to report to a Big
Brother system. No data would ever show up, because a Big Brother server
doesn't know how to feed the client data through the hobbitd_client
module, which takes care of converting the client data into status
columns.

However, you *can* run the Hobbit client in the local-configuration
mode. When the configure script asks
   Server side client configuration, or client side [server] ?
answer "client", and the launch the client with the "--local" option.

In this mode, the client sends normal "status" messages to the Hobbit/BB
server. I'm not sure if alerts will work, though, since the Hobbit
client doesn't generate the "page" messages that the BB server expects
to trigger sending out alerts. (Hobbit ignores these messages
completely, so it did seem like a waste of time to generate them).

Note that this isn't really described very well anywhere. It means you
will have to maintain the client configuration on the client, not on the
Hobbit server.

Regards,
Henrik

Q. How to enable fping as non-root user in Solaris 10 ?

edit
The problem
bash-3.00$ more bb-network.log
2006-08-29 17:31:02 Execution of 'hobbitping -Ae' failed - program not suid root?
2006-08-29 17:31:02 2006-08-29 17:31:02 Cannot get RAW socket: Permission denied
bash-3.00$
the fix
There are 3 files that need to be updated so that fping can be executed as root (uid=0)
on Solaris by a named user, in this case the hobbit user.

The three files in question are,
/etc/user_attr
/etc/security/exec_attr
/etc/security/prof_attr

These have been updated with the following lines

In /etc/user_attr:
hobbit::::profiles=Hobbit Commands

In /etc/security/exec_attr:
Hobbit Commands:solaris:cmd:::/usr/local/hobbit/server/bin/hobbitping:uid=0

In /etc/security/prof_attr:
Hobbit Commands:::Hobbit Commands:

Regards,

Mike Rowell, edited by T.J. Yang
References: http://docs.sun.com/app/docs/doc/816-4557/6maosrjfc?a=view

Q. Can you use a hobbit server, but keep your bb clients?

edit

A. Yes. Hobbit is 100% BB client compatible. but beware that after bb 1.9c, license become more restrictive, you need to pay for per-seat license.

Q. What is Big Sister ?

edit

Q. Is there a quick overview of System Monitoring Tools ?

edit

Q. What features should a monitoring system have ?

edit

Q. How safe is it to migrate to Hobbit from BB ?

edit
-----Original Message-----
From: Henrik Stoerner [mailto:henrik@hswn.dk] 
Sent: Tuesday, August 01, 2006 5:13 PM
To: hobbit@hswn.dk
Subject: Re: [hobbit] Hobbit newbie from BB: differences and what may I
lose from migrating?

Hi Jordan,

I'll try to answer your questions. Since I also develop Hobbit I am
probably slightly biased when it comes to the "is-this-more-difficult-
to-do-than-with-BB" type of questions, but I am sure others will
voice their opinions on that.

On Tue, Aug 01, 2006 at 12:36:29PM -0700, Jordan Mendler wrote:
> 
> First, after reading through whatever I could find on the website I am
> still a little bit confused about configuration and setup. With BB,
> you install and configure each client and server on the local machine,
> except for the universal bb-hosts. Is this the same on Hobbit, or does
> Hobbit use a central configuration file that is modified only on the
> server to configure clients? I am trying to figure out the difference
> between installing, maintaining and configuring BB and Hobbit setups.

First, let me stress that Hobbit is fully compatible with your existing
BB clients. You can keep your current client setup and just switch to
Hobbit on the server side, and all of your clients will continue to 
work as they do with BB as the server. So you can migrate the server
side first, and then migrate clients when you find that it is convenient
to do so - or you want to take advantage of some of the new stuff that
is in Hobbit.

The Hobbit client configuration is maintained on the Hobbit server. 
Clients in Hobbit are designed to be *really* dumb; they just collect
data, and all of the configuration of what to monitor, what thresholds
to use for e.g. disk utilization and so on is configured only on the
Hobbit server.

This is a major difference between Hobbit and BB. With BB you have
delegated the client administration to whoever manages each server.
Hobbit centralizes the monitoring configuration, so you will probably
have a group of people who take more control of the monitoring setup.

> Hobbit looks a lot more complex to setup, but once I get my feet wet is
> it any harder than BB?

I think it is easier, once you get used to the Hobbit way of doing
things. But as I said, I am biased.

> Second is performance. I know this list may be biased toward Hobbit,
> but is it actually faster? We have about 50-100 clients on BB and I did
> not notice any performance issues.

With that number of systems monitored, you probably will not see a huge
difference. BB works quite well for a small number of systems, but when
you move beyond a couple of hundred boxes the overhead of generating 
webpages through shell scripts becomes very noticeable. On my setup,
the servers were simply choking on the disk I/O caused by BB saving
every status in a separate file, and from the huge number of small
cut-grep-awk-sed etc. commands that ran to generate webpages.

> Hobbit looks like it is very complex, so does this mean it uses a lot 
> of resources on the client and server? What speed/ram server is
> usually the minimum recommended for a dedicated Hobbit server? Would
> something like a dual Pentium II 266mhz have any performance issues 
> as a server, if it does nothing else? What about for clients? We have
> still have some testing, stating and production servers left that are
> singe chip Pentium III 700-850 mhz, and even a couple Pentium II's. 
> Just need to make sure all the resources used for things like graphs
> are taken from the server and not each client.

The Hobbit server uses fewer resources than the BB server. The main
resource usage is memory; Hobbit keeps everything in memory except 
the history logs and the RRD files used for graphs. That doesn't mean
a whole lot, though: Here's a ps listing of the Hobbit processes running

on my main monitoring system - it handles about 2500 hosts:

$ ps vax|cut -c1-100|egrep "PID|hobbit"
  PID TTY      STAT   TIME  MAJFL   TRS   DRS  RSS %MEM COMMAND
  732 ?        Ss     1:24      0   101  1802  696  0.0 hobbitlaunch
  735 ?        S    2434:37     1   162 31357 29784  2.8 hobbitd
 1470 ?        S     14:50      0    99  2332 1088  0.1 hobbitd_channel --channel=stachg
 1471 ?        S     25:18      0   108  2515 1048  0.1 hobbitd_history
 1472 ?        S    964:26      0    99  2332 1264  0.1 hobbitd_channel --channel=page
 1473 ?        S    1227:34     0   154  5661 3912  0.3 hobbitd_alert
 1474 ?        S    4090:05     0    99  2332 1264  0.1 hobbitd_channel --channel=status
 1475 ?        D    2962:15     0   178  7381 4392  0.4 hobbitd_rrd
 1476 ?        S    259:55      0    99  2332 1208  0.1 hobbitd_channel --channel=data
 1477 ?        S    494:13      0   178  5141 2128  0.2 hobbitd_rrd
 1478 ?        S    126:20      0    99  2844 1832  0.1 hobbitd_channel --channel=client
 1480 ?        S    291:20      0   146  4485 2792  0.2 hobbitd_client
 5552 ?        S      0:00      0   669  2002 1352  0.1 sh -c vmstat 300 2 1>/usr/lib/hobbit/client/

As you can see, the biggest chunk of memory goes to the "hobbitd"
process which is the one that keeps all state information. It's
currently using some 31 MB of memory. (This box has 1 GB RAM).

A rough estimate of how much memory Hobbit needs would be the size of
your bbvar/logs/ directory, plus 30 MB.

As for CPU usage, your PII/266 should be adequate for 50-100 servers.
The box I'm running on is an old (7-8 years) Solaris server with a 
900 MHz UltraSparc II processor. That's roughly comparable to a PII
running at 1.2 GHz. And it handles 25 times as many hosts as you are
aiming for.

> Third is plugins. Are BB plugins compatible with Hobbit?

Yes.

> Also how hard are plugins to write for Hobbit?

Plugins that run on the monitored client systems are as easy to write
as for BB, since it is basically the same thing.

Hobbit also allows you to write plugins for the Hobbit server, which
receive events from the Hobbit server daemon. This is used by the 
core Hobbit tools - e.g. the hobbitd_rrd processes you see in the
ps-listing above are a plugin that handle updating of the RRD files
from the status- and data-messages that are sent to Hobbit. There
aren't any third-party plugins that use this yet (at least, I 
don't know of any), but writing them is fairly simple since it 
basically involves reading data from a pipe and processing it in
whatever way you want.

> I don't know if these even exist for bb, but I ultimately would 
> like to integrate plugins that 1) monitor legato tape backup,

Don't know about this.

> 2) run nmap to see what ports are open/can be seen from an external 
> machine,

The Hobbit client in version 4.2 (about to be released soon) reports
details about the network services running on a host. So you can check
for which ports are open/listening for connections, and trigger alerts
if any unwanted ports show up.

> 3) run 'lshw -html' to show a list of all the hardware on the system,

This would typically be a client-side test.

> 4) monitor uptime,

This is standard.

> 5) monitor OS and kernel versions (uname -a and head -n 1 /etc/issue),

This data is collected by the Hobbit client.

> 6) maybe some more router/network monitoring stuff and

Hobbit comes with built-in network service monitoring. There is also
an SNMP add-on which can be used for monitoring devices such as routers.

> Fourth is relay. By this I mean monitoring systems on a private
> subnetwork that are only accessible to the Hobbit server by going
> through an intermediate server. Is this possible with Hobbit and is it
> any more difficult to do than on BB?

Two ways of doing that. First, there is a proxy utility which is used
to forward Hobbit messages from one network to another. This is used if
your client systems on the private subnet are allowed to make outgoing
connections to the proxy, and the proxy can connect to the real Hobbit
server.

Second, Hobbit 4.2 includes a set of tools where it's the server that
contacts clients to pick up the data they have collected (i.e. the
traffic is initiated by the server, where the normal BB setup is for 
the client to initiate the connection). Useful for DMZ style setups
where clients are not allowed to generate outbound connections.

> Fifth is portability. BB is very portable, I can make a 'model' client
> for say Red Hat and tar it and distribute it very easily to every
> server I have using only a few commands. Is Hobbit the same, or are there
> client dependencies or other things that may make this more difficult.

The Hobbit client uses only the system libraries and standard utilities 
found on your client systems. You will need at least one system where
you can compile the client binaries (that's similar to the BB
requirements), since a few of the client-side tools are written in C.

Once you have a client compiled for an OS, it is as portable as any
binary that is dynamically linked on your platform. I.e. you can 
just copy it over as long as the same run-time libraries are available.

So far, we haven't managed to find any unix-like system that couldn't
run the Hobbit client. Including some rather odd ones. The current list
of client-side data collectors are

hobbitclient-aix.sh    hobbitclient-darwin.sh  hobbitclient-freebsd.sh
hobbitclient-hp-ux.sh  hobbitclient-irix.sh    hobbitclient-linux.sh
hobbitclient-netbsd.sh hobbitclient-openbsd.sh hobbitclient-osf1.sh
hobbitclient-sunos.sh

> Sixth is development. How active is the development of Hobbit, how big
> is the community, etc? How many people can attest to having fully
> functional hobbit setups, how long has it been around and how often
> are new releases usually made?

Hobbit started back in late 2002 when it was called the "bbgen toolkit".
It was renamed to Hobbit in March 2005 when it had developed into a 
complete replacement for BB. More details in the hobbit(7) man-page
available online at http://www.hswn.dk/hobbit/help/manpages/

It is actively being developed by me, but people on this list have
made contributions of code. Some have picked up special projects
like the Windows client and run that completely on their own.
I'd say Hobbit currently has a very active user community, and
the development community is slowly growing beyond just myself.

There are currently 433 subscribers to the Hobbit mailing list.
According to the Sourceforge download statistics, it is downloaded
about 1000 times per month.
http://sourceforge.net/project/stats/?group_id=128058&ugn=hobbitmon&type
=&mode=year

There was a thread on the mailing list back in May about who uses
Hobbit. The results were summarized here:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/User_Guide#Wh
o_use_Hobbit_.3F

New releases have usually happened frequently - 2-4 times a year.
The current interval between the 4.1.2 release and version 4.2 is 
unusually long - a whole year. I don't expect that to happen again.

> Also I saw something this morning about a Windows client -- how 
> stable is that?

From what I hear it should be usable. But you can stick with the
current BBNT client until it reaches version 1.0.

> How stable is the Solaris version?

Rock-solid.

> Is there a client for Mac OSX?

Yes. It will run the Hobbit server also, if you want to.

> Is Hobbit like BB in the sense that you can change paths to system 
> binaries like grep and sed to allow easy use on other UNIXes like OSX?

Adding a client for a new OS will require implementing both a
client-side script to collect whatever data is interesting for this
system, and implementing the data parsing on the Hobbit server-side.
So it is somewhat more challenging. But since Hobbit already supports
all of the common Unix systems, I doubt that you will need to worry 
about that. If you do have a system which is not on the list, I will
help you with adding support for it.

> When will 4.2 be officially released as a production version?

Probably by the end of this week.

> Since we have a working BB setup for now, I need to
> decide if I should try to start migrating now or if I should wait some
> time for Hobbit to develop more before I migrate from BB.

I don't think you have to wait. But it's for You to decide.

Regards,
Henrik

Q. Does NOCOLUMNS have to refer to a client line or can it refer to a page/subpage in bb-hosts ?

edit

A. Waiting for contribution.

Solaris Developer FAQ

edit

Q. What is the benefit of 64-bit hobbit server vs 32-bit?

edit

Q. How do I enable 64 bit compilation ?

edit

32 bit

edit
  CC=cc CFLAGS="-mr -Qn -xstrconst -xO2 -xtarget=ultra2 -xarch=v8plusa" 
  CC_LD_RT="-R"

Meaning of each option:
-mr:
-Qn:
-xstrconst: 
-xO2
-xtarget=ultra2
-xarch=v8plusa

64 bit

edit
  CC=cc CFLAGS="-mr -Qn -xstrconst -xO2 -xtarget=general -xarch=v9" 
  CC_LD_RT="-R" 

Q. Example of 32-bit hobbit server

edit
bash-3.00# file hobbitd
hobbitd:        ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, 
                UltraSPARC1 Extensions Required, dynamically linked, not stripped
bash-3.00#

Warning messages from Sun Compiler

edit

Followings are warning messages from using the Sun Compiler.

Q. "pointer to unsigned char "=" pointer to char"

edit
  • the error message.
bash-3.00# gmake
cc -mr -Qn -xstrconst -xO2 -xtarget=ultra2 -xarch=v8plusa -D_REENTRANT  -DSunOS -o safequery\
 safequery.c
"safequery.c", line 12: warning: assignment type mismatch:
        pointer to unsigned char "=" pointer to char
bash-3.00#

  • The source
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *safechars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_+?~=";

int main(int argc, char *argv[])
{
    unsigned char *querystring;
    unsigned char *p;

    querystring = getenv("QUERY_STRING");
    if (!querystring) {
       return 0;
    };

    for (p=querystring; (*p); p++) {
      if (!strchr(safechars, *p)) {
         return 1;
      }
    }
  • Note: getenv() is documented as returning char *, not unsigned char *.

Others

edit
"loadhosts.c", line 463: warning: statement not reached
         prototype: pointer to char : "/opt/build/hobbit-4.2.0/include/../lib/strfunc.h", line 16
         argument : pointer to unsigned char
"loadhosts.c", line 463: warning: statement not reached
"loadhosts.c", line 521: warning: return value type mismatch
"hobbitd_alert.c", line 543: warning: assignment type mismatch:
        pointer to char "=" pointer to unsigned char
"hobbitd_alert.c", line 665: warning: assignment type mismatch:
        pointer to unsigned char "=" pointer to char
"hobbitd_alert.c", line 692: warning: argument #1 is incompatible with prototype:
         prototype: pointer to unsigned char : "/opt/build/hobbit-4.2.0/include/../lib/encoding.h",
         line 17
         argument : pointer to char
"hobbitd_alert.c", line 701: warning: assignment type mismatch:
         pointer to unsigned char "=" pointer to char
"hobbitd_alert.c", line 719: warning: assignment type mismatch:
         pointer to unsigned char "=" pointer to char
"do_alert.c", line 182: warning: argument #2 is incompatible with prototype:
       prototype: pointer to char : "/opt/build/hobbit-4.2.0/include/../lib/strfunc.h", line 16
       argument : pointer to unsigned char
"do_alert.c", line 190: warning: argument #1 is incompatible with prototype:
       prototype: pointer to char : "/opt/build/hobbit-4.2.0/include/../lib/misc.h", line 25
       argument : pointer to unsigned char
"do_alert.c", line 253: warning: argument #1 is incompatible with prototype:
       prototype: pointer to char : "/opt/build/hobbit-4.2.0/include/../lib/misc.h", line 25
       argument : pointer to unsigned char
"do_alert.c", line 272: warning: argument #1 is incompatible with prototype:
       prototype: pointer to char : "/opt/build/hobbit-4.2.0/include/../lib/misc.h", line 25
       argument : pointer to unsigned char

Administration FAQ

edit

How do I monitor HP-UX network log ?

edit
On Mon, Jan 28, 2008 at 12:06:22PM -0500, Robert Herron wrote:
> HP-UX stores its network log in a binary file (/var/adm/nettl.LOG000) that
> you view with the netfmt command.  Before I start working on my own, does
> anyone have an EXT script to monitor it?  If so, could I have a copy?

Alternatively, you could modify the HP-UX client script to generate a
normal Hobbit "msgs" section with the text-output from the netfmt
command; then Hobbit can process it as if it were an ordinary text-based
logfile.

E.g. at the bottom of the hobbitclient-hp-ux.sh script running on your
clients, just before the "exit" command add this:

   echo "[msgs:/var/adm/nettl.LOG000]"
   netfmt ...whatever needs to go here to get the text-output ...

Then you can use a normal log-file entry on the Hobbit server to process
the log data.

Regards,
Henrik

Q. How do I enable RSS on hobbit server ?

edit

A.

  • In the hobbitserver.cfg file, change the BBGENOPTS variable from the following:
BBGENOPTS="--recentgifs --subpagecolumns=2"     # Standard options for bbgen.
  • To the following:
# enable RSS by " --rss --rsslimit=yellow"
BBGENOPTS="--recentgifs --subpagecolumns=2  --rss --rsslimit=yellow"     # Standard options for bbgen.
  • The newly created RSS files in the www directory should resemble the following:
bash-3.00$ ls -lrt /opt/moto/hobbitserver42/www/*.rss
-rw-r--r--   1 hobbits  hobbits      714 Jan 22 07:50 /opt/moto/hobbitserver42/www/bb.rss
-rw-r--r--   1 hobbits  hobbits     6184 Jan 22 07:50 /opt/moto/hobbitserver42/www/bb2.rss
-rw-r--r--   1 hobbits  hobbits      337 Jan 22 07:50 /opt/moto/hobbitserver42/www/bbnk.rss
bash-3.00$

Q. How do I configure SMF for hobbit on Solaris 10 ?

edit

A. From: Everett, Vernon [1] For those of you familiar with Solaris 10, you should know about services, but for some, adding new ones is a little tricky. To get Hobbit working as a service we need to do the following.

Create a file named /var/svc/manifest/application/hobbit.xml with the following content:

<?xml version="1.0"?>

<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<!--
Copyright 2007 Vernon Everett - vernon(a)everett.co.za
Free for use and distibution for non-commercial purposes.
No warranty exists either implicit or implied.
Standard disclaimer applies.
Commercial use is subject to license terms.
-->

<service_bundle type='manifest'name='Hobbit-monitor:hobbit'>

<service name='application/hobbit' type='service' version='1'>
<dependency name='filesystem' grouping='require_all' restart_on='none' type='service'>

<service_fmri value='svc:/system/filesystem/local'/>
</dependency>
<dependency name='multi-user-server' grouping='optional_all' type='service' restart_on='none'>
<service_fmri value='svc:/milestone/multi-user-server' />
</dependency>
        <exec_method type='method' name='start'exec='/usr/lib/hobbit/client/runclient.sh start'
         timeout_seconds='10'>
<method_context>
<method_credential user='hobbit'/>
</method_context>
</exec_method>
        <exec_method type='method' name='stop' exec='/usr/lib/hobbit/client/runclient.sh stop'
         timeout_seconds='10'>
<method_context>
<method_credential user='hobbit' />
</method_context>
</exec_method>
<exec_method type='method' name='restart' exec='/usr/lib/hobbit/client/runclient.sh restart'
 timeout_seconds='10'  >
<method_context>
<method_credential user='hobbit' />
</method_context>
</exec_method>
        <instance name='default' enabled='true' />
        <stability value='Unstable' />
        <template>
<common_name>
<loctext xml:lang='C'> Hobbit Monitor Client </loctext>

</common_name>

</template>

</service>

</service_bundle>

Take note of lines 37, 47 and 57, the lines that start "exec=". You may need to edit the path to your Hobbit start script.

To avoid confusion or possible issues, shut down your hobbit client at this point using the runclient script.

Now, as root, run the command

#svccfg import /var/svc/manifest/application/hobbit.xml

We should now have a service called hobbit.

#svcs | grep hobbit
online 9:23:05 svc:/application/hobbit:default

(It will probably have gone online at this point)

You can now treat it as you would a regular service. If it hasn't gone online, kick it off as normal.

#svcadm enable hobbit

It may be necessary to do a disable and then an enable, but that should get it going.

And because we have set the default as enabled, the service should start automatically when you do a reboot.

Confirm it's all good by doing

# ps -efa | grep hobbit

All the usual scripts should be running.

If you don't want it as a service anymore, as root run

#svccfg delete hobbit.

This will remove the service, and allow you to continue running it from the runclient script.

Q. How do I configure Hobbit Client for Solaris 10 using SMF ?

edit

A. copied from http://xymonton.org/addons:hobbitsmf

 These are service manifest files for Solaris 10. These will allow you to import the hobbit start
 and stop scripts for the server/client into the new Solaris Service Management Facility (Solaris 10
 replacement of /etc/rcN.d).

Installation

   1. mkdir -p /var/svc/manifest/application/monitoring/hobbit
   2. copy the client and server xml files to /var/svc/manifest/application/monitoring/hobbit
   3. import the service(s)

      svccfg import /var/svc/manifest/application/monitoring/hobbit/server.xml
      svccfg import /var/svc/manifest/application/monitoring/hobbit/client.xml

   4. enable the service(s)

      svcadm enable svc:/application/monitoring/hobbit/client:default
      svcadm enable svc:/application/monitoring/hobbit/server:default

Source
Hobbit Client

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<!--
    client.xml : Hobbit Client manifest, Galen Johnson
    2007-04-13
    based on bigbrother.xml : BigBrother manifest, Kyle Reynolds
    2006-07-02
-->
 
<!--
    Solaris 10 SMF manifest file for Hobbit client. Just place in
    /var/svc/manifest/application/monitoring/hobbit
    and run:
    'svccfg import /var/svc/manifest/application/monitoring/hobbit/client.xml'
    'svcadm enable svc:/application/monitoring/hobbit/client:default'
-->
 
<!--
    Be sure to change the path to runclient.sh to match your setup...
    Be sure to change the user and group hobbit runs as for your setup...
    If you need to provide the hobbit service additional group privs, add 
    them to the supp_groups in the method context.
-->
 
<service_bundle type="manifest" name="hobbit:client">
 
<service
    name="application/monitoring/hobbit/client"
    type="service"
    version="1">
 
    <create_default_instance enabled='false' />
 
    <single_instance />
    
    <dependency
        name="filesystem"
        grouping="require_all"
        restart_on="none"
        type="service">
        <service_fmri value="svc:/system/filesystem/local"/>
    </dependency>
 
    <dependency
        name="network"
        grouping="require_all"
        restart_on="none"
        type="service">
        <service_fmri value="svc:/network/initial"/>
    </dependency>
 
    <dependency
        name="multi-user-server"
        grouping="require_any"
        restart_on="error"
        type="service">
       <service_fmri value="svc:/milestone/multi-user-server:default"/>
    </dependency>
 
    <exec_method
        type="method"
        name="start"
        exec="/usr/local/hobbit/client/runclient.sh start"
        timeout_seconds="30">
        <method_context>
            <method_credential user="hobbit" group="bb"
                   supp_groups="" />
        </method_context>
    </exec_method>
 
    <exec_method
        type="method"
        name="stop"
        exec="/usr/local/hobbit/client/runclient.sh stop"
        timeout_seconds="30">
        <method_context>
            <method_credential user="hobbit" group="bb"
                  supp_groups="" />
        </method_context>
    </exec_method>
 
     <property_group name='startd' type='framework'>
        <!-- sub-process core dumps shouldn't restart session -->
        <propval name='ignore_error' type='astring' value='core,signal' />
    </property_group>
 
   <stability value="Unstable"/>
 
    <template>
        <common_name>
            <loctext xml:lang="C">
                Hobbit Client
            </loctext>
        </common_name>
        <documentation>
             <doc_link name='hobbit_monitor_site'
                uri='http://hobbitmon.sourceforge.net/' />
        </documentation>
    </template>
 
</service>
 
</service_bundle>

Q. How do I configure Hobbit Server for Solaris 10 using SMF ?

edit

A. copied from http://xymonton.org/addons:hobbitsmf

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<!--
    client.xml : Hobbit Server manifest, Galen Johnson
    2007-04-13
    based on bigbrother.xml : BigBrother manifest, Kyle Reynolds
    2006-07-02
-->
 
<!--
    Solaris 10 SMF manifest file for Hobbit server. Just place in
    /var/svc/manifest/application/monitoring/hobbit
    and run:
    'svccfg import /var/svc/manifest/application/monitoring/hobbit/server.xml'
    'svcadm enable svc:/application/monitoring/hobbit/server:default'
-->
 
<!--
    Be sure to change the path to hobbit.sh to match your setup...
    Be sure to change the user and group hobbit runs as for your setup...
-->
 
<service_bundle type="manifest" name="hobbit:server">
 
<service
    name="application/monitoring/hobbit/server"
    type="service"
    version="1">
 
    <create_default_instance enabled='false' />
 
    <single_instance />
 
    <dependency
        name="filesystem"
        grouping="require_all"
        restart_on="none"
        type="service">
        <service_fmri value="svc:/system/filesystem/local"/>
    </dependency>
 
    <dependency
        name="network"
        grouping="require_all"
        restart_on="none"
        type="service">
        <service_fmri value="svc:/network/initial"/>
    </dependency>
 
    <dependency
        name="multi-user-server"
        grouping="require_any"
        restart_on="error"
        type="service">
       <service_fmri value="svc:/milestone/multi-user-server:default"/>
    </dependency>
 
    <exec_method
        type="method"
        name="start"
        exec="/usr/local/hobbit/server/hobbit.sh start"
        timeout_seconds="30">
        <method_context>
            <method_credential user="hobbit" group="bb"/>
        </method_context>
    </exec_method>
 
    <exec_method
        type="method"
        name="stop"
        exec="/usr/local/hobbit/server/hobbit.sh stop"
        timeout_seconds="30">
        <method_context>
            <method_credential user="hobbit" group="bb"/>
        </method_context>
    </exec_method>
 
    <property_group name='startd' type='framework'>
        <!-- sub-process core dumps shouldn't restart session -->
        <propval name='ignore_error' type='astring' value='core,signal' />
    </property_group>
 
   <stability value="Unstable"/>
 
    <template>
        <common_name>
            <loctext xml:lang="C">
                Hobbit Server
            </loctext>
        </common_name>
        <documentation>
             <doc_link name='hobbit_monitor_site'
                uri='http://hobbitmon.sourceforge.net/' />
            <manpage title="hobbit" section="1" manpath="/usr/local/man"/>
        </documentation>
    </template>
 
</service>
 
</service_bundle>

Q. How do I enable devmon on hobbit server ?

edit

A. This http://www.techagent.com/devmon_snmp_hobbit_setup.htm has some procedures to enable devmon on hobbit.

Q. How do I remove a test ?

edit

A.

On Tue, Sep 05, 2006 at 08:07:34AM +0200, Ulric Eriksson wrote:
> I have figured out how to remove a single test from one host, or 
> all tests from a single host.

The command,   bb 127.0.0.1 "hobbitdboard"
is your friend, combined with a bit of scripting. E.g:

> Is it possible to remove a single test from *all* hosts?

bb 127.0.0.1 "hobbitdboard test=MYTEST fields=hostname" |
   while read H; do bb 127.0.0.1 "drop $H MYTEST"; done

> Or all tests from all hosts?

bb 127.0.0.1 "hobbitdboard test=info fields=hostname" |
   while read H; do bb 127.0.0.1 "drop $H"; done

> Or all tests that are purple?

bb 127.0.0.1 "hobbitdboard color=purple fields=hostname,testname" |
while read L; do 
      HOST=`echo $L | cut -d'|' -f1`
      TEST=`echo $L | cut -d'|' -f2`
      bb 127.0.0.1 "drop $HOST $TEST"
done

Q. How can I request the hobbit server to ask hobbit client to run a command locally based on an alert.

edit

A.

Run this as a client extension:

  #!/bin/sh

  # Get the current status of the "msgs" column
  MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`

  # Get the command we must run from the client config
  CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'`

  # If "msgs" is red and there is a command, run it
  if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
  then
     $CMD
  fi

  exit 0

Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.

Regards,
Henrik

Q. How do I configure "GROUP" alerts ?

edit

A. "GROUP" keyword is used in hobbit to classify many process name or disk partition name into different groups. This feature is needed in a big IT environment usually has different teams responsible for different areas of IT infrastructure. Ex. an IT organization usually consists of network,data/storage,backup, databases, application and Unix teams.

  • hobbit-client.cfg : We need to specify which process will alert which GROUP.
  • hobbit-alerts.cfg : In this file we then specify which email address receive the GROUP alert.

Example

edit

This is simple setup for learning purpose. Assuming when / is over 93% usage we want unix team to be paged.when /boot is over 15 percent apps team need to be alerted. Also cron process is dead, Unix team should be alerted. when Xvnc is over 120 processes then apps team need to be alerted also.

hobbit-client.cfg

edit
HOST=t-rh9.mywork.com
        DISK / 93 98            GROUP=UNIX_TEAM_PARTITION
        DISK /boot 15 20        GROUP=APPS_TEAM_PARTITION
        PROC cron 1 -1 yellow   GROUP=UNIX_TEAM_PROCESS
        PROC Xvnc 1 120 yellow  GROUP=APPS_TEAM_PROCESS
        PROC defunct  0 0  red
        LOG /var/log/messages  WARNING COLOR=yellow
        LOG /var/log/maillog   WARNING COLOR=yellow
        LOG /var/opt/hobbitclient42/log/clientlaunch.log WARNING COLOR=yellow
        LOG /var/opt/hobbitclient42/log/hobbitclient.log  WARNING
        FILE /etc/passwd  SIZE>0  OWNERID=root  COLOR=yellow

hobbit-alerts.cfg

edit

The names after GROUP, have to be exact the same that are used in hobbit-client.cfg

GROUP=UNIX_TEAM_PROCESS
     MAIL site02unix-admin-email@site02ad2141.mywork.com FORMAT=TEXT

GROUP=UNIX_TEAM_PARTITION
     MAIL site02unix-admin-email@site02ad2141.mywork.com FORMAT=TEXT

GROUP=APPS_TEAM_PROCESS
     MAIL site02unix-admin-email@site02ad2141.mywork.com FORMAT=TEXT

GROUP=APPS_TEAM_PARTITION
     MAIL site02unix-admin-email@site02ad2141.mywork.com FORMAT=TEXT

Debugging

edit

Hobbit provide a very powerful debugging tool to trace the alert rules. From following example we check a host against all the rules in hobbit-alert.cfg.

  • How do I debug "Oversize data/client msg" error message?
    • following is an example error message
Oversize data/client msg from 10.5.64.212 truncated (n=2068326, limit 1961984)
First line: linux2.test.com|linux|linux
    • Checking the size of msg file got sent from hobbit client side.
# ls -l msg.linux2.test.com.txt
-rw-r--r--  1 hobbitc hobbitc 2068297 Jan 29 07:54 msg.linux2.test.com.txt
#
    • Find out why msg.linux2.test.com.text is so big.
  • Setup the debugging environment by running bbcmd. it will set up all the hobbit environment variables.
bash-3.00$ bin/bbcmd
2007-07-23 14:28:21 Using default environment file /etc/opt/hobbitserver42/hobbitserver.cfg
bash-3.00$
  • Also have a look at command syntax of hobbitd_alert.
$ bin/hobbitd_alert --debug --test
Usage: hobbitd_alert --test HOST SERVICE [options]
Possible options:
        [--duration=SECONDS]
        [--color=COLOR]
        [--group=GROUPNAME]
        [--time=TIMESPEC]
$ bash
bash-3.00$
  • Using the hobbitd_alert debugging command: A successful match, /boot disk usage has to be really over 16%.
bash-3.00$ bin/hobbitd_alert --debug --test t-rh9.mywork.com disk --group=APPS_TEAM_PARTITION
2007-07-23 14:38:57 Opening file /etc/opt/hobbitserver42/bb-hosts
2007-07-23 14:38:57 Opening file /etc/opt/hobbitserver42/hobbit-alerts.cfg
2007-07-23 14:38:57 Compiling regex (t-rh9).mywork.com
2007-07-23 14:38:57 Compiling regex (t-rh9).mywork.com
2007-07-23 14:38:57 send_alert t-rh9.mywork.com:DISK state 0
00018286 2007-07-23 14:38:57 send_alert t-rh9.mywork.com:DISK state Paging
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 122
00018286 2007-07-23 14:38:57 Failed 'HOST=$site02test SERVICE=cpu,disk,memory,files,telnet'
 (service not in include list)
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 125
00018286 2007-07-23 14:38:57 Failed 'HOST=$site02test SERVICE=conn' (service not in include list)
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 129
00018286 2007-07-23 14:38:57 Failed 'GROUP=UNIX_TEAM_PROCESS' (group not in include list)
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 132
00018286 2007-07-23 14:38:57 Failed 'GROUP=UNIX_TEAM_PARTITION' (group not in include list)
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 135
00018286 2007-07-23 14:38:57 Failed 'GROUP=APPS_TEAM_PROCESS' (group not in include list)
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 138
00018286 2007-07-23 14:38:57 *** Match with 'GROUP=APPS_TEAM_PARTITION' ***
2007-07-23 14:38:57 Found a first matching rule
00018286 2007-07-23 14:38:57 Matching host:service:page 't-rh9.mywork.com:DISK:'
 against rule line 138
00018286 2007-07-23 14:38:57 *** Match with 'GROUP=APPS_TEAM_PARTITION' ***
2007-07-23 14:38:57   repeat t-rh9.mywork.com|DISK|mail|site02unix-admin-
 email@site02ad2141.mywork.com at 0
2007-07-23 14:38:57   Alert for t-rh9.mywork.com:DISK to site02unix-admin-
 email@site02ad2141.mywork.com
00018286 2007-07-23 14:38:57 Mail alert with command 'mailx -s "Hobbit [12345] t-rh9.mywork.com:DISK
 CRITICAL (RED)" site02unix-admin-email@site02ad2141.mywork.com'
2007-07-23 14:38:57 No more secondary matching rule
bash-3.00$
  • Look from "Match with" keywords to locate the exact rule got matched.
  • Another check, Xvnc process limit has to be really triggered.
bash-3.00$ bin/hobbitd_alert --debug --test t-rh9.mywork.com PROC  --group=APPS_TEAM_PARTITION
2007-07-23 14:55:38 Opening file /etc/opt/hobbitserver42/bb-hosts
2007-07-23 14:55:38 Opening file /etc/opt/hobbitserver42/hobbit-alerts.cfg
2007-07-23 14:55:38 Compiling regex (t-rh9).mywork.com
2007-07-23 14:55:38 Compiling regex (t-rh9).mywork.com
2007-07-23 14:55:38 send_alert t-rh9.mywork.com:PROC state 0
00018526 2007-07-23 14:55:38 send_alert t-rh9.mywork.com:PROC state Paging
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 122
00018526 2007-07-23 14:55:38 Failed 'HOST=$site02test SERVICE=cpu,disk,memory,files,telnet'
 (service not in include list)
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 125
00018526 2007-07-23 14:55:38 Failed 'HOST=$site02test SERVICE=conn' (service not in include list)
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 129
00018526 2007-07-23 14:55:38 Failed 'GROUP=UNIX_TEAM_PROCESS' (group not in include list)
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 132
00018526 2007-07-23 14:55:38 Failed 'GROUP=UNIX_TEAM_PARTITION' (group not in include list)
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 135
00018526 2007-07-23 14:55:38 Failed 'GROUP=APPS_TEAM_PROCESS' (group not in include list)
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 138
00018526 2007-07-23 14:55:38 *** Match with 'GROUP=APPS_TEAM_PARTITION' ***
2007-07-23 14:55:38 Found a first matching rule
00018526 2007-07-23 14:55:38 Matching host:service:page 't-rh9.mywork.com:PROC:'
 against rule line 138
00018526 2007-07-23 14:55:38 *** Match with 'GROUP=APPS_TEAM_PARTITION' ***
2007-07-23 14:55:38   repeat t-rh9.mywork.com|PROC|mail|
 site02unix-admin-email@site02ad2141.mywork.com at 0
2007-07-23 14:55:38   Alert for t-rh9.mywork.com:PROC to
 site02unix-admin-email@site02ad2141.mywork.com
00018526 2007-07-23 14:55:38 Mail alert with command 'mailx -s "Hobbit [12345] t-rh9.mywork.com:PROC
 CRITICAL (RED)" site02unix-admin-email@site02ad2141.mywork.com'
2007-07-23 14:55:38 No more secondary matching rule
bash-3.00$

Q. How do I get trimhistory to work for a hobbit server on Solaris ?

edit

A. The default example in hobbit manpage is for hobbit server on Linux.

Q. How do I exclude "info" and "trends" columns from the NK overview page?

edit

A. As of Hobbitd 4.12 it is currently hard-coded that the "info" and "trends" columns show up on all pages, including the NK and BB2 pages. If you want those columns removed you'll have to edit the Hobbit source-code. The change is pretty simple. In the hobbit-4.1.2/bbdisplay/pagegen.c file, lines 121-123 look like this:

/* TRENDS and INFO columns are always included on non-BB pages */
if (strcmp(column->name, xgetenv("INFOCOLUMN")) == 0) return 1;
if (strcmp(column->name, xgetenv("TRENDSCOLUMN")) == 0) return 1;

Change the "return 1" on both lines to "return 0", save the file, run "make" and either run "make install", or copy the bbdisplay/bbgen program to ~hobbit/server/bin/ . Next time the NK page is updated, those columns will be gone.

Q. How do I use the internal HTTP test feature of Hobbit to test a Squid proxy server?

edit

A. If you want to check that the service is actually functional use something like this in your bb-hosts file:

0.0.0.0   squid.domain.com   # http://squid.domain.com:8080/http://www.google.com/

Q. How do I use the internal HTTP test feature of Hobbit to test a proxy server with authentication to a Windows domain?

edit

A. If you want to check that the service is actually functional use something like this in your bb-hosts file:

0.0.0.0   servername.domain.com   # \
    http://domain\username:password@servername.domain.com:8080/http://www.google.com/

Q. How do I compare graphs from different hosts on one page?

edit

A. As of Hobbitd 4.12 there currently isn't a front-end to build the URL's needed for the graphs, but you can do it by hand for any graphs with -multi definitions (e.g. load average, swap):

  1. Find the base graph you want, e.g. the cpu "load average" graph for one of your hosts.
  2. In your browser, right-click the graph and select "view image" or "open image". You now have a view of your load graph only.
  3. In the address bar field you'll see the URL for this image. E.g.
http://hobbit.domain.com/hobbit-cgi/hobbitgraph.sh?host=host1.domain.com&service=la\
&graph_width=576&graph_height=120&disp=host1%2edomain%2ecom&nostale&graph=hourly&action=view

Now, you can add more hosts after the "host=..." part of the URL - just list all of your hosts separated by commas. Like:

http://hobbit.domain.com/hobbit-cgi/hobbitgraph.sh?host=host1.domain.com,host2.domain.com,\
host3.domain.com&service=la&graph_width=576&graph_height=120&disp=host1%2edomain%2ecom&nostale&\
graph=hourly&action=view

Q. I just upgraded from the BigBrother client to the Hobbit client and I don't get any status for CPU or disk, but I get status for other tests

edit

A. Most common cause is that the hostname used by the client is different from the hostname you have in your bb-hosts file. On the client, what's the hostname reported by the "uname -n" command ? If that is different from the hostname you have in the bb-hosts file, start the client with the "--hostname=THE.REAL.HOSTNAME" option.

Q. How do I fix "Oversize status msg from 192.168.1.31 for test.my.com:ports truncated (n=508634, limit=262144)"

edit

A.

Try to increase value of MAXMSG_STATUS in ~server/etc/hobbitserver.cfg :

MAXMSG_STATUS
   The maximum size of a "status" message in kB, default: 256. Status
   messages are the ones that end up as columns on the web display. The
   default size should be adequate in most cases, but some extension
   scripts can generate very large status messages - close to 1024 kB.
   You should only change this if you see messages in the hobbitd log
   file about status messages being truncated.

limit=262144 is 256kB. You can divide n value for 1024 (508634/1024 = 496) then you can set MAXMSG_STATUS="500" and restart hobbit server.

B.

On Wed, May 03, 2006 at 03:43:19PM +0200, Dominique Frise wrote:
> Hi,
> 
> ----hobbitd.log----
> 2006-05-03 12:34:27 Oversize data/client msg from 130.223.5.20 truncated 
> (n=815825, limit 524288)
> First line: godzilla|sunos

"godzilla" - a Solaris host - sent a too-large "client" message of
815825 bytes. There's a limit set in Hobbit for the size of client
message at 512 KB, so the message was truncated.

> [bb (at) iris hobbit]$ cat clientdata.log
> 2006-05-03 12:34:28 Worker process died with exit code 0, terminating

This is interesting. If the truncated message caused hobbitd_client to
crash, I would have expected a different exit-code. I'll have to check 
how it handles truncated messages.

> How can this happend?

Dont know, but apparently some input from your host caused it.

> Has this been fixed in latest snapshot?

Probably not. Which version are you running?

> Which worker process died? (hobbitd_client is still running)

It's restarted automatically by hobbitlaunch.

Henrik
  • We need to investigate why the hb client message is oversize. Like following we found we have msg.*.txt file that is over 512k. This is abonormal for bb message sampling of a system.
bash-3.00# wc msg.k206.test.com.txt
    7943   55662  611936 msg.k206.test.com.txt
bash-3.00# ls -l  msg.k206.test.com.txt
-rw-r--r--   1 hobbitc  hobbitc   611936 May  2 18:35 msg.v04k206.test.com.txt
bash-3.00#

  • Further investigation found that [ports] section of msg.*.txt has too many output from two "netstat -na" commands.
bash-3.00# grep netstat /opt/hobbitclient42/bin/hobbitclient-sunos.sh
netstat -rn
echo "[netstat]"
netstat -s
netstat -na -f inet -P tcp | tail +3
netstat -na -f inet6 -P tcp | tail +5
bash-3.00#

edit

A. It isn't configurable, but in the hobbit-4.2.0/web/hobbitgraph.c file near the top of the file you'll find these lines:

 #define HOUR_GRAPH  "e-48h"
 #define DAY_GRAPH   "e-12d"
 #define WEEK_GRAPH  "e-48d"
 #define MONTH_GRAPH "e-576d"

Change them to suit you. Then search that same file for the HOUR_GRAPH etc. further down; you'll find 1 place where each is used like this:

 period = HOUR_GRAPH;
 persecs = 48*60*60;

and you need to change that "persecs" calculation also for all 4 graph types.

Change the legend:

 //persecs = 12*24*60*60;
 persecs = 7*24*60*60;
 //glegend = "Last 12 Days";
 glegend = "Last 7 Days";

Then run "make" (from the hobbit-4.2.0 directory) and "make install" (or just copy the "web/hobbitgraph.cgi" file to your ~hobbit/server/bin/ directory).

Graph FAQ

edit

Q. Why does my Hobbit server have the following http response graph ?

edit

Why does the response time of the http service differ so much. Is this a mis-configured http server?
 

A.Hundreds of reasons could cause a delay on an http server. We see a graph differ only in a range between 2 and 10 ms. This is quite normal. Due to the automated scaling, the graph differences look more important or drastic than they actually are.

edit

 

A., Inbound traffic is much more than outbound, this is normal. The Hobbit server receives lots of bb/hb message from clients.

Q. What are those wrsmd* interfaces ?

edit

A., The wrsmd(7D) (WCI Remote Shared Memory (WRSM) DLPI driver) status was reported by the Solaris 10 command "/usr/bin/kstat -p -s '[or]bytes64'" used for [ifstat] in hobbitclient-sunos.sh. I said "was" because on all our patched Solaris 10 servers, we have not seen this for quite a while.

You can avoid this output by using "/usr/bin/kstat -p -s '[or]bytes64' | grep -v wrsmd | sort" for [ifstat]. You need also to remove all ifstat.wrsmd*.rrd files.

Dominique UNIL - University of Lausanne

Q. Graph after correction

edit

The wrsmd interfaces now disappeared from the graph.

 

Q. How can we use Hobbit/BBWin client to collect Bit/s on the Network interfaces?

edit

A., I'm very interesting in this question, but have not found an answer yet.

Error FAQ

edit

semop failed, Invalid argument ?

edit

A., Frequently seen when hobbit dies ungracefully. Stop Hobbit, and run,

         # ipcs |grep hobbit
         0x0100ba76 162758665  hobbit    600        262144     2
         0x0200ba76 162791434  hobbit    600        262144     2
         And remove any remaining shared memory segments.
         # ipcrm -M 0x0200ba76
bash-3.00# tail hobbitd.log
2008-02-18 12:28:00 semop failed, Invalid argument
2008-02-18 12:28:00 How did this happen? clients=-1, s.sem_op=0
2008-02-18 12:28:00 semop failed, Invalid argument
2008-02-18 12:28:00 How did this happen? clients=-1, s.sem_op=0
2008-02-18 12:28:00 semop failed, Invalid argument
2008-02-18 12:28:00 How did this happen? clients=-1, s.sem_op=0
2008-02-18 12:28:00 semop failed, Invalid argument
2008-02-18 12:28:00 How did this happen? clients=-1, s.sem_op=0
2008-02-18 12:28:00 semop failed, Invalid argument
2008-02-18 12:28:00 How did this happen? clients=-1, s.sem_op=0
bash-3.00#
bash-3.00#