XQuery/Uptime monitor

Motivation

edit

You would like to monitor the service availability of several web sites or web services. You would like to do this all with XQuery and store the results in XML files. You would also like to see "dashboard" graphical displays of uptime.

There are several commercial services (Pingdom, Host-tracker )which will monitor the performance of your web sites in terms of uptime and response time.

Although the production of a reliable service requires a network of servers, the basic functionality can be performed using XQuery in a few scripts.

Method

edit

This approach focuses on the uptime and response time of web pages. The core approach is to use the eXist job scheduler to execute an XQuery script at regular time intervals. This script performs a HTTP GET on a URI and records the statusCode of the site in an XML data file.

The operation is timed to gather response times from elapsed time (valid on a lightly used server) and the test results stored. Reports can then be run from the test results and alerts send when a site is observed to be down.

Even though a prototype, the access to fine-grained data has already revealed some response time issues on one of the sites at the University.

Watch list

Conceptual Model

edit

This ER model was created in QSEE, which can also generate SQL or XSD.

 

In this notation the bar indicates that Test is a weak entity with existence dependence on Watch.

Mapping ER model to Schemas

edit

Watch-Test relationship

edit

Since Test is dependent on Watch, the Watch-Test relationship can be implemented as composition, with the multiple Test elements contained in a Log element which itself is a child of the Watch element. Tests are stored in chronological order.

Watch Composition

edit

Two possible approaches:

  • add the Log as a element amongst the base data for the Watch
  Watch
     uri
     name
     Log
        Test
  • construct a Watch element which contains the Watch base data as WatchSpec and the Log
  Watch
     WatchSpec (the Watch entity )
        uri
        name
     Log

The second approach preserves the original Watch entity as a node, and also fits with the use of XForms, allowing the whole WatchSpec node to be included in a form. However it introduces a difficult-to-name intermediate, and results in paths like

  $watch/WatchSpec/uri 

when

  $watch/uri would be more natural.

Here we choose the first approach on the grounds that it is not desirable to introduce intermediate elements in anticipation of simpler implementation of a particular interface.

Watch entity

edit

A Watch entity may be implemented as a file or as an element in a collection. Here we choose to implement Watch as a element in a Monitor container in a document. However this is a difficult decision and the XQuery code should hide this decision as much as possible.

Attribute implementation

edit

Watch attributes are mapped to elements. Test attributes are mapped to attributes.

Schema

edit
Model Generated
edit

QSEE will generate an XML Schema. In this mapping, all relationships are implemented with foreign keys, with key and keyref used to describe the relationship. In this case, the schema would need to be edited to implement the Watch-Test relationship by composition.

By Inference
edit

This schema has been generated by Trang (in Oxygen ) from an example document, created as the system runs.

  • Compact Relax NG
    element Monitor {
        element Watch {
            element uri { xsd:anyURI },
            element name { text },
            element Log {
                element Test {
                    attribute at { xsd:dateTime },
                    attribute responseTime { xsd:integer },
                    attribute statusCode { xsd:integer }
                }+
            }
        }+
    }

  • XML Schema

XML Schema

Designed Schema

edit

Editing the QSEE generated schema results in a schema which includes the restriction on statusCodes.

XML Schema


Test Data

edit

An XQuery script transforms an XML Schema (or a subset thereof) to a random instance of a conforming document.

Random Document

The constraint that Tests are in ascending order of the attribute at is not defined in this schema. The generator needs to be helped to generate useful test data by additional information about the length of strings and the probability distribution of enumerated values, iterations and optional elements

Equivalent SQL implementation

edit
CREATE TABLE Watch(
	uri	VARCHAR(8) NOT NULL,
	name	VARCHAR(8) NOT NULL,
	CONSTRAINT	pk_Watch PRIMARY KEY (uri)
) ;

CREATE TABLE Test(
	at	TIMESTAMP NOT NULL,
	responseTime	INTEGER NOT NULL,
	statusCode	INTEGER NOT NULL,
	uri	VARCHAR(8) NOT NULL,
	CONSTRAINT	pk_Test PRIMARY KEY (at,uri)
) ;

ALTER TABLE Test
  ADD INDEX (uri), 
  ADD CONSTRAINT fk1_Test_to_Watch FOREIGN KEY(uri) 
                 REFERENCES Watch(uri) 
                 ON DELETE RESTRICT
                 ON UPDATE RESTRICT;

In the Relational implementation the primary key uri of Watch is the foreign key of Test. There would be an advantage to adding a system-generated id to use in place of this meaningful URI, both to remove the redundancy created and to reduce the size of the foreign key. However a mechanism is then need to allocate unique ids.

Implementation

edit

Dependencies

edit

eXistdb modules

edit
  • xmldb for database update and login
  • datetime for date formating
  • util - for system-time function
  • httpclient - for HTTP GET
  • scheduler - to schedule the monitoring task
  • validation - for database validation

other

edit
  • Google Charts


Functions

edit

Functions in a single XQuery module.

module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor";

Database Access

edit

Access to the Monitor database which may be a local database document, or a remote document.

declare function monitor:get-watch-list($base as xs:string) as element(Watch)* {
   doc($base)/Monitor/Watch
};

A specific Watch entity is identified by its URI:

  let $wl:= monitor:get-watch-list("/db/Wiki/Monitor3/monitor.xml")

Further references to a Watch are by reference. e.g.

declare function monitor:get-watch-by-uri($base as xs:string, $uri as xs:string) as element(Watch)* {
   monitor:get-watch-list($base)[uri=$uri]
};

Executing Tests

edit

The test does an HTTP GET on the uri. The GET is bracketed by calls to util:system-time() to compute the elapsed wall-clock time in milliseconds. The test report includes the statusCode.

declare function monitor:run-test($watch as element(Watch)) as element(Test) { 
   let $uri := $watch/uri
   let $start := util:system-time()
   let $response :=  httpclient:get(xs:anyURI($uri),false(),())
   let $end := util:system-time()
   let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S'))  * 1000  
   let $statusCode := string($response/@statusCode)
   return
       <Test  at="{current-dateTime()}" responseTime="{$runtimems}" statusCode="{$statusCode}"/>
};


The generated test is appended to the end of the log:

declare function monitor:put-test($watch as element(Watch), $test as element(Test)) {
    update insert $test into $watch/Log
};

To execute the test, a script logs in, iterates through the Watch entities and for each, executes the test and stores the result:


import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

let $login := xmldb:login("/db/","user","password")
let $base := "/db/Wiki/Monitor3/Monitor.xml"
for $watch in monitor:get-watch-list($base)
let $test := monitor:run-test($watch)
let $update :=monitor:put-test($watch,$test) 
return $update

Job scheduling

edit

A job is scheduled to run this script every 5 minutes.

let $login := xmldb:login("/db","user","password")
return scheduler:schedule-xquery-cron-job("/db/Wiki/Monitor/runTests.xq" , "0 0/5 * * * ?")

Index page

edit

The index page is based on a supplied Monitor document, by default the production database.

import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

declare option exist:serialize  "method=xhtml media-type=text/html";
declare variable $heading := "Monitor Index";
declare  variable $base := request:get-parameter("base","/db/Wiki/Monitor3/Monitor.xml");

<html>
   <head>
        <title>{$heading}</title>
    </head>
    <body>
       <h1>{$heading}</h1>
       <ul>
          {for $watch in  monitor:get-watch-list($base)
          return 
               <li>{string($watch/name)}&#160;  &#160; 
                        <a href="report.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($watch/uri)}">Report</a>  
               </li>
          }
      </ul>
    </body>
</html>

In this implementation, the URI of the monitor document is passed to dependent scripts in the URI. An alternative would to pass this data via a session variable.

View

Reporting

edit

Reporting draws on the log of Tests for a Watch

declare function monitor:get-tests($watch as element(Watch)) as element(Test)* {
    $watch/Log/Test
};

Overview Report

edit

The basic report shows summary data about the watched URI and an embedded chart of response time over time. Up-time is the ratio of tests with a status code of 200 to the total number of tests.

import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

declare option exist:serialize  "method=xhtml media-type=text/html";

let $base := request:get-parameter("base",())
let $uri:= request:get-parameter("uri",())
let $watch :=monitor:get-watch-by-uri($base,$uri)

let $tests := monitor:get-tests($watch)
let $countAll := count($tests)
let $uptests := $tests[@statusCode="200"]
let $last24hrs := $tests[position() >($countAll - 24 * 12)]
let $heading := concat("Performance results for ", string($watch/name))
return 
<html>
    <head>
        <title>{$heading}</title>
    </head>
    <body>
       <h3>
            <a href="index.xq">Index</a>
        </h3>
        <h1>{$heading}</h1>
        <h2><a href="{$watch/uri}">{string($watch/uri)}</a></h2>
        {if (empty($tests)) 
         then ()
         else
   <div>      
        <table border="1">
              <tr>
                <th>Monitoring started</th>
                <td> {datetime:format-dateTime($tests[1]/@at,"EE dd/MM HH:mm")}</td>
            </tr>
            <tr>
                <th>Latest test</th>
                <td> {datetime:format-dateTime($tests[last()]/@at,"EE dd/MM HH:mm")}</td>
            </tr>
            <tr>
                <th>Minimum response time </th>
                <td> {min($tests/@responseTime)} ms </td>
            </tr>
            <tr>
                <th>Average response time</th>
                <td> { round(sum($tests/@responseTime) div count($tests))} ms</td>
            </tr>
            <tr>
                <th>Maximum response time </th>
                <td> {max($tests/@responseTime)} ms</td>
            </tr>
            <tr>
                <th>Uptime</th>
                <td>{round(count($uptests) div count($tests)  * 100) } %</td>
            </tr>
            <tr>
                <th>Raw Data </th>
                <td>
                    <a href="testData.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($uri)}">View</a>
                </td>
            </tr>
            <tr>
                <th>Response Distribution </th>
                <td>
                    <a href="responseDistribution.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($uri)}">View</a>
                </td>
            </tr>
        </table>
        <h2>Last 24 hours </h2>
            {monitor:responseTime-chart($last24hrs)}    
         <h2>1 hour averages </h2>
            {monitor:responseTime-chart(monitor:average($tests,12))}    

       </div>
       }
    </body>
</html>

View

Response time graph

edit

The graph is generated using the Google Chart API. The default vertical scale from 0 to 100 fits the typical response time. In this simple example, the graph is unadorned or explained.

declare function monitor:responseTime-chart($test as element(Test)* ) as element(img) {
   let $points := 
       string-join($test/@responseTime,",")
   let  $chartType := "lc"
   let $chartSize :=  "300x200"
   let $uri := concat("http://chart.apis.google.com/chart?",
                                       "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points)
   return 
        <img src="{$uri}"/>  
};

Response Time Frequency Distribution

edit

The frequency distribution of response times summarised the response times. First the distribution itself is computed as a sequence of groups. The interval calculation is crude and uses 11 groups to fit with Google Chart.

declare function monitor:response-distribution($test as element(Test)* ) as element(Distribution) {
  let $times := $test/@responseTime
  let $min := min($times)
  let $max := max($times)
  let $range := $max - $min
  let $step := round( $range div 10)
  return
 <Distribution>
 {
      for $i in (0 to 10)
      let $low := $min + $i * $step
      let $high :=$low + $step
      return
           <Group i="{$i}" mid="{round(($low + $high ) div 2)}" count="{ count($times[. >= $low] [. < $high]) }"/>
 }
</Distribution>
};

This grouped distribution can then be Charted as a bar chart. Scaling is needed in this case.

declare function monitor:distribution-chart($distribution as element(Distribution)) as element(img) {
   let $maxcount := max($distribution/Group/@count)
   let $scale :=100 div $maxcount 
   let $points := 
       string-join( $distribution/Group/xs:string($scale * @count),",")
   let  $chartType := "bvs"
   let $chartSize :=  "300x200"
   let $uri := concat("http://chart.apis.google.com/chart?",
                                       "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points)
   return 
        <img src="{$uri}"/>  
};

Finally a Script to create a page:

import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

declare option exist:serialize  "method=xhtml media-type=text/html";

let $base := request:get-parameter("base",())
let $uri:= request:get-parameter("uri",())
let $watch := monitor:get-watch($base,$uri)
let $tests := monitor:get-tests($watch)
let $heading := concat("Distribution for ", string($watch/name))
let $distribution := monitor:response-distribution($tests)
return 

<html>
    <head>
        <title>{$heading}</title>
    </head>
    <body>
        <h1>{$heading}</h1> {monitor:distribution-chart($distribution)} <br/>
        <table border="1">
            <tr>
                <th>I </th>
                <th>Mid</th>
                <th>Count</th>
            </tr> {for $group in $distribution/Group return <tr>
                <td>{string($group/@i)}</td>
                <td>{string($group/@mid)}</td>
                <td>{string($group/@count)}</td>
            </tr> } </table>
    </body>
</html>

Validation

edit

The eXist module provides functions for validating a document against a schema. The Monitor document links to a schema:


let $doc := "/db/Wiki/Monitor3/Monitor.xml"

return 
<report>
 <document>{$doc}</document>
  {validation:validate-report(doc($doc))}
</report>

Execute

Alternatively, a document can be validated against any schema:

let $schema := "http://www.cems.uwe.ac.uk/xmlwiki/Monitor3/trangmonitor.xsd"
let $doc := "/db/Wiki/Monitor3/Monitor.xml"

return 
<report>
 <document>{$doc}</document>
  <schema>{$schema}</schema>
  {validation:validate-report(doc($doc),xs:anyURI($schema))}
</report>

Execute

This is used to check that the randomly generated instance is valid:

let $schema := request:get-parameter("schema",())
let $file := doc(concat("http://www.cems.uwe.ac.uk/xmlwiki/XMLSchema/schema2instance.xq?file=",$schema))
return 
 <result>
   <schema>{$schema}</schema>
   {validation:validate-report($file,xs:anyURI($schema))}
   {$file}
</result>

Execute

Downtime alerts

edit

The purpose of a monitor is to alert those responsible for a site to its failure. Such an alert might be by SMS, email or some other channel. The Watch entity will need to be augmented with configuration parameters.

Check if failed

edit

First it is necessary to calculate whether the site is down. monitor:failing () returns true() if all tests in the past $watch/fail-minutes have not returned a statusCode of 200.

 declare function monitor:failing($watch as element(Watch)) as xs:boolean  {
   let $now := current-dateTime()
   let $lastTestTime :=  $now - $watch/failMinutes * xs:dayTimeDuration("PT1M")
   let $recentTests := $watch/Log/Test[@at > $lastTestTime] 
   return
        every $t in $recentTests satisfies 
              not($t/statusCode = "200")    
   };

Check if alert already sent

edit

If this test is executed repetitively by a scheduled job, an Alert message on the appropriate channel can be generated. However, the Alert message would be sent every time the condition is true. It would be better to send an Alert less frequently. One approach would add Alert elements to the log, interspersed with the Tests. This does not affect the code which accesses Tests, but allows us to inhibit Alerts when one has been recently. alert-sent() will be true if an alert has been sent in the last $watch/alert-minutes.

declare function monitor:alert-sent($watch as element(Watch) as xs:boolean )  {
   let $now := current-dateTime()
   let $lastAlertTime := $now - $watch/alertMinutes * xs:dayTimeDuration("PT1M")
   let $recentAlerts := $watch/Log/Alert[@at > $lastAlertTime] 
   return
       exists($recentAlerts)                                 
};

Alter notification task

edit

The task to check the monitor log iterates through the Watches and for each checks if it is failing but no Alert has been sent in the period. If so, a message is constructed and an Alert element is added to the Log. The use of the Log to record Alert events means that no other state need to be held, and the period with which this task is executes is unrelated to the Alert period.

import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";
 
let $login := xmldb:login("/db/","user","password")
let $base := "/db/Wiki/Monitor3/Monitor.xml"
for $watch in monitor:get-watch-list($base)
return 
   if (monitor:failing($watch) and not(monitor:alert-sent($watch)))
   then 
       let $update := update insert <Alert at="{current-dateTime()}"/> into $watch/Log
       let $alert := monitor:send-alert($watch,$message)
       return true()
   else false()

Discussion

edit

Alert events could be added to a separate AlertLog but it is arguably easier to add a new class of Events than create a separate sequence for each. There may also be cases where the sequential relationship between Tests and Events is useful.


[ Re-designed Schema]

To do

edit
  • add create/edit Watch
  • detect missing tests
  • Support analysis for date ranges by filtering tests by date prior to analysis
  • improve the appearance of the charts