XQuery/UK shipping forecast

MotivationEdit

The UK shipping forecast is prepared by the UK met office 4 times a day and published on the radio, the Met Office web site and the BBC web site. However it is not available in a computer readable form.

Tim Duckett recently blogged about creating a Twitter stream. He uses Ruby to parse the text forecast. The textual form of the forecast is included on both the Met Office and BBC sites. However as Tim points out, the format is designed for speech, compresses similar areas to reduce the time slot and is hard to parse. The approach taken here is to scrape a JavaScript file containing the raw area forecast data.


ImplementationEdit

DependanciesEdit

eXist-db ModulesEdit

The following scripts use these eXist modules:

  • request - to get HTTP request parameters
  • httpclient - to GET and POST
  • scheduler - to schedule scrapping tasks
  • dateTime - to format dateTimes
  • util - base64 conversions
  • xmldb - for database access

OtherEdit

  • UK Met office web site

Met Office pageEdit

The Met office page shows an area-by-area forecast but this part of the page is generated by JavaScript from data in a generated JavaScript file. In this file, the data is assigned to multiple arrays. A typical section looks like

// Bailey
gale_in_force[28] = "0";
gale[28] = "0";
galeIssueTime[28] = "";
shipIssueTime[28] = "1725 Sun 06 Jul";
wind[28] = "Northeast 5 to 7.";
weather[28] = "Showers.";
visibility[28] = "Moderate or good.";
seastate[28] = "Moderate or rough.";
area[28] = "Bailey";
area_presentation[28] = "Bailey";
key[28] = "Bailey";
 
// Faeroes
...

Area ForecastEdit

JavaScript conversionEdit

This function fetches the current JavaScript data using the eXist httpclient module, converts the base64 data to a string, picks out the required area data and parses the code to generate an XML structure using the JavaScript array names.

declare namespace httpclient = "http://exist-db.org/xquery/httpclient";
 
declare function met:get-forecast($area as xs:string) as element(forecast)? {
 let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js"
(: fetch the javascript source  and locate the text of the body of the response :)
  let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text()
(: this is base64 encoded , so decode it back to text :)
  let $js :=  util:binary-to-string($base64)
(: isolate the section for the required area, prefixed with a comment
  let $areajs :=  normalize-space(substring-before( substring-after($js,concat("// ",$area)),"//"))
 return 
   if($areajs ="")  (: area not found :)
   then ()
   else 
(: build an XML element containing elements for each of the data items, using the array names as the element names :)
 
<forecast>
{
for $d in tokenize($areajs,";")[position() < last()] (: JavaScript statements terminated by ";" - ignore the last empty :)
   let $ds := tokenize(normalize-space($d)," *= *") (: separate the LHS and RHS of the assignment statement :)
   return 
     element {replace(substring-before($ds[1],"["),"_","")}(: element name is  the array name, converted to a legal name :)
             {replace($ds[2],'"','')}  (: element text is the RHS minus quotes :)
}
</forecast>
};

For example, the output for one selected area is:

<forecast>
    <galeinforce>0</galeinforce>
    <gale>0</gale>
    <galeIssueTime/>
    <shipIssueTime>0505 Mon 07 Jul</shipIssueTime>
    <wind>Northwest backing west 5 to 7.</wind>
    <weather>Squally showers.</weather>
    <visibility>Moderate or good.</visibility>
    <seastate>Moderate or rough.</seastate>
    <area>Fastnet</area>
    <areapresentation>Fastnet</areapresentation>
    <key>Fastnet</key>
</forecast>

Format the forecast as textEdit

The forecast data needs to be formatted into a string:

declare function met:forecast-as-text($forecast as element(forecast)) as xs:string {
      concat( $forecast/weather,
              " Wind ",  $forecast/wind,
              " Visibility ", $forecast/visibility, 
              " Sea ", $forecast/seastate
            )
};

Area ForecastEdit

Finally these functions can be used in a script which accepts a shipping area name and returns an XML message:

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
let $area := request:get-parameter("area","Lundy")
let $forecast := met:get-forecast($area)
return
   <message area="{$area}"  dateTime="{$forecast/shipIssueTime}">
       {met:forecast-as-text($forecast)} 
   </message>

Message abbreviationEdit

To create a message suitable for texting (160 characters), or tweeting (140 character limit), the message can compressed by abbreviating common words.


Abbreviation dictionaryEdit

A dictionary of words and abbreviations is created and stored locally. The dictionary has been developed using some of the abbreviations in Tim Duckett's Ruby implementation.

<dictionary>
 <entry  full="west" abbrev="W"/>
 <entry  full="westerly" abbrev="Wly"/>
..
 <entry  full="variable" abbrev="vbl"/>
 <entry  full="visibility" abbrev="viz"/>
 <entry  full="occasionally" abbrev="occ"/>
 <entry  full="showers" abbrev="shwrs"/>
 
</dictionary>

The full dictionary

Abbreviation functionEdit

The abbreviation function breaks down the text into words, replaces words with abbreviations and builds the text up again:

declare function met:abbreviate($forecast as xs:string) as xs:string {
   string-join(
(: lowercase the string, append a space (to ensure a final . is matched) and tokenise :)
      for $word in tokenize(concat(lower-case($forecast)," "),"\.? +")
      return
(: if there is an entry for the word , use its abbreviation, otherwise use the unabbreviated word :)
        ( /dictionary/entry[@full=$word]/@abbrev,$word) [1]
        ,
      " ") (: join the words back up with space separator :)  
};

Abbreviated MessageEdit

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $area := request:get-parameter("area","Lundy")
let $forecast := met:get-forecast($area)
return
   <message area="{$area}"  dateTime="{$forecast/shipIssueTime}">
       {met:abbreviate(met:forecast-as-text($forecast))} 
   </message>

All Areas forecastEdit

This function is an extension of the area forecast. The parse uses the comment separator to break up the script, ignores the first and last sections and the area name in the comment

declare function met:get-forecast() as element(forecast)* {
  let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js"
  let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text()
  let $js :=  util:binary-to-string($base64)
  for $js in tokenize($js,"// ")[position() > 1] [position()< last()]
  let $areajs := concat("gale",substring-after($js,"gale"))
  return      
<forecast>
{
for $d in tokenize($areajs,";")[position() < last()]
   let $ds := tokenize(normalize-space($d)," *= *")
   return 
     element {replace(substring-before($ds[1],"["),"_","")}
                     {replace($ds[2],'"','')}
}
</forecast>
};


XML version of forecastEdit

This script returns the full Shipping forecast in XML:

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
<ShippingForecast>
    {met:get-forecast()}
</ShippingForecast>

RSS version of forecastEdit

XSLT would be suitable for transforming this XML to RSS format ...

SMS serviceEdit

One possible use of this data would be to provide an SMS on-request service, taking an area name and returning the abbreviated forecast. The complete set of forecasts are created, and the one for the area supplied as the message selected and returned as an abbreviated message.

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
let $area := lower-case(request:get-parameter("text",()))
let $forecast := met:get-forecast()[lower-case(area) = $area]
return
   if (exists($forecast))
   then 
      concat("Reply: ", met:abbreviate(met:forecast-as-text($forecast)))
    else 
      concat("Reply: Area ",$area," not recognised")

The calling protocol is determined here by the SMS service installed at UWE and described here

CachingEdit

Fetching the JavaScript on demand is neither efficient nor acceptable net behaviour, and since the forecast times are known, it is preferable to fetch the data on a schedule, convert to the XML form and save in the eXist database and then use the cached XML for later requests.

Store XML forecastEdit

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare variable $col := "/db/Wiki/Met/Forecast";
 
if (xmldb:login($col, "user", "password"))  (: a user who has write access to the Forecast collection :)
then 
       let $forecast := met:get-forecast()
       let $forecastDateTime := met:timestamp-to-xs-date(($forecast/shipIssueTime)[1])  (: convert to xs:dateTime :)
       let $store :=  xmldb:store( 
              $col,                    (: collection to store forecast in :)
              "shippingForecast.xml",  (: file name - overwrite is OK here as we only want the latest :)
                                       (: then the constructed XML to be stored :)
              <ShippingForecast  at="{$forecastDateTime}" >
                   {$forecast}
              </ShippingForecast>
              ) 
       return
           <result>
               Shipping forecast for {string($forecastDateTime)} stored in  {$store}
          </result>
else ()

The timestamp used on the source data is converting to an xs:dateTime for ease of later processing.

declare function met:timestamp-to-xs-date($dt as xs:string) as xs:dateTime {
(: convert timestamps in the form 0505 Tue 08 Jul to xs:dateTime :)
   let $year := year-from-date(current-date())  (: assume the current year since none provided :)
   let $dtp := tokenize($dt," ")
   let $mon := index-of(("Jan","Feb", "Mar","Apr","May", "Jun","Jul","Aug","Sep","Oct","Nov","Dec"),$dtp[4])
   let $monno := if($mon < 10) then concat("0",$mon) else $mon
   return xs:dateTime(concat($year,"-",$monno,"-",$dtp[3],"T",substring($dtp[1],1,2),":",substring($dtp[1],3,4),":00"))
};

Reducing the forecast dataEdit

The raw data contains redundant elements (several versions of the area name) and elements which are normally empty (all gale related elements when no gale warning) but lacks a case-normalised area name as a key. The following function performs this restructuring:

declare function met:reduce($forecast as element(forecast)) as element(forecast) {
            <forecast>  
                          { attribute area {lower-case($forecast/area)}}
                          { $forecast/*
                                    [not(name(.) = ("shipIssueTime","area","key"))] 
                                    [ if (../galeinforce = "0" ) 
                                      then not(name(.) = ("galeinforce","gale","galeIssueTime")) 
                                      else true()
                                    ]
                            }
             </forecast>
};

There would be a case to make for using XSLT for this transformation. The caching script applies this transformation to the forecast before saving.

SMS via cacheEdit

The revised SMS script can now access the cache. First a function to get the stored forecast:

declare function met:get-stored-forecast($area as xs:string) as element(forecast) {
  doc("/db/Wiki/Met/Forecast/shippingForecast.xml")/ShippingForecast/forecast[@area = $area]
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
let $area := lower-case(normalise-space(request:get-parameter("text",())))
let $forecast := met:get-stored-forecast($area)
return
   if (exists($forecast))
   then 
      concat("Reply: ", datetime:format-dateTime($forecast/../@at,"HH:mm")," ",met:abbreviate(met:forecast-as-text($forecast)))
    else 
      concat("Reply: Area ",$area," not recognised")

In this script, the selected forecast for the input area extracted by the met function call is a reference to the database element, not a copy. Thus it is still possible to navigate back to the parent element containing the timestamp.

The eXist datetime functions are wrappers for the Java class java.text.SimpleDateFormat which defines the date formatting syntax.

Job schedulingEdit

eXist includes a scheduler module which is a wrapper for the Quartz scheduler. Jobs can only be created by a DBA user.

For example, to set a job to fetch the shipping forecast on the hour,

let $login := xmldb:login( "/db", "admin", "admin password" ) 
let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq" , "0 0 * * * ?")
return $job

where "0 0 * * * ?" means to run at 0 seconds, 0 minutes past every hour of every day of every month, ignoring the day of the week.

To check on the set of scheduled jobs, including system schedule jobs:

let $login := xmldb:login( "/db", "admin", "admin password" ) 
return scheduler:get-scheduled-jobs()

It would be better to schedule jobs on the basis of the update schedule for the forecast. These times are 0015, 0505, 1130 and 1725. These times cannot be fitted into a single cron pattern so multiple jobs are required. Because jobs are identified by their path, the same url cannot be used for all instances, so a dummy parameter is added.

Discussion The times are one minute later than the published times. This may not be enough slack to account for discrepancies in timing on both sides. Clearly a push from the UK Met Office would be better than the pull scraping. The scheduler clock runs in local time (BST) as are the publication times.

let $login := xmldb:login( "/db", "admin", "admin password" ) 
let $job1 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=1" , "0 16 0 * * ?")
let $job2 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=2" , "0 6 5 * * ?")
let $job3 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=3" , "0 31 11 * * ?")
let $job4 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=4" , "0 26 17 * * ?")
return ($job1, $job2, $job3, $job4)

Forecast as kmlEdit

Sea area coordinatesEdit

The UK Met Office provides a clickable map of forecasts but a KML map would be nice. The coordinates of the sea areas can be captured and manually converted to XML.

<?xml version="1.0" encoding="UTF-8"?>
<boundaries>
    <boundary area="viking">
        <point latitude="61" longitude="0"/>
        <point latitude="61" longitude="4"/>
        <point latitude="58.5" longitude="4"/>
        <point latitude="58.5" longitude="0"/>
      </boundary>
...

The boundary for an area is accessed by two functions. In this idiom one function hides the document location and returns the root of the document. Subsequence functions use this base function to get the docuement and then apply further predicates to filter as required.

declare function met:area-boundaries() as element(boundaries) {
  doc("/db/Wiki/Met/shippingareas.xml")/boundaries
};
 
declare function met:area-boundary($area as xs:string) as element(boundary) {
   met:area-boundaries()/boundary[@area=$area]
};

The centre of an area can be roughly computed by averaging the latitudes and longitudes:

declare function met:area-centre($boundary as element(boundary)) as element(point) {
   <point 
      latitude="{round(sum($boundary/point/@latitude) div count($boundary/point) * 100) div 100}"
      longitude="{round(sum($boundary/point/@longitude) div count($boundary/point) * 100) div 100}"
   />
};

kml PlacemarkEdit

We can generate a kml PlaceMark from a forecast:

declare function met:forecast-to-kml($forecast as element(forecast)) as element(Placemark) {
   let $area := $forecast/@area
   let $boundary := met:area-boundary($area)
   let $centre := met:area-centre($boundary)
 
   return 
     <Placemark >
        <name>{string($forecast/areapresentation)}</name>
         <description>
           {met:forecast-as-text($forecast)}
         </description>
         <Point>
             <coordinates>
                 {string-join(($centre/@longitude,$centre/@latitude),",")}
             </coordinates>
         </Point>
    </Placemark>
};

kml area areaEdit

Since we have the area coordinates, we can also generate the boundaries as a line in kml.

declare function met:sea-area-to-kml(
    $area as xs:string, 
    $showname as xs:boolean
    ) as element(Placemark)
 {
   let $boundary := met:area-boundary($area)
   return 
     <Placemark >
        {if($showname) then <name>{$area}</name> else()}
        <LineString>
            <coordinates>
            {string-join(
               for $point in $boundary/point
               return
                   string-join(($point/@longitude,$point/@latitude,"0"),",")
                , " "
                )
             }
            </coordinates>
         </LineString>
      </Placemark>
  };

Generate the kml fileEdit

import module  namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
(: set the media type for a kml file :)
declare option exist:serialize  "method=xml indent=yes 
     media-type=application/vnd.google-earth.kml+xml"; 
 
(: set the file name ans extension when saved to allow GoogleEarth to be invoked :)
let $dummy := response:set-header('Content-Disposition','inline;filename=shipping.kml;')
 
(: get the latest forecast :)
let $shippingForecast := met:get-stored-forecast()
 
return
<kml >
   <Folder>
       <name>{datetime:format-dateTime($shippingForecast/@at,"EEEE HH:mm")} UK Met Office Shipping forecast</name>
       {for $forecast in $shippingForecast/forecast
        return 
         (met:forecast-to-kml($forecast),
          met:sea-area-to-kml($forecast/@area,false())
         )
       }
   </Folder>
</kml>


Push messagesEdit

An alternative use of this data is to provide a channel to push the forecasts through as soon as they are received. The channel could be a SMS alert to subscribers or a dedicated Twitter stream which users could follow.

Subscription SMSEdit

This service should allow a user to request an alert for a specific area or areas. The application requires:

  • a data structure to record subscribers and their areas
  • a web service to register a user, their mobile phone number and initial area [to do]
  • an SMS service to change the required area and turn messaging on or off
  • a scheduled task to push the SMS messages when the new forecast has been obtained

Document StructureEdit

<subscriptions>
  <subscription>
     <username>Fred Bloggs</username>
     <password>hafjahfjafa</password>
     <mobilenumber>447777777</mobilenumber>
     <area>lundy</area>
     <status>off</status>
  </subscription>
  ...
</subscriptions>


XML SchemaEdit

(to be completed)

Access controlEdit

Access to this document needs to be controlled.

The first level of access control is to place the file in a collection which is not accessible via the web. In the UWE server, the root (via mod-rewrite) is the collection /db/Wiki so resources in this directory and subdirectories are accessible, subject to the access settings on the file, but files in parent or sibling directories are not. So this document is stored in the directory /db/Wiki2. The URL of this file, relative to the external root is http://www.cems.uwe.ac.uk/xmlwiki/../Wiki2/shippingsubscriptions.xml but access fails.

The second level of control is to set the owner and permissions on the file. This is needed because a user on a client behind the firewall, using the internal server address, will gain access to this file. By default, world permissions are set to read and update. Removing this access requires the script to login to read as group or owner.

Ownership and permissions can be set either via the web client or by functions in the eXist xmldb module.

SMS pushEdit

This function takes a subscription, formulates a text message and calls a general sms:send function to send. This interfaces with our SMS service provider.

declare function met:push-sms($subscription as element(subscription))  as element(result) {
  let $area := $subscription/area
  let $forecast := met:get-stored-forecast($area)
  let $time := datetime:format-dateTime($forecast/../@at,"EE HH:mm")
  let $text := encode-for-uri(concat($area, " ",$time," ",met:abbreviate(met:forecast-as-text($forecast))))
  let $number := $subscription/mobilenumber
  let $sent := sms:send($number,$text)
  return 
       <result number="{$number}" area="{$area}" sent="{$sent}"/>
};

SMS push subscriptionsEdit

First we need to get the active subscriptions. The functions follow the same idiom used for boundaries:

declare function met:subscriptions() {
    doc("/db/Wiki2/shippingsubscriptions.xml")/subscriptions
};
 
declare function met:active-subscriptions() as element(subscription) *  {
    met:subscriptions()/subscription[status="on"]
};


and then to iterate through the active subscriptions and report the result:

declare function met:push-subscriptions() as element(results) {
<results>
   { 
     let $dummy := xmldb:login("/db","webuser","password")
     for  $subscription in  met:active-subscriptions()
     return     
        met:push-sms($subscription) 
   }
</results>
};

This script iterates through the subscriptions currently active and calls the push-SMS function for each one.

import module  namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
met:push-subscriptions()

This task could be scheduled to run after the caching task has run or the caching script modified to invoke the subscription task when it has completed. However eXist also supports triggers so the task could also be triggered by the database event raised when the forecast file store has been completed.

Subscription editing by SMSEdit

A message format is required to edit the status of the subscription and to change the subscription area:

 metsub [ on |off |<area> ]

If the area is changed the status is set to on.

The area is validated against a list of area codes. These are extracted from the boundary data:

declare function met:area-names() as xs:string* {
   met:area-boundaries()/boundary/string(@area)
};


import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $login:= xmldb:login("/db","user","password")
let $text := normalize-space(request:get-parameter("text",()))
let $number := request:get-parameter("from",())
let $subscription := met:get-subscription($number)
 
return
   if (exists($subscription))
   then       
        let $update :=
           if ( $text= "on") 
           then update replace $subscription/status with <status>on</status>
           else if( $text = "off") 
           then update replace $subscription/status with <status>off</status>
           else if ( lower-case($text) = met:area-names())
           then ( update replace $subscription/area with <area>{$text}</area>,
                  update replace $subscription/status with <status>on</status>
                )
           else ()
       return
         let $subscription := met:get-subscription($number)(: get the subscription post update :)
         return 
             concat("Reply: forecast is ",$subscription/status," for area ",$subscription/area)
 else ()


TwitterEdit

Twitter has a simple REST API to update the status. We can use this to tweet the forecasts to a Twitter account. Twitter uses Basic Access Authentication and a suitable XQuery function to send a message to a username/password, using the eXist httpclient module is :

declare function met:send-tweet ($username as xs:string,$password as xs:string,$tweet as xs:string )  as xs:boolean {
   let $uri :=  xs:anyURI("http://twitter.com/statuses/update.xml")
   let $content :=concat("status=", encode-for-uri($tweet))
   let $headers := 
      <headers>
          <header name="Authorization" 
                  value="Basic {util:string-to-binary(concat($username,":",$password))}"/>
         <header name="Content-Type"
                  value="application/x-www-form-urlencoded"/>
     </headers>
   let $response :=   httpclient:post( $uri, $content, false(), $headers ) 
   return
        $response/@statusCode='200'
 };

A script is needed to access the stored forecast and tweet the forecast for an area. Different twitter accounts could be set up for each shipping area. The script will need to be scheduled to run after the the full forecast has been acquired.

In this example, the forecast for given are is tweeted to a hard-coded twitterer:

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
 
declare variable $username := "kitwallace";
declare variable $password := "mypassword";
declare variable $area := request:get-parameter("area","lundy");
 
let $forecast := met:get-stored-forecast($area)
let $time := datetime:format-dateTime($forecast/../@at,"HH:mm")
let $message := concat($area," at ",$time,":",met:abbreviate(met:forecast-as-text($forecast)))
return 
    <result>{met:send-tweet($username,$password,$message)}</result>

Chris Wallace's Twitter

To doEdit

Creating and editing subscriptionsEdit

This task is ideal for XForms.

TriggersEdit

Use a trigger to push the SMS messages when update has been done.

Last modified on 9 January 2010, at 19:04