XML - Managing Data Exchange/RSS

Previous Chapter Next Chapter
RDF - Resource Description Framework JDNC

Learning Objectives edit

Upon completion of this chapter, you will

  • Understand the basics of RSS
  • Understand the history of RSS
  • Be able to construct a RSS 2.0 document using XML
  • Subscribe to an RSS aggregator/reader

Introduction edit

RSS is a simple XML format used to syndicate headlines. It is now popularly used by websites that publish new content regularly and provide a list of headlines with links to their latest content. Content such as news feeds, events listings, project updates, blogger and most recently podcasting, video and image distribution can all be distributed by RSS. RSS feeds are also used by major Internet portals such as Google, Yahoo and AOL for people to personalize and have information that they care about delivered to them, i.e. MyYahoo.

What does RSS mean? edit

RSS is considered a name variously used to refer to three different standards. The three separate branches are the RSS 0.9 branch, the RSS 1.0 branch (which is based on RDF) and RSS 2.0, and the initials have been expanded into three different names: "Really Simple Syndication" (RSS 0.9, 2.0), "Rich Site Summary" and "RDF Site Summary" (for RSS 1.0).

Several different versions have been developed by different developers under different names. According to XML.com, seven versions of RSS have been developed (see What is RSS?). Because RSS is understood as a term referring to many types of syndication protocols, these various RSS protocols have sometimes been accused of being "incompatible" with each other (see The myth of RSS compatibility). This is an important issue for RSS reader/aggregator developers.

History edit

The original version (version 0.90) of RSS was released by Netscape in 1999. Netscape developers were designing a format for making portals of headlines for news sites. After Netscape released the simplified version of RSS, they lost interest in developing RSS. However, another company, UserLand Software took over with intention to use RSS with their web-logging products and web-based writing software. While UserLand Software continued development with version 0.91, a third non-commercial group split off from the company and designed a new format based on version 0.90, which was a non-simplified version. The new format developed by this non-commercial group became known as version 1.0. In the meantime UserLand Software grew angered at the new 1.0 version, kept developing RSS and released version 2.0. Version 2.0 has become the leader and most widely adopted version of RSS. The 2.0 specification was donated to a non-commercial third party, Harvard Law School. Harvard Law is now responsible for the future development of the RSS 2.0 specification. Below is a table that describes each version, the owner, pros and cons, as well as its current status and recommendation for use.

RSS versions and recommendations, Table source: What is RSS? XML.com
Version Owner Pros Status Recommendation
0.90 Netscape   Obsoleted by 1.0 Don't use
0.91 UserLand Drop dead simple Officially obsoleted by 2.0, but still quite popular Use for basic syndication. Easy migration path to 2.0 if you need more flexibility
0.92, 0.93, 0.94 UserLand Allows richer metadata than 0.91 Obsoleted by 2.0 Use 2.0 instead
1.0 RSS-DEV Working Group RDF-based, extensibility via modules, not controlled by a single vendor Stable core, active module development Use for RDF-based applications or if you need advanced RDF-specific modules
2.0 UserLand Extensibility via modules, easy migration path from 0.9x branch Stable core, active module development Use for general-purpose, metadata-rich syndication

RSS structure edit

A RSS document is often known as RSS feed and can have three different types of file extensions: .RSS, .XML and .RDF. All RSS documents must conform 100% to the XML specification begin with the XML declaration. To identify a RSS document, the top level starts with a <rss> element, followed by a mandatory version attribute that specifies the RSS version. Sub-element to the <rss> element, is the single <channel> element which contains a brief description of the channel. Below is a sample of RSS(2.0) from the New York Times.

Exhibit 1: Data model for RSS

<rss version="2.0">
        <title>NYT > Home Page</title>
        <link> <nowiki>http    //www.nytimes.com/index.html</nowiki> </link>
        <description>New York Times > Breaking News, World News Multimedia</description>
        <copyright>Copyright 2004 The New York Times Company</copyright>
        <lastBuildDate>Sun,  7 Nov 2004 13    30    01 EST</lastBuildDate>
            <url> <nowiki>http    //www.nytimes.com/images/section/NytSectionHeader.gif</nowiki> </url>
            <title>NYT > Home Page</title>
            <link> <nowiki>http    //www.nytimes.com/index.html</nowiki> </link>
            <title>Iraq Declares State of Emergency as Insurgents Step Up Attacks</title>
            <link> <nowiki>http    //www.nytimes.com/2004/11/07/international/middleeast/07cnd-iraq.html</nowiki> </link>
            <description> Today's attacks, including three police post raids that killed 21, came a day after insurgents killed at least 30. </description>
            <author> By EDWARD WONG </author>
            <pubDate> Sun, 07 Nov 2004 00    00    00 EDT </pubDate>
            <guid> <nowiki>http    //www.nytimes.com/2004/11/07/international/middleeast/07cnd-iraq.html</nowiki> </guid>

Figure 1-1: New York Times - HomePage.xml - RSS version 2

The <channel> element has three mandatory elements and several optional elements.
Mandatory <channel> elements:

Element Description Example
<title> Name of the channel "The New York Times"
<description> Brief description of the channel New York Times > Breaking News, World News Multimedia
<link> URL to the channel associated website http://www.nytimes.com/index.html

Optional <channel> elements:

Element Description Example
<language> Channel language en-us
<copyright> Copyright notice for content in the channel Copyright 2004 The New York Times Company
<lastBuildDate> The last time the content of the channel was updated/changed Sun, 7 Nov 2004 13:30:01 EST

Other optional elements include: managingEditor, webMaster, pubDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, skipDates. The requirement or sub-elements of each element please refer to the RSS specification.(see at Harvard Law). Below are example of image element.

<image> elements:

Element Description Example
<link> The URL to the item http://www.nytimes.com/index.html
<title> Picture title NYT > Home Page
<url> The URL to the picture http://www.nytimes.com/images/section/NytSectionHeader.gif

A channel may contain a number of <item>s. An item may represent a "story" - much like a story in a newspaper or magazine; if so, its description is a synopsis of the story. The link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed; see examples), and the link and title may be omitted.
Each RSS channel can contain up to 15 items. All elements of an item are optional,however, an <item> element must contain at least one <title> or <description> element.

<item> elements:

Element Description Example
<title> Title of the item Iraq Declares State of Emergency as Insurgents Step Up Attacks
<link> The URL to the item http://www.nytimes.com/2004/11/07/international/middleeast/07cnd-iraq.html
<description> Brief description of the item Today's attacks, including three police post raids that killed 21, came a day after insurgents killed at least 30.
<author> Author's name and/or author's email address mail@nytimes.com (Edward Wong)
<pubDate> Date/time the item was published Sun, 07 Nov 2004 00:00:00 EDT
<guid> Is a string that uniquely identifies the item. Can be used by the aggregator to determine if an item is new. http://www.nytimes.com/2004/11/07/international/middleeast/07cnd-iraq.html

Others include:
source, enclosure, category, and comments.(see at Harvard Law).

An item can either be a child or a sibling of a channel.





More optional elements visit RSS 2.0 Specification

How does it work? edit

RSS can be divided into two parts; the reader/ag and the feed. The reader is the program that reads and presents the RSS feed in an understandable format. The feed is the website with its RSS file. RSS feeds are typically identified on webpages with an orange rectangle icon, or an orange icon with the letters RSS written on it. To view the XML code, you simply have to click on the icon.

Creating an RSS feed edit

A website author can establish a RSS feed for itself in different ways; either by doing it manually, by using software or by online services. Most large websites use content management software to produce their RSS feed. Every time a change is made on their website, the content management software produce a RSS file of the changes with the new items added and old items removed.

Subscribing to an RSS feed edit

As a RSS subscriber you need a RSS aggregator. By feeding a RSS link, the aggregator will search for information you subscribed and display them. Say that you subscribe on the sport section in the New York Times; each time the NY Times publish a new sport article the article’s headlines, description and the URL will be displayed on your computer. Whenever you are online, the aggregator will search out and sort your list of interests and display them.

RSS Aggregators edit

RSS aggregator (aka RSS Reader) is an application that is used to collect, update and display RSS feeds. Below is a list of some RSS aggregators for different platforms that the aggregator will work properly on.

Some others include:

Future of RSS edit

The future of RSS seems very promising as version 2.0 has become extremely popular with the Internet industry and somewhat the standard of the RSS versions. Yahoo recently released its new version of Yahoo Maps and the API is based on georRSS version 2.0. This version of Yahoo Maps allows users to edit the information on the maps, which makes the Maps and Local Search products more effective. RSS version 2.0 is also very popular with distributing podcasts to the subscriber base along with distributing content Google’s blogger product. Furthermore, RSS is being utilized in an innovative way for search engine marketers to submit time sensitive content to the engines. The Mozilla Firefox browser already contains an internal RSS aggregator that allows users to view RSS news and blog headlines in the bookmark toolbar or bookmark menu. This is accomplished through the Mozilla Firefox feature named “Live Bookmarks”. RSS has quickly become a mainstream technology in a relatively short period and has definitely become a major player in the Internet space.

Use RSS on Firefox edit

Summary edit

Now, RSS is commonly used in areas such as, websites and blogs, with version 2.0 being the most popular standard. RSS feeds are typically identified on webpages with an orange rectangle icon, or an orange icon with the letters RSS written on it. To view the XML code, you simply have to click on the icon.

Exercises edit

Answers edit

References edit

Technology at Harvard Law - Internet technology hosted by Berkman Center - RSS 2.0 Specification
Dive-into-XML by Mark Pilgrim - What is RSS?
Mozilla Firefox - Live Bookmarks
Apple - PodCasting
USA Today - USA Today