Reading from Atom or RSS feeds

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Reading Data into s-Server > Reading from Other Sources  >

Reading from Atom or RSS feeds

Previous pageReturn to chapter overviewNext page

The Atom/RSS adapter reads an RSS or Atom feed. It tails this feed and parses results into a columned stream, using the xpath syntax. For more information on xpath, see http://www.w3schools.com/xpath/xpath_intro.asp.

As with other adapters, to implement the Atom/RSS adapter, you need to register the adapter, create a server object with connection details for the feed, and create a foreign stream that you can later query.

Registering the Plugin (Foreign Data Wrapper)

Creating a Foreign Data Wrapper

As with other adapters, the foreign data wrapper loads a Java class for the plugin, called class com.sqlstream.plugin.webfeed.WebFeedStreamControlPlugin. You will reference the wrapper when you create the server object below. For more information on foreign data wrappers, please see the topic CREATE FOREIGN DATA WRAPPER in the s-Server Streaming SQL Reference Guide. The following code creates a foreign data wrapper for the WebFeed adapter.

CREATE OR REPLACE FOREIGN DATA WRAPPER "WebFeed"

   LIBRARY 'class com.sqlstream.plugin.webfeed.WebFeedStreamControlPlugin'

   LANGUAGE JAVA

   DESCRIPTION 'Plugin to read/tail web-based data feeds';

 

Creating the Server

As with other adapters, the server object for a WebFeed adapter contains server connection information. For WebFeed, this information includes the base URL for the feed, as well as information on how frequently to fetch the feed. For more information on server objects, please see the topic CREATE FOREIGN DATA WRAPPER in the s-Server Streaming SQL Reference Guide.

Example Code for Creating a Server

CREATE OR REPLACE SERVER "GoogleNews"

   TYPE 'Google NewsFeeds'

   VERSION '0.1'

   FOREIGN DATA WRAPPER "WebFeed"

   OPTIONS (

       URL 'http://news.google.com/',

       MIN_SLEEP_INTERVAL  '60000',

       MAX_SLEEP_INTERVAL '300000',

       SLEEP_MULTIPLICAND    '1.0'

   )

   DESCRIPTION 'follow data from Google news feeds';

Creating the Foreign Stream

The foreign stream is the SQL object where data from the Atom/RSS adapter will be accessed. Once you create a foreign stream for the WebFeed adapter, you can query it as you would any other SQL object.

Since the Atom/RSS adapter does not know the structure of the target feed, you must define the SQLstream FOREIGN STREAM rowtype to match the rowtype of the feed. The foreign stream columns need to be in the same order  as those of the web feed.

Example Code for Creating a Foreign Stream

The following code creates a foreign stream using the server defined above. The list of columns defined for the foreign stream maps to the list of xpath expressions defined under server options.

CREATE OR REPLACE SCHEMA "RSSTest";

DROP SCHEMA "RSSTest" CASCADE;

CREATE SCHEMA "RSSTest" DESCRIPTION 'read RSS data feeds';

SET SCHEMA '"RSSTest"';

CREATE OR REPLACE FOREIGN STREAM "GoogleNewsRSS" (

   "title" VARCHAR(100),

   "link" VARCHAR(1024),

   "guid" VARCHAR(80),

   "category" VARCHAR(20),

   "pub_Date" TIMESTAMP,

   "description" VARCHAR(4096),

   "feed_ver" VARCHAR(5),

   "feed_lang" CHAR(2),

   "feed_pub" TIMESTAMP,

   "feed_title" VARCHAR(20),

   "feed_link" VARCHAR(100)

   )

 

   SERVER "GoogleNews"

   OPTIONS (

       URL 'nwshp?hl=en&output=rss',

       FEED_FORMAT 'RSS',

       DATE_FORMAT 'EEE, dd MMM yyyy HH:mm:ss z',

       -- xpath expressions are evaluated relative to the current "item" element

       -- note that rooted xpath expressions also work, e.g. /rss/channel/... or --channel/...

       UNIQUE_ID 'guid',

       "title_xpath" 'title',

       "link_xpath" 'link',

       "guid_xpath" 'guid',

       "category_xpath" 'category',

       "pub_date_xpath" 'pubDate',

       "description_xpath" 'description',

       "feed_ver_xpath" '/rss/@version',

       "feed_lang_xpath" '--channel/language',

       "feed_pub_xpath" '--channel/pubDate',

       "feed_title_xpath" '--channel/title',

       "feed_link_xpath" '--channel/link'

   )

   DESCRIPTION 'GoogleNews feed in RSS format';

 

 

Defining a Feed with Atom

Defining a foreign stream and server for an Atom feed is similar to those for RSS. Atom feeds specify a default namespace which requires a prefix for use with XPath expressions. The default prefix, "defaultns", is always available. You may also specify a preferred prefix in the DEFAULT_NS_PREFIX option. Use one of these prefixes to qualify each XPath expression element in the default namespace.

CREATE SCHEMA "AtomTest" DESCRIPTION 'test web-based data feeds';

SET SCHEMA '"AtomTest"';

CREATE OR REPLACE FOREIGN STREAM "GoogleNewsAtom" (

   "category" VARCHAR(20),

   "title" VARCHAR(100),

   "issued" TIMESTAMP,

   "modified" TIMESTAMP,

   "content" VARCHAR(4096),

   "link" VARCHAR(1024),

   "id" VARCHAR(80),

   "feed_ver" VARCHAR(5),

   "feed_lang" CHAR(2),

   "feed_mod" TIMESTAMP,

   "feed_title" VARCHAR(20),

   "feed_link" VARCHAR(100)

   )

   SERVER "GoogleNews"

   OPTIONS (

       URL 'nwshp?hl=en&output=atom',

       FEED_FORMAT 'Atom',

       DEFAULT_NS_PREFIX 'atom',

       DATE_FORMAT 'yyyy-MM-dd''T''HH:mm:ss',

       -- xpath expressions are evaluated relative to the current "entry" element

       -- note that rooted xpath expressions also work, e.g. /feed/...

       UNIQUE_ID 'defaultns:id',

       "category_xpath" 'atom:summary',

       "title_xpath" 'atom:title[@type=''text/html'']',

       "issued_xpath" 'atom:issued',

       "modified_xpath" 'atom:modified',

       "content_xpath" 'atom:content[@type=''text/html''][@mode=''escaped'']',

       "link_xpath" 'atom:link/@href',

       "id_xpath" 'atom:id',

       "feed_ver_xpath" '/atom:feed/@version',

       "feed_lang_xpath" '/atom:feed/@xml:lang',

       "feed_mod_xpath" '/atom:feed/atom:modified',

       "feed_title_xpath" '/atom:feed/atom:title',

       "feed_link_xpath" '/atom:feed/atom:link/@href'

   )

   DESCRIPTION 'GoogleNews feed in Atom format';