Skip to main content

Slashdot News

Querying XML from the Web

The following is a small example of how you can use MonetDB/XQuery. The script reads the RSS-feed at Slashdot.org and formats it as a list and displays the title as a link to the source.

Explanation: the XQuery statement below reads the RSS-feed from file slashdot.org and displays all the items it finds.

slashdot.xq:

<html>
 <body>
  <h1>Slashdot.org News</h1>
  <ul>
  {
    for $t in doc("http://slashdot.org/rss/index.rss")//*:item
    return 
    element li {
      element a {
        attribute href { $t/*:link/text() },
        text { $t/*:title }
      }
    }
  }
  </ul>
 </body>
</html>
  

URL Caching in MonetDB/XQuery

The XML document cache of MonetDB/XQuery stores documents that were previously accessed by URI with fn:doc(). If a document is cached, query performance will be better, because the XML document will not need to be parsed anymore, as it is already be stored and indexed by MonetDB/XQuery.

However, MonetDB/XQuery does not automatically cache XML documents with HTTP URLs because, unlike for file URLs, it cannot guarantee that a cached HTTP URL remains up-to-date.

It may happen that document freshness is not a problem for your application. In that case, you can configure the XML cache of MonetDB/XQuery to cache HTTP documents anyway. You can even instruct it to cache different URIs, identified by a string prefix, for different amounts of time (some kinds of URIs you may not want to cache at all, some only for a short time, and others forever).

The documentation provides more information on how the XML cache works and can be configured.

How to add documents persistently to the database

If want to add an XML document in its current state on the web to the document, are are not interested in querying the latest version (vs you want to prevent the URL becoming stale in the future), you can also add the document persistently to the database.

Read our quick guide to Document Management to find out how to do this.