The university of Tromsø > Giellatekno
 

The Cocoon Sitemap(s) in risten.no

The sitemaps in risten.no have the following duties:

The following sections will cover each part in detail.

Design goal: generality, inheritance and overriding code

Since many dictionaries and term collections will be quite similar in structure, it is a goal to design the sitemap such that general code can be used when suitable, but more specific code when needed. At the same time, the specific code should only contain what is specific, and inherit the rest. The specific code should either override more general functions with the same name, or be specificially written to cover specific use cases.

The sitemap should provide support for these goals. How this is implemented is outlined below.

Delegate incoming requests

Incoming requests are either meant for a specific collection, or a set of collections. When the request is targeted at multiple collections, the request itself is split up in multiple requests, each targeted at a single collection. This is done using an XQuery that returns XInclude elements, one for each wanted collection, in the return document. This document is then processed by Cocoon, in effect creating one request for each collection. That is, all requests are in the end generalised and broken down to single requests for single collections, specifying the collection wanted. Example:

<document xmlns:xi="http://www.w3.org/2001/XInclude">
 <body>
  <section title="HitTitle" i18n:attr="title">
   <xi:include href="cocoon:/search-coll-entries.xq?srchcoll=/db/ordbase/terms/SD-terms"/>
   <xi:include href="cocoon:/search-coll-entries.xq?srchcoll=/db/ordbase/terms/mekanikk-1999"/>
   <xi:include href="cocoon:/search-coll-entries.xq?srchcoll=/db/ordbase/terms/propnouns"/>
   <xi:include href="cocoon:/search-coll-entries.xq?srchcoll=/db/ordbase/dicts/komi-JR"/>
   <p class="srchtime">
    <i18n:translate xmlns:i18n="http://apache.org/cocoon/i18n/2.1">
     <i18n:text>SearchTime</i18n:text>
     <i18n:param type="number" pattern="#.##">0.003</i18n:param>
    </i18n:translate>
   </p>
  </section>
 </body>
</document>

The specified collection is either provided as part of the request in request attributes, stored in session parameters, or provided in a single request attribute that contains the whole eXist collection path — the two last parts of that path contains the type and the ID, respectivey. The example above is of this last type.

When stored in the session or or provided as separate request attributes, Cocoon's built-in matchers provide the necessary means to extract the collection ID and type for further sitemap processing. When served as a single request parameter, we have built our own Cocoon action, RistennoRequestAction.java (in termdb/src/db-app/risten/resources/), which will provide the same functionality, and return the collection type and ID as sitemap parameters for further processing (in requestType and requestCollection, respectively). The source file needs to be copied to src/org/exist/cocoon/ and compiled with eXist to work.

The root sitemap (in $EXIST_HOME/webapp/) contains the following configuration to enable our own action:

<map:actions>
...
   <map:action logger="sitemap.action.xmldb.ristenno"
        name="risten-coll" src="org.exist.cocoon.RistennoRequestAction"/>
</map:actions>

Sub-sitemaps do not need to contain any further configuration.

The next sitemap is the one in the root of the risten.no application (that is, in $EXIST_HOME/webapp/risten). It contains very little, it just forks further processing to either query/ or edit/, depending on the incoming request. For the rest of the description, we will use the edit/ branch as an example (the bug mentioned above makes things a little more complicated on the query/search side, but the principles remain), that is, the sitemap below is found in $EXIST_HOME/webapp/risten/edit/.

The main XQuery processing using session attributes for identifying collection ID and type looks like the following:

<map:match pattern="*.xq">
  <!-- PROTECT: only allow registered users: -->
  <map:act src="xmldb:exist:///db" type="xmldb-login">      <!-- see #1 below -->

    <map:match type="session-attribute" pattern="colltype"> <!-- see #2 below -->
      <map:match type="session-attribute" pattern="collection">

        <map:select type="resource-exists">                 <!-- see #3 below -->
          <map:when test="{../1}/{1}/xquery/{../../../1}.xq">
            <map:mount check-reload="yes"
                       src="{../1}/{1}/sitemap.xmap"
                       uri-prefix=""/>
          </map:when>
          <map:when test="{../1}/xquery/{../../../1}.xq">
            <map:mount check-reload="yes"
                       src="{../1}/sitemap.xmap"
                       uri-prefix=""/>
          </map:when>

          <map:when test="xquery/{../../../1}.xq">
            <map:generate src="xquery/{../../../1}.xq" type="xquery"/>
            <map:transform src="xslt/{../../../1}2html.xsl">
                <map:parameter name="use-request-parameters" value="true"/>
            </map:transform>
            <map:transform type="i18n">
                <map:parameter name="locale" value="{../../../../locale}"/>
            </map:transform>
            <map:transform type="encodeURL"/>
            <map:serialize encoding="UTF-8" type="html"/>
          </map:when>

          <map:otherwise>                                   <!-- see #4 below -->
            <map:generate src="../xquery/file-not-found.xq" type="xquery">
              <map:parameter name="collID" value="{1}" />
              <map:parameter name="colltype" value="{../1}" />
              <map:parameter name="requested-file" value="{../../../1}.xq" />
            </map:generate>
            <map:transform src="../xslt/docu2html.xsl"/>
            <map:transform type="i18n">
                <map:parameter name="locale" value="{../../../../locale}"/>
            </map:transform>
            <map:transform type="encodeURL"/>
            <map:serialize encoding="UTF-8" type="html"/>
          </map:otherwise>

        </map:select>
      </map:match>
    </map:match>

  </map:act> <!-- End of PROTECTION -->
  <!-- no session found: redirect to login form -->
  <map:redirect-to uri="editframe.xml"/>
/map:match>

Some comments to the numbered sections of the sitemap fragment:

  1. Here we check whether a user is logged in—this is done only in the editor, and the corresponding lines are missing in the query/ sitemap.
  2. Here we match against session attributes (which we know exist), and thereby make the value of the session attributes available for further processing in the sitemap
  3. Then we check if a resource (XQuery file) exists at three different locations, most specific first:
    1. in an XQuery folder specific to the requested collection — if found, a sitemap in that collection is mounted, and the rest of the processing is handled there
    2. in an XQuery folder specific to the collection type (terms, dicts or classes) — if found, a sitemap in the collection type directory is mounted, and the rest of the processing is handled there
    3. in a generic XQuery folder — if so, the rest of the processing is done in this sitemap:
      1. the XQuery is called with some parameters
      2. the result of the XQuery is transformed to a common HTML structure, augmented with <i18n> elements around text to be localised
      3. i18n processing: all text elements within <i18n> tags are replaced with localised text, if found, using the locale in the HTTP header or an explicit locale request
      4. Finally all URLs are encoded, and the document is serialised to HTML and sent back to the browser
  4. If no match is found we return an error page (this should really not happen except during development)

XSL naming conventions

As can be seen from the sitemap fragment above, there is a fixed relationship between XQuery filenames and XSL ones: for a given collection, XQuery files are named xquery/NAME.xq, and the corresponding XSL file is named xslt/NAME2html.xsl. The reason for the 2html part is to indicate that the XSL is used to transform to HTML. In the future, one can imagine other output formats — the most likely candidate is LEXC (or Xerox lexicon) format to use the query output as source for compiling analysers etc.

There is always a corresponding XSL file to each XQuery file, although the XSL file can be as simple as just containing an import statement:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:exist="http://exist.sourceforge.net/NS/exist"
  xmlns:i18n="http://apache.org/cocoon/i18n/2.1"
  version="1.0">

  <xsl:import href="../../../xslt/search2html.xsl" />

</xsl:stylesheet>

Because of the way XSL works, imports have lower priority than the importing stylesheet. Thus, when there are identical targets in both the importing and the imported stylesheets, the one in the importing stylesheet will be used. This makes it very easy to implement the inheritance and specificity goals above regarding the XSL part - just import the less-specific XSL document, and override what you need. The rest is taken care of automatically.

XQuery, inheritance and modularisation

As opposed to XSL, there is no import and override functionality built into the XQuery standard. Instead there is a mechanism for defining external modules, or libraries of code. This will inevitably increase the amount of duplicate code, and will require more careful code design to identify reusable patterns.

For now we have a set of library modules in the top XQuery folder risten/xquery/, as well as one module in risten/edit/xquery/saveroutines.xqm. These are imported whenever needed. Even more specific modules could be implemented if needed. Probably the best way to avoid too much duplicate XQuery code, is to strive to keep term and dictionary collections as similarly structured as possible. One could also imagine having collection-specific modules dealing with all collection-specific processing, and importing and using less-specific modules for the more (but still not completely) general case.

This is to say that inheritance can not be implemented for XQueries, but specificity can. To help manage the code, we instead need to employ modular code as much as possible.

CSS handling

The sitemap fragment for identifying and returning the correct CSS document is shown below:

<map:match pattern="*_*_*_*_*.css">
    <map:read mime-type="text/css" src="{3}/{4}/css/{1}_{2}_{3}_{4}_{5}.css"/>
</map:match>
<map:match pattern="*_*_*_*.css">
    <map:read mime-type="text/css" src="{3}/{4}/css/{1}_{2}_{3}_{4}.css"/>
</map:match>
<map:match pattern="*_*_*.css">
    <map:read mime-type="text/css" src="{3}/css/{1}_{2}_{3}.css"/>
</map:match>
<map:match pattern="*_*.css">
    <map:read mime-type="text/css" src="css/{1}_{2}.css"/>
</map:match>

It is just a list of matches, from more specific to less. It converts requests following a certain naming convention to file references, that are in turn just read and sent back to the browser. The references above should be read as follows:

  1. {1} = risten
  2. {2} = edit or query (that is, the editor interface or the regular search interface)
  3. {3} = terms / collection type
  4. {4} = propnouns / collection ID
  5. {5} = specific page in a collection, if needed

Just as XSL, CSS also includes an import mechanism that gives imported style declarations lower priority than declarations with identical matches in the importing stylesheet. Thus, by always importing the next less specific stylesheet above in the folder hierarchy, one will only need to override CSS specifications that needs to be changed for a given case. The import statement looks like this:

@import url(risten_edit_terms.css);

and should be placed in the beginning of a CSS document.

Since CSS stylesheets are read independently from the XQuery and XSL files, there is no requirement that each XQuery has a corresponding CSS stylesheet. The most specific stylesheet needed should be referenced in the <header> element of the returned document body, as shown below. This reference will be transformed to the corresponding HTML header element before returned to the browser.

<document>
    <header>
      <style href="risten_edit_terms.css"/>
      <title>Hits!</title>
    </header>
    <body>
      ...
    </body>
</document>

The above sitemap fragment and the CSS @import statement implement our goals of inheritance, specificity and overriding code as outlined at the top of this page.