source: main/trunk/greenstone3/docs/manual/manual.tex@ 26102

Last change on this file since 26102 was 26102, checked in by kjdon, 12 years ago

updated the gsf:metadata options in the formatting section

  • Property svn:keywords set to Author Date Id Revision
File size: 168.5 KB
RevLine 
[6312]1\documentclass[a4paper,11pt]{article}
[25804]2
3\usepackage{times}
4\usepackage{graphicx}
5
[6312]6\hyphenation{Message-Router Text-Query}
7
8\newenvironment{gsc}% Greenstone text bits
9{\begin{footnotesize}\begin{tt}}%
10{\end{tt}\end{footnotesize}}
11
12\newcommand{\gst}[1]{{\footnotesize \tt #1}}
13
[13920]14\newcommand{\gsii}{Greenstone2}
15\newcommand{\gsiii}{Greenstone3}
[6908]16\newcommand{\gs}{Greenstone}
17
[6312]18\begin{document}
19
[6908]20\title{\gsiii\ : A modular digital library.}
[6312]21
22% if you work on this manual, add your name here
[13920]23\author{Katherine Don \\[1ex]
[6312]24 Department of Computer Science \\
[10880]25 University of Waikato \\ Hamilton, New Zealand \\ }
[6312]26
27\date{}
28
29\maketitle
30
31\newenvironment{bulletedlist}%
32{\begin{list}{$\bullet$}{\setlength{\itemsep}{0pt}\setlength{\parsep}{0pt}}}%
33{\end{list}}
34
35\noindent
36Greenstone Digital Library Version 3 is a complete redesign and
[6908]37reimplementation of the \gs\ digital library software. The current
38version (\gsii) enjoys considerable success and is being widely used.
[10880]39\gsiii \ will capitalize on this success, and in addition it will
[6312]40\begin{bulletedlist}
41\item improve flexibility, modularity, and extensibility
[6908]42\item lower the bar for ``getting into'' the \gs\ code with a view to
[6312]43 understanding and extending it
44\item use XML where possible internally to improve the amount of
45 self-documentation
46\item make full use of existing XML-related standards and software
[10880]47\item provide improved internationalization, particularly in terms of sort order,
[6312]48 information browsing, etc.
49\item include new features that facilitate additional ``content management''
50 operations
51\item operate on a scale ranging from personal desktop to corporate library
52\item easily permit the incorporation of text mining operations
53\item use Java, to encourage multilinguality, X-compatibility, and to permit
54 easier inclusion of existing Java code (such as for text mining).
55\end{bulletedlist}
[6908]56Parts of \gs\ will remain in other languages (e.g. MG, MGPP); JNI (Java
[6312]57Native Interface) will be used to communicate with these.
58
[10775]59A description of the general design and architecture of \gsiii\ is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the docs/manual directory).
[6312]60
[13893]61This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\ installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customizations to the interface. The remaining sections are aimed towards the \gs\ developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats. Section~\ref{sec:distributed} describes how to make \gs\ run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\ from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\ and \gsiii\ format statements.
[6422]62\newpage
[8472]63\tableofcontents
64\newpage
[6908]65\section{\gs\ installation and administration}\label{sec:install}
[6312]66
[7861]67This section covers where to get \gsiii\ from, how to install it and how to run it. The standard method of running \gsiii\ is as a Java servlet. We provide the Tomcat servlet container to run the servlet. Standard web servers may be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access \gsiii, Tomcat must be started up, and then it can be accessed via a web browser.
[6312]68
[13893]69Ant (Java's XML based build tool) is used for compilation, installation and running Greenstone. The \gst{build.xml} file is the configuration file for the Greenstone project, and \gst{build.properties} contains parameters that can be altered by the user.
[6312]70
[13893]71\subsection{Get and install \gs\ }\label{sec:getandinstall}
[6312]72
[10775]73\gsiii\ is available for download from Sourceforge:\\
[13893]74 \gst{https://sourceforge.net/projects/greenstone3}. There are Windows, Linux, and source releases. The binary releases are self-installing executables: download and run the file to install. A series of prompts will guide you through the installation process. The source release is a gzip'd tar file. Unzip and untar this, check build.properties, then run \gst{'ant install'} to configure and compile the code.
[6312]75
[13920]76The \gsiii\ library can be launched by running the server program. This is accessible from the Start menu on Windows, or by running the \gst{gs3-server.sh/bat} script in the top level \gst{greenstone3} directory. This program will start up the Tomcat web server and launch a browser.
[13893]77
78Alternatively, you can start it up using Ant: run \gst{'ant start'}, which starts up Tomcat, then in a browser go to \gst{http://localhost:8080/greenstone3}\\
[10775]79(or \gst{http://your-computer-name:your-chosen-port/greenstone3}). \\
[16865]80This gets you to a welcome page containing links to four servlets: the \gst{test} servlet (this allows you to check that Tomcat is running properly); the standard \gst{library} servlet which serves \gst{localsite} site with the \gst{gs2} interface; the \gst{gs3library} servlet which serves \gst{localsite} using the \gst{default} \gsiii-style interface; and the \gst{gateway} servlet, which serves \gst{gateway} site with the \gst{default} interface. The \gst{gateway} site uses a SOAP connection to communicate with \gst{localsite}, and demonstrates the library working in a distributed fashion. The SOAP connection is not enabled by default - to enable it, run \gst{'ant deploy-localsite'}.
[10775]81
[6908]82\gsiii\ is also available through CVS (Concurrent Versioning System). This provides the latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install \gsiii\ from CVS.
[6422]83
84\subsection{How the library works}
[6312]85
[13893]86The standard library program is a Java servlet. We use the Tomcat servlet container to present the servlets over the web. Tomcat takes CGI-style URLs and passes the arguments to the servlet, which processes these and returns a page of HTML. As far as an end-user is concerned, a servlet is a Java version of a CGI program. The interaction is similar: access is via a web browser, using arguments in a URL.
[6422]87
88Other types of interfaces can be used, such as Java GUI programs. See Section~\ref{sec:new-interfaces} for details about how to make these.
89
90\subsubsection{Restarting the library}
91
[13893]92You can restart Tomcat by clicking 'Restart Server' on the little server program. You should restart the server any time you make changes in the following for those changes to take effect:\\
[6312]93\begin{bulletedlist}
94\begin{gsc}
[13893]95\item \$GSDL3HOME/WEB-INF/web.xml
96\item \$GSDL3SRCHOME/packages/tomcat/conf/server.xml
[6312]97\end{gsc}
98\item any classes or jar files used by the servlets
99\end{bulletedlist}
100
101
102\subsection{Directory structure}
103
[13893]104Table~\ref{tab:dirs} shows the file hierarchy for \gsiii.
[6312]105The first part shows the common stuff which can be shared between
[13893]106\gs\ users---the source, libraries etc. The second part shows the file hierarchy for the web directory, which comprises the greenstone3 context for Tomcat, and is accessible via Tomcat. The main directories are for sites and interfaces: there can be several sites and interfaces per installation, and they are described in the following section.
[6312]107
[13893]108Two environment variables used by \gsiii\ are often mentioned in this manual: \gst{\$GSDL3SRCHOME} and \gst{\$GSDL3HOME}. \gst{\$GSDL3SRCHOME} refers to the top-level \gst{greenstone3} directory, while \gst{\$GSDL3HOME} refers to the \gst{web} directory. The web directory contains everything needed to serve the \gsiii\ library using Tomcat, and doesn't necessarily need to live with the rest of the \gsiii\ source.
[7861]109
[6312]110\begin{table}
[6908]111\caption{The \gs\ directory structure}
[6312]112\label{tab:dirs}
[6422]113{\footnotesize
[6312]114\begin{tabular}{l p{8cm}}
115\hline
[6422]116\bf directory & \bf description \\
117\hline
[10775]118greenstone3
[13893]119 & The main installation directory---\$GSDL3SRCHOME is set to this directory \\
[10775]120greenstone3/src
[6312]121 & Source code lives here \\
[10775]122greenstone3/src/java/
[13920]123 & main \gsiii\ java source code \\
[10826]124greenstone3/src/packages
[13893]125 & Imported source packages from other systems e.g. indexing packages may go here \\
[10775]126greenstone3/lib
[6312]127 & Shared library files\\
[10775]128greenstone3/lib/java
[13920]129 & Java jar files not needed in the \gsiii\ runtime\\
[10826]130greenstone3/lib/jni
131 & Jar files and shared library files (.so, .jnilib, .dll) needed for JNI components \\
[10775]132greenstone3/resources
[6312]133 & any resources that may be needed\\
[10775]134greenstone3/resources/soap
[6312]135 & soap service description files \\
[10775]136greenstone3/bin
[6312]137 & executable stuff lives here\\
[10775]138greenstone3/bin/script
[13893]139 & some Perl and/or shell scripts\\
[10826]140greenstone3/packages
[13893]141 & External packages that may be installed as part of greenstone, e.g. Tomcat \\
[10775]142greenstone3/docs
[7861]143 & Documentation\\
[13893]144greenstone3/gli
145 & \gs\ Librarian Interface code \\
146greenstone3/gs2build
147 & collection building code\\
[6312]148\hline
[10775]149greenstone3/web
[13893]150 & This is where the web site is defined. Any static HTML files can go here. This directory is the root directory used by Tomcat when serving \gsiii. \$GSDL3HOME is set to this directory. \\
[10775]151greenstone3/web/WEB-INF
[6904]152 & The web.xml file lives here (servlet configuration information for Tomcat)\\
[10775]153greenstone3/web/WEB-INF/classes
[10880]154 & Individual class files needed by the servlet go in here, also properties files for java resource bundles - used to handle all the language specific text. This directory is on the servlet classpath\\
[10826]155greenstone3/web/WEB-INF/lib
156 & jar files needed by the servlets go here \\
[10775]157greenstone3/web/sites
[6335]158 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (e.g. soap) to other sites\\
[10775]159greenstone3/web/sites/localsite
[7861]160 & An example site - the site configuration file lives here\\
[10775]161greenstone3/web/sites/localsite/collect
[6312]162 & The collections directory \\
[10775]163greenstone3/web/sites/localsite/images
[6312]164 & Site specific images \\
[10775]165greenstone3/web/sites/localsite/transforms
[6312]166 & Site specific transforms \\
[10775]167greenstone3/web/interfaces
[6335]168 & Contains directories for different interfaces - an interface is defined by its images and XSLT files \\
[10775]169greenstone3/web/interfaces/default
[6312]170 & The default interface\\
[10775]171greenstone3/web/interfaces/default/images
[6312]172 & The images for the default interface\\
[13893]173greenstone3/web/interfaces/default/js
174 & The javascript libraries for the default interface\\
175greenstone3/web/interfaces/default/style
176 & The CSS stylesheets for the default interface\\
[10775]177greenstone3/web/interfaces/default/transforms
[6312]178 & The XSLT files for the default interface\\
[10826]179greenstone3/web/applet
180 & jar files needed by applets can go here \\
[6312]181\hline
182\end{tabular}}
183\end{table}
184
185
[6335]186\subsection{Sites and interfaces}\label{sec:sites-and-ints}
[6312]187
[7861]188Sites and interfaces contain the content and presentation information, respectively, for the digital library.
189A site is comprised of a set of collections and possibly some site-wide services. An interface (in this web-based servlet context) is a set of images along with a set of XSLT files used for translating xml output from the library into an appropriate form---HTML in general.
[6312]190
[7861]191One \gsiii\ installation can have many sites and interfaces, and these can be paired in different combinations. One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance. For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. e.g. HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
[6312]192
[13893]193Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialized will be loaded up. Public collections will appear on the library home page, while private collections will be hidden. These can still be accessed by typing in cgi arguments. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
[6904]194
[7861]195There are two sites that come with the distribution: \gst{localsite}, and \gst{gateway}. \gst{localsite} has several demo collections, while \gst{gateway} has none. \gst{gateway} specifies that a SOAP connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
[16865]196There are also two interfaces provided in the distribution: \gst{default} and \gst{gs2}. The default interface is a generic \gsiii\ interface, while the \gst{gs2} interface aims to look like the old \gsii\ interface.
[6312]197
[6422]198Each site and interface has a configuration file which specifies parameters for the site or interface---these are described in Section~\ref{sec:config}.
199
[7861]200\subsection{Configuring Tomcat}\label{sec:tomcat-config}
[6422]201
[13893]202The file \gst{\$GSDL3HOME/WEB-INF/web.xml} contains the configuration information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
[16865]203There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\ library servlets described in Section~\ref{sec:getandinstall}, \gst{library}, \gst{gs3library} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. \gst{site\_name} and \gst{interface\_name} are just two examples of initialization parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
[7861]204
205For more details about Tomcat see Appendix~\ref{app:tomcat}.
206
[6312]207\begin{table}
[10880]208\caption{\gs\ servlet initialization parameters}
[6312]209\label{tab:serv-init}
[6422]210{\footnotesize
[13893]211\begin{tabular}{lp{3.5cm}p{6cm}}
[6422]212\hline
[6312]213\bf name & \bf sample value & \bf description \\
214\hline
[13281]215library\_name & library & the web name of the servlet \\
[6908]216interface\_name & default & the name of the interface to use\\
[13893]217site\_name & localsite & the name of the local site to use (use either site\_name or the three remote\_site parameters)\\
[13281]218remote\_site\_name & org.greenstone.site1 & the name of a remote site (can be anything??) \\
219remote\_site\_type & soap & the type of server running on the site \\
[13893]220remote\_site\_address & http://www.greenstone.org/ greenstone3/services/ localsite & The address of the server \\
[6422]221default\_lang & en & the default language for the interface\\
[16878]222receptionist\_class & MyReceptionist & (optional) specifies an alternative Receptionist to use (default is DefaultReceptionist)\\
223messagerouter\_class & NewMessageRouter & (optional) specifies an alternative MessageRouter to use (default is MessageRouter)\\
224params\_class & GS2Params & (optional) specifies an alternative GSParams class to use \\
[6312]225\hline
[6422]226\end{tabular}}
[6312]227\end{table}
228
[7861]229\subsection{Configuring a \gs\ library}\label{sec:config}
[6312]230
[13893]231Initial \gsiii\ system configuration is determined by a set of XML configuration files. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies parameters for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
[10880]232The configuration files are read in when the system is initialized, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a system command to the library. There are a series of system commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
[6312]233
234\subsubsection{Site configuration file}\label{sec:siteconfig}
235
[13893]236The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any \gst{ServiceClusters} that the site provides (for example, collection building), any \gst{ServiceRacks} that do not belong to a cluster or collection, and a list of
[6312]237known external sites to connect to. Collections are not specified in the site
[7861]238configuration file, but are determined by the contents of the site's
[13893]239collect directory.
[6312]240
[10775]241The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible through Tomcat, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat greenstone3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
[6312]242
243Figure~\ref{fig:siteconfig} shows two example site configuration files. The first example is for a rudimentary site with no site-wide services,
244which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster. It also connects to the first site using SOAP.
[10826]245These two sites happen to be running on the same machine, which is why they can use \gst{localhost} in the address. For site \gst{gsdl1} to talk to site \gst{localsite}, a SOAP server must be run for \gst{localsite}. The address of the SOAP server, in this case, is \gst{http://localhost:8080/greenstone3/services/localsite}.
[6312]246
247
248\begin{figure}
249\begin{gsc}\begin{verbatim}
250<siteConfig>
251 <localSiteName value="org.greenstone.localsite"/>
[10775]252 <httpAddress value="http://localhost:8080/greenstone3/sites/localsite"/>
[6312]253 <serviceClusterList/>
254 <serviceRackList/>
255 <siteList/>
256</siteConfig>
257\end{verbatim}\end{gsc}
258
259\begin{gsc}\begin{verbatim}
260<siteConfig>
261 <localSiteName value="org.greenstone.gsdl1"/>
[10775]262 <httpAddress value="http://localhost:8080/greenstone3/sites/gsdl1"/>
[6312]263 <serviceClusterList>
264 <serviceCluster name="build">
265 <metadataList>
266 <metadata name="Title">Collection builder</metadata>
267 <metadata name="Description">Builds collections in a
[6422]268 gsdl2-style manner</metadata>
[6312]269 </metadataList>
270 <serviceRackList>
271 <serviceRack name="GS2Construct"/>
272 </serviceRackList>
273 </serviceCluster>
274 </serviceClusterList>
275 <siteList>
276 <site name="org.greenstone.localsite"
[10826]277 address="http://localhost:8080/greenstone3/services/localsite"
[6312]278 type="soap"/>
279 </siteList>
280</siteConfig>
281\end{verbatim}\end{gsc}
282\caption{Two sample site configuration files}
283\label{fig:siteconfig}
284\end{figure}
285
[13893]286Another element that can appear in a site configuration file is \gst{replaceList}. This must have an \gst{id} attribute, and may contain one or more \gst{replace} elements. Replace elements are discussed in Section \ref{sec:collconfig}. The list found in a \gst{siteConfig.xml} file can be applied to any collection by adding a \gst{replaceListRef} element (with the appropriate \gst{id} attribute) to its \gst{collectionConfig.xml} file.
287
[6312]288\subsubsection{Interface configuration file}\label{sec:interfaceconfig}
289
[13893]290The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library URLs for the a (action) parameter) e.g. QueryAction should use \gst{a=q}. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
[6312]291
[13893]292It also lists all the languages that the interface text files have been translated into. These have a \gst{name} attribute, which is the ISO code for the language, and a \gst{displayElement} which gives the language name in that language (note that this file should be encoded in UTF-8). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a \gsiii\ library are shown in Section~\ref{sec:interface-language}.
[6904]293
[13893]294An \gst{optionList} element can be used to disable or enable some optional functionality for the interface. Currently there are three options that can be enabled:
295
296\begin{tabular}{lp{7cm}}
297highlightQueryTerms & Whether search term highlighting is available or not\\
298berryBaskets & Whether berry basket functionality is available or not\\
299displayAnnotationService & Whether any annotation services (specified in the site config file) should be displayed with a document or not. \\
300\end{tabular}
301
[16865]302An interface may be based on an existing one, for example, the gs2 interface is based on the default interface. This means that it will use any images or templates from the base one unless overridden in the current one. The \gst{baseInterface} attribute of the \gst{<interfaceConfig>} element is used to specify the base interface.
[13893]303
[6312]304\begin{figure}
305\begin{gsc}\begin{verbatim}
306<interfaceConfig>
307 <actionList>
308 <action name='p' class='PageAction'>
309 <subaction name='home' xslt='home.xsl'/>
310 <subaction name='about' xslt='about.xsl'/>
[6422]311 <subaction name='help' xslt='help.xsl'/>
312 <subaction name='pref' xslt='pref.xsl'/>
[13893]313 <subaction name='nav' xslt='nav.xsl'/><!-- used for the
314 collection header frame -->
315 <subaction name="html" xslt="html.xsl"/> <!-- used to put an
316 external page into a frame with a collection header-->
[6312]317 </action>
318 <action name='q' class='QueryAction' xslt='basicquery.xsl'/>
[6422]319 <action name='b' class='GS2BrowseAction' xslt='classifier.xsl'/>
[6312]320 <action name='a' class='AppletAction' xslt='applet.xsl'/>
321 <action name='d' class='DocumentAction' xslt='document.xsl'/>
[6422]322 <action name='xd' class='XMLDocumentAction'>
323 <subaction name='toc' xslt='document-toc.xsl'/>
324 <subaction name='text' xslt='document-content.xsl'/>
325 </action>
[6312]326 <action name='pr' class='ProcessAction' xslt='process.xsl'/>
327 <action name='s' class='SystemAction' xslt='system.xsl'/>
[13893]328 <action name='g' class='GeneralAction'>
329 <subaction name="berry" xslt='berry.xsl'/>
330 </action>
[6312]331 </actionList>
[6904]332 <languageList>
333 <language name="en">
334 <displayItem name='name'>English</displayItem>
335 </language>
336 <language name="fr">
337 <displayItem name='name'>Français</displayItem>
338 </language>
339 <language name='es'>
340 <displayItem name='name'>Español</displayItem>
341 </language>
342 </languageList>
[13893]343 <optionList>
344 <option name="highlightQueryTerms" value="true"/>
345 <option name="berryBaskets" value="true"/>
346 </optionList>
[6312]347</interfaceConfig>
348\end{verbatim}\end{gsc}
[6422]349\caption{Default interface configuration file}
[6312]350\label{fig:ifaceconfig}
351\end{figure}
352
353
[10880]354\subsection{Run-time re-initialization}\label{sec:runtime-config}
[6312]355
[6904]356When Tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless Tomcat is restarted, or re-configuration commands issued.
[6312]357
[7861]358There are several commands that can be issued to Tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, Tomcat must be restarted for those changes to take effect. Similarly, if the Java classes are modified, Tomcat must be restarted then too.
[6312]359
[6908]360Currently, the runtime configuration commands can only be accessed by typing arguments into the URL; there is no nice web form yet to do this.
[6312]361
[13893]362The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
[6312]363
364\begin{table}
365\caption{Example run-time configuration arguments.}
366\label{tab:run-time config}
[6422]367{\footnotesize
[13893]368\begin{tabular}{lp{9cm}}
[6422]369\hline
[6904]370\gst{a=s\&sa=c} & reconfigures the whole site. Reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument \gst{ss} (system subset). The valid values are \gst{collectionList}, \gst{siteList}, \gst{serviceList}, \gst{clusterList}. \\
[6312]371\gst{a=s\&sa=c\&sc=XXX} & reconfigures the XXX collection or cluster. \gst{ss} can also be used here, valid values are \gst{metadataList} and \gst{serviceList}. \\
372\gst{a=s\&sa=a} & (re)activate a specific module. Modules are specified using two arguments, \gst{st} (system module type) and \gst{sn} (system module name). Valid types are \gst{collection}, \gst{cluster} \gst{site}.\\
[6335]373\gst{a=s\&sa=d} & deactivate a module. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{collection}, \gst{cluster}, \gst{site}, \gst{service}. Modules are removed from the current configuration, but will reappear if Tomcat is restarted.\\
[6422]374\gst{a=s\&sa=d\&sc=XXX} & deactivate a module belonging to the XXX collection or cluster. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{service}. \\
375\hline
376\end{tabular}}
[6312]377\end{table}
[6422]378\newpage
[6908]379\section{Using \gsiii\ }\label{sec:user}
[6312]380
[6908]381Once \gsiii\ is installed, the sample collections can be accessed. The installation comes with several example collections, and Section~\ref{sec:usecolls} describes these collections and how to use them. Section~\ref{sec:buildcol} describes how to build new collections.
[6312]382
[6908]383\subsection{Using a collection}\label{sec:usecolls}
[6312]384
[7861]385A collection typically consists of a set of documents, which could be text, HTML, word, PDF, images, bibliographic records etc, along with some access methods, or ``services''. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.
[6904]386Searching involves entering words or phrases and getting back lists of documents that contain those words. The search terms may be restricted to particular fields of the document.
[6312]387
[6904]388Browsing involves navigating pre-defined hierarchies of documents, following links of interest to find documents. The hierarchies may be constructed on different metadata fields, for example, alphabetical lists of Titles, or a hierarchy of Subject classifications. Clicking on a bookshelf icon takes you to a lower level in the hierarchy, while clicking on a book or page icon takes you to a document.
389
[13893]390In the standard interface that comes with \gsiii\ \footnote{of course, this is all customizable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner for a collection looks something like that shown in Figure~\ref{fig:page-banner}.
[6312]391
392\begin{figure}[h]
393 \centering
[25804]394 \includegraphics[width=4in]{pagebanner} %5.8
[6312]395 \caption{A sample collection page banner}
396 \label{fig:page-banner}
397\end{figure}
398
[7861]399The image at the top left is a link to the collection's home page. The top right has buttons to link to the library home page, help and preferences pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Clicking on a name accesses that service.
[6908]400
[7635]401Search type services generally provide a form to fill in, with parameters including what field or granularity to search, and the query itself. Clicking the search button carries out the search, and a list of matching documents will be displayed. Clicking on the icons in the result list takes you to the document itself.
[6908]402
[6904]403Once you are looking at a document, clicking the open book icon at the top of the document, underneath the navigation bar, will take you back to the service page that you accessed the document from.
[6312]404
405\subsection{Building a collection}\label{sec:buildcol}
406
[13893]407There are three ways to get a new collection into \gsiii. The most common way is to use the Greenstone Librarian Interface to create a collection. If you have existing collections in a \gsii\ installation, these can be imported into \gsiii. Thirdly, you can use the Perl command line building scripts directly.
[6312]408
[13893]409Collections live in the \gst{collect} directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\ installation. The collect directory is at \gst{\$GSDL3HOME/sites/site-name/collect}, where site-name is the name of the site you want your new collection to belong to.
[6312]410
[13893]411The following three sections briefly describe how to create a collection using GLI, how to import a collection from \gsii, and how to use command line building. Once a collection has been built (and is located in the collect directory), the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{and eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, an activate collection command can be issued to the servlet, using the arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}, where \gst{collname} should be replaced with the collection name---this tells the library program to (re)load the \gst{collname} collection.
[6312]412
[13893]413\subsubsection{Using the Librarian Interface}
[6312]414
[13893]415The Greenstone Librarian Interface (GLI) can be used to create collections. The procedure is the same as for \gsii, but it works in a \gsiii\ context. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 Digital Library menu in the Program Files section of the Start menu. On Linux, run \gst{ant gli} from the \gst{greenstone3} directory, or run \gst{./gli4gs3.sh} from the \gst{\$GSDL3SRCHOME/gli} directory.
[6312]416
[13893]417Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use \gsiii\ XML configuration files.}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
[6312]418
[13893]419The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates within a single site---you can edit, delete, and create new collections within this site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences-$>$Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
[6511]420
[13893]421Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\ configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ configuration files. Clicking the Preview Collection button will re-run the configuration file conversion script. If you change anything on the Format panel, you will need to click Preview Collection. Just reloading the collection via a browser will not be enough.
422
[13920]423Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the \gsii\ User's Guide (\gst{GS2-User-en.pdf}). This can be found in your \gsii\ installation, or in the \gst{\$GSDL3SRCHOME/docs/manual} directory if you have installed \gsiii\ from a distribution.
[6511]424
[13893]425
426\subsubsection{Importing from \gsii}
427
428Pre-built \gsii\ collections can also be used in \gsiii. The collection folder should be copied to the collect directory of the site it is to appear in (or a symbolic link may be used if possible).
429The \gsiii\ run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new \gst{collectionConfig.xml} and \gst{buildConfig.xml} from the old \gst{collect.cfg} and \gst{build.cfg} files. It does not change the collection in any way, so it can still be used by \gsii\ software.
430
431The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, make sure you have run \gst{source setup.bash} (or \gst{setup} in Windows) in the \gst{\$GSDL3SRCHOME/gs2build} directory (as well as running the standard \gst{gs3-setup} command). Then you need to specify the path to the collect directory and the collection name as parameters to the conversion script. For example,
432
433\begin{gsc}
434\begin{verbatim}
435convert_coll_from_gs2.pl -collectdir
436 $GSDL3HOME/sites/localsite/collect gs2mgdemo
437\end{verbatim}
438\end{gsc}
439%$
440The script attempts to create \gsiii\ format statements from the old \gsii\ ones. The conversion may not always work properly, so if the collection looks a bit strange under \gsiii, you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
441
442Once again, to have the collection recognized by the library servlet, you can either restart Tomcat, or load it dynamically.
443
444\subsubsection{Using command line building}
445
446This is the same procedure as for \gsii\ command line building, with the addition of a final step to create the \gsiii\ configuration files. The basic steps are (for a new collection called testcol):
447
448Linux:
449
450\begin{gsc}
451\begin{verbatim}
452cd greenstone3
453source gs3-setup.sh
454cd gs2build
455source setup.bash
456cd ../
457mkcol.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
458put source documents and metadata into
459 $GSDL3HOME/sites/localsite/collect/testcol/import
460edit $GSDL3HOME/sites/localsite/collect/testcol/etc/collect.cfg as
461 appropriate
462import.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
463buildcol.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
464rename the $GSDL3HOME/sites/localsite/collect/testcol/building
465 directory to index
466convert_coll_from_gs2.pl -collectdir $GSDL3HOME/sites/localsite/collect
467 testcol
468%$
469\end{verbatim}
470\end{gsc}
471
472Windows:
473\begin{gsc}
474\begin{verbatim}
475cd greenstone3
476gs3-setup
477cd gs2build
478setup
479cd ..
480perl -S mkcol.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
481put source documents and metadata into
482 %GSDL3HOME%\sites\localsite\collect\testcol\import
483edit %GSDL3HOME%\sites\localsite\collect\testcol\etc\collect.cfg as
484 appropriate
485perl -S import.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
486perl -S buildcol.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
487rename the %GSDL3HOME%\sites\localsite\collect\testcol\building directory
488 to index
489perl -S convert_coll_from_gs2.pl -collectdir
490 %GSDL3HOME%\sites\localsite\collect testcol
491\end{verbatim}
492\end{gsc}
493
494Once the build process is complete, Tomcat should be prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
495
496Metadata for documents can be added using \gst{metadata.xml} files. A \gst{metadata.xml} file has a root element of \gst{<DirectoryMetadata>}. This encloses a series of \gst{<FileSet>} items. Neither of these tags has any attributes. Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata. Only files in the same directory as the \gst{metadata.xml} file, or in one of its child directories, will be selected. The filename tag encloses the regular expression as text, e.g.:
497
[6511]498\begin{gsc}\begin{verbatim}
499<FileName>example</FileName>
500\end{verbatim}\end{gsc}
501
[13893]502This would match any file containing the text 'example' in its name. The second part of the \gst{<FileSet>} item is a \gst{<Description>} item. The \gst{<Description>} tag has no attributes, but encloses one or more \gst{<Metadata>} tags. Each \gst{<Metadata>} tag contains one metadata item, i.e. a label to describe the metadata and a corresponding value. The \gst{<Metadata>} tag has one compulsory attribute: \gst{'name'}. This attribute gives the metadata label to add to the document. Each \gst{<Metadata>} tag also has an optional attribute: \gst{'mode'}. If this attribute is set to \gst{'accumulate'} then the value is added to the document, and any existing values for that metadata item are retained. If the attribute is set to \gst{'set'} or is omitted, then any existing value of the metadata item will be deleted.
[6511]503
[7635]504\begin{figure}
[6511]505\begin{gsc}\begin{verbatim}
[7635]506<?xml version="1.0" encoding="UTF-8"?>
[8472]507<!DOCTYPE DirectoryMetadata SYSTEM "http://greenstone.org/dtd/DirectoryMetadata
508 /1.0/DirectoryMetadata.dtd">
[7635]509<DirectoryMetadata>
510 <FileSet>
511 <FileName>ec160e</FileName>
512 <Description>
513 <Metadata name="Title">The Courier - No.160 - Nov - Dec 1996 -
514 Dossier Habitat - Country reports: Fiji , Tonga (ec160e)</Metadata>
515 <Metadata mode="accumulate" name="Language">English</Metadata>
516 <Metadata mode="accumulate" name="Subject">Settlements and housing:
517 general works incl. low- cost housing, planning techniques, surveying,
518 etc.</Metadata>
519 <Metadata mode="accumulate" name="Subject">The Courier ACP 1990 - 1996
520 Africa-Caribbean-Pacific - European Union</Metadata>
521 <Metadata mode="accumulate" name="Organization">EC Courier</Metadata>
522 <Metadata mode="accumulate" name="AZList">T.1</Metadata>
523 </Description>
524 </FileSet>
525 <FileSet>
526 <FileName>b22bue</FileName>
527 <Description>
[7861]528 <Metadata name="Title">Butterfly Farming in Papua New Guinea
529 (b22bue)</Metadata>
[7635]530 <Metadata mode="accumulate" name="Language">English</Metadata>
[7861]531 <Metadata mode="accumulate" name="Subject">Other animals (micro-
532 livestock, little known animals, silkworms, reptiles, frogs,
533 snails, game, etc.)</Metadata>
[7635]534 <Metadata mode="accumulate" name="Organization">BOSTID</Metadata>
535 <Metadata mode="accumulate" name="AZList">T.1</Metadata>
[7861]536 <Metadata mode="accumulate" name="Keyword">start a butterfly farm
537 </Metadata>
[7635]538 </Description>
539 </FileSet>
540</DirectoryMetadata>
[6511]541\end{verbatim}\end{gsc}
[7635]542\caption{Sample metadata.xml file}
543\label{fig:metadatafile}
544\end{figure}
[6511]545
[7635]546Figure~\ref{fig:metadatafile} shows an example metadata.xml file.
[13893]547Here, only one file pattern is found in each file set. However, the \gst{Description} tag contains a number of separate metadata items. Note that the \gst{Title} metadata does not have the \gst{mode=accumulate} attribute. This means that when this title is assigned to a document, any existing \gst{Title} information will be lost.
[6511]548
549
[6312]550\subsection{Collection configuration files}\label{sec:collconfig}
551
[13893]552Each collection has two, or possibly three, \gsiii\ configuration files, \\
553\gst{collectionConfig.xml}, \gst{buildConfig.xml}, and optionally \gst{collectionInit.xml}, that give metadata, display and other information for the
554collection. Currently, \gst{collectionConfig.xml} and \gst{buildConfig.xml} are generated from \gst{collect.cfg} and \gst{build.cfg}. At some stage, the collection building process and the Librarian Interface will be modified to use these files directly.
555\gst{collect.cfg} and/or \gst{collectionConfig.xml} includes user-defined presentation metadata for the collection, such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives instructions on how the collection is to be built. \gst{build.cfg} and/or \gst{buildConfig.xml} are produced by the build-time process and include any metadata that can be determined automatically. It also includes configuration information for any ServiceRacks needed by the collection.
[6312]556
[6908]557All the configuration files should be encoded using UTF-8.
558
[13893]559The format of \gst{collect.cfg} and \gst{build.cfg} are not discussed here. Please see the \gsii\ manuals for more information regarding these files.
560
[6422]561\subsubsection{collectionInit.xml}
562
[10880]563This optional file is only used for non-standard, customized collections. It specifies the class name of the non-standard collection class. The only syntax so far is the class name:
[6422]564
565\begin{gsc}\begin{verbatim}
566<collectionInit class="XMLCollection"/>
567\end{verbatim}\end{gsc}
568
[10880]569Section~\ref{sec:new-coll-types} describes an example collection where this file is used. Depending on the type of collection that this is used for, one or both of the other configuration files may not be needed.
[6422]570
571\subsubsection{collectionConfig.xml}
572
[13893]573The collection configuration file is where the collection designer (e.g. a librarian) decides what form the collection should take. So far this file only includes the presentation aspects needed by the run-time system. Instructions for collection building have yet to be defined. Presentation aspects include collection metadata such as title and description, display text for indexes, and format statements for search results, classifiers etc. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far.
[6312]574
[13893]575Display elements for a collection can be entered in any language---use \gst{lang='en'} attributes to specify which language they are in.
[6312]576
577\begin{figure}
578\begin{gsc}\begin{verbatim}
[10863]579<collectionConfig xmlns:gsf="http://www.greenstone.org/greenstone3/
580 schema/ConfigFormat" xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
[6312]581 <metadataList>
582 <metadata name="creator">[email protected]</metadata>
[13893]583 <metadata name="public">true</metadata>
[6312]584 </metadataList>
585 <displayItemList>
[10863]586 <displayItem name='name' lang='en'>Greenstone3 MG demo collection</displayItem>
587 <displayItem name='description' lang='en'>This is a demonstration
588 collection for the Greenstone3 digital library software.</displayItem>
589 <displayItem name='icon' lang='en'>gs3mgdemo.gif</displayItem>
590 <displayItem name='smallicon' lang='en'>gs3mgdemo_sm.gif</displayItem>
[6312]591 </displayItemList>
[13893]592 <search>
593 <index name="ste">
[6908]594 <displayItem name='name' lang="en">chapters</displayItem>
595 <displayItem name='name' lang="fr">chapitres</displayItem>
[10863]596 <displayItem name='name' lang="es">capítulos</displayItem>
[6908]597 </index>
[10863]598 [ ... more indexes ...]
[6312]599 <format>
[10863]600 <gsf:template match="documentNode"><td valign='top'>
601 <gsf:link><gsf:icon/></gsf:link></td><td><gsf:metadata name='Title'/>
602 </td></gsf:template>
[6312]603 </format>
604 </search>
605 <browse>
[13893]606 <classifier name="CL1" horizontalAtTop='true'>
[7635]607 <displayItem name='name' lang='en'>Titles</displayItem>
608 </classifier>
[10863]609 [... more classifiers ...]
[13893]610 <classifier name="CL4">
[7635]611 <displayItem name='name' lang='en'>HowTo</displayItem>
[6312]612 <format>
[6422]613 <gsf:template match="documentNode">
[6312]614 <br /><gsf:link><gsf:metadata name='Keyword' />
615 </gsf:link></gsf:template>
616 </format>
617 </classifier>
618 </browse>
[10863]619 <display>
620 <format>
621 <gsf:option name="coverImages" value="false"/>
[13893]622 <gsf:option name="documentTOC" value="false"/>
[10863]623 </format>
624 </display>
[6312]625</collectionConfig>
626\end{verbatim}\end{gsc}
[13893]627\caption{Sample collectionConfig.xml file}
[6312]628\label{fig:collconfig}
629\end{figure}
630
[13893]631The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These \gst{displayItem} elements can be specified in different languages.
[6904]632
[13893]633The \gst{<search>} element provides some display and formatting information for the search indexes, while the \gst{<browse>} element concerns classifiers, and the \gst{<display>} element looks at document display.
[6904]634
[13893]635Inside the \gst{<search>} and \gst{<browse>} elements, \gst{<displayItem>} elements are used to provide titles for the indexes or classifiers, while \gst{<format>} elements provide formatting instructions, typically for a document or classifier node in a list of results. Placing the \gst{<format>} instructions at the top level in the \gst{search} or \gst{browse} element will apply the format to all the indexes or classifiers, while placing it inside an individual \gst{index} or \gst{classifier} element will restrict that formatting instruction to that item.
[6904]636
[13893]637The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading} and \gst{DocumentContent}. Other formatting options may also be specified here, such as whether to display a table of contents and/or cover image for the documents.
[6904]638
[7861]639Format elements are described in Section~\ref{sec:formatstmt}.
[6904]640
[13920]641An optional \gst{<replaceList>} element can be included at the top level. This contains a list of strings and their replacements. This is particularly useful for \gsii\ collections that use macros.
[8696]642
643The format is like the following:
644\begin{gsc}\begin{verbatim}
645<replaceList>
646<replace scope='text' macro="xxx" text="yyy"/>
647<replace scope='metadata' macro="xxx" bundle="yyy" key="zzz"/>
648<replace scope='all' macro='xxx' metadata='yyy'/>
649</replaceList>
650\end{verbatim}\end{gsc}
651
[13893]652Scope determines on what text the replacements are carried out: \gst{text}, \gst{metadata}, and \gst{all} (both text and metadata). An empty scope attribute is equivalent to scope=all. Each replace type can be used with all scope values. Replacing uses Java's 'String.replaceAll' functionality, so macro and replacement text are actually regular expressions. The first example is a straight textual replacement. The second example uses dictionary lookups. xxx will be replaced with the (language-dependent) value for key zzz in resource bundle yyy. The third example uses metadata: xxx will be replaced by the value of the yyy metadata for that document.
[8696]653
[13920]654Appendix~\ref{app:gs2replace} gives some examples that have been used for \gsii\ collections.
[8696]655
[9445]656\subsubsection{buildConfig.xml}\label{sec:buildconfig}
[6312]657
[7861]658The file \gst{buildConfig.xml} is produced by the collection building process. Generally it is not necessary to look at this file, but it can be useful in determining what went wrong if the collection doesn't appear quite the way it was planned.
[6904]659
[10863]660It contains metadata and other information about the collection that can
661be determined automatically, such as the number of
[13893]662documents in the collection. It also includes a list of \gst{ServiceRack} classes that are
[6930]663required to provide the services that have been built into the
[6312]664collection. The serviceRack names are Java classes that are loaded
665dynamically at runtime. Any information inside the serviceRack element is
[10863]666specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This configuration file specifies that the collection should load up 3 ServiceRacks: \gst{GS2Browse}, \gst{GS2MGPPRetrieve} and \gst{GS2MGPPSearch}. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. The \gst{collectionConfig.xml} file content is also passed to the ServiceRack objects at configure time---the \gst{format} and \gst{displayItem} information is used directly from the \gst{collectionConfig.xml} file rather than added into \gst{buildConfig.xml} during building. This enables formatting and metadata changes in \gst{collectionConfig.xml} to take effect in the collection without rebuilding being necessary. However, as these files are cached, the collection needs to be reloaded for the changes to appear in the library.
[6312]667
668
669\begin{figure}
670\begin{gsc}\begin{verbatim}
[10863]671<buildConfig>
[6312]672 <metadataList>
673 <metadata name="numDocs">11</metadata>
[10863]674 <metadata name="buildType">mgpp</metadata>
[6312]675 </metadataList>
676 <serviceRackList>
[10863]677 <serviceRack name="GS2Browse">
[13893]678 <indexStem name="gs2mgppdemo"/>
[6312]679 <classifierList>
[10863]680 <classifier name="CL1" content="Title"/>
681 <classifier name="CL2" content="Subject" />
682 <classifier name="CL3" content="Organization" />
683 <classifier name="CL4" content="Howto" />
[6312]684 </classifierList>
685 </serviceRack>
[10863]686 <serviceRack name="GS2MGPPRetrieve">
[13893]687 <indexStem name="gs2mgppdemo"/>
[10863]688 <defaultLevel name="Sec" />
689 </serviceRack>
[6312]690 <serviceRack name="GS2MGPPSearch">
[13893]691 <indexStem name="gs2mgppdemo"/>
[6312]692 <defaultLevel name="Sec" />
693 <levelList>
[10863]694 <level name="Sec" />
695 <level name="Doc" />
[6312]696 </levelList>
697 <fieldList>
[10863]698 <field shortname="ZZ" name="allfields" />
699 <field shortname="TX" name="text" />
700 <field shortname="DL" name="dls.Title" />
701 <field shortname="DS" name="dls.Subject" />
702 <field shortname="DO" name="dls.Organization" />
[6312]703 </fieldList>
704 <searchTypeList>
[10863]705 <searchType name="form" />
706 <searchType name="plain" />
[6312]707 </searchTypeList>
[13893]708 <indexOptionList>
709 <indexOption name="stemIndexes" value="3"/>
710 <indexOption name="maxnumeric" value="4"/>
711 </indexOptionList>
[6312]712 <defaultIndex name="idx" />
713 <indexList>
[10863]714 <index name="idx" />
[6312]715 </indexList>
716 </serviceRack>
717 </serviceRackList>
[13893]718</buildConfig>
[6312]719\end{verbatim}\end{gsc}
[10863]720\caption{Sample buildConfig.xml file (gs2mgppdemo collection)}
[6312]721\label{fig:buildconfig}
722\end{figure}
723
724\subsection{Formatting the collection}\label{sec:formatstmt}
725
[6908]726Part of collection design involves deciding how the collection should look. \gsiii\ has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
[6312]727
[13893]728In standard \gsiii, the library is served to a web browser by a servlet, and the HTML is generated using XSLT. XSLT templates are used to format all the parts of the pages. These templates can be overridden by including them in the \gst{collectionConfig.xml} file. Some commonly overridden templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
[6312]729
[6335]730Real XSLT templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.
[6312]731
732\begin{gsc}\begin{verbatim}
733<xsl:template match="documentNode" priority="2"
734 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
735 <xsl:param name="collName"/>
[13893]736 <td><a href="{$library_name}?a=d&amp;c={$collName}&amp;
[6312]737 d={@nodeID}&amp;dt={@docType}"><xsl:value-of
738 select="metadataList/metadata[@name='Keyword']"/></a>
739 </td>
740</xsl:template>
741 \end{verbatim}\end{gsc}
742
743To write this, the user would need to know that:
744\begin{bulletedlist}
[6930]745\item the variable \gst{\$library\_name} exists,
746\item the collection name is passed in as a parameter called \gst{collName}
747\item metadata for a document is found in a \gst{<metadataList>} and that its form is \gst{<metadata name="Keyword">the value</metadata>}
748\item the arguments needed for the link to the document are \gst{a, sa, c, d, a, dt}.
[6312]749\end{bulletedlist}
750
[13893]751We can use XSLT to transform XML into XSLT. \gsiii\ provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. The user specifies a \gst{<gsf:template>} for what they want to format---these typically match \gst{documentNode} or \gst{classifierNode} (for a node in a classification hierarchy).
[6312]752
[13893]753The template above can be represented as:
[6930]754
755\begin{gsc}\begin{verbatim}
756<gsf:template match='documentNode'>
757 <td><gsf:link><gsf:metadata name='Keyword'/></gsf:link></td>
758</gsf:template>
759\end{verbatim}\end{gsc}
760
[13893]761Table~\ref{tab:gsf-format} shows the set of \gst{'gsf'} (Greenstone Format) elements. If you have come from a \gsii\ background, Appendix~\ref{app:gs2format} shows \gsii\ format elements and their equivalents in \gsiii\ .
[6930]762
[6312]763\begin{table}
764\caption{Format elements for GSF format language}
765\label{tab:gsf-format}
[6422]766{\footnotesize
767\begin{tabular}{p{6.5cm}p{6.5cm}}
768\hline
[6312]769\bf Element & \bf Description \\
[6422]770\hline
771\gst{<gsf:text/>} & The document's text\\
[6908]772\hline
[6312]773\gst{<gsf:link>...</gsf:link>} & The HTML link to the document itself \\
[6422]774\gst{<gsf:link type='document'>...
775</gsf:link>} & Same as above\\
776\gst{<gsf:link type='classifier'>...
777</gsf:link>} & A link to a classification node (use in classifierNode templates)\\
778\gst{<gsf:link type='source'>...
779</gsf:link>} & The HTML link to the original file---set for documents that have been converted from e.g. Word, PDF, PS \\
[6908]780\hline
[6312]781\gst{<gsf:icon/>} & An appropriate icon\\
782\gst{<gsf:icon type='document'/>} & same as above\\
783\gst{<gsf:icon type='classifier'/>} & bookshelf icon for classification nodes\\
[6335]784\gst{<gsf:icon type='source'/>} & An appropriate icon for the original file e.g. Word, PDF icon\\
[6908]785\hline
[26102]786\gst{<gsf:metadata name='Title'/>} & All the values of a metadata element for the current document or section, in this case, Title\\
[25806]787\gst{<gsf:metadata name='Title' select='select-type' [separator='y' pos='first|last|n']/>} & A more extended selection of metadata values. The select field can be one of those shown in Table~\ref{tab:gsf-select-types}. There are two optional attributes: separator gives a String that will be used to separate the fields, default is ``, ``, and pos can be set to return either the first, last or nth value for that metadata at each section.\\
[26102]788\gst{<gsf:metadata name='Date' format='formatDate'/>} & The value of a metadata element for the current document, formatted in some way. Current formatting options available are listed in Table~\ref{tab:gsf-process-types}. \\
[6908]789\hline
[6312]790\gst{<gsf:choose-metadata>
791 <gsf:metadata name='metaA'/>
792 <gsf:metadata name='metaB'/>
793 <gsf:metadata name='metaC'/>
794</gsf:choose-metadata>}
[25806]795 & A choice of metadata. Will select the first existing one. the metadata elements can have the select, separator and pos attributes like normal.\\
[6908]796\hline
[6422]797\gst{<gsf:switch preprocess=
798'preprocess-type'>
799<gsf:metadata name='Title'/>
800<gsf:when test='test-type'
801test-value='xxx'>...</gsf:when>
802<gsf:when test='test-type'
803test-value='yyy'>...</gsf:when>
804<gsf:otherwise>...</gsf:otherwise>
805</gsf:switch>} & switch on the value of a particular metadata - the metadata is specified in gsf:metadata, has the same attributes as normal.\\
806\hline
807\end{tabular}}
[6312]808\end{table}
809
[25806]810The \gst{<gsf:metadata>} elements are used to output metadata values. The simplest case is \gst{<gsf:metadata name='Title'/>}---this outputs all the Title metadata values for the current document or section. Namespaces are important here: if the Title metadata is in the Dublin Core (dc) namespace, then the element should look like \gst{<gsf:metadata name='dc.Title'/>}. There are three other attributes for this element. By default, more than one value for the selected metadata is returned, where multiple exist. The attribute \gst{pos} is used when a particular value for the selected metadata is requested (which can be the first, last or nth value).
811For instance, one document may fall into several classification categories, and therefore may have multiple Subject metadata values. When all are returned, the multiple values are separated by commas by default. The \gst{separator} attribute is used to change the separating string. For example, adding \gst{separator=':~'} to the element will separate all values by a colon and a space. Instead of retrieving all values for a piece of metadata, adding \gst{pos='first'} to the \gst{<gsf:metadata>} element will retrieve the first value.
[6312]812
[6930]813Sometimes you may want to display metadata values for sections other than the current one. For example, in the mgppdemo collection, in a search list we display the Titles of all the enclosing sections, followed by the Title of the current section, all separated by semi-colons. The display ends up looking something like:
814\emph{Farming snails 2; Starting out; Selecting your snails}
815where \emph{Selecting your snails} is the Title of the section in the results list, and \emph{Farming snails 2} and \emph{Starting out} are the Titles of the enclosing sections. The \gst{select} attribute is used to display metadata for sections other than the current one. Table~\ref{tab:gsf-select-types} shows the options available for this attribute. The \gst{separator} attribute is used here also, to specify the separating text.
[6312]816
817To get the previous metadata, the format statement would have the following in it:
818
[6422]819\begin{gsc}
820\begin{verbatim}
821<gsf:metadata name='Title' select='ancestors' separator='; '/>;
822 <gsf:metadata name='Title'/>
823\end{verbatim}
824\end{gsc}
[6312]825
826\begin{table}
827\caption{Select types for metadata format elements}
828\label{tab:gsf-select-types}
[6422]829{\footnotesize
[6312]830\begin{tabular}{ll}
[6343]831\hline
[6312]832\bf Select Type & \bf Description\\
[6343]833\hline
[6312]834parent & The immediate parent section\\
835ancestors & All the parents back to the root (topmost) section\\
836root & The root or topmost section \\
[26102]837%siblings & All the sibling sections\\
838%children & The immediate child sections of the current section\\
839%descendants & All the descendent sections\\
[6343]840\hline
[6422]841\end{tabular}}
[6312]842\end{table}
843
[26102]844\begin{table}
845\caption{String processing option, for preprocess in gsf:switch, and format in gsf:metadata}
846\label{tab:gsf-process-types}
847{\footnotesize
848\begin{tabular}{ll}
849\hline
850\bf Process Type & \bf Description\\
851\hline
852toUpper & Make the value upper case \\
853toLower & Make the value lower case \\
854tidyWhitespace & Replace multiple whitespace characters with a single space \\
855stripWhitespace & Removes all whitespace characters \\
856cgiSafe &Make value safe to be a cgi argument \\
857formatDate & turns '20040201' into '01 February 2004' in a language dependent manner \\
858formatLanguage & turns 'en' into 'English' in a language dependent manner\\
859formatBigNumber & \\
860\hline
861\end{tabular}}
862\end{table}
863
[13893]864The \gst{<gsf:choose-metadata>} element selects the first available metadata value from the list of options.
[6312]865\begin{gsc}
866\begin{verbatim}
[6422]867<gsf:choose-metadata>
[8696]868 <gsf:metadata name='dc.Title'/>
869 <gsf:metadata name='dls.Title'/>
870 <gsf:metadata name='Title'/>
[6422]871</gsf:choose-metadata>
[6312]872\end{verbatim}
873\end{gsc}
874
[26102]875This will display dc.Title if available, otherwise it will use dls.Title if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed.
[6312]876
[10880]877The \gst{<gsf:switch>} element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organization it came from.
[6312]878
879\begin{gsc}
880\begin{verbatim}
[8696]881<gsf:switch preprocess='toLower;stripSpace'>
882 <gsf:metadata name='Organization'/>
[6422]883 <gsf:when test='equals' test-value='bostid'>
884 <!-- output BOSTID image --></gsf:when>
885 <gsf:when test='equals' test-value='worldbank'>
886 <!-- output world bank image --></gsf:when>
[6312]887 <gsf:otherwise><!-- output default image--></gsf:otherwise>
888</gsf:switch>
889\end{verbatim}
890\end{gsc}
891
[26102]892Preprocessing of the metadata value is optional. The preprocess types are listed in Table~\ref{tab:gsf-process-types}. These operations are carried out on the value of the selected metadata before the test is carried out. Multiple processing types can be specified, separated by ; and they will be applied in the order specified (from left to right).
[6312]893
[6930]894Each option specifies a test and a test value. Test values are just text. Tests include \gst{startsWith}, \gst{contains}, \gst{exists}, \gst{equals}, \gst{endsWith}. Exists doesn't need a test value. Having an otherwise option ensures that something will be displayed even when none of the tests match.
[6312]895
[6335]896If none of the gsf elements meets your needs for formatting, XSLT can be entered directly into the format element, giving the collection designer full flexibility over how the collection appears.
[6312]897
[6930]898The collection specific templates are added into the configuration file \gst{collectionConfig.xml}. Any templates found in the XSLT files can be overridden.
[6335]899The important part to adding templates into the configuration file is determining where to put them. Formatting templates cannot go just anywhere---there are standard places for them. Figure~\ref{fig:format-places} shows the positions that templates can occur.
[6312]900
901\begin{figure}
902\begin{gsc}\begin{verbatim}
903<collectionConfig>
904 <metadataList/>
905 <displayItemList/>
906 <search>
907 <format> <!--Put here templates related to searching and
[6422]908 the query page. The common one is the documentNode
909 template -->
[6312]910 <gsf:template match='documentNode'>...</gsf:template>
911 </format>
912 </search>
913 <browse>
914 <classifier name='xx'>
915 <format><!-- put here templates related to formating a
[6422]916 particular classifier page. Common ones are documentNode
917 and classifierNode templates-->
[6312]918 <gsf:template match='documentNode'>...</gsf:template>
919 <gsf:template match='classifierNode'>...</gsf:template>
920 <gsf:template match='classifierNode' mode='horizontal'>...
[6422]921 </gsf:template>
[6312]922 </format>
923 </classifier>
924 <classifier>...</classifier>
[6904]925 <format><!-- formatting for all the classifiers. these will
926 be overridden by any classifier specific formatting
927 instructions --></format>
[6312]928 </browse>
929 <display>
930 <format><!-- here goes any formatting relating to the display
[6422]931 of the documents. These are generally named templates,
932 and format options -->
[6312]933 <gsf:template name='documentContent'>...</gsf:template>
934 <gsf:option name='TOC' value='true'/>
935 </format>
936 </display>
937</collectionConfig>
938\end{verbatim}\end{gsc}
[6335]939\caption{Places for format statements}
[6312]940\label{fig:format-places}
941\end{figure}
942
943
944There are also formatting instructions that are not templates but are options.
[6335]945These are described in Table~\ref{tab:format_options}. They are entered into the configuration file like \gst{<gsf:option name='coverImages' value='false'/>}
[6312]946
947\begin{table}
948\caption{Formatting options}
949\label{tab:format_options}
[6422]950{\footnotesize
[6312]951\begin{tabular}{llp{5cm}}
952\hline
953\bf option name & \bf values & \bf description \\
954\hline
955coverImages & true, false & whether or not to display cover images for documents \\
[13893]956documentTOC & true, false & whether or not to display the table of contents for the document\\
[6312]957\hline
958\end{tabular}}
959\end{table}
960
[6335]961Note, format templates are added into the XSLT files before transforming, while the options are added into the page source, and used in tests in the XSLT.
[13893]962
[9445]963\subsubsection{Changing the service text strings}
[6312]964
[9445]965Each collection has a set of services which are the access points for the information in the collection. Each service has a set of text strings which are used to display it. These include name, description, the text on the submit button, and names and descriptions of all the parameters to the service.
966
[13893]967These text strings are found in \gst{.properties} files, in \gst{\$GSDL3HOME/WEB-INF/classes}. The names of the files are based on class names. Subclasses can define their own properties, or can use their parent class ones. For example, \gst{AbstractSearch} defines strings for the \gst{TextQuery} service, in \gst{AbstractSearch.properties}. \gst{GS2MGSearch} just uses these default ones, so doesn't need its own properties file.
[9445]968
[13893]969A particular collection can override the properties for any service. For example, if a collection uses the \gst{GS2MGSearch} service rack (look in the \gst{buildConfig.xml} file for a list of service racks used), and the collection builder wants to change the text associated with this service, they can put a \gst{GS2MGSearch.properties} file in the resources directory of the collection. After a reconfigure of the collection, this will be used in preference to the one in the default resources directory.
[9445]970
[10880]971\subsection{Customizing the interface}\label{sec:interface-customise}
[6312]972
[10880]973Format statements in the collection configuration files provide a way to change small parts of the collection display. For large scale customizations to a collection, or ones that apply to a site as a whole, a second mechanism is available. The interface is defined by a set of XSLT files that transform the page data into HTML. Any of these files can be overridden to provide specialized display, on a site or collection basis.
[6312]974
[7635]975The first section looks at customizing the existing interface, while the second section looks at defining a whole new interface. The last section describes how to add a new language translation of an interface.
[6312]976
[6930]977\subsubsection{Modifying an existing interface}
[6312]978
[13893]979Most of an interface is defined by XSLT files, which are stored in \gst{\$GSDL3HOME/\-interfaces/\-interface-name/\-transform}. These can be changed and the changes will take effect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same \gs\ installation as the interface.)
[6312]980
[10880]981Sites and collections can have a transform directory, which is where customized XSLT files should go. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different layout for the about page of a collection, you can put a new \gst{about.xsl} file into the collection's \gst{transform} directory, and this will be used instead. This is what we do for the Gutenberg sample collection.
[6312]982
[13893]983This also applies to files that are included from other XSLT files. For example the \gst{query.xsl} for the query pages includes a file called \gst{querytools.xsl}. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site \gst{transform} directory will work. Either the new \gst{query.xsl} will include the default \gst{querytools.xsl}, or the default \gst{query.xsl} will include the new \gst{querytools.xsl}. The \gst{xsl:include} directives are preprocessed by the Java code and full paths added based on availability of the files, so that the correct one is used.
[6312]984
[13893]985Note that you cannot include a file with the same name as the including file. For example \gst{query.xsl} cannot include \gst{query.xsl} (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
[6312]986
[13893]987You can add the argument \gst{o=xml} to any URL and you wil be returned the XML before transformation by a stylesheet. This shows you the XML page source. It can be useful when you are trying to write some new XSLT statements.
[13281]988
[6930]989\subsubsection{Defining a new interface}
[6312]990
[6930]991A new interface may be needed if different instantiations of the library require different interfaces, or different developers want their own look and feel. Creating a new interface will allow modifications to be made while leaving the original one intact.
[6312]992
[13893]993A new interface needs a directory in \gst{\$GSDL3HOME/interfaces}, the name of this directory becomes the interface name. Inside, it needs \gst{images} and \gst{transform} directories, and an \gst{interfaceConfig.xml} file. The \gst{interfaceConfig.xml} file may specify a base interface, in which case the new interface only needs to define XSLT for the parts that are different. Otherwise, it will need a full set of XSLT files.
[6312]994
[13893]995To use a new interface, the \gst{\$GSDL3HOME/WEB-INF/web.xml} file must be edited: either change the interface that a current servlet instance is using, or add another servlet instantiation to the file (see Section~\ref{sec:sites-and-ints} or Appendix~\ref{app:tomcat}). The Tomcat server must be restarted for this to take effect.
[6312]996
[13893]997\subsubsection{Changing the interface language}\label{sec:interface-language}
[6312]998
[6930]999The interface language can be changed by going to the preferences page, and choosing a language from the list, which includes all languages into which the interface has been translated.
[6312]1000
[16865]1001It is easy to add a new interface language to \gs\ . Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in \gst{\$GSDL3HOME/WEB-INF/classes}. Each interface has one named \gst{interface\_name.properties} (where \gst{'name'} is the interface name, for example, \gst{interface\_default.properties}, or \gst{interface\_gs2.properties}). Each service class has one with the same name as the class (e.g. \gst{GS2Search.properties}). To add another language all of the base \gst{.properties} files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of \gst{interface\_default.properties} would be named \gst{interface\_default\_fr.properties}.
[6312]1002
[7861]1003Keys will be looked up in the properties file closest to the specified language. For example, if language \gst{fr\_CA} was specified (French language, country Canada), and the default locale was \gst{en\_GB}, Java would look at properties files in the following order, until it found the key: \gst{XXX\_fr\_CA.properties}, \gst{XXX\_fr.properties}, \gst{XXX\_en\_GB.properties}, then \gst{XXX\_en.properties}, and finally the default \gst{XXX.properties}.
[6312]1004
[10880]1005These new files are available straight away---to use the new language, add e.g. \gst{l=fr} to the arguments in the URL. To get \gs\ to add it in to the list of languages on the preferences page, an entry needs to be added into the languages list in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}). Modification of this file requires a restart of the Tomcat server for the changes to be recognized.
[6312]1006
[6422]1007\newpage
[6930]1008\section{Developing \gsiii : Run-time system}\label{sec:develop-runtime}
[6312]1009
[13893]1010[TODO: rewrite this section\\
[6312]1011runtime object structure diagram. describe the modules.\\
1012class hierarchy,\\
1013directory structure and where everything lives\\
1014message format.\\
1015overall description of message passing sequence.\\
1016configuration process - start up and runtime\\
1017\\
1018page generation\\
[13893]1019]
[6343]1020\subsection{Overview of modules??}
1021
[7861]1022A \gsiii\ 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc. Figure~\ref{fig:local} shows how they fit together in a stand-alone system. The top left part is concerned with displaying the data, while the bottom right part is the collection data serving part. The two sides communicate through the MessageRouter. There is a one-to-one correspondence between modules and Java classes, with the exception of services: for coding and/or run-time efficiency reasons, several Service modules may be grouped together into one ServiceRack class.
[6312]1023
1024\begin{figure}[t]
1025 \centering
[9874]1026 \includegraphics[width=4in]{local} %5.8
[6312]1027 \caption{A simple stand-alone site.}
1028 \label{fig:local}
1029\end{figure}
1030
1031
1032{\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site.
1033
[7826]1034{\em Collection and ServiceCluster}: these are very similar, and group a set of services into a conceptual group.. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e.g. all the building services may be part of a cluster. What is part of a cluster is specified by the site configuration file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
[6312]1035Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
1036
[7826]1037{\em Service}: these provide the core functionality of the system e.g. searching, retrieving documents, building collections etc. One or more may be grouped into a single Java class (ServiceRack) for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory.
[6312]1038
1039{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
1040
[7826]1041{\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to the appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT. There is a hierarchy of different Receptionist types, which is described in Section~\ref{sec:recepts}.
[6312]1042
[6335]1043{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'CGI' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in more detail in Section~\ref{sec:pagegen}.
[6312]1044
1045
1046\subsection{Start up configuration}\label{sec:startup-config}
1047
1048We use the Tomcat web server, which operates either stand-alone in a test mode
[6908]1049or in conjunction with the Apache web server. The \gs\ LibraryServlet
[6312]1050class is loaded by Tomcat and the servlet's \gst{init()} method is called. Each time a
1051\gst{get/put/post} (etc.) is used, a new thread is started and
1052\gst{doGet()/doPut()/doPost()} (etc.) is called.
1053
1054The \gst{init()} method creates a new Receptionist and a new
[6335]1055MessageRouter. Default classes (DefaultReceptionist, MessageRouter) are used unless subclasses have been specified in the servlet initiation parameters (see Section~\ref{sec:sites-and-ints}). The appropriate system variables are set for each object (interface
[6312]1056name, site name, etc.) and then \gst{configure()} is called on both. The MessageRouter handle
1057is passed to the Receptionist. The servlet then communicates only with
1058the Receptionist, not with the MessageRouter.
1059
[7635]1060The Receptionist reads in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}), and loads up all the different Action classes. Other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
[6335]1061If the Receptionist is a TransformingReceptionist, a mapping between shortnames and XSLT file names is also created.
[6312]1062
[7826]1063The MessageRouter reads in its site configuration file \gst{siteConfig.xml} (see Section~\ref{sec:siteconfig}). It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are part of what get returned in response to a describe request (see Section~\ref{sec:describe}.).
1064
[6335]1065Each ServiceRack specified in the configuration file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
[7826]1066
[6312]1067ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
1068
[7826]1069For each site specified, the MessageRouter creates an appropriate type of Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should be successful. The site will be added to the mapping with its site name as a key. The site's collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure-site command must be sent (see Section~\ref{sec:runtime-config}).
[6312]1070
[7826]1071The MessageRouter also looks inside the site's \gst{collect} directory, and loads up a Collection object for each valid collection found. If a \gst{collectionInit.xml} file is present, a subclass of Collection may be used.
[6312]1072The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
1073files, determines the metadata, and loads ServiceRack classes based on the
[6335]1074names specified in \gst{buildConfig.xml\/}. The \gst{<serviceRack>} XML element is passed to the object to be used in configuration. The \gst{collectionConfig.xml} contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection configuration file.
1075Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList XML.
[6312]1076
[7635]1077\subsection{Message passing}
[6312]1078
[7635]1079There are two types of messages used by the system: external and internal messages. All messages have an enclosing \gst{<message>} element, which contains either one or more requests, or one or more responses. In the following descriptions, the message element is not shown, but is assumed to be present.
[7826]1080Action in \gsiii\ is originated by a request coming in from the outside. In the standard web-based \gs, this comes from a servlet and is passed into the Receptionist. This ``external'' type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter of the request.
[6312]1081
[7826]1082Messages inside the system (``internal'' messages) all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned. Currently all requests are independent, so any requests can be combined into the same message, and they will be answered separately, with their responses being sent back in a single message.
[6312]1083
[10880]1084When a page request (external request) comes in to the Receptionist, it looks at the action attribute and passes the request to the appropriate Action module. The Action will fire one or more internal requests to the MessageRouter, based on the arguments. The data is gathered into a response, which is returned to the Receptionist. The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the \gs\ servlet we transform using XSLT to generate HTML pages.
[6312]1085
[6335]1086Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text
[7826]1087There are different internal request types: describe, process, system, format, status. Process requests do the actual work of the system, while the other types get auxiliary information. The format of the requests and responses for each internal request type are described in the following sections. External style requests, and their page responses are described in the Section about page generation (Section~\ref{sec:pagegen}).
[6312]1088
1089\subsection{'describe'-type messages}\label{sec:describe}
1090
1091The most basic of the internal standard requests is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The response is predefined apart from any language-specific text strings, which are put together as each request comes in, based on the language attribute of the request.
1092\begin{quote}\begin{gsc}\begin{verbatim}
1093<request lang='en' type='describe' to=''/>
1094\end{verbatim}\end{gsc}\end{quote}
1095If the \gst{to} field is empty, a request is answered by the MessageRouter.
1096An example response from a MessageRouter might look like this:
1097\begin{quote}\begin{gsc}\begin{verbatim}
1098<response lang='en' type='describe'>
1099 <serviceList/>
1100 <siteList>
1101 <site name='org.greenstone.gsdl1'
[10826]1102 address='http://localhost:8080/greenstone3/services/localsite'
[6312]1103 type='soap' />
1104 </siteList>
1105 <serviceClusterList>
1106 <serviceCluster name="build" />
1107 </serviceClusterList>
1108 <collectionList>
1109 <collection name='org.greenstone.gsdl1/
1110 org.greenstone.gsdl2/fao' />
1111 <collection name='org.greenstone.gsdl1/demo' />
1112 <collection name='org.greenstone.gsdl1/fao' />
1113 <collection name='myfiles' />
1114 </collectionList>
1115</response>
1116\end{verbatim}\end{gsc}\end{quote}
1117This MessageRouter has no individual site-wide services (an empty \gst{<serviceList>}), but has a service cluster called build (which provides collection importing and building functionality). It
1118communicates with one site, \gst{org.greenstone.gsdl1}. It is aware of four
1119collections. One of these, \gst{myfiles}, belongs to it; the other three are
1120available through the external site. One of those collections is actually from
1121a further external site.
1122
1123It is possible to ask just for a specific part of the information provided by a
[7826]1124describe request, rather than the whole thing. For example, these two
1125messages get the \gst{collectionList} and the \gst{siteList} respectively:
[6312]1126\begin{quote}\begin{gsc}\begin{verbatim}
1127<request lang='en' type='describe' to=''>
1128 <paramList>
1129 <param name='subset' value='collectionList'/>
1130 </paramList>
1131</request>
1132
1133<request lang='en' type='describe' to=''>
1134 <paramList>
1135 <param name='subset' value='siteList'/>
1136 </paramList>
1137</request>
1138\end{verbatim}\end{gsc}\end{quote}
1139
[7826]1140Subset options for the MessageRouter include \gst{collectionList}, \gst{serviceClusterList}, \gst{serviceList}, \gst{siteList}.
[6312]1141
[7826]1142When a collection or service cluster is asked to describe itself, what is returned is a list of metadata, some display elements, and a list of services. For example, here is such a message, along with a sample response.
1143
[6312]1144\begin{quote}\begin{gsc}\begin{verbatim}
1145<request lang='en' type='describe' to='mgppdemo'/>
1146
1147<response from="mgppdemo" type="describe">
1148 <collection name="mgppdemo">
1149 <displayItem lang="en" name="name">greenstone mgpp demo
1150 </displayItem>
1151 <displayItem lang="en" name="description">This is a
1152 demonstration collection for the Greenstone digital
1153 library software. It contains a small subset (11 books)
1154 of the Humanity Development Library. It is built with
1155 mgpp.</displayItem>
1156 <displayItem lang="en" name="icon">mgppdemo.gif</displayItem>
1157 <serviceList>
1158 <service name="DocumentStructureRetrieve" type="retrieve" />
1159 <service name="DocumentMetadataRetrieve" type="retrieve" />
1160 <service name="DocumentContentRetrieve" type="retrieve" />
1161 <service name="ClassifierBrowse" type="browse" />
1162 <service name="ClassifierBrowseMetadataRetrieve"
1163 type="retrieve" />
1164 <service name="TextQuery" type="query" />
1165 <service name="FieldQuery" type="query" />
1166 <service name="AdvancedFieldQuery" type="query" />
1167 <service name="PhindApplet" type="applet" />
1168 </serviceList>
1169 <metadataList>
1170 <metadata name="creator">[email protected]</metadata>
1171 <metadata name="numDocs">11</metadata>
1172 <metadata name="buildType">mgpp</metadata>
[10775]1173 <metadata name="httpPath">http://kanuka:8090/greenstone3/sites/
[6312]1174 localsite/collect/mgppdemo</metadata>
1175 </metadataList>
1176 </collection>
1177</response>
1178\end{verbatim}\end{gsc}\end{quote}
1179
[7826]1180Subset options for a collection or serviceCluster include \gst{metadataList}, \gst{serviceList}, and \gst{displayItemList}.
[6312]1181
1182This collection provides many typical services. Notice how this response lists the services available, while the collection configuration file for this collection (Figure~\ref{fig:collconfig}) described serviceRacks. Once the service racks have been configured, they become transparent in the system, and only services are referred to.
[6335]1183There are three document retrieval services, for structural information, metadata, and content. The Classifier services retrieve classification structure and metadata. These five services were all provided by the GS2MGPPRetrieve ServiceRack. The three query services were provided by GS2MGPPSearch serviceRack, and provide different kinds of query interface. The last service, PhindApplet, is provided by the PhindPhraseBrowse serviceRack and is an applet service.
[6312]1184
1185A \gst{describe} request sent to a service returns a list of parameters that
[7826]1186the service accepts and some display information, (and in future may describe the content type for the request and response). Subset options for the request include \gst{paramList} and \gst{displayItemList}.
[6312]1187
[7826]1188Parameters can be in the following formats:
[6312]1189\begin{quote}\begin{gsc}\begin{verbatim}
1190<param name='xxx' type='integer|boolean|string|invisible' default='yyy'/>
1191<param name='xxx' type='enum_single|enum_multi' default='aa'/>
1192 <option name='aa'/><option name='bb'/>...
1193</param>
1194<param name='xxx' type='multi' occurs='4'>
1195 <param .../>
1196 <param .../>
1197</param>
1198\end{verbatim}\end{gsc}\end{quote}
1199
1200If no default is specified, the parameter is assumed to be mandatory.
1201Here are some examples of parameters:
1202\begin{quote}\begin{gsc}\begin{verbatim}
1203<param name='case' type='boolean' default='0'/>
1204
1205<param name='maxDocs' type='integer' default='50'/>
1206
1207<param name='index' type='enum' default='dtx'>
1208 <option name='dtx'/>
1209 <option name='stt'/>
1210 <option name='stx'/>
1211<param>
1212
1213<!-- this one is for the text box and field list for the
1214simple field query-->
1215<param name='simpleField' type='multi' occurs='4'>
1216 <param name='fqv' type='string'/>
1217 <param name='fqf' type='enum_single'>
1218 <option name='TI'/><option name='AU'/><option name='OR'/>
1219 </param>
1220</param>
1221
1222\end{verbatim}\end{gsc}\end{quote}
1223The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or many items, respectively, can be selected.
1224A multi-type parameter indicates that two or more parameters are associated, and should be displayed appropriately. For example, in a field query, the text box and field list should be associated. The occurs attribute specifies how many times the parameter should be displayed on the page.
1225Parameters also come with display information: all the text strings needed to present them to the user. These include the name of the parameter and the display values for any options. These are included in the above parameter descriptions in the form of \gst{<displayItem>} elements.
1226
1227A service description also contains some display information---this includes the name of the service, and the text for the submit button.
1228
[10880]1229Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} shows an example HTML search form that may be generated from this describe response.
[6312]1230
1231\begin{quote}\begin{gsc}\begin{verbatim}
1232<request lang="en" to="mgppdemo/FieldQuery" type="describe" />
1233
1234<response from="mgppdemo/FieldQuery" type="describe">
1235 <service name="FieldQuery" type="query">
1236 <displayItem name="name">Form Query</displayItem>
1237 <displayItem name="submit">Search</displayItem>
1238 <paramList>
1239 <param default="Doc" name="level" type="enum_single">
1240 <displayItem name="name">Granularity to search at</displayItem>
1241 <option name="Doc">
1242 <displayItem name="name">Document</displayItem>
1243 </option>
1244 <option name="Sec">
1245 <displayItem name="name">Section</displayItem>
1246 </option>
1247 <option name="Para">
1248 <displayItem name="name">Paragraph</displayItem>
1249 </option>
1250 </param>
1251 <param default="1" name="case" type="boolean">
1252 <displayItem name="name">Turn casefolding </displayItem>
1253 <option name="0">
1254 <displayItem name="name">off</displayItem>
1255 </option>
1256 <option name="1">
1257 <displayItem name="name">on</displayItem>
1258 </option>
1259 </param>
1260 <param default="1" name="stem" type="boolean">
1261 <displayItem name="name">Turn stemming </displayItem>
1262 <option name="0">
1263 <displayItem name="name">off</displayItem>
1264 </option>
1265 <option name="1">
1266 <displayItem name="name">on</displayItem>
1267 </option>
1268 </param>
1269 <param default="10" name="maxDocs" type="integer">
1270 <displayItem name="name">Maximum documents to return
1271 </displayItem>
1272 </param>
1273 <param name="simpleField" occurs="4" type="multi">
1274 <displayItem name="name"></displayItem>
1275 <param name="fqv" type="string">
1276 <displayItem name="name">Word or phrase </displayItem>
1277 </param>
1278 <param default="ZZ" name="fqf" type="enum_single">
1279 <displayItem name="name">in field</displayItem>
1280 <option name="ZZ">
1281 <displayItem name="name">allfields</displayItem>
1282 </option>
1283 <option name="TX">
1284 <displayItem name="name">text</displayItem>
1285 </option>
1286 <option name="TI">
1287 <displayItem name="name">Title</displayItem>
1288 </option>
1289 <option name="SU">
1290 <displayItem name="name">Subject</displayItem>
1291 </option>
1292 <option name="ORG">
1293 <displayItem name="name">Organization</displayItem>
1294 </option>
1295 <option name="SO">
1296 <displayItem name="name">Source</displayItem>
1297 </option>
1298 </param>
1299 </param>
1300 </paramList>
1301 </service>
1302</response>
1303\end{verbatim}\end{gsc}\end{quote}
1304
1305\begin{figure}[t]
1306 \centering
[25804]1307 \includegraphics[width=3.5in]{query2}
[6312]1308 \caption{The previous query service describe response as displayed on the search page.}
1309 \label{fig:query-display}
1310\end{figure}
1311
[10880]1312A describe request to an applet type service returns the applet HTML element: this will be embedded into a web page to run the applet.
[6312]1313\begin{quote}\begin{gsc}\begin{verbatim}
1314<request type='describe' to='mgppdemo/PhindApplet'/>
1315
1316<response type='describe'>
1317 <service name='PhindApplet' type='query'>
1318 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
1319 jaxp.jar, xml-apis.jar'
1320 CODE='org.greenstone.applet.phind.Phind.class'
1321 CODEBASE='lib/java'
1322 HEIGHT='400' WIDTH='500'>
1323 <PARAM NAME='library' VALUE=''/>
1324 <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
1325 <PARAM NAME='collection' VALUE='mgppdemo' />
1326 <PARAM NAME='classifier' VALUE='1' />
1327 <PARAM NAME='orientation' VALUE='vertical' />
1328 <PARAM NAME='depth' VALUE='2' />
1329 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
1330 <PARAM NAME='backdrop' VALUE='interfaces/default/>
1331 images/phindbg1.jpg'/>
1332 <PARAM NAME='fontsize' VALUE='10' />
1333 <PARAM NAME='blocksize' VALUE='10' />
1334 The Phind java applet.
1335 </applet>
1336 <displayItem name="name">Browse phrase hierarchies</displayItem>
1337 </service>
1338</response>
1339\end{verbatim}\end{gsc}\end{quote}
1340
[10880]1341Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the Receptionist must fill in this parameter before displaying the HTML.
[6312]1342
[7826]1343\subsection{'system'-type messages}\label{sec:system}
[6312]1344
[7826]1345``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently these requests are initiated by particular CGI requests (see Section~\ref{sec:runtime-config}).
[6312]1346
1347The basic format of a system request is as follows:
1348
1349\begin{quote}\begin{gsc}\begin{verbatim}
1350<request type='system' to=''>
1351 <system .../>
1352</request>
1353\end{verbatim}\end{gsc}\end{quote}
1354
1355One or more actual requests are specified in system elements. The following are examples:
1356\begin{quote}\begin{gsc}\begin{verbatim}
1357<system type='configure' subset=''/>
1358<system type='configure' subset='collectionList'/>
1359<system type='activate' moduleType='collection' moduleName='demo'/>
1360<system type='deactivate' moduleType='site' moduleName='site1'/>
1361\end{verbatim}\end{gsc}\end{quote}
1362
1363The first request reconfigures the whole site---the MessageRouter goes through its whole configure process again. The second request just reconfigures the collectionList---the MessageRouter will delete all its collection information, and re-look through the collect directory and reload all the collections again.
1364The third request is to activate collection demo. This could be a new collection, or a reactivation of an old one. If a collection module already exists, it will be deleted, and a new one loaded. The final request deactivates the site site1---this removes the site from the siteList and module map, and also removes any of that sites collections/services from the static lists.
1365
[7826]1366A response just contains a status message\footnote{TODO: add in error/status codes}, for example:
[6312]1367\begin{quote}\begin{gsc}\begin{verbatim}
[7826]1368<status>MessageRouter reconfigured successfully</status>
1369<status>Error on reconfiguring collectionList</status>
1370<status>collection:demo activated</status>
1371<status>site:site1 deactivated</status>
[6312]1372\end{verbatim}\end{gsc}\end{quote}
1373
1374System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests.
1375
1376\subsection{'format'-type messages}\label{sec:format}
1377
1378Collection designers are able to specify how their collection looks to a certain degree. They can specify format statements for display that will apply to the results of a search, the display of a document, entries in a classification hierarchy, for example. This info is generally service specific. All services respond to a format request, where they return any service specific formatting information. A typical request and response looks like this:
1379\begin{quote}\begin{gsc}\begin{verbatim}
1380<request lang="en" to="mgppdemo/FieldQuery" type="format" />
1381
1382<response from="mgppdemo/FieldQuery" type="format">
1383 <format>
1384 <gsf:template match="documentNode"><td><gsf:link>
1385 <gsf:metadata name="Title" />(<gsf:metadata name="Source" />)
1386 </gsf:link></td>
1387 </gsf:template>
1388 </format>
1389</response>
1390\end{verbatim}\end{gsc}\end{quote}
1391
[7826]1392The actual format statements are described in Section~\ref{sec:formatstmt}. They are templates written directly in XSLT, or in GSF (GreenStone Format) which is a simple XML representation of the more complicated XSLT templates.
1393GSF-style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format XML is transformed to XSLT using XSLT with the config\_format.xsl stylesheet.
[6312]1394
1395\subsection{'status'-type messages}\label{sec:status}
1396
[7826]1397These are only used with process-type services, which are those where a request is sent to start some type of process (see Section~\ref{sec:process}). An initial 'process' request to a 'process' service generates a response which states whether the process had successfully started, and whether its still continuing. If the process is not finished, status requests can be sent repeatedly to the service to poll the status, using the pid to identify the process. Status codes are used to identify the state of a process. The values used at the moment are listed in Table~\ref{tab:status codes}\footnote{A more standard set of codes should probably be used, for example, the HTTP codes}.
[6312]1398
1399\begin{table}
[6908]1400\caption{Status codes currently used in \gsiii\ }
[6312]1401\label{tab:status codes}
[6422]1402{\footnotesize
[6312]1403\begin{tabular}{llp{8cm}}
[6422]1404\hline
[6312]1405\bf code name & \bf code & \bf meaning \\
1406& \bf value & \\
[6422]1407\hline
[6312]1408SUCCESS & 1 & the request was accepted, and the process was completed \\
1409ACCEPTED & 2 & the request was accepted, and the process has been started, but it is not completed yet \\
1410ERROR & 3 & there was an error and the process was stopped \\
1411CONTINUING & 10 & the process is still continuing \\
1412COMPLETED & 11 & the process has finished \\
1413HALTED & 12 & the process has stopped \\
[6335]1414INFO & 20 & just an info message that doesn't imply anything \\
[6422]1415\hline
1416\end{tabular}}
[6312]1417\end{table}
1418
[6335]1419 The following shows an example status request, along with two responses, the first a 'OK but continuing' response, and the second a 'successfully completed' response. The content of the status elements in the two responses is the output from the process since the last status update was sent back.
[6312]1420
1421\begin{quote}\begin{gsc}\begin{verbatim}
1422<request lang="en" to="build/ImportCollection" type="status">
1423 <paramList>
1424 <param name="pid" value="2" />
1425 </paramList>
1426</request>
1427
1428<response from="build/ImportCollection">
1429 <status code="2" pid="2">Collection construction: import collection.
[10775]1430command = import.pl -collectdir /research/kjdon/home/greenstone3/web/sites/
[6312]1431 localsite/collect test1
1432starting
1433 </status>
1434</response>
1435
1436<response from="build/ImportCollection">
1437 <status code="11" pid="2">RecPlug: getting directory
[10775]1438/research/kjdon/home/greenstone3/web/sites/localsite/collect/test1/import
[6312]1439WARNING - no plugin could process /.keepme
1440
1441*********************************************
1442Import Complete
1443*********************************************
1444* 1 document was considered for processing
1445* 0 were processed and included in the collection
[10775]1446* 1 was rejected. See /research/kjdon/home/greenstone3/web/sites/
[6312]1447 localsite/collect/test1/etc/fail.log for a list of rejected documents
1448Success
1449 </status>
1450</response>
1451\end{verbatim}\end{gsc}\end{quote}
1452
[7826]1453\subsection{'process'-type messages}
[6312]1454
[6335]1455Process requests and responses provide the major functionality of the system---these are the ones that do the actual work. The format depends on the service they are for, so I'll describe these by service.
[6312]1456
1457Query type services TextQuery, FieldQuery, AdvancedFieldQuery (GS2MGSearch, GS2MGPPSearch), TextQuery (LuceneSearch)
1458The main type of requests in the system are for services. There are different types of services, currently: \gst{query}, \gst{browse}, \gst{retrieve}, \gst{process}, \gst{applet}, \gst{enrich}. Query services do some kind of search and return a list of document identifiers. Retrieve services can return the content of those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. Process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet. Enrich services take a document and return the document with some extra markup added.
1459
1460 Other possibilities include transform, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
1461
1462The basic structure of a service 'process' request is as follows:
1463\begin{quote}\begin{gsc}\begin{verbatim}
1464
1465<request lang='en' type='process' to='demo/TextQuery'>
1466 <paramList/>
1467 other elements...
1468</request>
1469
1470\end{verbatim}\end{gsc}\end{quote}
1471
1472The parameters are name-value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
1473
1474\begin{quote}\begin{gsc}\begin{verbatim}
1475<param name='case' value='1'/>
1476<param name='maxDocs' value='34'/>
1477<param name='index' value='dtx'/>
1478\end{verbatim}\end{gsc}\end{quote}
1479
1480Some requests have other content---for document retrieval, this would be a list of document identifiers to retrieve. For metadata retrieval, the content is the list of documents to retrieve metadata for.
1481
[6335]1482Responses vary depending on the type of request. The following sections look at the process type requests and responses for each type of service.
[6312]1483
1484\subsubsection{'query'-type services}
1485Responses to query requests contain a list of document identifiers, along with some other information, dependent on the query type. For a text query, this includes term frequency information, and some metadata about the result. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
1486
1487The following shows an example query request and its response.
1488
1489Find at most 10 Sections in the mgppdemo collection, containing the word snail (stemmed), returning the results in ranked order:
1490\begin{quote}\begin{gsc}\begin{verbatim}
1491<request lang='en' to="mgppdemo/TextQuery" type="process">
1492 <paramList>
1493 <param name="maxDocs" value="10"/>
1494 <param name="queryLevel" value="Section"/>
1495 <param name="stem" value="1"/>
1496 <param name="matchMode" value="some"/>
1497 <param name="sortBy" value="1"/>
1498 <param name="index" value="t0"/>
1499 <param name="case" value="0"/>
1500 <param name="query" value="snail"/>
1501 </paramList>
1502</request>
1503
1504<response from="mgppdemo/TextQuery" type="process">
1505 <metadataList>
1506 <metadata name="numDocsMatched" value="59" />
1507 </metadataList>
1508 <documentNodeList>
1509 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
1510 docType='hierarchy' nodeType="leaf" />
1511 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"
1512 docType='hierarchy' nodeType="leaf" />
1513 <documentNode nodeID="HASH010f073f22033181e206d3b7.1"
1514 docType='hierarchy' nodeType="interior" />
1515 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.2.2"
1516 docType='hierarchy' nodeType="leaf" />
1517 ...
1518 </documentNodeList>
1519 <termList>
1520 <term field="" freq="454" name="snail" numDocsMatch="58" stem="3">
1521 <equivTermList>
1522 <term freq="" name="Snail" numDocsMatch="" />
1523 <term freq="" name="snail" numDocsMatch="" />
1524 <term freq="" name="Snails" numDocsMatch="" />
1525 <term freq="" name="snails" numDocsMatch="" />
1526 </equivTermList>
1527 </term>
1528 </termList>
1529</response>
1530\end{verbatim}\end{gsc}\end{quote}
1531
[6335]1532The list of document identifiers includes some information about document type and node type. Currently, document types include \gst{simple}, \gst{paged} and \gst{hierarchy}. \gst{simple} is for single section documents, i.e. ones with no sub-structure. \gst{paged} is documents that have a single list of sections, while \gst{hierarchy} type documents have a hierarchy of nested sections. For \gst{paged} and \gst{hierarchy} type documents, the node type identifies whether a section is the root of the document, an internal section, or a leaf.
[6312]1533
[6335]1534The term list identifies, for each term in the query, what its frequency in the collection is, how many documents contained that term, and a list of its equivalent terms (if stemming or casefolding was used).
[6312]1535
1536\subsubsection{'browse'-type services}
1537
1538Browse type services are used for classification browsing. The request consists of a list of classifier identifiers, and some structure parameters listing what structure to retrieve.
1539
1540\begin{quote}\begin{gsc}\begin{verbatim}
1541<request lang="en" to="mgppdemo/ClassifierBrowse" type="process">
1542 <paramList>
1543 <param name="structure" value="ancestors" />
1544 <param name="structure" value="children" />
1545 </paramList>
1546 <classifierNodeList>
1547 <classifierNode nodeID="CL1.2" />
1548 </classifierNodeList>
1549</request>
1550
1551<response from="mgppdemo/ClassifierBrowse" type="process">
1552 <classifierNodeList>
1553 <classifierNode nodeID="CL1">
1554 <nodeStructure>
1555 <classifierNode nodeID="CL1">
1556 <classifierNode nodeID="CL1.2">
1557 <classifierNode nodeID="CL1.2.1" />
1558 <classifierNode nodeID="CL1.2.2" />
1559 <classifierNode nodeID="CL1.2.3" />
1560 <classifierNode nodeID="CL1.2.4" />
1561 <classifierNode nodeID="CL1.2.5" />
1562 </classifierNode>
1563 </classifierNode>
1564 </nodeStructure>
1565 </classifierNode>
1566 </classifierNodeList>
1567</response>
1568\end{verbatim}\end{gsc}\end{quote}
1569
[15209]1570Possible values for structure parameters are \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendants}. The response gives, for each identifier in the request, a \gst{<nodeStructure>} element with all the requested structure put together into a hierarchy. The structure may include classifier and document nodes.
[6312]1571
[22200]1572Structural info can also be requested in the \gst{paramList}, and will be returned in a \gst{<nodeStructureInfo>} element. (See the section on DocumentStructureRetrieve messages.) Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}.
[6312]1573
1574\subsubsection{'retrieve'-type services}
1575
[6335]1576Retrieval services are special in that requests are not explicitly initiated by a user from a form on a web page, but are called from actions in response to other things. This means that their names are hard-coded into the Actions. DocumentContentRetrieve, DocumentStructureRetrieve and DocumentMetadataRetrieve are the standard names for retrieval services for content, structure, and metadata of documents. Requests to each of these include a list of document identifiers. Because these generally refer to parts of documents, the elements are called \gst{<documentNode>}. For the content, that is all that is required. For the metadata retrieval service, the request also needs parameters specifying what metadata is required. For structure retrieval services, requests need parameters specifying what structure or structural info is required.
[6312]1577
1578Some example requests and responses follow.
1579
1580Give me the Title metadata for these documents:
1581\begin{quote}\begin{gsc}\begin{verbatim}
1582
1583<request lang="en" to="mgppdemo/DocumentMetadataRetrieve" type="process">
1584 <paramList>
1585 <param name="metadata" value="Title" />
1586 </paramList>
1587 <documentNodeList>
1588 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"/>
1589 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"/>
1590 <documentNode nodeID="HASH010f073f22033181e206d3b7.1"/>
1591 ...
1592 </documentNodeList>
1593</request>
1594
1595<response from="mgppdemo/DocumentMetadataRetrieve" type="process">
1596 <documentNodeList>
1597 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1598 <metadataList>
1599 <metadata name="Title">Putting snails in your second pen</metadata>
1600 </metadataList>
1601 </documentNode>
1602 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12">
1603 <metadataList>
1604 <metadata name="Title">Now you must decide</metadata>
1605 </metadataList>
1606 </documentNode>
1607 <documentNode nodeID="HASH010f073f22033181e206d3b7.1">
1608 <metadataList>
1609 <metadata name="Title">Introduction</metadata>
1610 </metadataList>
1611 </documentNode>
1612 </documentNodeList>
1613</response>
1614\end{verbatim}\end{gsc}\end{quote}
1615
[10880]1616One or more parameters specifying metadata may be included in a request. Also, a metadata value of \gst{all} will retrieve all the metadata for each document.
[6312]1617
1618Any browse-type service must also implement a metadata retrieval service to provide metadata for the nodes in the classification hierarchy. The name of it is the browse service name plus \gst{MetadataRetrieve}. For example, the ClassifierBrowse service described in the previous section should also have a ClassifierBrowseMetadataRetrieve service. The request and response format is exactly the same as for the DocumentMetadataRetrieve service, except that \gst{<documentNode>} elements are replaced by \gst{<classifierNode>} elements (and the corresponding list element is also changed).
1619
1620Give me the text (content) of this document:
1621\begin{quote}\begin{gsc}\begin{verbatim}
1622<request lang="en" to="mgppdemo/DocumentContentRetrieve" type="process">
1623 <paramList />
1624 <documentNodeList>
1625 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
1626 </documentNodeList>
1627</request>
1628
1629<response from="mgppdemo/DocumentContentRetrieve" type="process">
1630 <documentNodeList>
1631 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1632 <nodeContent>&lt;Section&gt;
1633 &lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
1634 &lt;P ALIGN=&quot;JUSTIFY&quot;&gt;190. When the plants in
1635 your second pen have grown big enough to provide food and
1636 shelter, you can put in the snails.&lt;/P&gt;
1637 </nodeContent>
1638 </documentNode>
1639 </documentNodeList>
1640</response>
1641\end{verbatim}\end{gsc}\end{quote}
1642
1643The content of a node is returned in a \gst{<nodeContent>} element. In this case it is escaped HTML.
1644
1645Give me the ancestors and children of the specified node, along with the number of siblings it has:
1646\begin{quote}\begin{gsc}\begin{verbatim}
1647<request lang="en" to="mgppdemo/DocumentStructureRetrieve" type="process">
1648 <paramList>
1649 <param name="structure" value="ancestors" />
1650 <param name="structure" value="children" />
1651 <param name="info" value="numSiblings" />
1652 </paramList>
1653 <documentNodeList>
1654 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
1655 </documentNodeList>
1656</request>
1657
1658<response from="mgppdemo/DocumentStructureRetrieve" type="process">
1659 <documentNodeList>
1660 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1661 <nodeStructureInfo>
1662 <info name="numSiblings" value="2" />
1663 </nodeStructureInfo>
1664 <nodeStructure>
1665 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd"
1666 docType='hierarchy' nodeType="root">
1667 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4"
1668 docType='hierarchy' nodeType="interior">
1669 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
1670 docType='hierarchy' nodeType="leaf" />
1671 </documentNode>
1672 </documentNode>
1673 </nodeStructure>
1674 </documentNode>
1675 </documentNodeList>
1676</response>
1677\end{verbatim}\end{gsc}\end{quote}
1678
[15209]1679Structure is returned inside a \gst{<nodeStructure>} element, while structural info is returned in a \gst{<nodeStructureInfo>} element. Possible values for structure parameters are as for browse services: \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendants}, \gst{entire}. Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}.
[6312]1680
1681\subsubsection{'process'-type services}\label{sec:process}
1682Requests to process-type services are not requests for data---they request some action to be carried out, for example, create a new collection, or import a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a response is sent back after a successful start to the command. The status may be polled by the requester to see how the process is going.
1683
1684Process requests generally contain just a parameter list. Like for any service, the parameters used by a process-type service can be obtained by a describe request to that service.
1685
1686Here are two example requests for process-services that are part of the build service cluster (hence the addresses all begin with 'build/'), followed by an example response:
1687
1688\begin{quote}\begin{gsc}\begin{verbatim}
1689<request lang='en' type='process' to='build/NewCollection'>
1690 <paramList>
1691 <param name='creator' value='[email protected]'/>
1692 <param name='collName' value='the demo collection'/>
1693 <param name='collShortName' value='demo'/>
1694 </paramlist>
1695</request>
1696
1697<request lang='en' type='process' to='build/ImportCollection'>
1698 <paramList>
1699 <param name='collection' value='demo'/>
1700 </paramlist>
1701</request>
1702
1703<response from="build/ImportCollection">
1704 <status code="2" pid="2">Starting process...</status>
1705</response>
1706\end{verbatim}\end{gsc}\end{quote}
1707
[7826]1708The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of the status element is (currently) just the output from the process so far. Status messages, which were described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.
[6312]1709
1710\subsubsection{'applet'-type services}
1711
[6335]1712Applet-type services are those that process the data for an applet. A request consists only of a list of parameters, and the response contains an \gst{<appletData>} element that contains the XML data to be returned to the applet. The format of this is entirely specific to the applet---there is no set format to the applet data.
[6312]1713
1714Here is an example request and response, used by the Phind applet:
1715\begin{quote}\begin{gsc}\begin{verbatim}
1716 <request type='query' to='mgppdemo/PhindApplet'>
1717 <paramList>
1718 <param name='pc' value='1'/>
1719 <param name='pptext' value='health'/>
1720 <param name='pfe' value='0'/>
1721 <param name='ple' value='10'/>
1722 <param name='pfd' value='0'/>
1723 <param name='pld' value='10'/>
1724 <param name='pfl' value='0'/>
1725 <param name='pll' value='10'/>
1726 </paramList>
1727 </request>
1728
1729 <response type='query' from='mgppdemo/PhindApplet'>
1730 <appletData>
1731 <phindData df='9' ef='46' id='933' lf='15' tf='296'>
1732 <expansionList end='10' length='46' start='0'>
1733 <expansion df='4' id='8880' num='0' tf='59'>
1734 <suffix> CARE</suffix>
1735 </expansion>
1736 ...
1737 </expansionList>
1738 <documentList end='10' length='9' start='0'>
1739 <document freq='78' hash='HASH4632a8a51d33c47a75c559' num='0'>
1740 <title>The Courier - N??159 - Sept- Oct 1996 Dossier Investing
1741 in People Country Reports: Mali ; Western Samoa
1742 </title>
1743 </document>
1744 ...
1745 </documentList>
1746 <thesaurusList end='10' length='15' start='0'>
1747 <thesaurus df='7' id='12387' tf='15' type='RT'>
1748 <phrase>PUBLIC HEALTH</phrase>
1749 </thesaurus>...
1750 </thesaurusList>
1751 </phindData>
1752 </appletData>
1753 </response>
1754
1755\end{verbatim}\end{gsc}\end{quote}
1756
1757\subsubsection{'enrich'-type services}
1758
1759Enrich services typically take some text of documents (inside \gst{<nodeContent>} tags) and returns the text marked up in some way. One example of this is the GatePOSTag service: this identifies Dates, Locations, People and Organizations in the text, and annotates the text with the labels. In the following example, the request is for Location and Dates to be identified.
[7826]1760
[6312]1761\begin{quote}\begin{gsc}\begin{verbatim}
1762<request lang="en" to="GatePOSTag" type="process">
1763 <paramList>
1764 <param name="annotationType" value="Date,Location" />
1765 </paramList>
1766 <documentNodeList>
1767 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd">
1768 <nodeContent>
1769 FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS
1770 Rome 1986
1771 P-69
1772 ISBN 92-5-102397-2
1773 FAO 1986
1774 </nodeContent>
1775 </documentNode>
1776 </documentNodeList>
1777</request>
1778
1779<response from="GatePOSTag" type="process">
1780 <documentNodeList>
1781 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd">
1782 <nodeContent>
1783 FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS
1784 <annotation type="Location">Rome</annotation>
[7826]1785 <annotation type="Date">1986</annotation>
[6312]1786 P-69
1787 ISBN 92-5-102397-2
1788 FAO <annotation type="Date">1986</annotation>
1789 </nodeContent>
1790 </documentNode>
1791 </documentNodeList>
1792</response>
1793\end{verbatim}\end{gsc}\end{quote}
1794
[7826]1795\subsection{Page generation}\label{sec:pagegen}
[6312]1796
[7826]1797A 'page' is some XML or HTML (or other?) data returned in response to an
[10880]1798external 'page'-type request. These requests originate from outside \gs\ , for example from a servlet, or Java application, and are received by the Receptionist. As described below in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\ URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to.
[6312]1799
[7826]1800Action modules decode the rest of the arguments to determine what requests need to be made to the system. One or more internal requests may be made to the MessageRouter. A request for format information from the Collection/Service may also be made. The resulting data is gathered together into a single XML response, \gst{<page>}, and returned to the Receptionist.
[6312]1801
[7826]1802The page format is described in Section~\ref{sec:page-format}. The XML may be returned as is, or may be modified by the Receptionist. The various Receptionists are described in Section~\ref{sec:recepts}. The default receptionist used by a servlet transforms the XML into HTML using XSL stylesheets. Section~\ref{sec:collformat} looks at collection specific formatting, in particular for HTML output.
1803Sections~\ref{sec:pageaction} to \ref{sec:systemaction} look at the various actions and what kind of data they gather.
[6312]1804
[7826]1805\subsubsection{'page'-type requests and their arguments}\label{sec:page-requests}
[6312]1806
[7826]1807These are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a \gs\ URL. The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
1808
1809Here are some examples of requests\footnote{In a servlet context, these correspond to the arguments \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
1810
1811\begin{quote}\begin{gsc}\begin{verbatim}
1812<request type='page' action='p' subaction='about'
1813 lang='fr' output='html'>
1814 <paramList>
1815 <param name='c' value='demo'/>
1816 </paramList>
1817</request>
1818\end{verbatim}\end{gsc}\end{quote}
1819
1820\begin{quote}\begin{gsc}\begin{verbatim}
1821<request type='page' action='q' lang='en' output='html'>
1822 <paramList>
1823 <param name='s' value='TextQuery'/>
1824 <param name='c' value='demo'/>
1825 <param name='rt' value='r'/>
1826 <!-- the rest are the service specific params -->
1827 <param name='ca' value='0'/> <!-- casefold -->
1828 <param name='st' value='1'/> <!-- stem -->
1829 <param name='m' value='10'/> <!-- maxdocs -->
1830 <param name='q' value='snail'/> <!-- query string -->
1831 </paramList>
1832</request>
1833\end{verbatim}\end{gsc}\end{quote}
1834
1835There are some standard arguments used in Greenstone, and they are described in Table~\ref{tab:args}. These are used by Receptionists and Actions. The GSParams class specifies all the general basic arguments, and whether they should be saved or not (Some arguments need to be saved during a session, and this needs to be implemented outside \gs\ proper --- currently we do this in the servlet, using servlet session handling). The servlet has an init parameter \gst{params\_class} which specifies which params class to use: GSParams can be subclassed if necessary. The Receptionist and Actions must not have conflicting argument names.
1836
1837Other arguments are used dynamically and come from the Services. Service arguments must always be saved during a session. Services may be created by different people, and may reside on a different site. There is no guarantee that there is no conflict with argument names between services and actions. Therefore service parameters are namespaced when they are put on the page, whereas interface (receptionist and action) parameters have no namespace. The default namespace is s1 (service1) --- any parameters that are for the service will be prefixed by this. For example, the case parameter for a search will be put in the page as s1.case, and the resulting argument in a search URL will be s1.case. When actions are deciding which parameters need to be sent in a request to a service, they can use the namespace information.
1838
1839If there are two or more services combined on a page with a single submit button, they will use namespaces s1, s2, s3 etc as needed. The s (service) parameter will end up with a list of services. For example, \gst{s=TextQuery,MusicQuery,} and the order of these determines the mapping order of the namespaces, i.e. s1 will map to TextQuery, s2 to MusicQuery.
1840
1841\begin{table}
1842{\footnotesize
1843\begin{tabular}{lll}
1844\hline
1845\bf Argument & \bf Meaning &\bf Typical values \\
1846\hline
1847a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
1848& & s (system)\\
1849sa & subaction & home, about (page action)\\
1850c & collection or & demo, build \\
1851& service cluster \\
1852s & service name & TextQuery, ImportCollection \\
1853rt & request type & d (display), r (request), s (status) \\
1854ro & response only & 0 or 1 - if set to one, the request is carried out \\
1855& & but no processing of the results is done \\
1856& & currently only used in process actions \\
[10880]1857o & output type & XML, HTML, WML \\
[7826]1858l & language & en, fr, zh ...\\
1859d & document id & HASHxxx \\
1860r & resource id & ???\\
1861pid & process handle & an integer identifying a particular process request \\
1862\hline
1863\end{tabular}}
1864\caption{Generic arguments that can appear in a \gs\ URL}
1865\label{tab:args}
1866\end{table}
1867
1868\subsubsection{page format}\label{sec:page-format}
1869
[6312]1870The basic page format is:
1871\begin{quote}\begin{gsc}\begin{verbatim}
[7826]1872<page lang='en'>
[6312]1873 <pageRequest/>
1874 <pageResponse/>
1875</page>
1876\end{verbatim}\end{gsc}\end{quote}
1877
[6335]1878* show configuration and describe whats its used for
[6312]1879
[6335]1880There are two main elements in the page: pageRequest, pageResponse. The pageRequest is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The pageResponse contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (e.g. library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
[6312]1881
1882The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
1883
1884
[6335]1885Once the XML page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
[6312]1886located in interfaces/default/transforms. Collections, sites and other interfaces
1887can override these files by having their own copy of the appropriate
1888files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
1889interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
[13893]1890[TODO: describe a bit more?? currently only can get this locally]
[6312]1891
1892\subsubsection{Receptionists}\label{sec:recepts}
1893
[6908]1894The receptionist is the controlling module for the page generation part of \gs\ . It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages.
[6312]1895
[6908]1896There are different ways of providing an interface to \gs\ , from web based CGI style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.
[6312]1897
[6335]1898Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they don't do anything.
[6312]1899
[6335]1900TransformingReceptionist: This extends Receptionist, and overwrites postProcessPage to transform the page using XSLT. An XSLT is listed for each action in the receptionists configuration file, and this is used to transform the page. First, some display information, and configuration information is added to the page. Then it is transformed using the specified XSLT for the action, and returned.
[6312]1901
[6335]1902WebReceptionist: The WebReceptionist extends TransformingReceptionist. It doesn't do much else except some argument conversion. To keep the URLs short, parameters from the services are given shortnames, and these are used in the web pages.
[6312]1903
[6908]1904DefaultReceptionist: This extends WebReceptionist, and is the default one for \gsiii\ servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. The receptionist sends a describe request to the collection to get this, and appends it to the page before transformation using XSLT.
[6312]1905
[7826]1906By default, the LibraryServlet uses DefaultReceptionist. However, there is a servlet init-param called \gst{receptionist} which can be set to make the servlet use a different one.
[6312]1907
[7826]1908\subsubsection{Collection specific formatting}\label{sec:collformat}
[10880]1909get format info, transform gsf->xsl. transform xml->html
[7826]1910
[10880]1911configuration params are passed in to the transformation
[6335]1912\subsubsection{CGI arguments}
[6312]1913
1914
[7826]1915\subsubsection{Page action}\label{sec:pageaction}
[6312]1916
[10880]1917PageAction is responsible for displaying kinds of information pages, such as the home page of the library, or the home page of a collection, or the help and preferences pages. These pages are not associated with specific services like the other page types. In general, the data comes from describe requests to various modules.
[7826]1918The different pages are requested using the subaction argument. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata
1919and a list of services.
[6312]1920
[13281]1921To get an external html page embedded into a greenstone collection, i.e. a two frame page, with the top frame containing the collection header and navigation bar, and the second frame containg the external page, use subaction html.
1922A url would look like
1923a=p\&amp;sa=html\&amp;c=collname\&amp;url=externalurl
[6312]1924
[7826]1925\subsubsection{Query action}\label{sec:queryaction}
[6312]1926
[6335]1927The basic URL is \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}.
[6312]1928There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
1929For each page, the service description is requested from the service of the current collection (via a describe request). This is currently done every time the query page is
1930displayed, but should be cached. The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers
1931is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
[7826]1932documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the XSLT that will be used to transform the page. The service description and query result are combined into a page of XML, which is returned to the Receptionist.
[6312]1933
[7826]1934\subsubsection{Applet action}\label{sec:appletaction}
[6312]1935
1936There are two types of request to the applet action: \gst{a=a \& rt=d\/} and
1937\gst{a=a \& rt=r\/}. The value \gst{rt=d\/} means ``display the applet.'' A
1938\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this
1939into the page, and the servlet returns the HTML.
1940
[7826]1941The value \gst{rt=r} signals a request from the applet. A process request containing all the parameters is sent to the applet service. The result contains an appletData element, which contains a single element - this element is returned
1942directly to the applet, in XML. No transformation is done.
1943Because the AppletAction doesn't know or care anything about the applet data, it can work with any applet-service pair.
[6312]1944
1945Note that the applet HTML may need to know the name of the \gst{library}
1946program. However, that name is chosen by the person who installed the software
1947and will not necessarily be ``library''. To get around this, the applet can
1948put a parameter called ``library'' into the applet data with a null value:
1949\begin{quote}\begin{gsc}\begin{verbatim}
1950<PARAM NAME='library' VALUE=''/>
1951\end{verbatim}\end{gsc}\end{quote}
[7826]1952When the AppletAction encounters this parameter it inserts the name of the
[6312]1953current library servlet as its value.
1954
[7826]1955\subsubsection{Document action}\label{sec:documentaction}
[6312]1956
[7826]1957DocumentAction is responsible for displaying a document to the user. The display might involve some metadata and/or text for a document or part of a document. For hierarchical documents, a table of contents may be shown, while for paged documents (those with a single linear list of sections), next and previous page buttons may be shown. These different display types require different information about the document. Depending on the arguments, DocumentAction will send requests to several services: DocumentMetadataRetrieve, DocumentStructureRetrieve and DocumentContentRetrieve.
[6312]1958
[7826]1959A basic display, for example, Title and text, involves a metadata request to get the Title, and a content request to get the text. Hierarchical table of contents display requires a structure request. If the entire contents is to be displayed, the parameter \gst{structure=entire} would be sent in the request. Otherwise, parameters \gst{structure=ancestors}, \gst{structure=children} and possibly \gst{structure=siblings} may be used, depending in the position of the current node in the document. These return a hierarchical structure of nodes, containing ancestor nodes, child nodes and sibling nodes, respectively.
1960For paged display, the structure is not actually needed. A structure request is still sent, but this time it requests some information, rather the structure itself. The information requested includes the number of siblings and the current position of the current node, or the number of children (if the current node is the root of the document).
[6312]1961
[7826]1962Metadata may be requested for the current node, or for any nodes in the structure, and content also. The metadata and content are added into the appropriate nodes in the structure hierarchy, and this is returned as the page data.
[6312]1963
[7826]1964\subsubsection{XML Document action}\label{sec:xmldocumentaction}
1965
[10880]1966XMLDocumentAction is a little different to the standard DocumentAction. It operates in two modes, \gst{text} and \gst{toc}. In \gst{text} mode, it will retrieve the content of the current document node using a DocumentContentRetrieve request. In \gst{toc} mode, it retrieves the entire table of contents for the document using a DocumentStructureRetrieve request. Either mode may also retrieve metadata for the current section or each section in the table of contents.
[7826]1967
1968\subsubsection{GS2Browse action}\label{sec:browseaction}
1969
[13920]1970GS2BrowseAction is for displaying \gsii\ style classifiers.
[7826]1971\subsubsection{System action}\label{sec:systemaction}
1972
[6335]1973SystemAction allows for manual reconfiguration of various components at run-time. There is no interactive web-page displaying the options, it merely turns a set of CGI arguments into an XML system request. The response from a system request is a message which is displayed to the user.
[6312]1974
1975\begin{table}
[6335]1976\caption{Configure CGI arguments}
[6312]1977\label{tab:system-cgi}
[6422]1978{\footnotesize
[6312]1979\begin{tabular}{ll}
1980\hline
1981\bf arg & \bf description\\
[6422]1982\hline
[6312]1983a=s & system action\\
1984sa=c$|$a$|$d & type of system request: c (configure), a (add/activate), \\
1985& d (delete/deactivate) \\
1986c=demo & the request will go to this collection/servicecluster \\
1987& instead of the message router\\
1988ss=collectionList & subset for configure: only reconfigure this part.\\
1989& For the MessageRouter, can be serviceClusterList, serviceList, \\
1990& collectionList, siteList.\\
1991& For a collection/cluster, can be metadataList or serviceList.\\
1992sn=demo & \\
1993st=collection& \\
1994\hline
[6422]1995\end{tabular}}
[6312]1996\end{table}
1997
1998
[7826]1999\subsection{Other code information}
2000
2001Greenstone has a set of Utility classes, which are briefly described in Table~\ref{tab:utils}.
2002
[6908]2003\begin{table}[h]
[6312]2004\caption{The utility classes in org.greenstone.gsdl3.util}
2005\label{tab:utils}
[6422]2006{\footnotesize
[6312]2007\begin{tabular}{lp{3.75in}}
2008\hline
2009\bf Utility class & \bf Description\\
[6422]2010\hline
[10863]2011CollectionClassLoader & ClassLoader that knows about a collection's resource directory \\
2012DBInfo & Class to hold info from GDBM database entry \\
[7826]2013Dictionary & wrapper around a Resource Bundle, providing strings with parameters\\
[10863]2014GDBMWrapper & Wrapper for GDBM database. Uses JavaGDBM\\
[7826]2015GSConstants & holds some constants used for servlet arguments and configuration variables\\
2016GSEntityResolver & an EntityResolver which can be used to find resources such as DTDs\\
[6908]2017GSFile & class to create all \gs\ file paths e.g. used to locate configuration files, XSLT files and collection data. \\
[6335]2018GSHTML & provides convenience methods for dealing with HTML, e.g. making strings HTML safe\\
[7826]2019GSParams & contains names and default values for interface parameters\\
[16865]2020GS2Params & a subclass of GSParams which holds default service parameters too, necessary for the gs2 style interface.\\
[6312]2021GSPath & used to create, examine and modify message address paths\\
2022GSStatus & some static codes for status messages\\
[6908]2023GSXML & lots of methods for extracting information out of \gs\ XML, and creating some common types of elements. Also has static Strings for element and attribute names used by \gs\ .\\
2024GSXSLT & some manipulation functions for \gs\ XSLT\\
[10863]2025GlobalProperties & Holds the global properties (from global.properties) \\
[10880]2026MacroResolver & Used with replace elements in collection configuration files, replaces a macro or string with another string, metadata or text from a dictionary\\
[10863]2027GS2MacroResolver & MacroResolver for GS2 collections, that uses the GDBM database\\
[6312]2028Misc & miscellaneous functions\\
[10863]2029MyNodeList & A simple implementation of an XML NodeList\\
[6908]2030OID & class to handle \gs\ (2) OIDs\\
[10863]2031Processing & Runs an external process and prints the output from the process \\
[7826]2032SQLQuery & contains a connection to a SQL database, along with some methods for accessing the data, such as converting MG numbers to and from Greenstone OIDs.\\
[6312]2033XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
2034XMLTransformer & methods to transform XML using XSLT \\
2035XSLTUtil & contains static methods to be called from within XSLT \\
2036\hline
[6422]2037\end{tabular}}
[6312]2038\end{table}
2039
2040
[6422]2041\newpage
[6908]2042\section{Developing \gsiii\ : Adding new features}\label{sec:new-features}
[6312]2043
[13893]2044[TODO: finish this section ]
2045
[13954]2046\subsection{Creating and using new services}\label{sec:new-services}
[6312]2047
[13954]2048There are three parts to adding new services to \gsiii: defining the new service, specifying that it should be loaded, and using it. If you are talking to \gs\ using the SOAP interface, then the firsttwo parts are all that need to be done. If you are using the Greenstone servlet interface, then you may need to do work for the third part, depending on what kind of new service it is.
2049If you are adding a service of a type that is already present, for example, a new query service, then the query action can just use your new service as is (assuming it is set up in the same way as the standard query services).
2050However, if it is a new type of service that the interface and actions don't know about, you willl need to add a new action or modify an existing one so that your service is actually used.
[6312]2051
[13954]2052\subsubsection{Creating the service}
[6312]2053
[13954]2054You will need to write a new Java class which inherits from \gst{org.greenstone.gsdl3.service.ServiceRack} (or a subclass of this). The class will need to implement at least the \gst{configure}, \gst{process<ServiceName>} and \gst{getServiceDescription} methods. There is a dummy class called \gst{MyNewServicesTemplate.java} in \gst{greenstone3/resources/java} which describes these methods and what needs to be done.
[6312]2055
[13954]2056\gst{ServiceRack.java} handles the main \gst{process} method. If the request type is 'describe', then it will send back a copy of short\_service\_info, which contains a list of services. If there request type is describe, but for a particular service, then it will call \gst{getServiceDescroption} for that service. For a format request, it will send any format element found in format\_info\_map for that service. For a processing request to a service, then the \gst{process<ServiceName>} method will be called.
[6312]2057
[13954]2058Once the class is written, it needs to be compiled up and either included in one of the existing jar files, or added in as a jar file to \gst{greenstone3/web/WEB-INF/lib} or a class file to \gst{greenstone3/web/WEB-INF/classes}.
[6518]2059
[13954]2060\subsubsection{Loading the service}
[6518]2061
[13954]2062To have the library load in your new service, it needs to be specified in a configuration file somewhere. For a collection service, add a new \gst{<serviceRack>} element to the collection's \gst{buildConfig.xml} file. This element should contain any information that the class needs to configure its service(s). For a site-wide service, add the \gst{<serviceRack>} element to the site's \gst{siteConfig.xml} file, either in the \gst{serviceRackList} or as part of a \gst{serviceCluster}.
2063
2064\subsubsection{Using the service}
2065
2066If you are using the SOAP web service, then you can send an XML request directly to the service. The 'address' of the request will be the service name if it is a site-wide service, cluster-name/service-name if it is site-wide but belonging to a cluster, or collection-name/service-name if it belongs to a collection. You will need to know the format of the XML request and response that the service expects and returns.
2067
2068If you want to access your new service through the current servlet interface that uses actions, then whether you need to do more work or not depends on whatkind of service you have implemented. If you have written a new query or browse service, for example, that has teh same request and response format as the existing services, then you don't need to do anything else. Your collection can just use the new query service straight away.
2069If the service is of an existing type, but needs soemthing different in the request/response format, then you may need to modify an existing action to supply or use the new information.
2070If the service is of a completely new type, then you will probably need a new action to talk to the service and display the results.
2071
2072
[6422]2073\subsection{creating new actions/pages}\label{sec:new-pages}
[6312]2074
[6422]2075\subsection{new interfaces}\label{sec:new-interfaces}
[7826]2076
2077It is easy to create new interfaces to \gsiii. Here we are talking about interfaces other than those to display in typical browser.
2078
2079Handheld devices: Use the standard servlet setup, but with a different set of XSLT files to format the pages for small screens, or use WML.
2080
2081Java GUI Interface: There are couple of alternatives. Depending on what you want to display in the GUI, you could talk to either a Receptionist or a MessageRouter. The library classes can be set up and compiled into the GUI program.
[10880]2082Talking to a Receptionist will give you access to pages of XML. It is likely that the standard Receptionist class would be used - this doesn't transform the data to HTML. Queries such as ``give me the home page of a collection'' and ``do the following search'' can be issued. All the data needed for the result view is returned. Queries are quite simple, but are limited to what kinds of Actions are available in the library.
2083Talking to a MessageRouter requires a bit more effort on the part of the GUI program, but results in greater flexibility. The kinds of queries that can be issued are individual units of action, such as ``describe yourself'', ``search'', ``retrieve the content for this document''. More than one request may need to be made for a particular feature of the GUI. However you can ask for any combination of data available in the system, you are not relying on Actions. What you will implement though, may be a lot like the Action code in terms of request sequences.
[7826]2084
[10880]2085Interfaces in other programming languages: Because the communication is all XML based, other interfaces can talk to the Java library if a communication protocol is set up. This could be done using SOAP for example. Like for Java GUI interfaces, the program could talk to a Receptionist or to a MessageRouter.
2086e.g. Java interface. where you can interface to. MR vs Receptionist. different receptionists. e.g., handheld - using servlet, transforming recpt, but new set of XSLT Java program other program - talk to recpt but just get back XML data for pages. Java gui - just talk to MR, do all processing itself.
[6312]2087
[7826]2088Remote interfaces: remote interfaces can be set up in the same way as above, using a communication protocol between the interface, and the library program.
2089
[6312]2090
[6422]2091\subsection{New types of collections}\label{sec:new-coll-types}
[6312]2092
[13893]2093The standard type of collection is built with the \gsii\ Perl collection building system. There are many options to this, but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\ has an ability to use any type of collection you can come up with, assuming some Java code is provided.
[6312]2094
[10880]2095There are four levels of customization that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\ to describe these different levels.
[6312]2096
[13893]2097Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, MGPP collections were the first to be served in \gsiii\ . When we came to do MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
[6312]2098
[7826]2099The XML Sample Texts (gberg) collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
[6312]2100
[13893]2101To tell \gs\ to load up a different type of collection class, we use another configuration file: \gst{etc/collectionInit.xml}. This specifies the name of the collection class to use.
[6422]2102Currently, this is all that is specified in that file, but you may want to add parameters for the class etc.
2103
2104\gst{<collectionInit class="XMLCollection"/>}
2105
[6908]2106The display for the collection is also quite different. The home page for the collection displays the list of documents. To achieve this, the describe response from the collection had to include the list, and a new XSLT was written for the collection that displayed this. Collection XSLT should be put in the transform directory of the collection\footnote{These are currently only used when running \gs\ in a non-distributed fashion, but it will be added in properly at some stage}.
[6422]2107
[6908]2108Document display is significantly different to standard \gs\ . There are two modes of display: table of contents mode, and content mode. Clicking on a document link from the collection home page takes the user to the table of contents for the collection. Clicking on one of the sections in the table of contents takes them to a display of that section. To facilitate this, not only do we need new XSLT files , we also needed a new action. XMLDocumentAction was created, that used two subactions, toc and text, for the different modes of display.
[6422]2109
[7826]2110The Receptionist was told about this new action by the addition of the following element to the interfaceConfig.xml file:
[6422]2111
2112\begin{gsc}\begin{verbatim}
2113<action name='xd' class='XMLDocumentAction'>
2114 <subaction name='toc' xslt='document-toc.xsl'/>
2115 <subaction name='text' xslt='document-content.xsl'/>
2116</action>
2117\end{verbatim}\end{gsc}
2118
2119XSLT files are linked to subactions rather than the action as a whole. The collection supplies the two XSLT files written appropriately for the data it contains.
2120
2121All links that link to the documents have to be changed to use the xd action rather than the standard d action. These include the links from the home page, and the links from query results.
2122
2123Querying of the collection is almost the same as usual. The query service provides a list of parameters, does the query and then sends back a list of document identifiers. The standard query action was fine for this collection. The change occurs in the way that the results are displayed---this is accomplished using a format statement supplied in the collectionConfig file inside the search node.
2124
2125\begin{gsc}\begin{verbatim}
2126<search>
2127 <format>
2128 <gsf:template match="documentNode">
2129 <xsl:param name="collName"/>
2130 <xsl:param name="serviceName"/>
2131 <td>
2132 <b><a href="{$library_name}?a=xd&amp;sa=text&amp;c={$collName}&
2133 amp;d={@nodeID}&amp;p.a=q&amp;p.s={$serviceName}">
2134 <xsl:choose>
2135 <xsl:when test="metadataList/metadata[@name='Title']">
2136 <gsf:metadata name="Title"/>
2137 </xsl:when>
2138 <xsl:otherwise>(section)</xsl:otherwise>
2139 </xsl:choose>
2140 </a>
2141 </b> from <b><a href="{$library_name}?a=xd&amp;sa=toc&amp;
2142 c={$collName}&amp;d={@nodeID}.rt&amp;p.a=q&amp;p.s={$serviceName}">
2143 <gsf:metadata name="Title" select="root"/></a></b>
2144 </td>
2145 </gsf:template>
2146 </format>
2147</search>
2148\end{verbatim}\end{gsc}
2149
[6908]2150Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not \gs\ format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with \gs\ format elements.
[6422]2151
[13893]2152The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to make all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder \gst{\$GSDL3SRCHOME/resources/dtd}.
[6422]2153
[16865]2154\subsection{The gs2 Interface}
[6422]2155
[16865]2156The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\ instead of \gsii\ . It uses a new site (nzdl) with a new interface (nzdl) which is based on the gs2 interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and nzdl interface.
[6422]2157
[7826]2158The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because it is running on Linux, we were able to link to all the collections in the old \gs\ installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
[6422]2159
[16865]2160The gs2 interface was created to be used by this site (and is now a standard part of Greenstone).
2161In many cases, creating a new interface just requires the new images and XSLT to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This gs2 interface required a bit more customization.
[6422]2162
[16878]2163The standard \gsiii\ navigation bar lists all the services available for the collection. In \gsii\ , the navigation bar provides the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produces the navigation bar needed to be altered to produce this.
2164The standard receptionist (DefaultReceptionist) gathers a little bit of extra information for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface uses the classifier list that comes in the ClassifierBrowse service description to display teh list of classifiers.
[6422]2165
[16878]2166The nzdl interface extends the gs2 interface to provide a different looking home page and an extra static 'gsdl' page.
[6422]2167
2168\newpage
[6908]2169\section{Distributed \gs\ }\label{sec:distributed}
[6312]2170
[8472]2171\gs\ is designed to run in a distributed fashion. One \gs\ installation can talk to several sites on different computers. This requires some sort of communication protocol. Any protocol can be used, currently we have a simple SOAP protocol.
[6422]2172
2173more explanation..
2174
2175\begin{figure}[h]
[6312]2176 \centering
2177 \includegraphics[width=4in]{remote} %5.8
[6335]2178 \caption{A distributed digital library configuration running over several servers}
[6312]2179 \label{fig:remote}
2180\end{figure}
2181
[13893]2182We have used Apache Axis SOAP implementation. This is run as a servlet in Tomcat. Axis is set up during installation of Greenstone. For more details about SOAP in Greenstone, see Appendix~\ref{app:soap}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
[6312]2183
[6422]2184\subsection{Serving a site using soap}
[6312]2185
[13893]2186A web service for localsite comes with \gs. However, it is not deployed by default. To deploy it, run run \gst{ant deploy-localsite}. If you want to set up web services for other sites, run \gst{ant soap-deploy-site}. This will prompt you for the sitename (its directory name), and a siteuri - a unique identifier for the web service. Tomcat needs to be running for this to work, and you need to have installed the \gs source code.
[6930]2187
[13893]2188The ant target deploys the service for the site specified. A resource file (\gst{<sitename>.wsdd}) is created which is used to specify the service. It can be found in \gst{\$GSDL3HOME/resources/soap}, and is generated from \gst{site.wsdd.template}.
[6930]2189
[13281]2190The address of the new SOAP service will be tomcatserver-address/greenstone3/services/sitename, for example, www.greenstone.org/greenstone3/services/localsite.
[6930]2191
[13281]2192\subsection{Connecting to a site web service}
2193
2194There are two ways to use a remote site. First, if you have a local site running, then the site can also connect to other remote sites. In the siteConfig.xml file, you need to add a site element into the siteList element.
2195
[13893]2196For example, to get siteA to talk to siteB, you need to deploy a SOAP server on siteB, then add a \gst{<site>} element to the \gst{<siteList>} of siteA's \gst{siteConfig.xml} file (in \gst{\$GSDL3HOME/sites/siteA/siteConfig.xml}).
[13281]2197
[10775]2198In the \gst{<siteList>} element, add the following (substituting the chosen site uri for siteBuri):
[6930]2199
2200\begin{gsc}\begin{verbatim}
[10775]2201<site name="siteBuri"
2202 address="http://localhost:8080/greenstone3/services/siteBuri"
[6930]2203 type="soap"/>
2204\end{verbatim}\end{gsc}
2205
[13281]2206(Note that localhost and 8080 should be changed to the values you entered when installing \gsiii. Localhost will only work for servers on the smae machine.).
[6930]2207
[13281]2208If you have changed the siteConfig.xml file for a site that is running, it will need to be reconfigured. Either restart Tomcat, or reconfigure through a URL:
2209e.g. \gst{http://localhost:8080/greenstone3/library?a=s\&sa=c}.
2210Several sites can be connected to in this manner.
2211
[13893]2212The second option is if you have a receptionist set up on a machine where you have no site, and you only want to connect to a single remote site. Instead of using site\_name in the servlet initialisation parameters (in \$GSDL3HOME/WEB-INF/web.xml), you can specify remote\_site\_name, remote\_site\_type and remote\_site\_address. A communicator object will be set up instead of a MessageRouter and the receptionist will talk to the communicator.
[13281]2213
[6312]2214\appendix
[6422]2215
2216\newpage
[6908]2217\section{Using \gsiii\ from CVS}\label{app:cvs}
[6422]2218
[6908]2219\gsiii\ is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes.
[6422]2220
[9874]2221Note that you will need the Java 2 SDK, version 1.4.0 or higher, and Ant (Apache's Java based build tool, http://ant.apache.org) installed.
[6499]2222
[6908]2223To check out the \gs\ code, use:
[6422]2224
2225\begin{quote}\begin{gsc}\begin{verbatim}
2226cvs -d :pserver:cvs\[email protected]:2402/usr/local/
[10777]2227 global-cvs/gsdl-src co -P greenstone3
[6422]2228\end{verbatim}\end{gsc}\end{quote}
2229
2230If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some older versions of CVS have trouble accessing this repository due to the port number being present. We are using version 1.11.1p1.
2231
[9874]2232Greenstone is built and installed using Ant (Apache's Java based build tool,
2233http://ant.apache.org). You will need a Java Development
[13893]2234Environment (1.4 or higher), and Ant installed to use Greenstone. You can download Ant from \\\gst{http://ant.apache.org/bindownload.cgi}. Make sure that the environment variables JAVA\_HOME and ANT\_HOME are set.
[6422]2235
[13893]2236In the \gst{greenstone3} directory, you can run \gst{'ant'} which will give you a help message.
2237Running \gst{'ant -projecthelp'} gives a list of the targets that you can run --- these
[9874]2238do various things like compile the source code, startup the server etc.
[6422]2239
[13893]2240The \gst{README.txt} file has up-to-date instructions for installing from CVS. Briefly, for a first time install, run \gst{'ant prepare install'}.
[6422]2241
[13893]2242The file \gst{build.properties} contains various parameters that can be set by the user. Please check these settings before running the installation process. The install process will ask you if you accept the properties before starting.
[9874]2243For a non-interactive version of the install, run
[13893]2244\gst{'ant -Dproperties.accepted=yes install'}
[6422]2245
[9874]2246To log the output in build.log, run
[13893]2247\gst{'ant -Dproperties.accepted=yes -logfile build.log install'}
[9874]2248
[13893]2249Compilation includes Java and C/C++. On Windows, you will need to have Visual Studio or equivalent installed. Please check the \gst{compile.windows.c++.setup} property in build.properties --- make sure it is set to the setup script of Visual Studio.
[9874]2250
[13893]2251Note: \gst{gs3-setup} sets the environment variables \gst{GSDL3HOME, GSDL3SRCHOME, CLASSPATH, PATH, JAVA\_HOME} and needs to be done in a shell before doing collection building etc.
[9874]2252
[13893]2253To run the library, use the \gst{gs3-server.sh/bat} shell scripts.
[6499]2254
[6422]2255\newpage
[6335]2256\section{Tomcat}\label{app:tomcat}
[6312]2257
[13920]2258Tomcat is a servlet container, and \gsiii\ runs as a servlet inside it.
[6312]2259
[13893]2260The file \gst{\$GSDL3SRCHOME/packages/tomcat/conf/server.xml} is the Tomcat configuration file. A context for \gsiii\ is given by the file\\ \gst{\$GSDL3SRCHOME/packages/tomcat/conf/Catalina/localhost/greenstone3.xml}. This tells Tomcat where to find the web.xml file, and what URL (\gst{/greenstone3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\$GSDL3HOME} can be accessed through the URL \gst{localhost:8080/greenstone3/index.html}. The gs2mgdemo collection's images can be accessed through \\
2261\gst{localhost:8080/greenstone3/sites/localsite/collect/gs2mgdemo/images/}.
[6312]2262
2263
[13893]2264Greenstone sets up Tomcat to run on port 8080 by default. To change this, you can edit the tomcat.port property in build.properties. If you do this before installing Greenstone, then running 'ant install' will use the new port number. If you want to change it later on, shutdown tomcat, run 'ant configure', then when you restart tomcat it will use the new port.
[6312]2265
[8472]2266Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:
[6312]2267\begin{bulletedlist}
2268\begin{gsc}
[13893]2269\item \$GSDL3HOME/WEB-INF/web.xml
2270\item \$GSDL3SRCHOME/packages/tomcat/conf/server.xml
[6312]2271\end{gsc}
2272\item any classes or jar files used by the servlets
2273\end{bulletedlist}
2274
2275On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}), or by restarting Tomcat.
2276
[13893]2277We have disabled following symlinks for the greenstone servlet. To enable it, edit \gst{\$GSDL3SRCHOME/packages/tomcat/conf/Catalina/localhost/greenstone3.xml} and set 'allowLinking' to true.
[6312]2278
[13893]2279By default, Tomcat allows directory listings. To disable this, change the 'listings' parameter to false in the default servlet definition, in Tomcat's web.xml file (\gst{\$GSDL3SRCHOME/packages/tomcat/conf/web.xml}):
[6312]2280
[13893]2281We have set the greenstone context to be reloadable. This means that if a class or resource file in web/WEB-INF/lib or web/WEB-INF/classes changes, the servlet will be reloaded. This is useful for development, but should be turned off for production mode (set the 'reloadable' attribute to false).
[6312]2282
[8472]2283Tomcat uses a Manager to handle HTTP session information. This may be stored between restarts if possible. To use a persistent session handling manager, uncomment the \gst{<Manager>} element in \\
[13893]2284\gst{\$GSDL3SRCHOME/packages/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory:\\
2285\gst{\$GSDL3SRCHOME/packages/tomcat/work/Catalina/localhost/greenstone3/SESSIONS.ser}. Delete this file to clear the cached session info. Note that Tomcat needs to be shutdown to delete this file.
[6312]2286
[6904]2287\subsection{Proxying Tomcat with apache}
[6312]2288
[6904]2289Instead of incorporating servlet support into your existing web server, an easy alternative is to proxy Tomcat. The \gst{http://www.greenstone.org/greenstone3} site uses apache to proxy Tomcat. ProxyPass and ProxyPassReverse directives need to be added to the Virtualhost description for the www.greenstone.org server.
[6312]2290
2291\begin{quote}\begin{gsc}
2292<VirtualHost xx.xx.xx.xx>\\
2293ServerName www.greenstone.org\\
2294...\\
[10775]2295ProxyPass /greenstone3 http://puka.cs.waikato.ac.nz:8080/greenstone3\\
2296ProxyPassReverse /greenstone3 http://puka.cs.waikato.ac.nz:8080/greenstone3\\
[6312]2297</VirtualHost>\\
2298\end{gsc}\end{quote}
2299
[8472]2300In our example, the \gsiii\ servlet can be accessed at \\
2301\gst{http://www.greenstone.org/greenstone3/library}, instead of at \\
[10775]2302\gst{http://puka.cs.waikato.ac.nz:8080/greenstone3/library}, which is not publically accessible.
[6312]2303
[6904]2304\subsection{Running Tomcat behind a proxy}
[6312]2305
[10880]2306Almost everything works fine when Tomcat is running behind a proxy. The only time this causes trouble is if the servlet itself needs to make external HTTP connections. We do this in the infomine demo collection for example. One of the service classes sends HTTP requests to the infomine database at riverside. Since this is going through the proxy, a username and password is needed. It is not sufficient to prompt the user for a password because they are unlikely to have a password for the particular proxy that Tomcat is using. What we have done at present is to put a proxy element in the siteConfig.xml file. Here you have to enter a suitable username and password for the proxy server. Unfortunately these are entered in plain text. And the file is viewable via the servlet. So we need a better solution.
[6312]2307
[6422]2308\newpage
[6312]2309\section{SOAP}\label{app:soap}
[6499]2310
[13893]2311Greenstone uses the Apache Axis SOAP implementation for distributed communications. Axis runs as a servlet inside Tomcat, and SOAP web services can be deployed by this Axis servlet. The Greenstone installation process sets up Axis for Tomcat, but does not deploy any services.
[6312]2312
[13893]2313To deploy the SOAP service for localsite, run \gst{ant deploy-localsite}.
2314
[9874]2315To deploy a SOAP service for other sites, run \gst{ant soap-deploy-site}
[6312]2316
[13893]2317This will prompt you for the sitename (the site's directory name), and a unique URI for the site. It creates a new SOAPServer class for the site \\(\gst{\$GSDL3SRCHOME/src/java/org/greenstone/gsdl3/SOAPServer<sitename>.java}), creates a resource file for deployment (\gst{\$GSDL3SRCHOME/resources/soap/<sitename>.wsdd}), and then tries to deploy the service.
[8472]2318
[15186]2319Information about deployed services is maintained between Tomcat sessions---you only need to deploy something once. To undeploy a site, use \gst{ant soap-undeploy-site}.
[8472]2320
[10775]2321The axis services can be accessed at \gst{localhost:8080/greenstone3/index.jsp}.
[6312]2322
2323\subsection{Debugging SOAP}\label{app:soap-debug}
2324
[9874]2325If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, you can use the TCP monitor. This intercepts messages coming in to one port, displays them, and passes them to another port.
[6312]2326To run it, type:
2327
[13893]2328\begin{quote}\gst{java -cp \$GSDL3HOME/WEB-INF/lib/axis.jar \\
[9874]2329org.apache.axis.utils.tcpmon}
[6312]2330\end{quote}
2331
[9874]2332The listen port is the port that you want the monitor to be listening on. It should 'act as' a Listener, with target hostname 127.0.0.1 (localhost), and target port the port that Tomcat is running on (8080). You need to modify the address used to talk to the SOAP service. For example, if you want to monitor traffic between the gateway site and the localsite SOAP server, you will need to edit gateway's siteConfig.xml file and change the port number (in the site element) to whatever you have chosen as the listen port.
[6312]2333
[13893]2334For example, in the Admin panel of TCPMonitor the Target Hostname might be 127.0.0.1, and the Target Port \# 8080. Set the Listen Port \# to be a different port, such as 8070 and click Add. This produces a new tab panel where you can see the messages arriving at port 8070 before being forwarded to port 8080. You then need to set your test request from your SOAP application to arrive at port 8070 and you will see copies of the messages in the new tab panel.
[13284]2335
2336
[6422]2337\newpage
[13920]2338\section{Tidying up the formatting for imported \gsii\ collections}\label{app:gs2tidy}
[8757]2339
2340\subsection{Format statements: \gsii\ vs \gsiii\ }\label{app:gs2format}
[6908]2341The following table shows the \gsii\ format elements, and their equivalents in \gsiii\
2342\begin{table}[h]
2343\caption{\gsiii\ equivalents of \gsii\ format statements}
[6422]2344{\footnotesize
[6312]2345\begin{tabular}{ll}
[6422]2346\hline
[6908]2347\bf \gsii\ & \bf \gsiii\ \\
[6422]2348\hline
[6312]2349\gst{[Text]} & \gst{<gsf:text/>} \\
2350\gst{[num]} & \gst{<gsf:metadata name='docnum'/>}\\
2351\gst{[link][/link]} & \gst{<gsf:link></gsf:link>} or \\
2352& \gst{<gsf:link type='document'></gsf:link>}\\
2353\gst{[srclink][/srclink]} & \gst{<gsf:link type='source'></gsf:link>}\\
2354\gst{[icon]} & \gst{<gsf:icon/>} or \\
2355& \gst{<gsf:icon type='document'/>}\\
2356\gst{[srcicon]} & \gst{<gsf:icon type='source'/>}\\
2357\gst{[Title]} (metadata) & \gst{<gsf:metadata name='Title'/>} or \\
2358& \gst{<gsf:metadata name='Title' select='current'/>}\\
2359\gst{[parent:Title]} & \gst{<gsf:metadata name='Title' select='parent' />}\\
2360\gst{[parent(All):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'/>}\\
2361\gst{[parent(Top):Title]} & \gst{<gsf:metadata name='Title' select='root' />}\\
2362\gst{[parent(All': '):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'}\\
2363& \gst{ separator=': ' />}\\
[25806]2364\gst{[sibling:dc.Title]} & \gst{<gsf:metadata name='dc.Title' pos='first'} \\
2365\gst{[sibling(All': '):Title]} & \gst{<gsf:metadata name='Title'} \\
[6312]2366& \gst{ separator=': ' />}\\
2367\gst{\{Or\}\{[dc.Title],} & \gst{<gsf:choose-metadata>}\\
2368\gst{ [dls.Title], [Title]\}}& \gst{ <gsf:metadata name='dc.Title'/>}\\
2369& \gst{ <gsf:metadata name='dls.Title'/>}\\
2370& \gst{ <gsf:metadata name='Title'/>}\\
2371& \gst{</gsf:choose-metadata>}\\
2372\gst{\{If\}\{[parent:Title],} & \gst{<gsf:choose-metadata>}\\
2373\gst{ [parent:Title], [Title]\}}& \gst{ <gsf:metadata name='Title' select='parent'/>}\\
2374& \gst{ <gsf:metadata name='Title'/>}\\
2375& \gst{</gsf:choose-metadata>}\\
2376\gst{\{If\}\{[Subject],} & \gst{<gsf:switch>}\\
2377\gst{ <td>[Subject]</td>\}}& \gst{ <gsf:metadata name='Subject'/>}\\
2378& \gst{ <gsf:when test='exists'>} \\
2379& \gst{ <td><gsf:metadata name='Subject'/></td>}\\
2380& \gst{ </gsf:when></gsf:switch>}\\
[6422]2381\hline
2382\end{tabular}}
2383\end{table}
[8757]2384\subsection{Cleaning up macros}\label{app:gs2replace}
2385
[13920]2386Here we show some of the replace items that have been used for \gsii\ collections.
[8757]2387
2388Getting rid of silly backslashes:
2389\begin{gsc}\begin{verbatim}
[9445]2390<replace scope='text' macro="\\?\\\(" text="\("/>
[8757]2391\end{verbatim}\end{gsc}
2392
2393Macro resolving using resource bundles and metadata:
2394\begin{gsc}\begin{verbatim}
[9445]2395<replace scope='metadata' macro="_magazines_" bundle="NZDLMacros"
2396 key="Magazines"/>
[8757]2397<replace scope='all' macro='_thisOID_' metadata='archivedir'/>
[9445]2398<replace macro="_httpcollimg_"
2399 text="sites/localsite/collect/folktale/index/assoc"/>
[8757]2400\end{verbatim}\end{gsc}
2401
2402Fixing up broken external links:
2403\begin{gsc}\begin{verbatim}
[9445]2404<replace macro="_httpextlink_&amp;rl=1&amp;href="
2405 text="?a=d&amp;c=folktale&amp;s0.ext=1&amp;d="/>
2406<replace macro="_httpextlink_&amp;rl=0&amp;href="
2407 text="?a=p&amp;sa=html&amp;c=folktale&amp;url="/>
[8757]2408\end{verbatim}\end{gsc}
2409
[13920]2410These two examples show how to deal with \gsii's external link macros. The first one is for a 'relative' external link. In this case, the links are like URL's but they actually refer to Greenstone internal documents. So the \gsiii\ link is to the document, but with parameter s0.ext signifying that the d argument will need translating before retrieving the content.
[10880]2411The second example is a truly external link. This is translated into a HTML type page action, where the URL is presented as a frame along with the collection header in a separate frame.
[8757]2412
2413Sometimes we need to add in macros to be resolved in a second step:
2414\begin{gsc}\begin{verbatim}
[9445]2415<replace macro="_iconpdf_" scope="metadata"
2416 text="&lt;img title='_texticonpdf_' src='interfaces/default/images/ipdf.gif'/&gt;"/>
[16865]2417<replace macro="_texticonpdf_" scope="metadata" bundle="interface_gs2"
[9445]2418 key="texticonpdf"/>
[8757]2419\end{verbatim}\end{gsc}
2420
[13284]2421\end{document}
Note: See TracBrowser for help on using the repository browser.