source: trunk/gsdl3/docs/manual/manual.tex@ 6422

Last change on this file since 6422 was 6422, checked in by kjdon, 20 years ago

more changes

  • Property svn:keywords set to Author Date Id Revision
File size: 136.2 KB
Line 
1\documentclass[a4paper,11pt]{article}
2\usepackage{times,epsfig}
3\hyphenation{Message-Router Text-Query}
4
5\newenvironment{gsc}% Greenstone text bits
6{\begin{footnotesize}\begin{tt}}%
7{\end{tt}\end{footnotesize}}
8
9\newcommand{\gst}[1]{{\footnotesize \tt #1}}
10\newcommand{\gsdlhome}{\$GSDL3HOME}
11
12\begin{document}
13
14\title{Greenstone 3: A modular digital library.}
15
16% if you work on this manual, add your name here
17\author{Katherine Don, George Buchanan and Ian H. Witten \\[1ex]
18 Department of Computer Science \\
19 University of Waikato \\ Hamilton, New Zealand \\
20 \{kjdon, grbuchan, ihw\}@cs.waikato.ac.nz}
21
22\date{}
23
24\maketitle
25
26\newenvironment{bulletedlist}%
27{\begin{list}{$\bullet$}{\setlength{\itemsep}{0pt}\setlength{\parsep}{0pt}}}%
28{\end{list}}
29
30\noindent
31Greenstone Digital Library Version 3 is a complete redesign and
32reimplementation of the Greenstone digital library software. The current
33version (Greenstone2) enjoys considerable success and is being widely used.
34Greenstone3 will capitalise on this success, and in addition it will
35\begin{bulletedlist}
36\item improve flexibility, modularity, and extensibility
37\item lower the bar for ``getting into'' the Greenstone code with a view to
38 understanding and extending it
39\item use XML where possible internally to improve the amount of
40 self-documentation
41\item make full use of existing XML-related standards and software
42\item provide improved internationalisation, particularly in terms of sort order,
43 information browsing, etc.
44\item include new features that facilitate additional ``content management''
45 operations
46\item operate on a scale ranging from personal desktop to corporate library
47\item easily permit the incorporation of text mining operations
48\item use Java, to encourage multilinguality, X-compatibility, and to permit
49 easier inclusion of existing Java code (such as for text mining).
50\end{bulletedlist}
51Parts of Greenstone will remain in other languages (e.g. MG, MGPP); JNI (Java
52Native Interface) will be used to communicate with these.
53
54A description of the general design and architecture of Greenstone3 is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory).
55
56This documentation consists of several parts. Section~\ref{sec:install} covers greenstone installation, how to access the library, and some administration issues. Section~\ref{sec:user} looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards the Greenstone developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to Greenstone, such as how to add new services, new page types, new plugins for different document formats. Section~\ref{sec:distributed} describes how to make Greenstone run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install Greenstone from CVS, and a comparison of greenstone 2 and greenstone 3 format statements.
57\newpage
58\section{Greenstone installation and administration}\label{sec:install}
59
60This section covers where to get Greenstone 3 from, how to install it and how to run it. The standard method of running Greenstone is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access Greenstone, tomcat must be started up, and then it can be accessed via a web browser.
61
62
63\subsection{Get and install Greenstone}
64
65Greenstone is available from www.... There are currently two distributions: a self-installing tar for Linux, and a Windows executable.
66
67Greenstone is also available through CVS (Concurrent Versioning System). This provides the absolute latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install Greenstone from CVS.
68
69\subsubsection{Linux}
70** add more once installer finished **
71
72Download the latest version of the self-installing tar file, gsdl3-x.xx-unix.sh, and run it in a shell (./gsdl3-x.xx-unix.sh). It will prompt you for where to install greenstone to, the name of your computer, what port to run tomcat on... Once Greenstone has been installed, you can start the library by running ./gsdl3.sh, and opening up a browser pointing to localhost:8080/gsdl3 (or different computer name and port).
73
74\subsubsection{Windows}
75** add more once installer finished **
76
77Download the latest Windows executable, gsdl3-x.xx-win32.exe, and double click it to start the installation. You will be prompted for ... Once Greenstone is installed, you can access the library by selecting Greenstone 3 Digital Library in the Start menu.
78
79\subsubsection{Accessing the library in a browser}
80
81Once you have started up the library (see the previous sections for OS dependent instructions), you can access it in a browser at http://localhost:8080/gsdl3 (or http://your-computer-name/your-chosen-port/gsdl3). This gets you to a welcome page, with three links: one to run a test servlet (this allows you to check that tomcat is running properly), one to run the standard library servlet using localsite, and one to run a library servlet using the site soapsite. This site uses a SOAP connection to communicate with localsite, and demonstrates the library working in a distributed fashion. See Section~\ref{sec:distributed} for details about how to run Greenstone distributedly.
82
83\subsection{How the library works}
84
85The standard library program is a Java servlet.
86
87Other types of interfaces can be used, such as Java GUI programs. See Section~\ref{sec:new-interfaces} for details about how to make these.
88
89\subsubsection{Restarting the library}
90
91The library program (actually tomcat) can be restarted by ... (** put a mechanism in each install program **).
92
93
94Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
95\begin{bulletedlist}
96\begin{gsc}
97\item \gsdlhome/web/WEB-INF/web.xml
98\item \gsdlhome/comms/jakarta/tomcat/conf/server.xml
99\end{gsc}
100\item any classes or jar files used by the servlets
101\end{bulletedlist}
102\noindent Note: stdout and stderr for the servlets both go to\\
103\gst{\gsdlhome/comms/jakarta/tomcat/logs/catalina.out}
104
105
106\subsection{Directory structure}
107
108Table~\ref{tab:dirs} shows the file hierarchy for Greenstone3.
109The first part shows the common stuff which can be shared between
110Greenstone users---the source, libraries etc. Under Linux, these will eventually be installed into appropriate system directories. The second part shows
111stuff used by one person/group---their sites and interface setup (see Section~\ref{sec:sites-and-ints}).
112etc. There can be several sites/interfaces per installation.
113
114\begin{table}
115\caption{The Greenstone directory structure}
116\label{tab:dirs}
117{\footnotesize
118\begin{tabular}{l p{8cm}}
119\hline
120\bf directory & \bf description \\
121\hline
122gsdl3
123 & The main installation directory---gsdl3home can be changed to something more standard\\
124gsdl3/src
125 & Source code lives here \\
126gsdl3/src/java/
127 & java source code \\
128gsdl3/src/cpp/
129 & c/ cpp source code---none yet \\
130gsdl3/packages
131 & Imported packages from other systems e.g. MG, MGPP \\
132gsdl3/lib
133 & Shared library files\\
134gsdl3/lib/java
135 & Java jar files\\
136gsdl3/resources
137 & any resources that may be needed\\
138gsdl3/resources/java
139 & properties files for java resource bundles - used to handle all the language specific text This directory is on the class path, so any other Java resources can be placed here \\
140gsdl3/resources/soap
141 & soap service description files \\
142gsdl3/resources/dtd
143 & Greenstone has trouble loading DTD files sometimes. They can go here\\
144gsdl3/bin
145 & executable stuff lives here\\
146gsdl3/bin/script
147 & some Perl building scripts\\
148gsdl3/bin/linux
149 & Linux executables for e.g. MGPP\\
150gsdl3/bin/windows
151 & windows executables for e.g. MGPP\\
152gsdl3/comms
153 & Put some stuff here for want of a better place---things to do with servers and communication. e.g. soap stuff, and tomcat servlet container\\
154gsdl3/docs
155 & Documentation :-)\\
156\hline
157gsdl3/web
158 & This is where the web site is defined. Any static html files can go here. This directory is the Tomcat root directory.\\
159gsdl3/web/WEB-INF
160 & The web.xml file lives here (servlet configuration information for tomcat)\\
161gsdl3/web/WEB-INF/classes
162 & Servlet classes go in here\\
163gsdl3/web/sites
164 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (e.g. soap) to other sites\\
165gsdl3/web/sites/localsite
166 & One site - the site configuration file lives here\\
167gsdl3/web/sites/localsite/collect
168 & The collections directory \\
169gsdl3/web/sites/localsite/images
170 & Site specific images \\
171gsdl3/web/sites/localsite/transforms
172 & Site specific transforms \\
173gsdl3/web/interfaces
174 & Contains directories for different interfaces - an interface is defined by its images and XSLT files \\
175gsdl3/web/interfaces/default
176 & The default interface\\
177gsdl3/web/interfaces/default/images
178 & The images for the default interface\\
179gsdl3/web/interfaces/default/transforms
180 & The XSLT files for the default interface\\
181\hline
182\end{tabular}}
183\end{table}
184
185
186\subsection{Sites and interfaces}\label{sec:sites-and-ints}
187
188local gs stuff (sites and interfaces) vs installed stuff (code)\\
189where they live, whats the difference, what each contains.\\
190
191A site is comprised of a set of collections and possibly services. An interface is a set of images along with a set of xslt files used for translating xml output from the library into an appropriate form---html for the servlet case.
192One greenstone installation can have many sites and interfaces. One instantiation of a servlet uses one site and one interface. Sites and interfaces can be matched up in different ways. For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. eg HTML vs WML, or perhaps providing completely different look and feel for different audiences. A standard interface may be used with many different sites---provides a consistent mode of access to a lot of different content.
193
194Collections live in the collect directory of a site. Any collections that are found in this directory when the servlet is initialised will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collection is added while tomcat is running will not be picked up: you can either restart the server, or send a configuration request to the servlet: these are described in Section~\ref{sec:runtime-config}.
195
196There are two Greenstone sites that come with the distribution: localsite, and soapsite. localsite has several demo collections, while soapsite has none. soapsite specifies that a soap connection should be made to localsite. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
197
198Each site and interface has a configuration file which specifies parameters for the site or interface---these are described in Section~\ref{sec:config}.
199
200The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
201There are three servlets specified in web.xml (these correspond to the three links in the welcome page for greenstone): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are Greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. Both of these servlets use the standard interface (called {\em default}).
202
203\begin{table}
204\caption{Greenstone servlet initialisation parameters}
205\label{tab:serv-init}
206{\footnotesize
207\begin{tabular}{llp{5cm}}
208\hline
209\bf name & \bf sample value & \bf description \\
210\hline
211gsdl3\_home & /research/kjdon/gsdl3 & the base directory of the gsdl3 installation \\
212site\_name & localsite & the name of the site to use \\
213interface\_name & default & the name or the interface to use\\
214library\_name & library & the web name of the servlet \\
215default\_lang & en & the default language for the interface\\
216receptionist\_class & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\
217messagerouter\_class & NewMessageRouter & (optional) specifies an alternative MessageRouter to use\\
218\hline
219\end{tabular}}
220\end{table}
221
222The initialisation parameters used by the library servlets are shown in Table~\ref{tab:serv-init}. This is where you define what site and interface each servlet uses. Any number of servlets can be specified here. See Appendix~\ref{app:tomcat} for more details about Tomcat.
223
224
225\subsection{Configuring a greenstone installation}\label{sec:config}
226
227Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
228The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a CGI-type command to the library. There are a series of CGI-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
229
230\subsubsection{Site configuration file}\label{sec:siteconfig}
231
232The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of
233known external sites to connect to. Collections are not specified in the site
234configuration file, instead they are determined by the contents of the site's
235collections directory.
236
237The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible through Tomcat, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat gsdl3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
238
239Figure~\ref{fig:siteconfig} shows two example site configuration files. The first example is for a rudimentary site with no site-wide services,
240which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster. It also connects to the first site using SOAP.
241These two sites are running on the same machine. For site \gst{gsdl1} to talk to site \gst{localsite}, a SOAP server must be run for \gst{localsite}. The address of the SOAP server, in this case, is \gst{http://localhost:8080/soap/servlet/rpcrouter}.
242
243
244\begin{figure}
245\begin{gsc}\begin{verbatim}
246<siteConfig>
247 <localSiteName value="org.greenstone.localsite"/>
248 <httpAddress value="http://localhost:8080/gsdl3/sites/localsite"/>
249 <serviceClusterList/>
250 <serviceRackList/>
251 <siteList/>
252</siteConfig>
253\end{verbatim}\end{gsc}
254
255\begin{gsc}\begin{verbatim}
256<siteConfig>
257 <localSiteName value="org.greenstone.gsdl1"/>
258 <httpAddress value="http://localhost:8080/gsdl3/sites/gsdl1"/>
259 <serviceClusterList>
260 <serviceCluster name="build">
261 <metadataList>
262 <metadata name="Title">Collection builder</metadata>
263 <metadata name="Description">Builds collections in a
264 gsdl2-style manner</metadata>
265 </metadataList>
266 <serviceRackList>
267 <serviceRack name="GS2Construct"/>
268 </serviceRackList>
269 </serviceCluster>
270 </serviceClusterList>
271 <siteList>
272 <site name="org.greenstone.localsite"
273 address="http://localhost:8090/soap/servlet/rpcrouter"
274 type="soap"/>
275 </siteList>
276</siteConfig>
277\end{verbatim}\end{gsc}
278\caption{Two sample site configuration files}
279\label{fig:siteconfig}
280\end{figure}
281
282\subsubsection{Interface configuration file}\label{sec:interfaceconfig}
283
284The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (but other ones can be loaded dynamically). If the interface uses servlets, it specifies what short name each action should use for the action CGI parameter e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and subaction.
285
286\begin{figure}
287\begin{gsc}\begin{verbatim}
288<interfaceConfig>
289 <actionList>
290 <action name='p' class='PageAction'>
291 <subaction name='home' xslt='home.xsl'/>
292 <subaction name='about' xslt='about.xsl'/>
293 <subaction name='help' xslt='help.xsl'/>
294 <subaction name='pref' xslt='pref.xsl'/>
295 </action>
296 <action name='q' class='QueryAction' xslt='basicquery.xsl'/>
297 <action name='b' class='GS2BrowseAction' xslt='classifier.xsl'/>
298 <action name='a' class='AppletAction' xslt='applet.xsl'/>
299 <action name='d' class='DocumentAction' xslt='document.xsl'/>
300 <action name='xd' class='XMLDocumentAction'>
301 <subaction name='toc' xslt='document-toc.xsl'/>
302 <subaction name='text' xslt='document-content.xsl'/>
303 </action>
304 <action name='pr' class='ProcessAction' xslt='process.xsl'/>
305 <action name='s' class='SystemAction' xslt='system.xsl'/>
306 </actionList>
307</interfaceConfig>
308\end{verbatim}\end{gsc}
309\caption{Default interface configuration file}
310\label{fig:ifaceconfig}
311\end{figure}
312
313This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
314
315\subsection{Run-time re-initialisation}\label{sec:runtime-config}
316
317should this section go in here, cos its kind of adminy, or go into the user stuff, cos you need to do it after building a collection???
318
319When tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless tomcat is restarted, or re-configuration commands issued.
320
321There are several CGI-type commands that can be issued to tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, tomcat must be restarted then too.
322
323Currently, the runtime configuration commands can only be accessed by typing in CGI-arguments into the URL, there is no nice web form yet to do this.
324
325The CGI arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in Greenstone, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the arguments in a bit more detail.
326
327\begin{table}
328\caption{Example run-time configuration arguments.}
329\label{tab:run-time config}
330{\footnotesize
331\begin{tabular}{lp{8cm}}
332\hline
333\gst{a=s\&sa=c} & reconfigures the whole site, reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument \gst{ss} (system subset). The valid values are \gst{collectionList}, \gst{siteList}, \gst{serviceList}, \gst{clusterList}. \\
334\gst{a=s\&sa=c\&sc=XXX} & reconfigures the XXX collection or cluster. \gst{ss} can also be used here, valid values are \gst{metadataList} and \gst{serviceList}. \\
335\gst{a=s\&sa=a} & (re)activate a specific module. Modules are specified using two arguments, \gst{st} (system module type) and \gst{sn} (system module name). Valid types are \gst{collection}, \gst{cluster} \gst{site}.\\
336\gst{a=s\&sa=d} & deactivate a module. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{collection}, \gst{cluster}, \gst{site}, \gst{service}. Modules are removed from the current configuration, but will reappear if Tomcat is restarted.\\
337\gst{a=s\&sa=d\&sc=XXX} & deactivate a module belonging to the XXX collection or cluster. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{service}. \\
338\hline
339\end{tabular}}
340\end{table}
341\newpage
342\section{Using Greenstone 3}\label{sec:user}
343
344Once you have greenstone 3 installed, you can access the sample collections. The installation comes with some example collections, and Section~\ref{sec:usecolls} describes these collections and how to use them. Section~\ref{sec:buildcol} describes how to build your own collections.
345
346\subsection{Using a collection}\label{sec:usecolls}
347
348A collection typically consists of a set of documents, which could be text, html, word, PDF, images, bibliographic records etc, along with some access methods, or services. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.
349Searching involves entering words or phrases and getting back lists of documents that contain those words. The search terms may be restricted to particular fields of the document. Browsing ...
350
351In the standard interface that comes with Greenstone3\footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
352
353\begin{figure}[h]
354 \centering
355 \includegraphics[width=4in]{pagebanner} %5.8
356 \caption{A sample collection page banner}
357 \label{fig:page-banner}
358\end{figure}
359
360The image at the top left is a link to the collection's about page. The top right has buttons to link to the library home page, help pages and preference pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Click on a name to access that service.
361Once you are looking at a document, clicking the open book icon at the top of the document, underneath the navigation bar, will take you back to the search or browse page where you accessed the document from.
362
363describe the colls that the sample installation comes with\\
364brief description of what a collection is.\\
365how to get around the collection, services etc. \\
366querying vs browsing \\
367use the demo colls that come with greenstone - one gs2 coll, one gs3 coll, tei coll??\\
368
369\subsection{Building a collection}\label{sec:buildcol}
370
371There are two ways to get a new collection into Greenstone 3. The first is to build it using the greenstone 3 building process. The second way is to import a greenstone 2 collection.
372
373Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per greenstone installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to.
374
375The following two sections describe how to create a collection from scratch, and how to import a greenstone 2 collection. Once a collection has been built, the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, there is a CGI command to reload a collection which can also load a new one. Use the CGI arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}---this tells the library program to reload the collname collection.
376
377
378\subsubsection{Creating a collection from scratch}
379
380****GEORGE****
381
382how to build a collection, but none of the mechanisms of building.
383talk a bit about configuration files? maybe just the parts that you use?? your changes should go into the next sections about configuration files, but they need to go here too.
384
385\subsubsection{Importing a greenstone 2 collection}
386
387Greenstone 3 can also serve Greenstone 2 collections. If you have a Greenstone 2 collection\footnote{For information about the Greenstone 2 software, and how to build collections using it, visit \gst{www.greenstone.org}}, you can copy it into the collect directory of the site you are using. Or make a link to it from the collect directory if your OS supports that.
388The Greenstone 3 run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new collectionConfig.xml and buildConfig.xml from the old collect.cfg and build.cfg files. It does not change the collection in any way, so it can still be used by Greenstone 2 software.
389
390The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, you need to specify the path to the collect directory, and the collection name. For example,
391
392\gst{convert\_coll\_from\_gs2.pl -collectdir \$GSDL3HOME/web/\-sites/\-localsite/\-collect demo}
393
394The script attempts to create gs3 format statements from the old greenstone 2 ones. The conversion may not always work properly, so if the collection looks a bit strange under greenstone 3, you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
395
396Once again, to have the collection recognised by the library servlet, you can either restart tomcat, or load it manually by sending the arguments \gst{a=s\&sa=c\&c=collname} to the library servlet.
397
398\subsection{Collection configuration files}\label{sec:collconfig}
399
400Each collection has two, or possibly three, configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, and optionally \gst{collectionInit.xml} that give metadata, display and other information for the
401collection.\footnote{\gst{siteConfig.xml} and \gst{interfaceConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
402Greenstone2.} The first includes user-defined presentation metadata for the collection,
403such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives
404instructions on how the collection is to be built. The second is produced by
405the build-time process and includes any metadata that can be determined
406automatically. It also includes configuration information for any ServiceRacks needed by the collection.
407
408\subsubsection{collectionInit.xml}
409
410This optional file specifies a new collection class if the standrad one is not to be used. The only syntax so far is the class name:
411
412\begin{gsc}\begin{verbatim}
413<collectionInit class="XMLCollection"/>
414\end{verbatim}\end{gsc}
415
416Section~\ref{sec:new-coll-types} describes an example collection where this file is used. Depending on the type of collection that this is used for, one or both of the other config files may not be needed.
417
418\subsubsection{collectionConfig.xml}
419
420The collection configuration file is where the collection designer (e.g. a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far. (Since collection building at this stage is still done using Greenstone2 Perl scripts and the old \gst{collect.cfg} file, we have only defined the format for the parts of \gst{collectionConfig.xml} that are used by the runtime-system.)
421
422Display elements for a collection or metadata for a document can be entered in any language---use lang='en' attributes to metadata elements to specify which language they are in.
423
424configuration files need to be encoded in utf-8.
425
426\begin{figure}
427\begin{gsc}\begin{verbatim}
428<collectionConfig xmlns:gsf="http://www.greenstone.org/configformat"
429 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
430 <metadataList>
431 <metadata name="creator">[email protected]</metadata>
432 </metadataList>
433 <displayItemList>
434 <displayItem name="smallicon" lang="en">mgppdemosm.gif</displayItem>
435 <displayItem name="description" lang="fr">C'est une collection pour
436 demonstration du logiciel Greenstone. Elle contient une petite
437 partie du projet de bibliotheques humanitaires et de developpement
438 (11 livres).</displayItem>
439 <displayItem name="description" lang="en">This is a demonstration
440 collection for the Greenstone digital library software. It contains
441 a small subset (11 books) of the Humanity Development Library. It is
442 built with mgpp.</displayItem>
443 <displayItem name="name" lang="en">greenstone mgpp demo</displayItem>
444 <displayItem name="icon" lang="en">mgppdemo.gif</displayItem>
445 </displayItemList>
446 <search>
447 <index name="idx"/>
448 <format>
449 <gsf:template match="documentNode">
450 <td valign='top'><gsf:link><gsf:icon/></gsf:link></td>
451 <td><gsf:metadata name='Title' select='ancestors'
452 separator=': '/>: <gsf:link><gsf:metadata name='Title' />
453 </gsf:link></td>
454 </gsf:template>
455 </format>
456 </search>
457 <browse>
458 <classifier name="CL1"/>
459 <classifier name="CL2"/>
460 <classifier name="CL3"/>
461 <classifier name="CL4">
462 <format>
463 <gsf:template match="documentNode">
464 <br /><gsf:link><gsf:metadata name='Keyword' />
465 </gsf:link></gsf:template>
466 </format>
467 </classifier>
468 </browse>
469 <display/>
470</collectionConfig>
471\end{verbatim}\end{gsc}
472\caption{Sample collectionConfig.xml file (mgppdemo collection)}
473\label{fig:collconfig}
474\end{figure}
475
476The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These displayItem elements can be specified in different languages. If languages other than English are used, the configuration file should be encoded in utf-8.
477The \gst{<search>} and \gst{<browse>} elements give some formatting information about the indexes and classifiers. \gst{<displayItem>} elements are used to provide titles for the indexes or classifiers, while \gst{<format>} elements provide formatting instructions, typically for a document or classifier node in a list of results.
478
479The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading}, \gst{DocumentContent}, and other information that could be specified (in a yet to be decided format) are things such as whether or not to display the cover image, table of contents etc.
480
481\subsection{buildConfig.xml}\label{sec:buildconfig}
482
483The file \gst{buildConfig.xml} is produced by the collection building process, and contains metadata and other information about the collection that can
484be determined automatically, such as the number of
485documents it contains. It also includes a list of ServiceRack classes that are
486required at runtime to provide the services that have been built into the
487collection. The serviceRack names are Java classes that are loaded
488dynamically at runtime. Any information inside the serviceRack element is
489specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This configuration file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve, GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. The collectionConfig.xml file is also passed to the ServiceRack objects at configure time---the \gst{format} and \gst{displayItem} information is used directly from the \gst{collectionConfig.xml} file rather than added into \gst{buildConfig.xml} during building. This enables changes in \gst{collectionConfig.xml} to take effect in the collection without rebuilding being necessary.
490
491
492\begin{figure}
493\begin{gsc}\begin{verbatim}
494<buildConfig xmlns:gsf="http://www.greenstone.org/configformat">
495 <metadataList>
496 <metadata name="numDocs">11</metadata>
497 </metadataList>
498 <serviceRackList>
499 <serviceRack name="GS2MGPPRetrieve">
500 <defaultLevel name="Sec" />
501 <classifierList>
502 <classifier name="CL1" content="Subject" />
503 <classifier name="CL2" content="Title" horizontalAtTop="true" />
504 <classifier name="CL3" content="Organization" />
505 <classifier name="CL4" content="Keyword" />
506 </classifierList>
507 </serviceRack>
508 <serviceRack name="PhindPhraseBrowse" />
509 <serviceRack name="GS2MGPPSearch">
510 <defaultLevel name="Sec" />
511 <levelList>
512 <level name="Doc" />
513 <level name="Sec" />
514 <level name="Para" />
515 </levelList>
516 <fieldList>
517 <field shortname="ZZ" name="allfields" />
518 <field shortname="TX" name="text" />
519 <field shortname="TI" name="Title" />
520 <field shortname="SU" name="Subject" />
521 <field shortname="ORG" name="Organization" />
522 <field shortname="SO" name="Source" />
523 </fieldList>
524 <searchTypeList>
525 <searchType name="plain" />
526 <searchType name="form" />
527 </searchTypeList>
528 <defaultIndex name="idx" />
529 <indexList>
530 <index name="idx" />
531 </indexList>
532 </serviceRack>
533 </serviceRackList>
534</buildConfig>
535\end{verbatim}\end{gsc}
536\caption{Sample buildConfig.xml file (mgppdemo collection)}
537\label{fig:buildconfig}
538\end{figure}
539
540\subsection{Formatting the collection}\label{sec:formatstmt}
541
542format statements. and displayItem stuff. advanced collection design.\\
543
544Part of collection design involves deciding how the collection should look. Greenstone has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
545
546In standard greenstone, the library is served to a web browser by a servlet, and the html is generated using XSLT. XSLT templates are used to format all the parts of the pages. Some commonly overwritten templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
547
548Real XSLT templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.
549
550\begin{gsc}\begin{verbatim}
551<xsl:template match="documentNode" priority="2"
552 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
553 <xsl:param name="collName"/>
554 <td><a href="{\$library_name}?a=d&amp;c={\$collName}&amp;
555 d={@nodeID}&amp;dt={@docType}"><xsl:value-of
556 select="metadataList/metadata[@name='Keyword']"/></a>
557 </td>
558</xsl:template>
559 \end{verbatim}\end{gsc}
560
561To write this, the user would need to know that:
562\begin{bulletedlist}
563\item the variable \$library\_name exists,
564\item the collection name is passed in as a parameter called collName
565\item metadata for a document is found in a metadataList and that its form is \gst{<metadata name="Keyword">the value</metadata>}
566\item the arguments needed for the link to the document are a, sa, c, d and dt.
567\end{bulletedlist}
568
569Since XSLT is written in XML, we can use XSLT to transform XML into XSLT. Greenstone provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. Table~\ref{tab:gsf-format} shows the set of 'gsf' (greenstone format) elements. If you have come from a Greenstone 2 background, Appendix~\ref{app:format} shows Greenstone 2 format elements and their equivalents in Greenstone 3.
570
571\begin{table}
572\caption{Format elements for GSF format language}
573\label{tab:gsf-format}
574{\footnotesize
575\begin{tabular}{p{6.5cm}p{6.5cm}}
576\hline
577\bf Element & \bf Description \\
578\hline
579\gst{<gsf:text/>} & The document's text\\
580\gst{<gsf:link>...</gsf:link>} & The HTML link to the document itself \\
581\gst{<gsf:link type='document'>...
582</gsf:link>} & Same as above\\
583\gst{<gsf:link type='classifier'>...
584</gsf:link>} & A link to a classification node (use in classifierNode templates)\\
585\gst{<gsf:link type='source'>...
586</gsf:link>} & The HTML link to the original file---set for documents that have been converted from e.g. Word, PDF, PS \\
587\gst{<gsf:icon/>} & An appropriate icon\\
588\gst{<gsf:icon type='document'/>} & same as above\\
589\gst{<gsf:icon type='classifier'/>} & bookshelf icon for classification nodes\\
590\gst{<gsf:icon type='source'/>} & An appropriate icon for the original file e.g. Word, PDF icon\\
591\gst{<gsf:metadata name='Title'/>} & The value of a metadata element for the current document or section, in this case, Title\\
592\gst{<gsf:metadata name='Title' select='select-type' [separator='y' multiple='true']/>} & A more extended selection of metadata values. The select field can be one of those shown in Table~\ref{tab:gsf-select-types}. There are two optional attributes: separator gives a String that will be used to separate the fields, default is ``, ``, and if multiple is set to true, looks for multiple values at each section.\\
593
594\gst{<gsf:choose-metadata>
595 <gsf:metadata name='metaA'/>
596 <gsf:metadata name='metaB'/>
597 <gsf:metadata name='metaC'/>
598</gsf:choose-metadata>}
599 & A choice of metadata. Will select the first existing one. the metadata elements can have the select, separator and multiple attributes like normal.\\
600\gst{<gsf:switch preprocess=
601'preprocess-type'>
602<gsf:metadata name='Title'/>
603<gsf:when test='test-type'
604test-value='xxx'>...</gsf:when>
605<gsf:when test='test-type'
606test-value='yyy'>...</gsf:when>
607<gsf:otherwise>...</gsf:otherwise>
608</gsf:switch>} & switch on the value of a particular metadata - the metadata is specified in gsf:metadata, has the same attributes as normal.\\
609\hline
610\end{tabular}}
611\end{table}
612
613The \gst{<gsf:metadata>} elements are used to output metadata values. The simplest case is \gst{<gsf:metadata name='Title'/>}---this outputs the Title metadata for the current document or section. Namespaces are important here: if the Title metadata is in the Dublin Core (dc) namespace, then the element should look like \gst{<gsf:metadata name='dc.Title'/>}. There are three other attributes for this element. 'multiple' is used when there may be more than one value for the selected metadata. For instance, one document may fall into several classification categories, and therefore may have multiple Subject metadata values. Adding \gst{multiple='true'} to the gsf:metadata element will retrieve all values, not just the first one. Multiple values are separated by commas by default. The separator attribute is used to change the separating string. For example, adding \gst{separator=': '} to the element will separate all values by a colon and a space.
614
615Sometimes you may want to display metadata values for sections other than the current one. For example, in the mgppdemo collection, in a search list we display the Title of all the enclosing sections, followed by the Title of the current section, all separated by semi-colons. The display ends up looking something like:
616Farming snails 2; Starting out; Selecting your snails
617where Selecting your snails is the Title of the section in the results list, and Farming snails 2 and Starting out are the Titles of the enclosing sections. The select attribute is used to display metadata for sections other than the current one. Table~\ref{tab:gsf-select-types} shows the options available for this attribute. The separator attribute is used here also, to specify the separating text.
618
619To get the previous metadata, the format statement would have the following in it:
620
621\begin{gsc}
622\begin{verbatim}
623<gsf:metadata name='Title' select='ancestors' separator='; '/>;
624 <gsf:metadata name='Title'/>
625\end{verbatim}
626\end{gsc}
627
628\begin{table}
629\caption{Select types for metadata format elements}
630\label{tab:gsf-select-types}
631{\footnotesize
632\begin{tabular}{ll}
633\hline
634\bf Select Type & \bf Description\\
635\hline
636current & The current section \\
637parent & The immediate parent section\\
638ancestors & All the parents back to the root (topmost) section\\
639root & The root or topmost section \\
640siblings & All the sibling sections\\
641children & The immediate children sections of the current section\\
642descendents & All the descendent sections\\
643\hline
644\end{tabular}}
645\end{table}
646
647The gsf:choose-metadata element selects the first available metadata value from the list of options.
648\begin{gsc}
649\begin{verbatim}
650<gsf:choose-metadata>
651 <gsf:option name='dc.Title'/>
652 <gsf:option name='dls.Title'/>
653 <gsf:option name='Title'/>
654</gsf:choose-metadata>
655\end{verbatim}
656\end{gsc}
657
658This will display the dls.Title metadata if available, otherwise it will use the dc.Title metadata if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed.
659
660The gsf:switch element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organisation it came from.
661
662\begin{gsc}
663\begin{verbatim}
664<gsf:switch metadata='Organization' preprocess='toLower;stripSpace'>
665 <gsf:when test='equals' test-value='bostid'>
666 <!-- output BOSTID image --></gsf:when>
667 <gsf:when test='equals' test-value='worldbank'>
668 <!-- output world bank image --></gsf:when>
669 <gsf:otherwise><!-- output default image--></gsf:otherwise>
670</gsf:switch>
671\end{verbatim}
672\end{gsc}
673
674Preprocessing of the metadata value is optional. The preprocess types are toLower (make the value lowercase), toUpper (make the value uppercase), stripSpace (removes any whitespace from the value). These operations are carried out on the value of the selected metadata before the test is carried out. Multiple processing types can be specified, separated by ; and they will be applied in the order specified (from left to right).
675
676Each option specifies a test and a test value. Test values are just text. Tests include startsWith, contains, exists, equals, endsWith. Exists doesn't need a test value. Having an otherwise option ensures that something will be displayed even when none of the tests match.
677
678
679If none of the gsf elements meets your needs for formatting, XSLT can be entered directly into the format element, giving the collection designer full flexibility over how the collection appears.
680
681The collection specific templates are added into the configuration file \gst{collectionConfig.xml}. Any templates found in the XSLT files can be overwritten.
682The important part to adding templates into the configuration file is determining where to put them. Formatting templates cannot go just anywhere---there are standard places for them. Figure~\ref{fig:format-places} shows the positions that templates can occur.
683
684\begin{figure}
685\begin{gsc}\begin{verbatim}
686<collectionConfig>
687 <metadataList/>
688 <displayItemList/>
689 <search>
690 <format> <!--Put here templates related to searching and
691 the query page. The common one is the documentNode
692 template -->
693 <gsf:template match='documentNode'>...</gsf:template>
694 </format>
695 </search>
696 <browse>
697 <classifier name='xx'>
698 <format><!-- put here templates related to formating a
699 particular classifier page. Common ones are documentNode
700 and classifierNode templates-->
701 <gsf:template match='documentNode'>...</gsf:template>
702 <gsf:template match='classifierNode'>...</gsf:template>
703 <gsf:template match='classifierNode' mode='horizontal'>...
704 </gsf:template>
705 </format>
706 </classifier>
707 <classifier>...</classifier>
708 </browse>
709 <display>
710 <format><!-- here goes any formatting relating to the display
711 of the documents. These are generally named templates,
712 and format options -->
713 <gsf:template name='documentContent'>...</gsf:template>
714 <gsf:option name='TOC' value='true'/>
715 </format>
716 </display>
717</collectionConfig>
718\end{verbatim}\end{gsc}
719\caption{Places for format statements}
720\label{fig:format-places}
721\end{figure}
722
723The user specifies a \gst{<gsf:template>} for what they want to format---these can match \gst{documentNode} or \gst{classifierNode} (for node in a classification hierarchy).
724
725The template above is now represented as:
726
727\begin{gsc}\begin{verbatim}
728<gsf:template match='documentNode'>
729 <td><gsf:link><gsf:metadata name='Keyword'/></gsf:link></td>
730</gsf:template>
731\end{verbatim}\end{gsc}
732
733There are also formatting instructions that are not templates but are options.
734These are described in Table~\ref{tab:format_options}. They are entered into the configuration file like \gst{<gsf:option name='coverImages' value='false'/>}
735
736\begin{table}
737\caption{Formatting options}
738\label{tab:format_options}
739{\footnotesize
740\begin{tabular}{llp{5cm}}
741\hline
742\bf option name & \bf values & \bf description \\
743\hline
744coverImages & true, false & whether or not to display cover images for documents \\
745TOC & true, false & whether or not to display the table of contents for the document\\
746\hline
747\end{tabular}}
748\end{table}
749
750Note, format templates are added into the XSLT files before transforming, while the options are added into the page source, and used in tests in the XSLT.
751
752For local collections\footnote{and eventually remote collections} whole XSLT files can be overridden. A collection can have a transform directory. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different about page for the collection, you can put a new about.xsl into the collections transform directory, and this will be used instead. This is what we do for the Gutenberg sample collection.
753
754
755\subsection{Customising the interface}\label{sec:interface-customise}
756
757The interface can be customised in several ways.
758adding a new interface, adding a new language, \\
759changing the look and feel for an interface vs a site vs a collection\\
760
761what needs a tomcat restart?
762
763\subsubsection{Changing the interface language}
764
765The interface language can be changed by going to the preferences page, and choosing a language from the list. The list lists (:-)) all languages in which the interface has been defined so far.
766
767It is easy to add a new interface language to greenstone. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where name is the interface name). Each service class has one with the same name as the class (e.g. GS2Search.properties). To add another language all of the base .properties files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties.
768
769Keys will be looked up in the properties file closest to the specified language. For example, if language fr\_CA was specified (french language, country Canada), and the default locale was en\_GB, java would look at properties files in the following order, until it found the key: XXX\_fr\_CA.properties, XXX\_fr.properties, XXX\_en\_GB.properties, then XXX\_en.properties, and finally the default XXX.properties.
770
771You can tell Greenstone about a new language by ... currently in interfaceConfig.
772
773
774\subsubsection{Modifying an existing interface}
775
776Most of an interface is defined by XSLT files, which are stored in \$GSDL3HOME/\-web/\-interfaces/\-interface-name/\-transform. These can be changed and the changes will take affect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same greenstone installation as the interface.) This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the java code and full paths added based on availability of the files, so that the correct one is used.
777
778Note that you cannot include a file with the same name as the including file. For example query.xsl cannot include query.xsl (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
779
780\subsubsection{Defining a new interface}
781
782A new interface may be needed if different instantiations of the library require different interfaces, or different developers want their own look and feel. Creating a new interface will allow modifications to be made while leaving the original one intact.
783
784A new interface needs a directory in \$GSDL3HOME/web/interfaces, the name of this directory becomes the interface name. Inside, it needs images and transform directories, and an interfaceConfig.xml file. Any XSLT may be overridden for a new interface by putting the replacement in the new transform directory. If the appropriate XSLT file is not there, the one from the default interface will be used - this enables just overriding a few XSLT files as needed.
785
786To use a new interface, the tomcat web.xml must be edited: either change the interface that a current version of the servlet is using, or add another servlet instantiation to the file (see Section~\ref{sec:sites-and-ints} or Appendix~\ref{app:tomcat}). The Tomcat server must be restarted for this to take effect.
787
788\newpage
789\section{Developing Greenstone 3: Run-time system}\label{sec:develop-runtime}
790
791runtime object structure diagram. describe the modules.\\
792class hierarchy,\\
793directory structure and where everything lives\\
794message format.\\
795overall description of message passing sequence.\\
796configuration process - start up and runtime\\
797\\
798page generation\\
799accessing the javadoc\\
800
801\subsection{Overview of modules??}
802
803A Greenstone3 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc. Figure~\ref{fig:local} shows how they fit together in a stand-alone system.
804
805\begin{figure}[t]
806 \centering
807 \includegraphics[width=4in]{local} %5.8
808 \caption{A simple stand-alone site.}
809 \label{fig:local}
810\end{figure}
811
812
813{\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site.
814
815{\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e.g. all the building services may be part of a cluster. What is part of a cluster is specified by the site configuration file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
816Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
817
818{\em Service}: these provide the core functionality of the system e.g. searching, retrieving documents, building collections etc. One or more may be grouped into a single class (ServiceRack) for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, e.g. searching, retrieving documents, building collections etc.
819
820{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
821
822{\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to the appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT for example. There is a hierarchy of different Receptionist types, which is described in Section~\ref{sec:recepts}.
823
824{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'CGI' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in more detail in Section~\ref{sec:pagegen}.
825
826
827\subsection{Start up configuration}\label{sec:startup-config}
828
829We use the Tomcat web server, which operates either stand-alone in a test mode
830or in conjunction with the Apache web server. The Greenstone LibraryServlet
831class is loaded by Tomcat and the servlet's \gst{init()} method is called. Each time a
832\gst{get/put/post} (etc.) is used, a new thread is started and
833\gst{doGet()/doPut()/doPost()} (etc.) is called.
834
835The \gst{init()} method creates a new Receptionist and a new
836MessageRouter. Default classes (DefaultReceptionist, MessageRouter) are used unless subclasses have been specified in the servlet initiation parameters (see Section~\ref{sec:sites-and-ints}). The appropriate system variables are set for each object (interface
837name, site name, etc.) and then \gst{configure()} is called on both. The MessageRouter handle
838is passed to the Receptionist. The servlet then communicates only with
839the Receptionist, not with the MessageRouter.
840
841The Receptionist reads in the \gst{interfaceConfig.xml} file, and loads up all the different Action classes. Other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
842If the Receptionist is a TransformingReceptionist, a mapping between shortnames and XSLT file names is also created.
843
844The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
845Each ServiceRack specified in the configuration file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
846ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
847For each site specified, the MessageRouter creates an appropriate type of Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should be successful. The site will be added to the mapping with its site name as a key. The site's collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure-sites commands must be sent (see next section).
848
849The MessageRouter also looks inside the site's \gst{collect} directory, and loads up a Collection object for each valid collection found.
850
851The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
852files, determines the metadata, and loads ServiceRack classes based on the
853names specified in \gst{buildConfig.xml\/}. The \gst{<serviceRack>} XML element is passed to the object to be used in configuration. The \gst{collectionConfig.xml} contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection configuration file.
854Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList XML.
855
856
857
858\subsection{Message passing}
859
860Action in Greenstone 3 is originated by a request coming in from the outside. In the standard web-based greenstone, this comes from a servlet into the receptionist. This external type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request. Messages inside the system all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned.
861
862When a page request comes in to the Receptionist, it looks at the action attribute to determine which action to send it to. The response is returned from the action.The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the servlet greenstone we transform using XSLT to generate html pages which get returned to the servlet.
863
864Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text
865There are different request types: describe, process, system...
866
867The message formats for each request type, and the response formats for each module are described in the following section.
868
869\subsection{an attempt at an API: message formats}
870
871\subsubsection{external$->$action}\label{sec:page-requests}
872
873request:
874These are the special 'external'-style messages. Requests originate from outside Greenstone, for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
875
876Here are some examples of requests\footnote{In a servlet context, these correspond to the URLs \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
877
878\begin{quote}\begin{gsc}\begin{verbatim}
879<request type='page' action='p' subaction='about'
880 lang='fr' output='html'>
881 <paramList>
882 <param name='c' value='demo'/>
883 </paramList>
884</request>
885\end{verbatim}\end{gsc}\end{quote}
886
887\begin{quote}\begin{gsc}\begin{verbatim}
888<request type='page' action='q' lang='en' output='html'>
889 <paramList>
890 <param name='s' value='TextQuery'/>
891 <param name='c' value='demo'/>
892 <param name='rt' value='r'/>
893 <!-- the rest are the service specific params -->
894 <param name='ca' value='0'/> <!-- casefold -->
895 <param name='st' value='1'/> <!-- stem -->
896 <param name='m' value='10'/> <!-- maxdocs -->
897 <param name='q' value='snail'/> <!-- query string -->
898 </paramList>
899</request>
900\end{verbatim}\end{gsc}\end{quote}
901
902The Receptionist routes the message to the appropriate Action (determined by looking up its shortname$->$Action object map). The actions determine what information is needed from the server and retrieves it, making one or more internal requests to the MessageRouter. This information is gathered together into a single response, and returned to the Receptionist. The Receptionist may process the result further, depending on what type of Receptionist is it.
903
904
905\begin{table}
906{\footnotesize
907\begin{tabular}{lll}
908\hline
909\bf Argument & \bf Meaning &\bf Typical values \\
910\hline
911a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
912& & s (system)\\
913sa & subaction & home, about (page action)\\
914c & collection or & demo, build \\
915& service cluster \\
916s & service name & TextQuery, ImportCollection \\
917rt & request type & d (display), r (request), s (status) \\
918ro & response only & 0 or 1 - if set to one, the request is carried out \\
919& & but no processing of the results is done \\
920& & currently only used in process actions \\
921o & output type & XML, html, WML \\
922l & language & en, fr, zh ...\\
923d & document id & HASHxxx \\
924r & resource id & ???\\
925pid & process handle & an integer identifying a particular process request \\
926\hline
927\end{tabular}}
928\caption{Generic arguments that can appear in a Greenstone URL}
929\label{tab:args}
930\end{table}
931
932\subsection{'describe'-type messages}\label{sec:describe}
933
934The most basic of the internal standard requests is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The response is predefined apart from any language-specific text strings, which are put together as each request comes in, based on the language attribute of the request.
935\begin{quote}\begin{gsc}\begin{verbatim}
936<request lang='en' type='describe' to=''/>
937\end{verbatim}\end{gsc}\end{quote}
938If the \gst{to} field is empty, a request is answered by the MessageRouter.
939An example response from a MessageRouter might look like this:
940\begin{quote}\begin{gsc}\begin{verbatim}
941<response lang='en' type='describe'>
942 <serviceList/>
943 <siteList>
944 <site name='org.greenstone.gsdl1'
945 address='http://localhost:8080/soap/servlet/rpcrouter'
946 type='soap' />
947 </siteList>
948 <serviceClusterList>
949 <serviceCluster name="build" />
950 </serviceClusterList>
951 <collectionList>
952 <collection name='org.greenstone.gsdl1/
953 org.greenstone.gsdl2/fao' />
954 <collection name='org.greenstone.gsdl1/demo' />
955 <collection name='org.greenstone.gsdl1/fao' />
956 <collection name='myfiles' />
957 </collectionList>
958</response>
959\end{verbatim}\end{gsc}\end{quote}
960This MessageRouter has no individual site-wide services (an empty \gst{<serviceList>}), but has a service cluster called build (which provides collection importing and building functionality). It
961communicates with one site, \gst{org.greenstone.gsdl1}. It is aware of four
962collections. One of these, \gst{myfiles}, belongs to it; the other three are
963available through the external site. One of those collections is actually from
964a further external site.
965
966It is possible to ask just for a specific part of the information provided by a
967describe request, rather than the whole thing. For example, these two
968messages get the \gst{collectionList} and the \gst{siteList} respectively:
969\begin{quote}\begin{gsc}\begin{verbatim}
970<request lang='en' type='describe' to=''>
971 <paramList>
972 <param name='subset' value='collectionList'/>
973 </paramList>
974</request>
975
976<request lang='en' type='describe' to=''>
977 <paramList>
978 <param name='subset' value='siteList'/>
979 </paramList>
980</request>
981\end{verbatim}\end{gsc}\end{quote}
982
983When a collection or service cluster is asked to describe itself, what is returned is a list of metadata, some display elements, and a list of services. For example, here is such
984a message, along with a sample response.
985
986\begin{quote}\begin{gsc}\begin{verbatim}
987<request lang='en' type='describe' to='mgppdemo'/>
988
989<response from="mgppdemo" type="describe">
990 <collection name="mgppdemo">
991 <displayItem lang="en" name="name">greenstone mgpp demo
992 </displayItem>
993 <displayItem lang="en" name="description">This is a
994 demonstration collection for the Greenstone digital
995 library software. It contains a small subset (11 books)
996 of the Humanity Development Library. It is built with
997 mgpp.</displayItem>
998 <displayItem lang="en" name="icon">mgppdemo.gif</displayItem>
999 <serviceList>
1000 <service name="DocumentStructureRetrieve" type="retrieve" />
1001 <service name="DocumentMetadataRetrieve" type="retrieve" />
1002 <service name="DocumentContentRetrieve" type="retrieve" />
1003 <service name="ClassifierBrowse" type="browse" />
1004 <service name="ClassifierBrowseMetadataRetrieve"
1005 type="retrieve" />
1006 <service name="TextQuery" type="query" />
1007 <service name="FieldQuery" type="query" />
1008 <service name="AdvancedFieldQuery" type="query" />
1009 <service name="PhindApplet" type="applet" />
1010 </serviceList>
1011 <metadataList>
1012 <metadata name="creator">[email protected]</metadata>
1013 <metadata name="numDocs">11</metadata>
1014 <metadata name="buildType">mgpp</metadata>
1015 <metadata name="httpPath">http://kanuka:8090/gsdl3/sites/
1016 localsite/collect/mgppdemo</metadata>
1017 </metadataList>
1018 </collection>
1019</response>
1020\end{verbatim}\end{gsc}\end{quote}
1021
1022The subset parameter can also be used in a describe request to a collection, to retrieve just the \gst{metadataList} or \gst{serviceList}.
1023
1024This collection provides many typical services. Notice how this response lists the services available, while the collection configuration file for this collection (Figure~\ref{fig:collconfig}) described serviceRacks. Once the service racks have been configured, they become transparent in the system, and only services are referred to.
1025There are three document retrieval services, for structural information, metadata, and content. The Classifier services retrieve classification structure and metadata. These five services were all provided by the GS2MGPPRetrieve ServiceRack. The three query services were provided by GS2MGPPSearch serviceRack, and provide different kinds of query interface. The last service, PhindApplet, is provided by the PhindPhraseBrowse serviceRack and is an applet service.
1026
1027A \gst{describe} request sent to a service returns a list of parameters that
1028the service accepts, some display information, (and in future may describe the content type for the request and response).
1029
1030Parameters can by in the following formats:
1031\begin{quote}\begin{gsc}\begin{verbatim}
1032<param name='xxx' type='integer|boolean|string|invisible' default='yyy'/>
1033<param name='xxx' type='enum_single|enum_multi' default='aa'/>
1034 <option name='aa'/><option name='bb'/>...
1035</param>
1036<param name='xxx' type='multi' occurs='4'>
1037 <param .../>
1038 <param .../>
1039</param>
1040\end{verbatim}\end{gsc}\end{quote}
1041
1042If no default is specified, the parameter is assumed to be mandatory.
1043Here are some examples of parameters:
1044\begin{quote}\begin{gsc}\begin{verbatim}
1045<param name='case' type='boolean' default='0'/>
1046
1047<param name='maxDocs' type='integer' default='50'/>
1048
1049<param name='index' type='enum' default='dtx'>
1050 <option name='dtx'/>
1051 <option name='stt'/>
1052 <option name='stx'/>
1053<param>
1054
1055<!-- this one is for the text box and field list for the
1056simple field query-->
1057<param name='simpleField' type='multi' occurs='4'>
1058 <param name='fqv' type='string'/>
1059 <param name='fqf' type='enum_single'>
1060 <option name='TI'/><option name='AU'/><option name='OR'/>
1061 </param>
1062</param>
1063
1064\end{verbatim}\end{gsc}\end{quote}
1065The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or many items, respectively, can be selected.
1066A multi-type parameter indicates that two or more parameters are associated, and should be displayed appropriately. For example, in a field query, the text box and field list should be associated. The occurs attribute specifies how many times the parameter should be displayed on the page.
1067Parameters also come with display information: all the text strings needed to present them to the user. These include the name of the parameter and the display values for any options. These are included in the above parameter descriptions in the form of \gst{<displayItem>} elements.
1068
1069A service description also contains some display information---this includes the name of the service, and the text for the submit button.
1070
1071Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} gives an example html search form that may be generated from this describe response.
1072
1073\begin{quote}\begin{gsc}\begin{verbatim}
1074<request lang="en" to="mgppdemo/FieldQuery" type="describe" />
1075
1076<response from="mgppdemo/FieldQuery" type="describe">
1077 <service name="FieldQuery" type="query">
1078 <displayItem name="name">Form Query</displayItem>
1079 <displayItem name="submit">Search</displayItem>
1080 <paramList>
1081 <param default="Doc" name="level" type="enum_single">
1082 <displayItem name="name">Granularity to search at</displayItem>
1083 <option name="Doc">
1084 <displayItem name="name">Document</displayItem>
1085 </option>
1086 <option name="Sec">
1087 <displayItem name="name">Section</displayItem>
1088 </option>
1089 <option name="Para">
1090 <displayItem name="name">Paragraph</displayItem>
1091 </option>
1092 </param>
1093 <param default="1" name="case" type="boolean">
1094 <displayItem name="name">Turn casefolding </displayItem>
1095 <option name="0">
1096 <displayItem name="name">off</displayItem>
1097 </option>
1098 <option name="1">
1099 <displayItem name="name">on</displayItem>
1100 </option>
1101 </param>
1102 <param default="1" name="stem" type="boolean">
1103 <displayItem name="name">Turn stemming </displayItem>
1104 <option name="0">
1105 <displayItem name="name">off</displayItem>
1106 </option>
1107 <option name="1">
1108 <displayItem name="name">on</displayItem>
1109 </option>
1110 </param>
1111 <param default="10" name="maxDocs" type="integer">
1112 <displayItem name="name">Maximum documents to return
1113 </displayItem>
1114 </param>
1115 <param name="simpleField" occurs="4" type="multi">
1116 <displayItem name="name"></displayItem>
1117 <param name="fqv" type="string">
1118 <displayItem name="name">Word or phrase </displayItem>
1119 </param>
1120 <param default="ZZ" name="fqf" type="enum_single">
1121 <displayItem name="name">in field</displayItem>
1122 <option name="ZZ">
1123 <displayItem name="name">allfields</displayItem>
1124 </option>
1125 <option name="TX">
1126 <displayItem name="name">text</displayItem>
1127 </option>
1128 <option name="TI">
1129 <displayItem name="name">Title</displayItem>
1130 </option>
1131 <option name="SU">
1132 <displayItem name="name">Subject</displayItem>
1133 </option>
1134 <option name="ORG">
1135 <displayItem name="name">Organization</displayItem>
1136 </option>
1137 <option name="SO">
1138 <displayItem name="name">Source</displayItem>
1139 </option>
1140 </param>
1141 </param>
1142 </paramList>
1143 </service>
1144</response>
1145\end{verbatim}\end{gsc}\end{quote}
1146
1147\begin{figure}[t]
1148 \centering
1149 \includegraphics[width=3.5in]{query2.ps}
1150 \caption{The previous query service describe response as displayed on the search page.}
1151 \label{fig:query-display}
1152\end{figure}
1153
1154A describe request to an applet type service returns the applet html element: this will be embedded into a web page to run the applet.
1155\begin{quote}\begin{gsc}\begin{verbatim}
1156<request type='describe' to='mgppdemo/PhindApplet'/>
1157
1158<response type='describe'>
1159 <service name='PhindApplet' type='query'>
1160 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
1161 jaxp.jar, xml-apis.jar'
1162 CODE='org.greenstone.applet.phind.Phind.class'
1163 CODEBASE='lib/java'
1164 HEIGHT='400' WIDTH='500'>
1165 <PARAM NAME='library' VALUE=''/>
1166 <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
1167 <PARAM NAME='collection' VALUE='mgppdemo' />
1168 <PARAM NAME='classifier' VALUE='1' />
1169 <PARAM NAME='orientation' VALUE='vertical' />
1170 <PARAM NAME='depth' VALUE='2' />
1171 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
1172 <PARAM NAME='backdrop' VALUE='interfaces/default/>
1173 images/phindbg1.jpg'/>
1174 <PARAM NAME='fontsize' VALUE='10' />
1175 <PARAM NAME='blocksize' VALUE='10' />
1176 The Phind java applet.
1177 </applet>
1178 <displayItem name="name">Browse phrase hierarchies</displayItem>
1179 </service>
1180</response>
1181\end{verbatim}\end{gsc}\end{quote}
1182
1183Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the Receptionist must fill in this parameter before displaying the html.
1184
1185\subsubsection{'system'-type messages}\label{sec:system}
1186
1187``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently they are initiated by particular CGI parameters (see Section~\ref{sec:runtime-config}).
1188
1189The basic format of a system request is as follows:
1190
1191\begin{quote}\begin{gsc}\begin{verbatim}
1192<request type='system' to=''>
1193 <system .../>
1194</request>
1195\end{verbatim}\end{gsc}\end{quote}
1196
1197One or more actual requests are specified in system elements. The following are examples:
1198\begin{quote}\begin{gsc}\begin{verbatim}
1199<system type='configure' subset=''/>
1200<system type='configure' subset='collectionList'/>
1201<system type='activate' moduleType='collection' moduleName='demo'/>
1202<system type='deactivate' moduleType='site' moduleName='site1'/>
1203\end{verbatim}\end{gsc}\end{quote}
1204
1205The first request reconfigures the whole site---the MessageRouter goes through its whole configure process again. The second request just reconfigures the collectionList---the MessageRouter will delete all its collection information, and re-look through the collect directory and reload all the collections again.
1206The third request is to activate collection demo. This could be a new collection, or a reactivation of an old one. If a collection module already exists, it will be deleted, and a new one loaded. The final request deactivates the site site1---this removes the site from the siteList and module map, and also removes any of that sites collections/services from the static lists.
1207
1208
1209A response just contains a status message, for example:
1210\begin{quote}\begin{gsc}\begin{verbatim}
1211<response from="">
1212 <status>collectionList reconfigured successfully</status>
1213</response>
1214\end{verbatim}\end{gsc}\end{quote}
1215
1216At some stage, an error or status code should be included.
1217
1218System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests.
1219
1220\subsection{'format'-type messages}\label{sec:format}
1221
1222Collection designers are able to specify how their collection looks to a certain degree. They can specify format statements for display that will apply to the results of a search, the display of a document, entries in a classification hierarchy, for example. This info is generally service specific. All services respond to a format request, where they return any service specific formatting information. A typical request and response looks like this:
1223\begin{quote}\begin{gsc}\begin{verbatim}
1224<request lang="en" to="mgppdemo/FieldQuery" type="format" />
1225
1226<response from="mgppdemo/FieldQuery" type="format">
1227 <format>
1228 <gsf:template match="documentNode"><td><gsf:link>
1229 <gsf:metadata name="Title" />(<gsf:metadata name="Source" />)
1230 </gsf:link></td>
1231 </gsf:template>
1232 </format>
1233</response>
1234\end{verbatim}\end{gsc}\end{quote}
1235
1236The actual format statements are described in Section~\ref{sec:formatstmt}. They are templates written directly in XSLT, or in GSF, which stands for Greenstone Format, and is a simple XML representation of the more complicated XSLT templates.
1237GSF style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format XML is transformed to XSLT using XSLT with the config\_format.xsl stylesheet.
1238
1239\subsection{'status'-type messages}\label{sec:status}
1240
1241These are only used with process-type services, which are those where a request is sent to start some type of process (see Section~\ref{sec:process}). The initial response states whether the process had successfully started, and whether its still continuing. If the process is not finished, status requests can be sent repeatedly to the service to poll the status, using the pid to identify the process. Status codes are used to identify the state of a process. The values used at the moment are listed in Table~\ref{tab:status codes}\footnote{A more standard set of codes should probably be used, for example, the HTTP codes}.
1242
1243\begin{table}
1244\caption{Status codes currently used in Greenstone 3}
1245\label{tab:status codes}
1246{\footnotesize
1247\begin{tabular}{llp{8cm}}
1248\hline
1249\bf code name & \bf code & \bf meaning \\
1250& \bf value & \\
1251\hline
1252SUCCESS & 1 & the request was accepted, and the process was completed \\
1253ACCEPTED & 2 & the request was accepted, and the process has been started, but it is not completed yet \\
1254ERROR & 3 & there was an error and the process was stopped \\
1255CONTINUING & 10 & the process is still continuing \\
1256COMPLETED & 11 & the process has finished \\
1257HALTED & 12 & the process has stopped \\
1258INFO & 20 & just an info message that doesn't imply anything \\
1259\hline
1260\end{tabular}}
1261\end{table}
1262
1263 The following shows an example status request, along with two responses, the first a 'OK but continuing' response, and the second a 'successfully completed' response. The content of the status elements in the two responses is the output from the process since the last status update was sent back.
1264
1265\begin{quote}\begin{gsc}\begin{verbatim}
1266<request lang="en" to="build/ImportCollection" type="status">
1267 <paramList>
1268 <param name="pid" value="2" />
1269 </paramList>
1270</request>
1271
1272<response from="build/ImportCollection">
1273 <status code="2" pid="2">Collection construction: import collection.
1274command = import.pl -collectdir /research/kjdon/home/gsdl3/web/sites/
1275 localsite/collect test1
1276starting
1277 </status>
1278</response>
1279
1280<response from="build/ImportCollection">
1281 <status code="11" pid="2">RecPlug: getting directory
1282/research/kjdon/home/gsdl3/web/sites/localsite/collect/test1/import
1283WARNING - no plugin could process /.keepme
1284
1285*********************************************
1286Import Complete
1287*********************************************
1288* 1 document was considered for processing
1289* 0 were processed and included in the collection
1290* 1 was rejected. See /research/kjdon/home/gsdl3/web/sites/
1291 localsite/collect/test1/etc/fail.log for a list of rejected documents
1292Success
1293 </status>
1294</response>
1295\end{verbatim}\end{gsc}\end{quote}
1296
1297\subsubsection{process messages}
1298
1299Process requests and responses provide the major functionality of the system---these are the ones that do the actual work. The format depends on the service they are for, so I'll describe these by service.
1300
1301Query type services TextQuery, FieldQuery, AdvancedFieldQuery (GS2MGSearch, GS2MGPPSearch), TextQuery (LuceneSearch)
1302The main type of requests in the system are for services. There are different types of services, currently: \gst{query}, \gst{browse}, \gst{retrieve}, \gst{process}, \gst{applet}, \gst{enrich}. Query services do some kind of search and return a list of document identifiers. Retrieve services can return the content of those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. Process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet. Enrich services take a document and return the document with some extra markup added.
1303
1304 Other possibilities include transform, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
1305
1306The basic structure of a service 'process' request is as follows:
1307\begin{quote}\begin{gsc}\begin{verbatim}
1308
1309<request lang='en' type='process' to='demo/TextQuery'>
1310 <paramList/>
1311 other elements...
1312</request>
1313
1314\end{verbatim}\end{gsc}\end{quote}
1315
1316The parameters are name-value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
1317
1318\begin{quote}\begin{gsc}\begin{verbatim}
1319<param name='case' value='1'/>
1320<param name='maxDocs' value='34'/>
1321<param name='index' value='dtx'/>
1322\end{verbatim}\end{gsc}\end{quote}
1323
1324Some requests have other content---for document retrieval, this would be a list of document identifiers to retrieve. For metadata retrieval, the content is the list of documents to retrieve metadata for.
1325
1326Responses vary depending on the type of request. The following sections look at the process type requests and responses for each type of service.
1327
1328\subsubsection{'query'-type services}
1329Responses to query requests contain a list of document identifiers, along with some other information, dependent on the query type. For a text query, this includes term frequency information, and some metadata about the result. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
1330
1331The following shows an example query request and its response.
1332
1333Find at most 10 Sections in the mgppdemo collection, containing the word snail (stemmed), returning the results in ranked order:
1334\begin{quote}\begin{gsc}\begin{verbatim}
1335<request lang='en' to="mgppdemo/TextQuery" type="process">
1336 <paramList>
1337 <param name="maxDocs" value="10"/>
1338 <param name="queryLevel" value="Section"/>
1339 <param name="stem" value="1"/>
1340 <param name="matchMode" value="some"/>
1341 <param name="sortBy" value="1"/>
1342 <param name="index" value="t0"/>
1343 <param name="case" value="0"/>
1344 <param name="query" value="snail"/>
1345 </paramList>
1346</request>
1347
1348<response from="mgppdemo/TextQuery" type="process">
1349 <metadataList>
1350 <metadata name="numDocsMatched" value="59" />
1351 </metadataList>
1352 <documentNodeList>
1353 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
1354 docType='hierarchy' nodeType="leaf" />
1355 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"
1356 docType='hierarchy' nodeType="leaf" />
1357 <documentNode nodeID="HASH010f073f22033181e206d3b7.1"
1358 docType='hierarchy' nodeType="interior" />
1359 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.2.2"
1360 docType='hierarchy' nodeType="leaf" />
1361 ...
1362 </documentNodeList>
1363 <termList>
1364 <term field="" freq="454" name="snail" numDocsMatch="58" stem="3">
1365 <equivTermList>
1366 <term freq="" name="Snail" numDocsMatch="" />
1367 <term freq="" name="snail" numDocsMatch="" />
1368 <term freq="" name="Snails" numDocsMatch="" />
1369 <term freq="" name="snails" numDocsMatch="" />
1370 </equivTermList>
1371 </term>
1372 </termList>
1373</response>
1374\end{verbatim}\end{gsc}\end{quote}
1375
1376The list of document identifiers includes some information about document type and node type. Currently, document types include \gst{simple}, \gst{paged} and \gst{hierarchy}. \gst{simple} is for single section documents, i.e. ones with no sub-structure. \gst{paged} is documents that have a single list of sections, while \gst{hierarchy} type documents have a hierarchy of nested sections. For \gst{paged} and \gst{hierarchy} type documents, the node type identifies whether a section is the root of the document, an internal section, or a leaf.
1377
1378The term list identifies, for each term in the query, what its frequency in the collection is, how many documents contained that term, and a list of its equivalent terms (if stemming or casefolding was used).
1379
1380\subsubsection{'browse'-type services}
1381
1382Browse type services are used for classification browsing. The request consists of a list of classifier identifiers, and some structure parameters listing what structure to retrieve.
1383
1384\begin{quote}\begin{gsc}\begin{verbatim}
1385<request lang="en" to="mgppdemo/ClassifierBrowse" type="process">
1386 <paramList>
1387 <param name="structure" value="ancestors" />
1388 <param name="structure" value="children" />
1389 </paramList>
1390 <classifierNodeList>
1391 <classifierNode nodeID="CL1.2" />
1392 </classifierNodeList>
1393</request>
1394
1395<response from="mgppdemo/ClassifierBrowse" type="process">
1396 <classifierNodeList>
1397 <classifierNode nodeID="CL1">
1398 <nodeStructure>
1399 <classifierNode nodeID="CL1">
1400 <classifierNode nodeID="CL1.2">
1401 <classifierNode nodeID="CL1.2.1" />
1402 <classifierNode nodeID="CL1.2.2" />
1403 <classifierNode nodeID="CL1.2.3" />
1404 <classifierNode nodeID="CL1.2.4" />
1405 <classifierNode nodeID="CL1.2.5" />
1406 </classifierNode>
1407 </classifierNode>
1408 </nodeStructure>
1409 </classifierNode>
1410 </classifierNodeList>
1411</response>
1412\end{verbatim}\end{gsc}\end{quote}
1413
1414Possible values for structure parameters are \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. The response gives, for each identifier in the request, a \gst{<nodeStructure>} element with all the requested structure put together into a hierarchy. The structure may include classifier and document nodes.
1415
1416
1417\subsubsection{'retrieve'-type services}
1418
1419Retrieval services are special in that requests are not explicitly initiated by a user from a form on a web page, but are called from actions in response to other things. This means that their names are hard-coded into the Actions. DocumentContentRetrieve, DocumentStructureRetrieve and DocumentMetadataRetrieve are the standard names for retrieval services for content, structure, and metadata of documents. Requests to each of these include a list of document identifiers. Because these generally refer to parts of documents, the elements are called \gst{<documentNode>}. For the content, that is all that is required. For the metadata retrieval service, the request also needs parameters specifying what metadata is required. For structure retrieval services, requests need parameters specifying what structure or structural info is required.
1420
1421Some example requests and responses follow.
1422
1423Give me the Title metadata for these documents:
1424\begin{quote}\begin{gsc}\begin{verbatim}
1425
1426<request lang="en" to="mgppdemo/DocumentMetadataRetrieve" type="process">
1427 <paramList>
1428 <param name="metadata" value="Title" />
1429 </paramList>
1430 <documentNodeList>
1431 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"/>
1432 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"/>
1433 <documentNode nodeID="HASH010f073f22033181e206d3b7.1"/>
1434 ...
1435 </documentNodeList>
1436</request>
1437
1438<response from="mgppdemo/DocumentMetadataRetrieve" type="process">
1439 <documentNodeList>
1440 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1441 <metadataList>
1442 <metadata name="Title">Putting snails in your second pen</metadata>
1443 </metadataList>
1444 </documentNode>
1445 <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12">
1446 <metadataList>
1447 <metadata name="Title">Now you must decide</metadata>
1448 </metadataList>
1449 </documentNode>
1450 <documentNode nodeID="HASH010f073f22033181e206d3b7.1">
1451 <metadataList>
1452 <metadata name="Title">Introduction</metadata>
1453 </metadataList>
1454 </documentNode>
1455 </documentNodeList>
1456</response>
1457\end{verbatim}\end{gsc}\end{quote}
1458
1459One or more parameters specifying metadata may be included in a request. Also, a value of \gst{all} will retrieve all the metadata for each document.
1460
1461Any browse-type service must also implement a metadata retrieval service to provide metadata for the nodes in the classification hierarchy. The name of it is the browse service name plus \gst{MetadataRetrieve}. For example, the ClassifierBrowse service described in the previous section should also have a ClassifierBrowseMetadataRetrieve service. The request and response format is exactly the same as for the DocumentMetadataRetrieve service, except that \gst{<documentNode>} elements are replaced by \gst{<classifierNode>} elements (and the corresponding list element is also changed).
1462
1463Give me the text (content) of this document:
1464\begin{quote}\begin{gsc}\begin{verbatim}
1465<request lang="en" to="mgppdemo/DocumentContentRetrieve" type="process">
1466 <paramList />
1467 <documentNodeList>
1468 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
1469 </documentNodeList>
1470</request>
1471
1472<response from="mgppdemo/DocumentContentRetrieve" type="process">
1473 <documentNodeList>
1474 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1475 <nodeContent>&lt;Section&gt;
1476 &lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
1477 &lt;P ALIGN=&quot;JUSTIFY&quot;&gt;190. When the plants in
1478 your second pen have grown big enough to provide food and
1479 shelter, you can put in the snails.&lt;/P&gt;
1480 </nodeContent>
1481 </documentNode>
1482 </documentNodeList>
1483</response>
1484\end{verbatim}\end{gsc}\end{quote}
1485
1486The content of a node is returned in a \gst{<nodeContent>} element. In this case it is escaped HTML.
1487
1488Give me the ancestors and children of the specified node, along with the number of siblings it has:
1489\begin{quote}\begin{gsc}\begin{verbatim}
1490<request lang="en" to="mgppdemo/DocumentStructureRetrieve" type="process">
1491 <paramList>
1492 <param name="structure" value="ancestors" />
1493 <param name="structure" value="children" />
1494 <param name="info" value="numSiblings" />
1495 </paramList>
1496 <documentNodeList>
1497 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
1498 </documentNodeList>
1499</request>
1500
1501<response from="mgppdemo/DocumentStructureRetrieve" type="process">
1502 <documentNodeList>
1503 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
1504 <nodeStructureInfo>
1505 <info name="numSiblings" value="2" />
1506 </nodeStructureInfo>
1507 <nodeStructure>
1508 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd"
1509 docType='hierarchy' nodeType="root">
1510 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4"
1511 docType='hierarchy' nodeType="interior">
1512 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
1513 docType='hierarchy' nodeType="leaf" />
1514 </documentNode>
1515 </documentNode>
1516 </nodeStructure>
1517 </documentNode>
1518 </documentNodeList>
1519</response>
1520\end{verbatim}\end{gsc}\end{quote}
1521
1522Structure is returned inside a \gst{<nodeStructure>} element, while structural info is returned in a \gst{<nodeStructureInfo>} element. Possible values for structure parameters are as for browse services: \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}.
1523
1524\subsubsection{'process'-type services}\label{sec:process}
1525Requests to process-type services are not requests for data---they request some action to be carried out, for example, create a new collection, or import a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a response is sent back after a successful start to the command. The status may be polled by the requester to see how the process is going.
1526
1527Process requests generally contain just a parameter list. Like for any service, the parameters used by a process-type service can be obtained by a describe request to that service.
1528
1529Here are two example requests for process-services that are part of the build service cluster (hence the addresses all begin with 'build/'), followed by an example response:
1530
1531\begin{quote}\begin{gsc}\begin{verbatim}
1532<request lang='en' type='process' to='build/NewCollection'>
1533 <paramList>
1534 <param name='creator' value='[email protected]'/>
1535 <param name='collName' value='the demo collection'/>
1536 <param name='collShortName' value='demo'/>
1537 </paramlist>
1538</request>
1539
1540<request lang='en' type='process' to='build/ImportCollection'>
1541 <paramList>
1542 <param name='collection' value='demo'/>
1543 </paramlist>
1544</request>
1545
1546<response from="build/ImportCollection">
1547 <status code="2" pid="2">Starting process...</status>
1548</response>
1549\end{verbatim}\end{gsc}\end{quote}
1550
1551The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of the status element is (currently) just the output from the process so far. Status messages, which are described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.
1552
1553\subsubsection{'applet'-type services}
1554
1555Applet-type services are those that process the data for an applet. A request consists only of a list of parameters, and the response contains an \gst{<appletData>} element that contains the XML data to be returned to the applet. The format of this is entirely specific to the applet---there is no set format to the applet data.
1556
1557Here is an example request and response, used by the Phind applet:
1558\begin{quote}\begin{gsc}\begin{verbatim}
1559 <request type='query' to='mgppdemo/PhindApplet'>
1560 <paramList>
1561 <param name='pc' value='1'/>
1562 <param name='pptext' value='health'/>
1563 <param name='pfe' value='0'/>
1564 <param name='ple' value='10'/>
1565 <param name='pfd' value='0'/>
1566 <param name='pld' value='10'/>
1567 <param name='pfl' value='0'/>
1568 <param name='pll' value='10'/>
1569 </paramList>
1570 </request>
1571
1572 <response type='query' from='mgppdemo/PhindApplet'>
1573 <appletData>
1574 <phindData df='9' ef='46' id='933' lf='15' tf='296'>
1575 <expansionList end='10' length='46' start='0'>
1576 <expansion df='4' id='8880' num='0' tf='59'>
1577 <suffix> CARE</suffix>
1578 </expansion>
1579 ...
1580 </expansionList>
1581 <documentList end='10' length='9' start='0'>
1582 <document freq='78' hash='HASH4632a8a51d33c47a75c559' num='0'>
1583 <title>The Courier - N??159 - Sept- Oct 1996 Dossier Investing
1584 in People Country Reports: Mali ; Western Samoa
1585 </title>
1586 </document>
1587 ...
1588 </documentList>
1589 <thesaurusList end='10' length='15' start='0'>
1590 <thesaurus df='7' id='12387' tf='15' type='RT'>
1591 <phrase>PUBLIC HEALTH</phrase>
1592 </thesaurus>...
1593 </thesaurusList>
1594 </phindData>
1595 </appletData>
1596 </response>
1597
1598\end{verbatim}\end{gsc}\end{quote}
1599
1600\subsubsection{'enrich'-type services}
1601
1602Enrich services typically take some text of documents (inside \gst{<nodeContent>} tags) and returns the text marked up in some way. One example of this is the GatePOSTag service: this identifies Dates, Locations, People and Organizations in the text, and annotates the text with the labels. In the following example, the request is for Location and Dates to be identified.
1603*** TODO ****
1604\begin{quote}\begin{gsc}\begin{verbatim}
1605<request lang="en" to="GatePOSTag" type="process">
1606 <paramList>
1607 <param name="annotationType" value="Date,Location" />
1608 </paramList>
1609 <documentNodeList>
1610 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd">
1611 <nodeContent>
1612 FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS
1613 Rome 1986
1614 P-69
1615 ISBN 92-5-102397-2
1616 FAO 1986
1617 </nodeContent>
1618 </documentNode>
1619 </documentNodeList>
1620</request>
1621
1622<response from="GatePOSTag" type="process">
1623 <documentNodeList>
1624 <documentNode nodeID="HASHac0a04dd14571c60d7fbfd">
1625 <nodeContent>
1626 FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS
1627 <annotation type="Location">Rome</annotation>
1628 <annotation type="Date">1986</annotation>
1629 P-69
1630 ISBN 92-5-102397-2
1631 FAO <annotation type="Date">1986</annotation>
1632 </nodeContent>
1633 </documentNode>
1634 </documentNodeList>
1635</response>
1636\end{verbatim}\end{gsc}\end{quote}
1637
1638\subsection{Page generation}\label{sec:pagegen} **** REDO ********
1639
1640* talk general first: get data, get format info, transform gsf->xsl. transfrom xml->html
1641
1642* state saving. the XSLT files assume that arguments are saved somehow. This needs to be implemented outside Greenstone proper - we do this in the servlet, using something or other.
1643
1644URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the CGI-arguments to determine what requests need to be made to the system.
1645System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
1646
1647Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
1648
1649The basic page format is:
1650\begin{quote}\begin{gsc}\begin{verbatim}
1651<page>
1652 <pageRequest/>
1653 <pageResponse/>
1654</page>
1655\end{verbatim}\end{gsc}\end{quote}
1656
1657* show configuration and describe whats its used for
1658
1659There are two main elements in the page: pageRequest, pageResponse. The pageRequest is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The pageResponse contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (e.g. library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
1660
1661The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
1662
1663
1664Once the XML page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
1665located in interfaces/default/transforms. Collections, sites and other interfaces
1666can override these files by having their own copy of the appropriate
1667files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
1668interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
1669***TODO*** describe a bit more?? currently only can get this locally
1670
1671\subsubsection{Receptionists}\label{sec:recepts}
1672
1673The receptionist is the controlling module for the page generation part of greenstone. It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages.
1674
1675There are different ways of providing an interface to greenstone, from web based CGI style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.
1676
1677Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they don't do anything.
1678
1679TransformingReceptionist: This extends Receptionist, and overwrites postProcessPage to transform the page using XSLT. An XSLT is listed for each action in the receptionists configuration file, and this is used to transform the page. First, some display information, and configuration information is added to the page. Then it is transformed using the specified XSLT for the action, and returned.
1680
1681WebReceptionist: The WebReceptionist extends TransformingReceptionist. It doesn't do much else except some argument conversion. To keep the URLs short, parameters from the services are given shortnames, and these are used in the web pages.
1682
1683DefaultReceptionist: This extends WebReceptionist, and is the default one for greenstone 3 servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. The receptionist sends a describe request to the collection to get this, and appends it to the page before transformation using XSLT.
1684
1685NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from the ClassifierBrowse service.
1686
1687By default, the LibraryServlet uses DefaultReceptionist. However, there is an init-param called receptionist which can be set to make the servlet use a different one.
1688
1689\subsubsection{CGI arguments}
1690
1691The arguments used by the page come from several sources. Receptionist uses a couple, actions use some and services. the receptionist and actions are treated as a whole so must not have conflicting arguments. GSParams class specifies all the general basic arguments, and whether they should be saved or not. servlet has an init parameter params\_class, that specifies which params class to use - if subclass it. actions or receptionist may specify some new ones
1692
1693services may be created by different people, may be on a different site. cant guarantee no conflict with action params, or even with other services.
1694so service params are namespaced when they are put on the page. interface (recept and action) params will have no namespace) the default namespace is s1 (service1) - any parameters that are for the service will be prefixed by this. e.g. the case parameter for a search will be put in the page as s1.case.
1695The actions must now look for all the s1 parameters to send to the service.
1696
1697if there are two or more services combined on a page with a single submit button, they will use s1, s2, s3 etc as needed. the s parameter (service) will end up with a list e.g. s=TextQuery,MusicQuery, and the order of these determines the mapping order of the namespaces, ie s1 will be TextQuery, s2 MusicQuery.
1698
1699also talk about saving arguments - save ones that GSParams says to save, and any service ones should always save.
1700
1701\subsubsection{Page action}
1702* kind of info pages. other actions are associated with specific services.
1703* uses describe requests to modules
1704Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is
1705transformed using \gst{home.xsl}. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata
1706and a list of services, and the result is transformed using \gst{about.xsl}.
1707
1708
1709\subsubsection{Query action}
1710
1711The basic URL is \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}.
1712There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
1713For each page, the service description is requested from the service of the current collection (via a describe request). This is currently done every time the query page is
1714displayed, but should be cached. The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers
1715is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
1716documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the XSLT that will be used to transform the page (Formatter object??). The service description and query result are combined into a page of XML, which is
1717transformed using \gst{basicquery.xsl} to produce the html page.
1718
1719\subsubsection{Applet action}
1720
1721There are two types of request to the applet action: \gst{a=a \& rt=d\/} and
1722\gst{a=a \& rt=r\/}. The value \gst{rt=d\/} means ``display the applet.'' A
1723\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this
1724into the page, and the servlet returns the HTML.
1725
1726The value \gst{rt=r} signals a request from the applet. The result is returned
1727directly to the applet code, in XML. The other parameters are sent to the
1728service untransformed, and the result is passed directly back to the applet.
1729Applet action can therefore work with any applet whose service understands the
1730messages.
1731
1732Here are two examples of requests generated by the Applet action, along with their corresponding responses.
1733
1734The first request corresponds to the URL arguments \gst{a=a \&
1735rt=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
1736applet for the mgppdemo collection''.
1737
1738
1739The second request corresponds to the arguments \gst{a=a \& rt=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
1740indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the
1741request as parameters. The response is in a form suitable for the applet, placed inside
1742\gst{<appletData>} in a standard Greenstone message. AppletAction returns the
1743contents of appletData to the browser, i.e. to the applet itself.
1744
1745
1746Note that the applet HTML may need to know the name of the \gst{library}
1747program. However, that name is chosen by the person who installed the software
1748and will not necessarily be ``library''. To get around this, the applet can
1749put a parameter called ``library'' into the applet data with a null value:
1750\begin{quote}\begin{gsc}\begin{verbatim}
1751<PARAM NAME='library' VALUE=''/>
1752\end{verbatim}\end{gsc}\end{quote}
1753When the Applet action encounters this parameter it inserts the name of the
1754current library servlet as its value.
1755
1756\subsubsection{Document action}
1757
1758DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and
1759table of contents would be needed to make the display nicer.
1760
1761
1762\subsubsection{System action}\label{sec:system-action}
1763
1764SystemAction allows for manual reconfiguration of various components at run-time. There is no interactive web-page displaying the options, it merely turns a set of CGI arguments into an XML system request. The response from a system request is a message which is displayed to the user.
1765
1766\begin{table}
1767\caption{Configure CGI arguments}
1768\label{tab:system-cgi}
1769{\footnotesize
1770\begin{tabular}{ll}
1771\hline
1772\bf arg & \bf description\\
1773\hline
1774a=s & system action\\
1775sa=c$|$a$|$d & type of system request: c (configure), a (add/activate), \\
1776& d (delete/deactivate) \\
1777c=demo & the request will go to this collection/servicecluster \\
1778& instead of the message router\\
1779ss=collectionList & subset for configure: only reconfigure this part.\\
1780& For the MessageRouter, can be serviceClusterList, serviceList, \\
1781& collectionList, siteList.\\
1782& For a collection/cluster, can be metadataList or serviceList.\\
1783sn=demo & \\
1784st=collection& \\
1785\hline
1786\end{tabular}}
1787\end{table}
1788
1789
1790\subsubsection{Some class info - where should this go??}
1791\begin{table}
1792\caption{The utility classes in org.greenstone.gsdl3.util}
1793\label{tab:utils}
1794{\footnotesize
1795\begin{tabular}{lp{3.75in}}
1796\hline
1797\bf Utility class & \bf Description\\
1798\hline
1799ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\
1800Dictionary & wrapper around a Resource Bundle, providing strings with parameter\\
1801GSCGI & class to map between short name CGI arguments and long name request parameters \\
1802GSFile & class to create all Greenstone file paths e.g. used to locate configuration files, XSLT files and collection data. \\
1803GSHTML & provides convenience methods for dealing with HTML, e.g. making strings HTML safe\\
1804GSPath & used to create, examine and modify message address paths\\
1805GSStatus & some static codes for status messages\\
1806GSXML & lots of methods for extracting information out of Greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by Greenstone.\\
1807GSXSLT & some manipulation functions for Greenstone XSLT\\
1808Misc & miscellaneous functions\\
1809OID & class to handle Greenstone (2) OIDs\\
1810XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
1811XMLTransformer & methods to transform XML using XSLT \\
1812XSLTUtil & contains static methods to be called from within XSLT \\
1813\hline
1814\end{tabular}}
1815\end{table}
1816
1817\newpage
1818\section{Collection building architecture}\label{sec:develop-build}
1819**** GEORGE ****
1820how building actually works\\
1821the building structure/architecture\\
1822modules API\\
1823
1824\newpage
1825\section{Developing Greenstone 3: Adding new features}\label{sec:new-features}
1826
1827\subsection{Creating new services}\label{sec:new-services}
1828
1829*inherit from ServiceRack - abstract base class. this handles the main process method, determines the service name and request type. if request type is describe, and to is empty, it returns a list of services (short\_service\_info) which is initialised in the configure method. a describe request to a particular service results in getServiceDescription being called, which must be supplied by the subclass.
1830other request types (process) get sent to processXXX methods, where XXX is the service name.
1831
1832* what methods are expected
1833
1834*service type responses expected
1835
1836*a browse type service must also implement servicenameMetadataRetrieve service.
1837
1838* should a metadata retrieval service advertise what metadata is available??
1839\subsection{creating new actions/pages}\label{sec:new-pages}
1840
1841\subsection{new interfaces}\label{sec:new-interfaces}
1842e.g. java interface. where you can interface to. MR vs Receptionist. diff receptionists. egs, handheld - using servlet, transforming recpt, but new set of XSLT java program other program - talk to recpt but just get back XML data for pages. java gui - just talk to MR, do all processing itself.
1843
1844\subsection{Adding new classifiers}\label{sec:new-classifiers}
1845*** GEORGE ***
1846\subsection{Adding new plugins}\label{sec:new-plugins}
1847*** GEORGE ***
1848
1849\subsection{New types of collections}\label{sec:new-coll-types}
1850
1851There are two types of standard Greenstone collections: collections built with the Greenstone 3 building system, and collections that are imported from Greenstone 2. There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. Greenstone 3 has an ability to use any type of collection you can come up with, assuming some java code is provided.
1852
1853
1854There are four levels of customisation that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with Greenstone to describe these different levels.
1855
1856Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the Greenstone 2 MGPP collections were the first to be served in Greenstone 3. When we came to do Greenstone 2 MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
1857
1858The nzmaps collection used the same level of customisation, just implementing new services and fitting all the extra display elements into the standard query/display framework using javascript.
1859
1860The gberg collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
1861
1862To tell Greenstone to load up a different type of collection class, we use another configuration file: etc/collectionInit.xml. This specifies the name of the collection class to use.
1863Currently, this is all that is specified in that file, but you may want to add parameters for the class etc.
1864
1865\gst{<collectionInit class="XMLCollection"/>}
1866
1867The display for the collection is also quite different. The home page for the collection displays the list of documents. To achieve this, the describe response from the collection had to include the list, and a new XSLT was written for the collection that displayed this. Collection XSLT should be put in the transform directory of the collection\footnote{These are currently only used when running greenstone in a non-distributed fashion, but it will be added in properly at some stage}.
1868
1869Document display is significantly different to standard greenstone. There are two modes of display: table of contents mode, and content mode. Clicking on a document link from the collection home page takes the user to the table of contents for the collection. Clicking on one of the sections in the table of contents takes them to a display of that section. To facilitate this, not only do we need new XSLT files , we also needed a new action. XMLDocumentAction was created, that used two subactions, toc and text, for the different modes of display.
1870
1871The Receptionist was told about this new action by the addition of the following to the interfaceConfig.xml file:
1872
1873\begin{gsc}\begin{verbatim}
1874<action name='xd' class='XMLDocumentAction'>
1875 <subaction name='toc' xslt='document-toc.xsl'/>
1876 <subaction name='text' xslt='document-content.xsl'/>
1877</action>
1878\end{verbatim}\end{gsc}
1879
1880XSLT files are linked to subactions rather than the action as a whole. The collection supplies the two XSLT files written appropriately for the data it contains.
1881
1882All links that link to the documents have to be changed to use the xd action rather than the standard d action. These include the links from the home page, and the links from query results.
1883
1884Querying of the collection is almost the same as usual. The query service provides a list of parameters, does the query and then sends back a list of document identifiers. The standard query action was fine for this collection. The change occurs in the way that the results are displayed---this is accomplished using a format statement supplied in the collectionConfig file inside the search node.
1885
1886\begin{gsc}\begin{verbatim}
1887<search>
1888 <format>
1889 <gsf:template match="documentNode">
1890 <xsl:param name="collName"/>
1891 <xsl:param name="serviceName"/>
1892 <td>
1893 <b><a href="{$library_name}?a=xd&amp;sa=text&amp;c={$collName}&
1894 amp;d={@nodeID}&amp;p.a=q&amp;p.s={$serviceName}">
1895 <xsl:choose>
1896 <xsl:when test="metadataList/metadata[@name='Title']">
1897 <gsf:metadata name="Title"/>
1898 </xsl:when>
1899 <xsl:otherwise>(section)</xsl:otherwise>
1900 </xsl:choose>
1901 </a>
1902 </b> from <b><a href="{$library_name}?a=xd&amp;sa=toc&amp;
1903 c={$collName}&amp;d={@nodeID}.rt&amp;p.a=q&amp;p.s={$serviceName}">
1904 <gsf:metadata name="Title" select="root"/></a></b>
1905 </td>
1906 </gsf:template>
1907 </format>
1908</search>
1909\end{verbatim}\end{gsc}
1910
1911Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not greenstone format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with greenstone format elements.
1912
1913The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to may all the links absolute links to files in the collection folder, the other option is to put them in Greenstone's DTD folder gsdl3/resources/dtd.
1914
1915\subsection{The NZDL mirror site}
1916
1917The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using Greenstone 3 instead of Greenstone 2. It uses a new site and a new interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and interface.
1918
1919The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because its running on Linux, we were able to link to all the collections in the old greenstone installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
1920
1921A new interface, also called nzdl, was created in the interfaces directory.
1922In many cases, creating a new interface just requires the new images and XSLT to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This setup also required a bit more customisation.
1923
1924The standard Greenstone navigation bar lists all the services available for the collection. In Greenstone 2, the navigation bar provided the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produced the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
1925The standard receptionist (DefaultReceptionist) gathers a little bit of extra info for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface needed a bit more information than this: for the ClassifierBrowse service, if there was one, the list of classifiers and their display elements must be obtained. So a new Receptionist was written that inherited from DefaultReceptionist, and added this new info into the page.
1926
1927One of the servlet initialisation parameters is the receptionist class: this was added to the servlet definition in the web.xml file so that the LibraryServlet would load up the right receptionist class.
1928
1929
1930\newpage
1931\section{Distributed Greenstone}\label{sec:distributed}
1932
1933Greenstone is designed to run in a distributed fashion. One greenstone installation can talk to several sites on different computers. This requires some sort of communication protocol. Any protocol can be used, however we have only implemented a simple SOAP protocol.
1934
1935more explanation..
1936
1937\begin{figure}[h]
1938 \centering
1939 \includegraphics[width=4in]{remote} %5.8
1940 \caption{A distributed digital library configuration running over several servers}
1941 \label{fig:remote}
1942\end{figure}
1943
1944We have used Apache SOAP for Java. This is run as a servlet in Tomcat.
1945If you have obtained Greenstone through CVS, you will need to install soap separately, describe in Appendix~\ref{app:soap-cvs}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
1946
1947\subsection{Serving a site using soap}
1948what do we have to do?? resource file format, deploy the service etc.
1949
1950\appendix
1951
1952\newpage
1953\section{Using Greenstone 3 from CVS}\label{app:cvs}
1954
1955*** need to make sure building stuff is in here ***
1956
1957Greenstone 3 is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes. Whats in CVS is quite different to what comes in a release. The code needs to be compiled, and some files need editing...
1958
1959To check out the greenstone code, use:
1960
1961\begin{quote}\begin{gsc}\begin{verbatim}
1962cvs -d :pserver:cvs\[email protected]:2402/usr/local/
1963 global-cvs/gsdl-src co gsdl3
1964\end{verbatim}\end{gsc}\end{quote}
1965
1966If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some older versions of CVS have trouble accessing this repository due to the port number being present. We are using version 1.11.1p1.
1967
1968The software needs to be compiled and installed. The installation procedure has been semi-automated. The following sections describe installation under Linux and windows.
1969
1970\subsection{Linux install}
1971
1972An install.sh script is provided to compile and install Greenstone3. What you need to do is:
1973
1974\begin{quote}\begin{gsc}
1975cd gsdl3\\
1976source setup.bash\\
1977install.bash\\
1978source setup.bash\\
1979\end{gsc}\end{quote}
1980
1981Note: if you are using mozilla it doesn't seem to like localhost - you should edit the siteConfig files (web/sites/<sitename>/siteConfig.xml) to have your computer name instead of localhost.
1982
1983Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running Tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.
1984
1985To shutdown or startup Tomcat, the commands are:
1986\begin{quote}\begin{gsc}
1987\gsdlhome/comms/jakarta/tomcat/bin/shutdown.sh\\
1988\gsdlhome/comms/jakarta/tomcat/bin/startup.sh\\
1989\end{gsc}\end{quote}
1990
1991You shouldn't run install.bash twice.
1992To update your installation, you can run update.bash - this updates your code from CVS, and re-makes all the java stuff.
1993
1994\subsection{Windows install}
1995\newpage
1996\section{Tomcat}\label{app:tomcat}
1997
1998Tomcat is a servlet container. It is used to serve a Greenstone site using a servlet.
1999
2000The file \gst{\gsdlhome/comms/jakarta/tomcat/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \\
2001\gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}.
2002
2003
2004Tomcat runs by default on port 8080---this can be changed in server.xml, in the \gst{<!-- Define a non-SSL Coyote HTTP/1.1 Connector on port 8080 --><Connector>} element. The siteConfig files also need changing if Tomcat's port is changed: \gst{<httpAddress>} for the site, and \gst{<address>} for a remote site both use this.
2005
2006Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
2007\begin{bulletedlist}
2008\begin{gsc}
2009\item \gsdlhome/web/WEB-INF/web.xml
2010\item \gsdlhome/comms/jakarta/tomcat-tomcat-4.0.1/conf/server.xml
2011\end{gsc}
2012\item any classes or jar files used by the servlets
2013\end{bulletedlist}
2014\noindent Note: stdin and stdout for the servlets both go to\\
2015\gst{\gsdlhome/comms/jakarta/tomcat/logs/catalina.out}
2016
2017On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}), or by restarting Tomcat.
2018
2019We have set up Tomcat to follow symlinks. To disable this feature, remove the \gst{<Resources>} element from the gsdl3 context in \gst{\$GSDL3HOME/comms/jakarta/tomcat/conf/server.xml}:\\
2020
2021\begin{quote}\begin{gsc}
2022<Context path="/gsdl3" docBase="\$GSDL3HOME/web" debug="1" \\
2023reloadable="true">\\
2024 <Resources allowLinking='true'/>\\
2025</Context>\\
2026\end{gsc}\end{quote}
2027
2028We have set up tomcat to disallow directory listings for everything in the docBase directory. To turn this back on, you need to edit Tomcat's default web.xml file (\$GSDL3HOME/comms/jakarta/tomcat/conf/web.xml):
2029
2030In the default servlet definition, change the 'listings' parameter to true.
2031
2032Tomcat uses a Manager to handle HTTP session information. This may be stored between restarts if possible. To use a persistent session handling manager, uncomment the \gst{<Manager>} element in \gst{\$GSDL3HOME/comms/jakarta/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory: \gst{\$GSDL3HOME/comms/jakarta/tomcat/work/Standalone/localhost/gsdl3/SESSIONS.ser}. Delete this file to clear the cached session info.
2033
2034\subsection{Proxying tomcat with apache}
2035
2036Instead of incorporating servlet support into your existing web server, an easy alternative is to proxy tomcat. The \gst{http://www.greenstone.org/greenstone3} site uses apache to proxy Tomcat. ProxyPass and ProxyPassReverse directives need to be added to the Virtualhost description for the www.greenstone.org server.
2037
2038\begin{quote}\begin{gsc}
2039<VirtualHost xx.xx.xx.xx>\\
2040ServerName www.greenstone.org\\
2041...\\
2042ProxyPass /greenstone3 http://puka.cs.waikato.ac.nz:8080/gsdl3\\
2043ProxyPassReverse /greenstone3 http://puka.cs.waikato.ac.nz:8080/gsdl3\\
2044</VirtualHost>\\
2045\end{gsc}\end{quote}
2046
2047In our example, the greenstone 3 servlet can be accessed at \gst{http://www.greenstone.org/greenstone3/library}, instead of at \gst{http://puka.cs.waikato.ac.nz:8080/gsdl3/library}, which is not publically accessible.
2048
2049\subsection{Running tomcat behind a proxy}
2050
2051Almost everything works fine when tomcat is running behind a proxy. The only time this causes trouble is if the servlet itself needs to make external http connections. We do this in the infomine demo collection for example. One of the service classes sends http requests to the infomine database at riverside. Since this is going through the proxy, a username and password is needed. It is not sufficient to prompt the user for a password because they are unlikely to have a password for the particular proxy that tomcat is using. What we have done at present is to put a proxy element in the siteConfig.xml file. Here you have to enter a suitable username and password for the proxy server. Unfortunately these are entered in plain text. And the file is viewable via the servlet. So we need a better solution.
2052
2053\newpage
2054\section{SOAP}\label{app:soap}
2055\subsection{Setting up SOAP from CVS}\label{app:soap-cvs}
2056
2057If you have obtained greenstone through CVS, you will need to install the SOAP stuff by running:
2058
2059\begin{quote}\begin{gsc}
2060install-soap.bash
2061\end{gsc}\end{quote}
2062
2063This unpacks the soap distribution, adds a SOAP context to Tomcat's server.xml configuration file, and creates the file \gst{src/java/org/greenstone/gsdl3/SOAPServer.java} from \gst{src/java/org/greenstone/gsdl3/SOAPServer.java.in} (it has a place where gsdl3home needs to be added).
2064It also tries to deploy the SOAP service, but this often doesn't work. You may need to run from a shell the following command:
2065
2066\begin{gsc}\begin{verbatim}
2067java org.apache.soap.server.ServiceManagerClient
2068 http://localhost:8080/soap/servlet/rpcrouter deploy
2069 resources/soap/localsite.xml
2070\end{verbatim}\end{gsc}
2071
2072You can also deploy a service through the website. If Tomcat is not running, start it up (see \ref{subsec:runtomcat}).
2073
2074The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
2075
2076To deploy the SOAPServer for localsite:
2077
2078Click on ``deploy'' and edit the following fields in the deploy form:
2079
2080\begin{tabular}{ll}
2081ID: & org.greenstone.localsite\\
2082Scope: (choose Session & Request---new instantiation for each request\\
2083 or Application) & Session---same instantiation across a session\\
2084 & Application---only uses one instantiation\\
2085Methods: &process\\
2086Java Provider / Provider Class: & org.greenstone.gsdl3.SOAPServer\\
2087\end{tabular}
2088
2089Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the left hand ``List'' button.
2090
2091Information about deployed services is maintained between Tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart Tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
2092
2093\subsection{Debugging SOAP}\label{app:soap-debug}
2094
2095If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, use a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
2096To run it, type:
2097
2098\begin{quote}\gst{java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080}
2099\end{quote}
2100
21018070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify Greenstone to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
2102\begin{quote}\begin{gsc}\begin{verbatim}
2103<site name="org.greenstone.localsite"
2104 address="http://localhost:8080/soap/servlet/rpcrouter"
2105 type="soap"/>
2106\end{verbatim}\end{gsc}\end{quote}
2107
2108Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the
2109address for talking to the Tomcat SOAP servlet services.
2110
2111
2112\newpage
2113\section{Format statements: Greenstone 2 vs Greenstone 3}\label{app:format}
2114The following table shows the Greenstone 2 format elements, and their equivalents in Greenstone 3
2115\begin{table}
2116\caption{Greenstone 3 equivalents of Greenstone 2 format statements}
2117{\footnotesize
2118\begin{tabular}{ll}
2119\hline
2120\bf Greenstone 2 & \bf Greenstone 3 \\
2121\hline
2122\gst{[Text]} & \gst{<gsf:text/>} \\
2123\gst{[num]} & \gst{<gsf:metadata name='docnum'/>}\\
2124\gst{[link][/link]} & \gst{<gsf:link></gsf:link>} or \\
2125& \gst{<gsf:link type='document'></gsf:link>}\\
2126\gst{[srclink][/srclink]} & \gst{<gsf:link type='source'></gsf:link>}\\
2127\gst{[icon]} & \gst{<gsf:icon/>} or \\
2128& \gst{<gsf:icon type='document'/>}\\
2129\gst{[srcicon]} & \gst{<gsf:icon type='source'/>}\\
2130\gst{[Title]} (metadata) & \gst{<gsf:metadata name='Title'/>} or \\
2131& \gst{<gsf:metadata name='Title' select='current'/>}\\
2132\gst{[parent:Title]} & \gst{<gsf:metadata name='Title' select='parent' />}\\
2133\gst{[parent(All):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'/>}\\
2134\gst{[parent(Top):Title]} & \gst{<gsf:metadata name='Title' select='root' />}\\
2135\gst{[parent(All': '):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'}\\
2136& \gst{ separator=': ' />}\\
2137\gst{[sibling(All': '):Title]} & \gst{<gsf:metadata name='Title' multiple='true'} \\
2138& \gst{ separator=': ' />}\\
2139\gst{\{Or\}\{[dc.Title],} & \gst{<gsf:choose-metadata>}\\
2140\gst{ [dls.Title], [Title]\}}& \gst{ <gsf:metadata name='dc.Title'/>}\\
2141& \gst{ <gsf:metadata name='dls.Title'/>}\\
2142& \gst{ <gsf:metadata name='Title'/>}\\
2143& \gst{</gsf:choose-metadata>}\\
2144\gst{\{If\}\{[parent:Title],} & \gst{<gsf:choose-metadata>}\\
2145\gst{ [parent:Title], [Title]\}}& \gst{ <gsf:metadata name='Title' select='parent'/>}\\
2146& \gst{ <gsf:metadata name='Title'/>}\\
2147& \gst{</gsf:choose-metadata>}\\
2148\gst{\{If\}\{[Subject],} & \gst{<gsf:switch>}\\
2149\gst{ <td>[Subject]</td>\}}& \gst{ <gsf:metadata name='Subject'/>}\\
2150& \gst{ <gsf:when test='exists'>} \\
2151& \gst{ <td><gsf:metadata name='Subject'/></td>}\\
2152& \gst{ </gsf:when></gsf:switch>}\\
2153\hline
2154\end{tabular}}
2155\end{table}
2156\end{document}
Note: See TracBrowser for help on using the repository browser.