source: trunk/gsdl3/docs/manual/manual.tex@ 4162

Last change on this file since 4162 was 4162, checked in by kjdon, 21 years ago

partial update of teh manual

  • Property svn:keywords set to Author Date Id Revision
File size: 77.9 KB
Line 
1\documentclass[a4paper,11pt]{article}
2\usepackage{times,epsfig}
3\hyphenation{Message-Router Text-Query}
4
5\newenvironment{gsc}% Greenstone text bits
6{\begin{footnotesize}\begin{tt}}%
7{\end{tt}\end{footnotesize}}
8
9\newcommand{\gst}[1]{{\footnotesize \tt #1} }
10\begin{document}
11
12\title{A modular digital library:\\
13 Architecture and implementation of Greenstone3}
14
15% if you work on this manual, add your name here
16\author{Katherine Don and Ian H. Witten \\[1ex]
17 Department of Computer Science \\
18 University of Waikato \\ Hamilton, New Zealand \\
19 \{kjdon, ihw\}@cs.waikato.ac.nz}
20
21\date{}
22
23\maketitle
24
25\newenvironment{bulletedlist}%
26{\begin{list}{$\bullet$}{\setlength{\itemsep}{0pt}\setlength{\parsep}{0pt}}}%
27{\end{list}}
28
29
30\noindent
31Greenstone Digital Library Version 3 is a complete redesign and
32reimplementation of the Greenstone digital library software. The current
33version (Greenstone2) enjoys considerable success and is being widely used.
34Greenstone3 will capitalize on this success, and in addition it will
35\begin{bulletedlist}
36\item improve flexibility, modularity, and extensibility
37\item lower the bar for ``getting into'' the Greenstone code with a view to
38 understanding and extending it
39\item use XML where possible internally to improve the amount of
40 self-documentation
41\item make full use of existing XML-related standards and software
42\item provide improved internationalization, particularly in terms of sort order,
43 information browsing, etc.
44\item include new features that facilitate additional ``content management''
45 operations
46\item operate on a scale ranging from personal desktop to corporate library
47\item easily permit the incorporation of text mining operations
48\item use Java, to encourage multilinguality, X-compatibility, and to permit
49 easier inclusion of existing Java code (such as for text mining).
50\end{bulletedlist}
51Parts of Greenstone will remain in other languages (e.g. MG, MGPP); JNI (Java
52Native Interface) will be used to communicate with these.
53
54A description of the general design and architecture of Greenstone3 is covered by the document ``The design of Greenstone3: An agent based dynamic digital library'' (design-2002.ps, in the gsdl3/docs/manual directory).
55
56\section{System modules}\label{sec:modules}
57
58A Greenstone3 'library' system consists of many components... Figure~\ref{fig:local} shows they fit together in a stand-alone system.
59
60\begin{figure}[t]
61 \centering
62 \includegraphics[width=4in]{local} %5.8
63 \caption{A simple stand-alone site.}
64 \label{fig:local}
65\end{figure}
66
67
68{\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site.
69
70{\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, eg all the building services may be part of a cluster. What is part of a cluster is specified by the site config file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
71Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
72
73{\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory.
74
75{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
76
77{\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.
78
79{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in more detail in Section~\ref{sec:pagegen}.
80
81
82\section{Configuration}\label{sec:config}
83
84Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for
85the site, \gst{siteConfig.xml}. Each collection has two configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, that give metadata and other information for the
86collection.\footnote{\gst{siteConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
87Greenstone2.} The first includes user-defined metadata for the collection,
88such as its name and the {\em About this collection} text; and also gives
89instructions on how the collection is to be built. The second is produced by
90the build-time process and includes any metadata that can be determined
91automatically. It also includes configuration information for any serviceRacks needed by the collection.
92
93The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will have no effect. There are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
94
95\subsection{Site configuration file}\label{sec:siteconfig}
96
97The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of
98known external sites to connect to. Collections are not specified in the site
99configuration file, instead they are determined by the contents of the site's
100collections directory.
101
102The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat gsdl3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
103
104The first example in Figure~\ref{fig:siteconfig} shows a site configuration file for a rudimentary site with no site-wide services,
105which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster. It also connects to the first site using SOAP.
106These two sites are running on the same machine. For site gsdl1 to talk to site localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is \gst{http://localhost:8090/soap/servlet/rpcrouter}.
107
108
109\begin{figure}
110\begin{gsc}\begin{verbatim}
111<siteConfig>
112 <localSiteName value="org.greenstone.localsite"/>
113 <httpAddress value="http://localhost:8090/gsdl3/sites/localsite"/>
114 <serviceClusterList/>
115 <serviceRackList/>
116 <siteList/>
117</siteConfig>
118\end{verbatim}\end{gsc}
119
120\begin{gsc}\begin{verbatim}
121<siteConfig>
122 <localSiteName value="org.greenstone.gsdl1"/>
123 <httpAddress value="http://localhost:8090/gsdl3/sites/gsdl1"/>
124 <serviceClusterList>
125 <serviceCluster name="build">
126 <metadataList>
127 <metadata name="Title">Collection builder</metadata>
128 <metadata name="Description">Builds collections in a
129 gsdl2-style manner</metadata>
130 </metadataList>
131 <serviceRackList>
132 <serviceRack name="GS2Construct"/>
133 </serviceRackList>
134 </serviceCluster>
135 </serviceClusterList>
136 <siteList>
137 <site name="org.greenstone.localsite"
138 address="http://localhost:8090/soap/servlet/rpcrouter"
139 type="soap"/>
140 </siteList>
141</siteConfig>
142\end{verbatim}\end{gsc}
143\caption{Two sample site config files}
144\label{fig:siteconfig}
145\end{figure}
146
147
148
149\subsection{Collection configuration file}\label{sec:collconfig}
150
151The collection configuration file is where the collection designer (eg a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig}
152here is an example as it is at present.
153
154\begin{figure}
155\begin{gsc}\begin{verbatim}
156<collectionConfig xmlns:gsf="http://www.greenstone.org/
157 configformat">
158 <metadataList>
159 <metadata name="colName" lang="en">greenstone mgpp demo
160 </metadata>
161 <metadata name="colDescription" lang="en">This is a
162 demonstration collection for the Greenstone digital
163 library software. It contains a small subset (11 books)
164 of the Humanity Development Library.</metadata>
165 <metadata name="colDescription" lang="fr">C'est une
166 collection pour demonstration du logiciel Greenstone.
167 Elle contient une petite partie du projet de bibliotheques
168 humanitaires et de developpement (11 livres).</metadata>
169 <metadata name="colIcon">mgppdemo.gif</metadata>
170 </metadataList>
171 <search type='mgpp'>
172 <index name="tt" content="text,metadata"
173 level="Document,Section">
174 <displayName lang="en">books</displayName>
175 </index>
176 <format>
177 <gsf:template match="documentNode">
178 <td><gsf:link><gsf:metadata name="Title"/>(<gsf:metadata
179 name="Source"/>)</gsf:link></td>
180 </gsf:template>
181 </format>
182 </search>
183 <browse>
184 <classifier name="CL1" type="Hierarchy" content="Subject"
185 level="Document">
186 <option name="hfile" value="sub.txt"/>
187 <option name="sort" value="Title"/>
188 </classifier>
189 <classifier name="CL2" type="AZList" content="Title"
190 level="Document">
191 <displayName lang='en'>all titles</displayName>
192 <format>
193 <gsf:template match="classifierNode">
194 <td><gsf:link type="classifier"><gsf:metadata name="Title"/>
195 </gsf:link></td>
196 </gsf:template>
197 </format>
198 </classifier>
199 <classifier name="CL3" type="List" content="Keyword"
200 level="Document">
201 <format>
202 <gsf:template match="documentNode"><td><gsf:link>
203 <gsf:metadata name="Keyword"/></gsf:link></td></gsf:template>
204 </format>
205 </classifier>
206 <classifier type="Phind" content="text" level="Section"/>
207 </browse>
208</collectionConfig>
209\end{verbatim}\end{gsc}
210\caption{Sample collectionConfig.xml file}
211\label{fig:collconfig}
212\end{figure}
213
214The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8.
215The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list suold look like.
216The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in teh hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.
217
218There is also a need for a descripiton of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.
219
220\subsection{Building configuration file}\label{sec:buildconfig}
221
222The file \gst{buildConfig.xml} contains the metadata and other information about the collection that can
223be determined automatically when building the collection, such as the number of
224documents it contains. It also includes a list of serviceRack classes that are
225required at runtime to provide the services that have been built into the
226collection. The serviceRack names are Java classes that are loaded
227dynamically at runtime. Any information inside the serviceRack element is
228specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve, GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration.
229
230
231\begin{figure}
232\begin{gsc}\begin{verbatim}
233<buildConfig xmlns:gsf="www.greenstone.org/format" >
234 <metadataList>
235 <metadata name="numDocs">11</metadata>
236 <metadata name="documentMetadata"><element name="Title"/>
237 <element name="Subject"/><element name="Organization"/>
238 <element name="URL"/></metadata>
239 </metadataList>
240 <serviceRackList>
241 <serviceRack name="GS2MGPPRetrieve">
242 <defaultLevel name="Section"/>
243 <levelList>
244 <level name="Document"/>
245 <level name="Section"/>
246 </levelList>
247 <classifierList>
248 <classifier name="CL1" content="Subject"
249 documentInterleave="true" orientation='vertical'/>
250 <classifier name="CL2" content="Title"
251 documentInterleave="false" orientation='horizontal'/>
252 <classifier name="CL4" content="Organisation"
253 documentInterleave="true" orientation='vertical'/>
254 <classifier name="CL5" content="Keyword"
255 documentInterleave="true" orientation='vertical'/>
256 </classifierList>
257 </serviceRack>
258 <serviceRack name="GS2MGPPSearch">
259 <defaultIndex name="tt"/>
260 <defaultLevel name="Section"/>
261 <levelList>
262 <level name="Document"/>
263 <level name="Section"/>
264 </levelList>
265 <indexList>
266 <index name="tt"/>
267 <index name="t0"/>
268 </indexList>
269 <fieldList>
270 <field shortname="TX" name="TextOnly"/>
271 <field shortname="SU" name="Subject"/>
272 <field shortname="TI" name="Title"/>
273 </fieldList>
274 </serviceRack>
275 <serviceRack name="PhindPhraseBrowse"/>
276 </serviceRackList>
277</buildConfig>
278\end{verbatim}\end{gsc}
279\caption{Sample buildConfig.xml file}
280\label{fig:buildconfig}
281\end{figure}
282
283
284\subsection{Start up configuration}\label{sec:startup-config}
285
286We use the Tomcat web server, which operates either stand-alone in a test mode
287or in conjunction with the Apache web server. The Greenstone LibraryServlet
288class is loaded by Tomcat and the servlet's \gst{init()} method is called. Each time a
289\gst{get/put/post} (etc.) is used, a new thread is started and
290\gst{doGet()/doPut()/doPost()} (etc.) is called.
291
292The \gst{init()} method creates a new Receptionist and a new
293MessageRouter. The appropriate system variables are set in each (interface
294name, site name, etc.) and then \gst{configure()} is called. A MessageRouter
295reference is given to the Receptionist. The servlet then communicates only with
296the Receptionist, not with the MessageRouter.
297
298The Receptionist loads up all the different Action classes. A
299static list is used initially, and other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
300
301The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. This
302lists the ServiceRack and ServiceCluster classes that need to be loaded and any sites that need
303to be connected to.
304It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
305Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
306ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
307For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If teh server for teh remote site is up and running, this should be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static lists.
308
309The MessageRouter also looks inside the site's \gst{collect} directory loads up a Collection object for each valid collection found.
310
311The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
312files, determines the metadata, and loads ServiceRack classes based on the
313names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.
314Collection objects are added to teh module map with their name as a key, and also a collection element is added into teh collectionList xml.
315
316\subsection{Run-time (re)configuration}\label{sec:runtime-config}
317
318The startup configuration reads in teh various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided.
319
320Currently there are commands to reconfigure the entire site---i.e. the MessageRouter repeats the whole of its startup initialisation.
321
322***TODO***
323whats available, whats not. show URLS, refer to system messages in next section
324
325\section{System messages}\label{sec:messages}
326
327for each type of message, show the basic elements, then some example messages.
328Lists must only have the same elements in them.
329
330Once the system is up and running (the configuration
331process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
332
333First, we look at how messages originate, and how they flow in the system. Then, we examine the basic message
334format, and look at the different types of messages.
335
336\subsection{Message flow}
337
338\subsection{Basic format}
339
340All messages are enclosed in
341\begin{quote}\begin{gsc}\begin{verbatim}
342<message>
343\end{verbatim}\end{gsc}\end{quote}
344Messages contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='xx'}.
345The language attribute is used by the XSLT to determine the language currently
346being used by the user interface. Virtually all messages contain text strings,
347and services use this attribute to return strings in the appropriate language.
348
349There are two different styles of messaging, explained in the two subsections
350below. The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system. The response is a page of data, typically in HTML. The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
351
352This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
353
354\subsection{cgi-type messages}\label{sec:cgi}
355
356Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
357Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa}
358(subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for
359the page action, and is new for other actions. For example, a text query could
360be encoded as \gst{a=q \& sa=text\/}.} All other arguments are treated as
361parameters.
362
363Here is the XML representation of the arguments:
364
365\begin{quote}\begin{gsc}\begin{verbatim}
366<request type='cgi' action='a-arg-value' subaction='sa-arg-value'
367 lang='en' output='html'>
368 <paramList>
369 <param name='xx' value='yyy'/>
370 <param name=...
371 </paramList>
372</request>
373\end{verbatim}\end{gsc}\end{quote}
374The receptionist routes the message to the appropriate action. The output
375field is used to indicate what type of output to return. The actions do not
376return responses in the normal format; instead they return a page of
377information, expressed by default in HTML. Alternative formats could be XML or WML.
378
379The LibraryServlet class communicates with the Receptionist, which is the entry
380point into the system. Future GUIs could communicate either with the
381Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
382
383The cgi arguments used currently are shown in Table~\ref{tab:args}.
384Other arguments can be specified by particular actions.. For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
385
386\begin{table}
387\center{\footnotesize
388\begin{tabular}{llll}
389\hline
390\bf Argument & \bf Meaning &\bf Typical values \\
391\hline
392a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
393& & s (system)\\
394sa & subaction & home, about (page action)\\
395c & collection or & demo, build \\
396& service cluster \\
397s & service name & TextQuery, ImportCollection \\
398rt & request type & d (display), r (request), s (status) \\
399ro & request only & 0 or 1 - if set to one, the request is carried out \\
400& & but no processing of the results is done \\
401& & currently only used in process actions \\
402o & output type & xml, html, wml \\
403l & language & en, fr, zh ...\\
404d & document id & HASHxxx \\
405r & resource id & ???\\
406pid & process handle & an integer identifying a particular process request \\
407\hline
408\end{tabular}}
409\caption{Generic arguments that can appear in a Greenstone URL}
410\label{tab:args}
411\end{table}
412
413Here is an example message that retrieves the home page in French:
414\begin{quote}\begin{gsc}\begin{verbatim}
415<message>
416 <request lang='fr' type='cgi' action='p' subaction='home'
417 output='html'/>
418</message>
419\end{verbatim}\end{gsc}\end{quote}
420
421This message represents a text query:
422\begin{quote}\begin{gsc}\begin{verbatim}
423<message>
424 <request lang='en' type='cgi' action='q' output='html'>
425 <paramList>
426 <param name='s' value='TextQuery'/>
427 <param name='c' value='demo'/>
428 <param name='rt' value='r'/>
429 <!-- the rest are the service specific params -->
430 <param name='ca' value='0'/> <!-- casefold -->
431 <param name='st' value='1'/> <!-- stem -->
432 <param name='m' value='10'/> <!-- maxdocs -->
433 <param name='q' value='snail'/> <!-- query string -->
434 </paramList>
435</message>
436\end{verbatim}\end{gsc}\end{quote}
437
438\subsubsection{Module to module messages}
439
440In Greenstone3's modular architecture messages are used extensively to pass
441information from one module to another, for example from an Action to the
442MessageRouter module, and from that module to a service module. Requests have
443a \gst{to} attribute and responses have \gst{from}. These are addresses used
444by routing modules. For example \gst{to='site1/site2/demo/TextQuery'} routes a
445message to a MessageRouter (\gst{site1}), from there to another MessageRouter
446(\gst{site2}), from there to a collection (\gst{demo}), and from there to a
447particular service (\gst{TextQuery}).
448
449Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system.
450
451\subsection{'describe'-type messages}\label{sec:describe}
452The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
453\begin{quote}\begin{gsc}\begin{verbatim}
454<message>
455 <request lang='en' type='describe' to=''/>
456</message>
457\end{verbatim}\end{gsc}\end{quote}
458If the \gst{to} field is empty, the request is answered by the first module that it is passed to.
459An example response from a MessageRouter might look like this:
460\begin{quote}\begin{gsc}\begin{verbatim}
461<message>
462 <response lang='en' type='describe'>
463 <serviceList>
464 <service name='CrossCollectionSearch' type='query' />
465 </serviceList>
466 <siteList>
467 <site name='org.greenstone.gsdl1'
468 address='http://localhost:8080/soap/servlet/rpcrouter'
469 type='soap' />
470 </siteList>
471 <collectionList>
472 <collection name='org.greenstone.gsdl1/
473 org.greenstone.gsdl2/fao' />
474 <collection name='org.greenstone.gsdl1/demo' />
475 <collection name='org.greenstone.gsdl1/fao' />
476 <collection name='myfiles' />
477 </collectionList>
478 </response>
479</message>
480\end{verbatim}\end{gsc}\end{quote}
481This MessageRouter has one site-wide service, a cross-collection searching service. It
482communicates with one site, \gst{org.greenstone.gsdl1}. It is aware of four
483collections. One of these, \gst{myfiles}, belongs to it; the other three are
484available through the external site. One of those collections is actually from
485a further external site.
486
487It is possible to ask just for a specific part of the information provided by a
488describe request, rather than the whole message. For example, these two
489messages get the \gst{collectionList} and the \gst{siteList} respectively:
490\begin{quote}\begin{gsc}\begin{verbatim}
491<message lang='en'>
492 <request type='describe' to='' info='collectionList'/>
493</message>
494
495<message lang='en'>
496 <request type='describe' to='' info='siteList'/>
497</message>
498\end{verbatim}\end{gsc}\end{quote}
499When a collection is asked to describe itself, what is returned is all of the
500collection specific metadata and a list of services. For example, here is such
501a message, along with a sample response.
502
503\begin{quote}\begin{gsc}\begin{verbatim}
504<message lang='en'>
505 <request type='describe' to='demo'/>
506</message>
507
508<message>
509 <response lang='en' type='describe' from='demo' >
510 <collection name='demo'>
511 <serviceList>
512 <service name='TextQuery' type='query' />
513 <service name='DocRetrieve' type='query' />
514 <service name='MetadataRetrieve' type='query' />
515 </serviceList>
516 <metadataList>
517 <metadata name='numDocs'>321</metadata>
518 <metadata name='numSections'>5532</metadata>
519 <metadata name='title'>The demo collection</metadata>
520 <metadata name='aboutText'>This is a demo collection.
521 </metadata>
522 </metadataList>
523 </collection>
524 </response>
525</message>
526\end{verbatim}\end{gsc}\end{quote}
527A \gst{describe} request sent to a service returns a list of parameters that
528the service accepts, and describes the content type for the request and
529response.
530
531Parameters have the following format:
532\begin{quote}\begin{gsc}\begin{verbatim}
533<param name='xxx' type='integer|boolean|string' default='yyy'/>
534<param name='xxx' type='enum_single|enum_multi' default='aa'/>
535 <option name='aa'/><option name='bb'/>...
536</param>
537<param name='xxx' type='multi' occurs='4'>
538 <param .../>
539 <param .../>
540</param>
541\end{verbatim}\end{gsc}\end{quote}
542If no default is specified, the parameter is assumed to be mandatory.
543Here are some examples of parameters:
544\begin{quote}\begin{gsc}\begin{verbatim}
545<param name='Case' type='boolean' default='0'/>
546
547<param name='MaxDocs' type='integer' default='50'/>
548
549<param name='Index' type='enum' default='dtx'>
550 <option name='dtx'/>
551 <option name='stt'/>
552 <option name='stx'/>
553<param>
554
555<!-- this one is for the text box and field list for the
556simple field query-->
557<param name='simple' type='multi' occurs='4'>
558 <param name='fqv' type='string'/>
559 <param name='fqf' type='enum_single'>
560 <option name='TI'/><option name='AU'/><option name='OR'/>
561 </param>
562</param>
563
564\end{verbatim}\end{gsc}\end{quote}
565Here is a message, along with a sample response.
566\begin{quote}\begin{gsc}\begin{verbatim}
567<message>
568 <request lang='en' type='describe' to='demo/TextQuery'/>
569</message>
570
571<message>
572 <response lang='en' type='describe' from='demo/TextQuery' >
573 <service name='TextQuery' type='query'>
574 <paramList>
575 <param name='matchDocs' type='integer' default='50/>
576 <param name='case' type='boolean' default='1'/>
577 <param name='index' type='enum' default='tt'>
578 <option name='tt'/>
579 <option name='t0'/>
580 </param>
581 </paramList>
582 </response>
583</message>
584\end{verbatim}\end{gsc}\end{quote}
585
586So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services.
587
588\subsection{'system'-type messages}
589``System'' requests are used to tell the MessageRouter or a Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
590
591So far, we have \gst{activate} and \gst{deactivate} configure requests.
592Some examples are as follows.
593\begin{quote}\begin{gsc}\begin{verbatim}
594<message><request type='configure' to=''>
595<configure action='deactivate' type='collection' name='demo'/>
596</request></message>
597
598<message><request type='configure' to=''>
599<configure action='activate' type='collection' name='demo'/>
600</request></message>
601
602<message><request type='configure' to=''>
603<configure action='activate' type='serviceRack'
604 name='TranslationServices'/>
605</request></message>
606\end{verbatim}\end{gsc}\end{quote}
607
608The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first.
609
610The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
611\begin{quote}\begin{gsc}\begin{verbatim}
612<message><response from='' type='configure'>
613 <status>demo collection activated</status>
614</response></message>
615\end{verbatim}\end{gsc}\end{quote}
616\footnote{this format not properly defined yet}
617
618Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
619
620\subsection{'process'-type messages}
621
622divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse...
623show basic structure, then more detailed format for each subtype
624
625The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet.
626
627 Other possibilities include transform, enrich, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
628
629The basic structure of a service request is as follows:
630\begin{quote}\begin{gsc}\begin{verbatim}
631<message>
632 <request lang='en' type='query' to='demo/TextQuery'>
633 <paramList/>
634 other elements...
635 </request>
636</message>
637\end{verbatim}\end{gsc}\end{quote}
638
639The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
640
641\begin{quote}\begin{gsc}\begin{verbatim}
642<param name='case' value='1'/>
643<param name='maxDocs' value='34'/>
644<param name='index' value='dtx'/>
645\end{verbatim}\end{gsc}\end{quote}
646
647Some requests have other content---for document retrieval, this would be a list of documents to retrieve. For metadata retrieval, the content is the list of documents, and a list of metadata to retrieve for each document.
648
649Responses vary depending on the type of request.
650
651\subsubsection{'query'-type services}
652Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
653
654The following shows some example query requests and their responses.
655
656Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
657\begin{quote}\begin{gsc}\begin{verbatim}
658<message>
659 <request lang='en' to="mgppdemo/TextQuery" type="process">
660 <paramList>
661 <param name="maxDocs" value="10"/>
662 <param name="queryLevel" value="Section"/>
663 <param name="stem" value="1"/>
664 <param name="matchMode" value="some"/>
665 <param name="sortBy" value="natural"/>
666 <param name="index" value="t0"/>
667 <param name="case" value="0"/>
668 <param name="query" value="snail"/>
669 </paramList>
670 </request>
671</message>
672\end{verbatim}\end{gsc}\end{quote}
673
674\begin{quote}\begin{gsc}\begin{verbatim}
675<message>
676 <response lang='en' from="mgppdemo/TextQuery" type="query">
677 <documentList>
678 <document name="HASH010f073f22033181e206d3b7"/>
679 <document name="HASH010f073f22033181e206d3b7.2"/>
680 <document name="HASHac0a04dd14571c60d7fbfd"/>
681 </documentList>
682 </response>
683</message>
684\end{verbatim}\end{gsc}\end{quote}
685
686\subsubsection{'retrieve'-type services}
687Give me the Title metadata for these documents:
688\begin{quote}\begin{gsc}\begin{verbatim}
689<message>
690 <request lang='en' to="mgppdemo/MetadataRetrieve"
691 type="retrieve">
692 <documentList>
693 <document name="HASH010f073f22033181e206d3b7"/>
694 <document name="HASH010f073f22033181e206d3b7.2"/>
695 <document name="HASHac0a04dd14571c60d7fbfd"/>
696 </documentList>
697 <metadataList>
698 <metadata name="Title"/>
699 </metadataList>
700 </content>
701 </request>
702</message>
703\end{verbatim}\end{gsc}\end{quote}
704
705\begin{quote}\begin{gsc}\begin{verbatim}
706<message>
707 <response lang='en' from="mgppdemo/MetadataRetrieve"
708 type="retrieve">
709 <content>
710 <documentList>
711 <document name="HASH010f073f22033181e206d3b7">
712 <metadataList>
713 <metadata name="Title">Farming snails 1:
714Learning about snails; Building a pen; Food and shelter plants
715 </metadata>
716 </metadataList>
717 </document>
718 <document name="HASH010f073f22033181e206d3b7.2">
719 <metadataList>
720 <metadata name="Title">Learning about snails
721 </metadata>
722 </metadataList>
723 </document>
724 <document name="HASHac0a04dd14571c60d7fbfd">
725 <metadataList>
726 <metadata name="Title">Farming snails 2:
727Choosing snails; Care and harvesting; Further improvement
728 </metadata>
729 </metadataList>
730 </document>
731 </documentList>
732 </content>
733 </response>
734</message>
735\end{verbatim}\end{gsc}\end{quote}
736
737Give me the text for this document:
738\begin{quote}\begin{gsc}\begin{verbatim}
739<message>
740 <request lang='en' to="mgppdemo/DocumentRetrieve"
741 type="retrieve">
742 <content>
743 <documentList>
744 <document name="HASH010f073f22033181e206d3b7.2"/>
745 </documentList>
746 </content>
747 </request>
748</message>
749\end{verbatim}\end{gsc}\end{quote}
750
751\begin{quote}\begin{gsc}\begin{verbatim}
752<message>
753 <response lang='en' from="mgppdemo/DocumentRetrieve"
754 type="retrieve">
755 <content>
756 <document name="HASH010f073f22033181e206d3b7.2">
757 <content>
758&lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
759&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;11. To farm snails is not hard; however,
760it is quite different from keeping chickens or ducks or from growing crops
761such as maize, rice, cassava or groundnuts.&lt;/P&gt;
762&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
763&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;12. Since farming snails is so different
764from other kinds of farming, you will have to learn a lot of new things.
765&lt;/P&gt;....
766 </content>
767 </document>
768 </content>
769 </response>
770</message>
771\end{verbatim}\end{gsc}\end{quote}
772
773\subsubsection{'browse'-type services}
774
775\subsubsection{'process'-type services}
776Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going.
777
778Build requests generally do not need a content, they just have a parameter list.\footnote{or is the collection the content?} Like any service, the parameters used by the service can be obtained by a describe request to that service.
779
780Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'):
781
782\begin{quote}\begin{gsc}\begin{verbatim}
783<message>
784 <request lang='en' type='process' to='build/NewCollection'>
785 <paramList>
786 <param name='creator' value='[email protected]'/>
787 <param name='collName' value='the demo collection'/>
788 <param name='collShortName' value='demo'/>
789 </paramlist>
790 </request>
791</message>
792
793<message>
794 <request lang='en' type='process' to='build/ImportCollection'>
795 <paramList>
796 <param name='collection' value='demo'/>
797 </paramlist>
798 </request>
799</message>
800\end{verbatim}\end{gsc}\end{quote}
801
802\subsubsection{'enrich]-type services}
803
804\subsection{'status'-type messages}
805
806
807\subsection{'format'-type messages}
808
809\subsection{'applet'-type services}
810
811\section{Page generation}\label{sec:pagegen}
812
813URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
814System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
815
816Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
817
818The basic page format is:
819\begin{quote}\begin{gsc}\begin{verbatim}
820<page>
821 <pageExtra>
822 <config/>
823 <display/>
824 </pageExtra>
825 <pageRequest/>
826 <pageResponse/>
827</page>
828\end{verbatim}\end{gsc}\end{quote}
829
830There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
831
832The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
833
834
835Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
836located in interfaces/default/transforms. Collections, sites and other interfaces
837can override these files by having their own copy of the appropriate
838files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
839interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
840***TODO*** describe a bit more??
841
842\subsection{Internationalization}
843
844Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
845
846Language specific text strings are specified in resource bundle property files. These live in resources/java.
847
848There is a properties file per class, and one per interface. At the moment, we have
849
850GS2MGPPSearch.properties
851GS2MGPPRetrieve.properties etc - the service classes
852
853interface\_default.properties. - for the default interface
854
855To add other languages, create eg GS2MGPPSearch\_fr.properties.
856
857The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there.
858xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
859
860All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
861
862\subsection{Page action}
863
864Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is
865transformed using \gst{home.xsl}. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata
866and a list of services, and the result is transformed using \gst{about.xsl}.
867
868
869\subsection{Query action}
870
871There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
872For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is
873displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers
874is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
875documents, with a request for their \gst{Title} metadata. The service description and query result are combined into a page of xml, which is
876transformed using \gst{basicquery.xsl} to produce the html page.
877
878\subsection{Applet action}
879
880There are two types of request to the applet action: \gst{a=a \& sa=d\/} and
881\gst{a=a \& sa=r\/}. The value \gst{sa=d\/} means ``display the applet.'' A
882\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this
883into the page, and the servlet returns the HTML.
884
885The value \gst{sa=r} signals a request from the applet. The result is returned
886directly to the applet code, in XML. The other parameters are sent to the
887service untransformed, and the result is passed directly back to the applet.
888Applet action can therefore work with any applet whose service understands the
889messages.
890
891Here are two examples of requests generated by the Applet action, along with their corresponding responses.
892
893The first request corresponds to the URL arguments \gst{a=a \&
894sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
895applet for the mgppdemo collection''.
896
897\begin{quote}\begin{gsc}\begin{verbatim}
898<message>
899 <request type='describe' to='mgppdemo/PhindApplet'/>
900</message>
901
902<message>
903 <response type='describe'>
904 <service name='PhindApplet' type='query'>
905 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
906 jaxp.jar, xml-apis.jar'
907 CODE='org.greenstone.applet.phind.Phind.class'
908 CODEBASE='lib/java'
909 HEIGHT='400' WIDTH='500'>
910 <PARAM NAME='library' VALUE=''/>
911 <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
912 <PARAM NAME='collection' VALUE='mgppdemo' />
913 <PARAM NAME='classifier' VALUE='1' />
914 <PARAM NAME='orientation' VALUE='vertical' />
915 <PARAM NAME='depth' VALUE='2' />
916 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
917 <PARAM NAME='backdrop' VALUE='interfaces/default/
918 images/phindbg1.jpg'/>
919 <PARAM NAME='fontsize' VALUE='10' />
920 <PARAM NAME='blocksize' VALUE='10' />
921 The Phind java applet.
922 </applet>
923 </service>
924 </response>
925</message>
926\end{verbatim}\end{gsc}\end{quote}
927
928The second request corresponds to the arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
929indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the
930request as parameters. The response is in a form suitable for the applet, placed inside
931\gst{<appletData>} in a standard Greenstone message. AppletAction returns the
932contents of appletData to the browser, i.e. to the applet itself.
933
934\begin{quote}\begin{gsc}\begin{verbatim}
935<message>
936 <request type='query' to='mgppdemo/PhindApplet'>
937 <paramList>
938 <param name='pc' value='1'/>
939 <param name='pptext' value='health'/>
940 <param name='pfe' value='0'/>
941 <param name='ple' value='10'/>
942 <param name='pfd' value='0'/>
943 <param name='pld' value='10'/>
944 <param name='pfl' value='0'/>
945 <param name='pll' value='10'/>
946 </paramList>
947 </request>
948</message>
949
950<message>
951 <response type='query' from='mgppdemo/PhindApplet'>
952 <appletData>
953 <phindData df='9' ef='46' id='933' lf='15' tf='296'>
954 <expansionList end='10' length='46' start='0'>
955 <expansion df='4' id='8880' num='0' tf='59'>
956 <suffix> CARE</suffix>
957 </expansion>
958 ...
959 </expansionList>
960 <documentList end='10' length='9' start='0'>
961 <document freq='78' hash='HASH4632a8a51d33c47a75c559' num='0'>
962 <title>The Courier - N??159 - Sept- Oct 1996 Dossier Investing
963 in People Country Reports: Mali ; Western Samoa
964 </title>
965 </document>
966 ...
967 </documentList>
968 <thesaurusList end='10' length='15' start='0'>
969 <thesaurus df='7' id='12387' tf='15' type='RT'>
970 <phrase>PUBLIC HEALTH</phrase>
971 </thesaurus>...
972 </thesaurusList>
973 </phindData>
974 </appletData>
975 </response>
976</message>
977\end{verbatim}\end{gsc}\end{quote}
978
979Note that the applet HTML may need to know the name of the \gst{library}
980program. However, that name is chosen by the person who installed the software
981and will not necessarily be ``library''. To get around this, the applet can
982put a parameter called ``library'' into the applet data with a null value:
983\begin{quote}\begin{gsc}\begin{verbatim}
984<PARAM NAME='library' VALUE=''/>\/}
985\end{verbatim}\end{gsc}\end{quote}
986When the Applet action encounters this parameter it inserts the name of the
987current library servlet as its value.
988
989\subsection{Document action}
990
991DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and
992table of contents would be needed to make the display nicer.
993
994
995
996\section{Collection formation}
997
998
999Greenstone 2 compatible building has been implemented in gsdl3.
1000
1001Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory.
1002
1003You need to carry out the following services:
1004NewCollection
1005- add docs to import directory
1006ImportCollection
1007BuildCollection
1008ActivateCollection
1009
1010If you want anything other than the default for the config file, you need to add it by hand - there is currently no ConfigureCollection service which would enable you to do this.
1011
1012Collection building can also be done on the command line:
1013
1014\gst{ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>}
1015
1016eg
1017
1018\gst{ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol}
1019
1020the options get passed to the underlying script, - there is no good help message yet.
1021
1022import and build use gs2 import.pl and buildcol.pl so you can specify any of their options if you like.
1023
1024Building stuff is in src/java/org/greenstone/gsdl3/build.
1025
1026CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
1027
1028\subsection{Collection design}\label{sec:colldesign}
1029
1030\section{Installation details}
1031
1032This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS.
1033
1034\subsection{Directory structure}
1035
1036The first part of Table~\ref{tab:dirs} shows the common stuff which can be shared between
1037Greenstone users---the src, libraries etc. These will eventually be installed into appropriate system directories. The second part shows
1038stuff used by one person/group---their sites and interface setup
1039etc. There can be several sites/interfaces per installation.
1040
1041\begin{table}
1042\caption{The Greenstone directory structure}
1043\label{tab:dirs}
1044\center{\footnotesize
1045\begin{tabular}{l p{7cm}}
1046\hline
1047gsdl3
1048 & The main installation directory---gsdl3home can be changed to something more standard\\
1049gsdl3/src
1050 & Source code lives here \\
1051gsdl3/src/java/org/greenstone/gsdl3
1052 & Contains the top level classes that either have main programs, or are server/servlet classes\\
1053gsdl3/src/java/org/greenstone/gsdl3/core
1054 & ModuleInterface, MessageRouter, Receptionist---the central classes that the others hang off\\
1055gsdl3/src/java/org/greenstone/gsdl3/service
1056 & The various service modules---these things do the work\\
1057gsdl3/src/java/org/greenstone/gsdl3/util
1058 & Utility classes \\
1059gsdl3/src/java/org/greenstone/gsdl3/collection
1060 & ServiceCluster and Collection classes\\
1061gsdl3/src/java/org/greenstone/gsdl3/comms
1062 & Communicator classes, eg SOAP\\
1063gsdl3/src/java/org/greenstone/gsdl3/build
1064 & stuff for collection building \\
1065gsdl3/src/java/org/greenstone/gsdl3/action
1066 & Action classes used by the Receptionist---do the work of displaying the pages\\
1067gsdl3/src/java/org/greenstone/gsdl3/classes
1068 & On compilation, the Java classes get put here---they can then be combined into a single jar file, and copied to the java lib directory \\
1069gsdl3/src/java/org/greenstone/gdbm
1070 & Java wrapper for gdbm---uses j-gdbm, a jni gdbm wrapper\\
1071gsdl3/src/java/org/greenstone/testing
1072 & Junit scaffolding for unit testing.\\
1073gsdl3/src/java/org/greenstone/applet
1074 & where the code for applets goes \\
1075gsdl3/src/java/org/greenstone/applet/phind
1076 & the phind applet (phrase browsing) \\
1077gsdl3/src/cpp/
1078 & Place for any cpp source code---none yet \\
1079gsdl3/packages
1080 & Imported packages from other systems eg mg, mgpp \\
1081gsdl3/lib
1082 & Shared library files\\
1083gsdl3/lib/java
1084 & Java jar files\\
1085gsdl3/resources
1086 & any resources that may be needed\\
1087gsdl3/resources/java
1088 & properties files for java resource bundles - used to handle all the language specific text This directory is on the classpath, so any other Java resources can be placed here \\
1089gsdl3/resources/soap
1090 & soap service description files \\
1091gsdl3/bin
1092 & executable stuff lives here\\
1093gsdl3/bin/script
1094 & some perl building scripts\\
1095gsdl3/bin/linux
1096 & linux executables for eg mgpp\\
1097gsdl3/comms
1098 & Put some stuff here for want of a better place---things to do with servers and communication. eg soap stuff, and tomcat servlet container\\
1099gsdl3/docs
1100 & Documentation :-)\\
1101\hline
1102gsdl3/web
1103 & This is where the web site is defined. Any static html files can go here. This directory is the Tomcat root directory.\\
1104gsdl3/web/WEB-INF
1105 & The web.xml file lives here (servlet configuration information for tomcat)\\
1106gsdl3/web/WEB-INF/classes
1107 & Servlet classes go in here\\
1108gsdl3/web/sites
1109 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (eg soap) to other sites\\
1110gsdl3/web/sites/localsite
1111 & One site - the site configuration file lives here\\
1112gsdl3/web/sites/localsite/collect
1113 & The collections directory \\
1114gsdl3/web/sites/localsite/images
1115 & Site specific images \\
1116gsdl3/web/sites/localsite/transforms
1117 & Site specific transforms \\
1118gsdl3/web/interfaces
1119 & Contains directories for different interfaces - an interface is defined by its images and xslt files \\
1120gsdl3/web/interfaces/default
1121 & The default interface\\
1122gsdl3/web/interfaces/default/images
1123 & The images for the default interface\\
1124gsdl3/web/interfaces/default/transforms
1125 & The XSLT files for the default interface\\
1126\hline
1127\end{tabular}}
1128\end{table}
1129
1130\subsection{Installation guide}
1131
1132\newcommand{\gsdlhome}{\$GSDL3HOME}
1133\newcommand{\gshome}{\$GSDLHOME}
1134
1135Cuurently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}.
1136
1137\subsubsection{Get the source}
1138
1139If you have a greenstone\_cvs account, you can use the following:
1140
1141\begin{quote}\begin{gsc}\begin{verbatim}
1142export CVS_RSH=ssh
1143cvs -d :ext:@cvs.scms.waikato.ac.nz:/usr/local/global-cvs/
1144 gsdl-src co gsdl3
1145\end{verbatim}\end{gsc}\end{quote}
1146
1147Otherwise, you can get it through anonymous access:
1148
1149\begin{quote}\begin{gsc}\begin{verbatim}
1150cvs -d :pserver:cvs\[email protected]:2402/usr/local/
1151 global-cvs/gsdl-src co gsdl3
1152\end{verbatim}\end{gsc}\end{quote}
1153
1154If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some versions of CVS have trouble accessing this repository. We are using version 1.11.1p1.
1155
1156\subsubsection{Compile and install greenstone}\label{subsec:compile}
1157
1158An install.sh script has been constructed to compile and install Greenstone3. What you need to do is:
1159
1160\begin{quote}\begin{gsc}
1161cd gsdl3\\
1162source setup.bash\\
1163install.bash\\
1164source setup.bash\\
1165\end{gsc}\end{quote}
1166
1167If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for tomcat.
1168
1169\noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.
1170
1171If you want to use SOAP to talk to remote sites, you also need to do the following:
1172
1173\begin{quote}\begin{gsc}
1174install-soap.bash
1175\end{gsc}\end{quote}
1176
1177There is one java command that sometimes doesn't work under bash, so you may need to cut and paste it into the terminal to get it to work. See the output from the bash-script for details.
1178
1179To shutdown or startup tomcat, the commands are:
1180\begin{quote}\begin{gsc}
1181\gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\
1182\gsdlhome/comms/tomcat/jakarta/bin/startup.sh\\
1183\end{gsc}\end{quote}
1184
1185You dont want to run install.bash twice - it adds stuff into files.
1186To update your installation, you can run update.bash - this updates your code form cvs, and remakes all the java stuff.
1187
1188
1189\subsubsection{The sample sites}
1190
1191\noindent There are two greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
1192localsite does not connect to any other sites. soapsite specifies a SOAP connection to localsite.
1193
1194\subsubsection{Tomcat}
1195
1196\noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
1197
1198The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
1199There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite.
1200
1201The initialisation parameters used by the library servlets are as follows:
1202
1203\begin{tabular}{lll}
1204\bf name & \bf sample value & \bf description \\
1205\hline
1206gsdl3home & /research/kjdon/gsdl3 & the base directory of the gsdl3 installation \\
1207sitename & localsite & the site to use \\
1208interfacename & default & the interface to use\\
1209libraryname & library & the name of the library program \\
1210defaultlang & en & the default language for the interface\\
1211receptionist & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\
1212messagerouter & NewMessageRouter & (optional) specifies an alternative MessageRouter to use\\
1213\hline
1214\end{tabular}
1215
1216It is possible to run several servlets at once, with different combinations of sites and/or interfaces.
1217
1218The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the tomcat configuration file. The installation process adds a context for greenstone3 servlets (\gst{\gsdlhome/web})---this tells tomcat where to find the web.xml file, and what url (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
1219
1220
1221Tomcat runs by default on port 8080---this can be changed in server.xml. The siteConfig files also need changing if Tomcat's port is changed: \gst{<httpAddress>} for the site, and \gst{<address>} for a remote site both use this.
1222
1223
1224\subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}
1225
1226\noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
1227
1228\begin{gsc}\begin{tt}
1229\noindent cd \gsdlhome/comms/tomcat/jakarta/bin\\
1230./startup.sh
1231\end{tt}\end{gsc}
1232
1233\noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)
1234\\
1235\\
1236\noindent The tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page.
1237The greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
1238
1239\noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
1240\begin{bulletedlist}
1241\begin{gsc}
1242\item \gsdlhome/web/WEB-INF/web.xml
1243\item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml
1244\end{gsc}
1245\item any classes or jar files used by the servlets
1246\end{bulletedlist}
1247\noindent Note: stdin and stdout for the servlets both go to\\
1248\gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out}
1249
1250On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting tomcat.
1251
1252\subsubsection{Using SOAP to talk to a remote site}
1253
1254\noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. soapsite specifies a SOAP connection to localsite. If you run soapsite without connecting to localsite, you don't get any collections. However, if you connect to localsite, you can see all of {\em its} collections.
1255\\
1256\\
1257\noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service.
1258This is done by install-soap.bash.
1259You can also deploy a service through the website. If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
1260
1261\noindent The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
1262
1263\noindent To deploy the SOAPServer for localsite:
1264
1265\noindent Click on ``deploy'' and edit the following fields in the deploy form:
1266
1267\begin{tabular}{ll}
1268ID: & org.greenstone.localsite\\
1269Scope: (any will do) & Request---new instantiation for each request\\
1270 & Session---same instantiation across a session\\
1271 & Application---only uses one instantiation\\
1272Methods: &process\\
1273Java Provider / Provider Class: & org.greenstone.gsdl3.SOAPServer\\
1274\end{tabular}
1275
1276\noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the lefthand ``List'' button.
1277
1278\noindent Information about deployed services is maintained between tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
1279
1280\subsubsection{Debugging SOAP}
1281
1282If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, use a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
1283To run it, type:
1284
1285\begin{quote}\gst{java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080}
1286\end{quote}
1287
12888070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify Greenstone to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
1289\begin{quote}\begin{gsc}\begin{verbatim}
1290<site name="org.greenstone.localsite"
1291 address="http://localhost:8080/soap/servlet/rpcrouter"
1292 type="soap"/>
1293\end{verbatim}\end{gsc}\end{quote}
1294
1295Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the
1296address for talking to the tomcat SOAP servlet services.
1297
1298\section{Developer's notes}
1299
1300Here are some random notes for developers who want to modify the source code.
1301\subsection{Greenstone utility classes}
1302
1303These are found in \gst{gsdl3/src/java/org/greenstone/gsdl3/util} and provide a variety of useful functions. Table~\ref{tab:utils} gives a brief description of the various classes.
1304
1305\begin{table}
1306\caption{The utility classes in org.greenstone.gsdl3.util}
1307\label{tab:utils}
1308\center{\footnotesize
1309\begin{tabular}{lp{3.75in}}
1310\hline
1311\bf Utility class & \bf Description\\
1312ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\
1313Dictionary & wrapper around a ResourceBundle, providing strings with parameter\\
1314GSCGI & class to map between short name cgi args and long name request parameters \\
1315GSFile & class to create all greenstone file paths eg used to locate configuration files, xslt files and collection data. \\
1316GSHTML & provides convenience methods for dealing with HTML, eg making strings HTML safe\\
1317GSPath & used to create, examine and modify message address paths\\
1318GSStatus & some static codes for status messages\\
1319GSXML & lots of methods for extracting information out of greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by greenstone.\\
1320GSXSLT & some manipulation functions for greenstone XSLT\\
1321Misc & miscellaneous functions\\
1322OID & class to handle greenstone (2) OIDs\\
1323XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
1324XMLTransformer & methods to transform XML using XSLT \\
1325XSLTUtil & contains static methods to be called from within XSLT \\
1326\hline
1327\end{tabular}
1328}
1329\end{table}
1330
1331\subsection{Creating new services}
1332
1333a browse type service must also implement servicenameMetadataRetrieve service.
1334\subsection{Working with XML}
1335
1336We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but dont append them to something already in the document tree, they will be separate - but they still know who their owner document is.
1337
1338To create new Documents, and convert Strings or Files to Documents, use XMLConverter.
1339eg:
1340\begin{quote}\begin{gsc}
1341XMLConverter converter = new XMLConverter();\\
1342Document doc = converter.newDOM();\\
1343
1344File stylesheet = new File(``query.xsl'');\\
1345Document style = converter.getDOM(stylesheet);\\
1346
1347String message = ``<message><request type='cgi'/></message>'';\\
1348Document m = converter.getDOM(message);\\
1349\end{gsc}\end{quote}
1350
1351To output a document as a String, use \gst{converter.getString(doc);}
1352
1353To add nodes and stuff to an empty document - create them, then append to the tree:
1354\begin{quote}\begin{gsc}
1355Document doc = converter.newDOM();\\
1356Element e = doc.createElement(``message'');\\
1357doc.appendChild(e);\\
1358\end{gsc}\end{quote}
1359
1360Note that you can only append one node to a document---this will become the toplevel node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node.
1361
1362Nodes can only be created by a Document. Document has creation methods for all types of Nodes, for example \gst{createElement(element\_name)}, \gst{createAttribute(attr\_name)}, \gst{createTextNode(text\_data)} etc.
1363
1364DOM006 Hierarchy request error: happens if you have more than one root node in your document
1365
1366\subsection{Greenstone XML}
1367
1368Greenstone format namespace: (at the moment)
1369xmlns:gsf="http://www.greenstone.org/configformat"
1370
1371
1372no DTDs or Schema defined yet. Until there are, try and keep to teh following rules:
1373
1374\begin{bulletedlist}
1375
1376\item always return expected elements even if empty, eg \gst{<paramList/>}.
1377
1378\item If you get the whole documetn it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documetns, they are \gst{<documentNode>}s.
1379
1380\item insiode a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it.
1381
1382\end{bulletedlist}
1383\subsection{Working with XSLT}
1384
1385\begin{bulletedlist}
1386\item {\em adding html to an xml doc:}
1387
1388eg I have a text node with html inside it inside a resource element
1389to add that to a new XML doc, I use
1390\gst{<xsl:value-of select='resource'>}
1391
1392if the output mode is xml or html, this will escape any special characters
1393ie $<$ and $>$ etc
1394
1395use
1396\gst{<xsl:value-of disable-output-escaping="yes" select='resource'>}
1397instead.
1398
1399\item {\em including an xml doc into a stylesheet:}
1400
1401\gst{<xsl:variable name='import' select='document(``newdoc.xml'')'/>}
1402
1403then can use the info:
1404
1405\gst{<xsl:value-of select='\$import/element'/>}
1406
1407\item {\em selecting an ancestor:}
1408
1409 the ancestor axis contains the parent of the context node, and its
1410 parent and so on. to pick one node among these:
1411 ancestor::elem-name. I dont know how this works if there are two
1412 nodes with the same name in the axis.
1413
1414\item {\em basic XSLT elements:}
1415\begin{quote}\begin{footnotesize}\begin{verbatim}
1416<xsl:template match='xxx' name='yyy'/>
1417
1418<xsl:apply-templates select='xxx'/>
1419<xsl:call-templates name='yyy'/>
1420
1421<xsl:variable name='doc' select='document("layout.xml")'/>
1422
1423<xsl:value-of select='$doc/chapter1'/> $
1424\end{verbatim}\end{footnotesize}\end{quote}
1425
1426\item {\em using namespaces:}
1427If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names dont match. This includes http:// on the front.
1428
1429\item I dont think \gst{<xsl:with-param name='xxx' select='true'/>} is
1430the same as \gst{<xsl:with-param name='xxx'>true</xsl:with-param>}.
1431Use the second one.
1432
1433\item to select a node from a list based on an attribute value: for example
1434\begin{quote}\begin{footnotesize}\begin{verbatim}
1435<xsl:variable name='name'>CL1</xsl:variable>
1436
1437<xsl:value-of select="classifier[@name=\$name]/@content"/>
1438\end{verbatim}\end{footnotesize}\end{quote}
1439
1440
1441\end{bulletedlist}
1442\subsubsection{What can I do to speed up XSL transformations?}
1443
1444This information taken from the Xalan FAQS page.
1445
1446\begin{bulletedlist}
1447
1448\item Use a Templates object (with a different Transformers for each
1449transformation) to perform multiple transformations with the same set
1450of stylesheet instructions.
1451
1452\item Set up your stylesheets to function efficiently.
1453
1454\item Don't use "//" (descendant axes) patterns near the root of a
1455large document.
1456
1457\item Use xsl:key elements and the key() function as an efficient way
1458to retrieve node sets.
1459
1460\item Where possible, use pattern matching rather than xsl:if or
1461xsl:when statements.
1462
1463\item xsl:for-each is fast because it does not require pattern matching.
1464
1465\item Keep in mind that xsl:sort prevents incremental processing.
1466
1467\item When you create variables,\\
1468\gst{<xsl:variable name="fooElem" select="foo"/>} is usually faster
1469than \\
1470\gst{<xsl:variable name="fooElem"><xsl:value-of-select="foo"/></xsl:variable>}.
1471
1472\item Be careful using the last() function.
1473
1474\item The use of index predicates within match patterns can be expensive.
1475
1476\item Decoding and encoding is expensive.
1477
1478\item For the ultimate in server-side scalability, perform transform
1479operations on the client.
1480
1481\end{bulletedlist}
1482
1483\subsection{Java gdbm}
1484
1485To talk to gdbm, a jni wrapper called java-gdbm is used. It was
1486obtained from:\\ \gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/}
1487
1488It uses packing objects to convert to and from an array of bytes (in
1489gdbm file) from and to java objects. In my GDBMWrapper class I use
1490StringPacking - uses UTF-8 encoding. but some stuff came out funny. so
1491I had to changes the from\_bytes method in StringPacking.java to use
1492new String(raw, "UTF-8") instead of new String(raw). this seems to
1493work.
1494
1495Note---if we use this gdbm stuff to create the file too, may need to
1496alter the to-bytes method.
1497
1498The makefile in j-gdbm is crap---it tries to get stuff from its
1499original CVS tree. I have created a new Makefile---in my-j-gdbm
1500directory. this stuff needs to go into cvs probably.
1501
1502
1503
1504\subsection{Resources}
1505
1506This is a list of some useful resources that we have come across during development of gsdl3.
1507
1508Contents for 'The Java Native Interface Programmer's Guide and
1509Specification' on-line\\
1510\gst{http://java.sun.com/docs/books/jni/html/jniTOC.html}
1511
1512Java Native Interface Specification\\
1513\gst{http://java.sun.com/j2se/1.4/docs/guide/jni/spec/jniTOC.doc.html}
1514
1515JNI Documentation Contents\\
1516\gst{http://java.sun.com/j2se/1.4/docs/guide/jni/index.html}
1517
1518another JNI page\\
1519\gst{http://mindprod.com/jni.html}
1520
1521Java 1.4 api index\\
1522\gst{http://java.sun.com/j2se/1.4/docs/api/index.html}
1523
1524Java tutorial index\\
1525\gst{http://java.sun.com/docs/books/tutorial/index.html}
1526
1527Safari books online - has java, XML, XSLT, etc books\\
1528\gst{http://proquest.safaribooksonline.com/mainhom.asp?home}
1529
1530Java 1.4 i18n FAQ\\
1531\gst{http://www.sun.com/developers/gadc/faq/java/java1.4.html}
1532
1533Java and XSLT page\\
1534\gst{http://www.javaolympus.com/java/Java\%20and\%20XSLT.html}
1535
1536Xalan-Java overview\\
1537\gst{http://xml.apache.org/xalan-j/overview.html}
1538
1539Tomcat documentation index\\
1540\gst{http://jakarta.apache.org/tomcat/tomcat-4.0-doc/index.html}
1541
1542Servlet and JSP tutorial\\
1543\gst{http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/}
1544
1545Core Servlets and JavaServer Pages, book by Marty Hall. download the
1546pdf from here (try before you buy link)\\
1547\gst{http://www.coreservlets.com/}
1548
1549J-gdbm page\\
1550\gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/}
1551
1552Stuarts page of links\\
1553\gst{http://www.cs.waikato.ac.nz/~nzdl/gsdl3/}
1554
1555a good basic xslt tutorial\\
1556\gst{http://www.zvon.org/xxl/XSLTutorial/Books/Output/contents.html}
1557
1558JAXP (java api for xml processing) package overview\\
1559\gst{http://java.sun.com/xml/jaxp/dist/1.1/docs/api/overview-summary.html}
1560
1561DeveloperWorks, xml zone\\
1562\gst{http://www-106.ibm.com/developerworks/xml/}
1563
1564xslt.com\\
1565\gst{http://www.xslt.com/}
1566
1567jeni tennison's xslt pages\\
1568\gst{http://www.jenitennison.com/xslt/}
1569
1570apaches xml tools\\
1571\gst{http://xml.apache.org/}
1572
1573
1574%\clearpage
1575%\addcontentsline{toc}{chapter}{Bibliography}
1576%\bibliography{main}
1577
1578\end{document}
1579
1580
1581
Note: See TracBrowser for help on using the repository browser.