source: trunk/gsdl3/docs/manual/manual.tex@ 3711

Last change on this file since 3711 was 3711, checked in by kjdon, 21 years ago

some changes, still needs more work but I've run out of time.

  • Property svn:keywords set to Author Date Id Revision
File size: 56.0 KB
Line 
1\documentclass[a4paper,11pt]{article}
2\usepackage{times,epsfig}
3\hyphenation{Message-Router Text-Query}
4
5\begin{document}
6
7\title{A modular digital library:\\
8 Architecture and implementation of Greenstone3}
9
10% if you work on this manual, add your name here
11\author{Katherine Don and Ian H. Witten \\[1ex]
12 Department of Computer Science \\
13 University of Waikato \\ Hamilton, New Zealand \\
14 \{kjdon, ihw\}@cs.waikato.ac.nz}
15
16\date{}
17
18\maketitle
19
20\newenvironment{bulletedlist}%
21{\begin{list}{$\bullet$}{\setlength{\itemsep}{0pt}\setlength{\parsep}{0pt}}}%
22{\end{list}}
23
24
25\noindent
26Greenstone Digital Library Version 3 is a complete redesign and
27reimplementation of the Greenstone digital library software. The current
28version (Greenstone2) enjoys considerable success and is being widely used.
29Greenstone3 will capitalize on this success, and in addition it will
30\begin{bulletedlist}
31\item improve flexibility, modularity, and extensibility
32\item lower the bar for ``getting into'' the Greenstone code with a view to
33 understanding and extending it
34\item use XML where possible internally to improve the amount of
35 self-documentation
36\item make full use of existing XML-related standards and software
37\item provide improved internationalization, particularly in terms of sort order,
38 information browsing, etc.
39\item include new features that facilitate additional ``content management''
40 operations
41\item operate on a scale ranging from personal desktop to corporate library
42\item easily permit the incorporation of text mining operations
43\item use Java, to encourage multilinguality, X-compatibility, and to permit
44 easier inclusion of existing Java code (such as for text mining).
45\end{bulletedlist}
46Parts of Greenstone will remain in other languages (e.g. MG, MGPP); JNI (Java
47Native Interface) will be used to communicate with these.
48
49
50\section{Architecture}
51
52This section is covered by the paper: An agent based architecture for dynamic digital library construction and configuration. Either cut and paste it in here, or link to the text?? or have two separate docs. dont want to have to maintain two separate versions of the same thing.
53
54\section{Greenstone Implementation}
55\label{sec:impl}
56
57\subsection{Configuring Greenstone}
58\label{subsec:config}
59
60Greenstone3 involves several different kinds of configuration files, all
61expressed in XML. Each site has a configuration file that binds parameters for
62the site, {\em siteConfig.xml}. Each collection has two configuration files, {\em collectionConfig.xml} and {\em buildConfig.xml\/}, that give metadata for the
63collection.\footnote{These replace {\em collect.cfg} and {\em build.cfg} in
64Greenstone2.} The first includes user-defined metadata for the collection,
65such as its name and the {\em About this collection} text; and also gives
66instructions on how the collection is to be built. The second is produced by
67the build-time process and includes any metadata that can be determined
68automatically.\footnote{Currently only the buildConfig.xml file is used - collections are built using gs2 style building and therefore use the old collect.cfg.}
69
70\subsubsection{Site configuration file}
71
72The file {\em siteConfig.xml} specifies the URI for the site ({\em
73localSiteName\/}), any services or service clusters provided by the site that are not connected
74with a particular collection (for example, translation services, or collection building), and a list of
75known external sites to connect to. Collections are not specified in the site
76configuration file, instead they are determined by the contents of the site's
77collections directory.
78
79Here is a configuration file for a rudimentary site with no site-wide services,
80which does not connect to any external sites.\footnote{should the code be tolerant of missing elements? or do we require empty elements?}
81\begin{quote}\begin{footnotesize}\begin{verbatim}
82<config>
83 <localSiteName value="org.greenstone.localsite"/>
84 <serviceClusterList/>
85 <serviceRackList/>
86 <siteList/>
87</config>
88\end{verbatim}\end{footnotesize}\end{quote}
89The following configuration file is for a site with one site-wide service cluster - a collection building cluster. It also connects to the previous site using SOAP.
90\begin{quote}\begin{footnotesize}\begin{verbatim}
91<config>
92 <localSiteName value="org.greenstone.gsdl1"/>
93 <serviceRackList/>
94 <servicesImpl name="TranslationServices"/>
95 </servicesImplList>
96 <serviceClusterList>
97 <serviceCluster name="build">
98 <metadataList>
99 <metadata name="Title">Collection builder</metadata>
100 <metadata name="Description">Builds collections in a gsdl2-style manner</metadata>
101 </metadataList>
102 <serviceRackList>
103 <serviceRack name="GS2Construct"/>
104 </serviceRackList>
105 </serviceCluster>
106 </serviceClusterList>
107 <siteList>
108 <site name="org.greenstone.localsite"
109 address="http://localhost:8080/soap/servlet/rpcrouter"
110 type="soap"/>
111 </siteList>
112</config>
113\end{verbatim}\end{footnotesize}\end{quote}
114
115These two sites are running on the same machine. For site1 to talk to localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is "http://localhost:8080/soap/servlet/rpcrouter"
116
117\subsubsection{Building configuration file}
118
119The file {\em buildConfig.xml} contains all metadata and other information about the collection that can
120be determined automatically when building the collection, such as the number of
121documents it contains. It also includes a list of serviceRack classes that are
122required at runtime to provide the services that have been built into the
123collection. The serviceRack names are Java classes that are loaded
124dynamically at runtime. Any information inside the serviceRack element is
125specific to that service---there is no set format. Here is an example:
126
127\begin{quote}\begin{footnotesize}\begin{verbatim}
128
129<buildConfig>
130 <metadataList>
131 <metadata name="numDocs">11</metadata>
132 <metadata name="colIcon">mgppdemo.gif</metadata>
133 <metadata name="colName">Greenstone demo collection</metadata>
134 <metadata name="colDescription">This is a demonstration collection for the Greenstone digital library software. It contains a small subset of the Humanitarian and Development Libraries.</metadata>
135 </metadataList>
136 <serviceRackList>
137 <serviceRack name="GS2MGPPRetrieve">
138 <defaultLevel name="Section"/>
139 <!-- something list this should be used to advertise what metadata the collection has available to be retrieved - however, it is not used yet -->
140 <metadataList>
141 <element name="Title"/><element name="Subject"/><element name="Organization"/><element name="URL"/>
142 </metadataList>
143 </serviceRack>
144 <serviceRack name="GS2MGPPSearch">
145 <defaultIndex name="tt"/>
146 <defaultLevel name="Section"/>
147 <levelList>
148 <level name="Document"/>
149 <level name="Section"/>
150 </levelList>
151 <indexList>
152 <index name="tt"/>
153 <index name="t0"/>
154 </indexList>
155 <fieldList>
156 <field name="TX"/><field name="SU"/><field name="TI"/>
157 </fieldList>
158 </serviceRack>
159 <serviceRack name="PhindPhraseBrowse"/>
160 <serviceRack name="GS2Browse">
161 <classifierList>
162 <classifier name="CL1"><metadataList><metadata name="Title">Subject</metadata></metadataList></classifier>
163 <classifier name="CL2" ><metadataList><metadata name="Title">Title</metadata></metadataList></classifier>
164 <classifier name="CL4"><metadataList><metadata name="Title">Organization</metadata></metadataList></classifier>
165 <classifier name="CL5" ><metadataList><metadata name="Title">Keyword</metadata></metadataList></classifier>
166 </classifierList>
167 </serviceRack>
168 </serviceRackList>
169</buildConfig>
170\end{verbatim}\end{footnotesize}\end{quote}
171Note: because {\em collectionConfig.xml} is not used yet, the {\em colIcon}, {\em colDescription}
172and {\em colName} metadata elements have been specified here.
173
174\subsubsection{Collection configuration file}
175
176The format of {\em collectionConfig.xml} has not yet been defined.
177
178\subsubsection{Starting up}
179
180We use the Tomcat web server, which operates either stand-alone in a test mode
181or in conjunction with the Apache web server. The Greenstone LibraryServlet
182class is loaded by Tomcat and the servlet's {\em init()} method is called. Each time a
183{\em get\/}/{\em put\/}/{\em post} (etc.) is used, a new thread is started and
184{\em doGet()\/}/{\em doPut()\/}/{\em doPost()} (etc.) is called.
185
186The {\em init()} method creates a new Receptionist and a new instance of the
187MessageRouter. The appropriate system variables are set in each (interface
188name, site name, etc.) and then {\em configure()} is called. A MessageRouter
189reference is given to the Receptionist. The servlet then communicates only with
190the Receptionist, not with the MessageRouter.
191
192The Receptionist loads up all the different Action classes. A
193static list is used initially, and other Actions may be loaded on the fly as needed.
194
195The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This
196lists the ServiceRack classes that need to be loaded, and lists any sites that need
197to be connected to. It looks inside the {\em collect} directory which contains
198all the site's collections and loads up a Collection object for each valid
199collection found.
200
201The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml}
202files, determines the metadata, and loads ServiceRack classes based on the
203names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServiceRack>#} XML element is passed to the object to be used in configuration.
204
205\section{System messages}
206
207Once the system is up and running (the configuration
208process described in Section~\ref{subsec:config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
209 First, we examine the basic message
210formats, then how the system creates and responds to the messages.
211
212All messages are enclosed in
213\begin{quote}\begin{footnotesize}\begin{verbatim}
214<message>
215\end{verbatim}\end{footnotesize}\end{quote}
216Messages contain either {\em <request>\/} or {\em <response>\/} elements--- a single message may contain multiple requests. Each {\em <request>\/} (and {\em <response>\/}?) has a language attribute, of the form ``lang='xx'''.
217The language attribute is used by the XSLT to determine the language currently
218being used by the user interface. Virtually all messages contain text strings,
219and services use this attribute to return strings in the appropriate language.
220
221There are two different styles of messaging, explained in the two subsections
222below. The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system. The response is a page of data, typically in HTML. The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
223
224This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
225
226\subsubsection{Servlet to Receptionist messages}\label{subsec:url-type}
227
228Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
229Greenstone URL. The two main arguments are {\em a} (action) and {\em sa}
230(subaction).\footnote{The {\em sa} replaces Greenstone's old {\em p} arg for
231the page action, and is new for other actions. For example, a text query could
232be encoded as {\em a=q \& sa=text\/}.} All other arguments are treated as
233parameters.
234
235Here is the XML representation of the arguments:
236
237\begin{quote}\begin{footnotesize}\begin{verbatim}
238<request type='cgi' action='a-arg-value' subaction='sa-arg-value'
239 lang='en' output='html'>
240 <paramList>
241 <param name='xx' value=''yyy'/>
242 <param name=...
243 </paramList>
244</request>
245\end{verbatim}\end{footnotesize}\end{quote}
246The receptionist routes the message to the appropriate action. The output
247field is used to indicate what type of output to return. The actions do not
248return responses in the normal format; instead they return a page of
249information, expressed by default in HTML. Alternative formats could be XML or WML.
250
251The LibraryServlet class communicates with the Receptionist, which is the entry
252point into the system. Future GUIs could communicate either with the
253Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
254
255The cgi arguments used currently are shown in Table~\ref{tab:args}.
256Other arguments can be specified by particular actions.. For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
257
258\begin{table}
259\center{\footnotesize
260\begin{tabular}{llll}
261\hline
262\bf Argument & \bf Meaning &\bf Typical values \\
263\hline
264a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
265sa & subaction & home, about (page action)\\
266c & collection or service cluster & demo, build \\
267s & service name & TextQuery, ImportCollection \\
268rt & request type & d (display), r (request), s (status) \\
269ro & request only & 0 or 1 - if set to one, the request is carried out but no processing of the results is done \\
270o & output type & xml, html, wml \\
271l & language & en, fr, zh \\
272d & document id & HASHxxx \\
273r & resource id & ???\\
274id & process handle & an integer identifying a particular process request \\
275\hline
276\end{tabular}}
277\label{tab:args}
278\caption{Generic rguments that can appear in a Greenstone URL}
279\end{table}
280
281Here is an example message that retrieves the home page in French:
282\begin{quote}\begin{footnotesize}\begin{verbatim}
283<message>
284 <request lang='fr' type='cgi' action='p' subaction='home' output='html'/>
285</message>
286\end{verbatim}\end{footnotesize}\end{quote}
287
288This message represents a text query:
289\begin{quote}\begin{footnotesize}\begin{verbatim}
290<message>
291 <request lang='en' type='cgi' action='q' output='html'>
292 <paramList>
293 <param name='s' value='TextQuery'/>
294 <param name='c' value='demo'/>
295 <param name='rt' value='r'/>
296 <!-- the rest are the service specific params -->
297 <param name='ca' value='0'/> <!-- casefold -->
298 <param name='st' value='1'/> <!-- stem -->
299 <param name='m' value='10'/> <!-- maxdocs -->
300 <param name='q' value='snail'/> <!-- query string -->
301 </paramList>
302</message>
303\end{verbatim}\end{footnotesize}\end{quote}
304
305**** UP TO HERE **************
306\subsubsection{Module to module messages}
307
308In Greenstone3's modular architecture messages are used extensively to pass
309information from one module to another, for example from an Action to the
310MessageRouter module, and from that module to a service module. Requests have
311a {\em to} attribute and responses have {\em from\/}. These are addresses used
312by routing modules. For example {\em to='site1/site2/demo/TextQuery'} routes a
313message to a MessageRouter ({\em site1\/}), from there to another MessageRouter
314({\em site2\/}), from there to a collection ({\em demo\/}), and from there to a
315particular service ({\em TextQuery\/}).
316
317Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system.
318
319The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
320\begin{quote}\begin{footnotesize}\begin{verbatim}
321<message>
322 <request lang='en' type='describe' to=''/>
323</message>
324\end{verbatim}\end{footnotesize}\end{quote}
325If the {\em to} field is empty, the request is answered by the first module that it is passed to.
326An example response from a MessageRouter might look like this:
327\begin{quote}\begin{footnotesize}\begin{verbatim}
328<message>
329 <response lang='en' type='describe'>
330 <serviceList>
331 <service name='CrossCollectionSearch' type='query' />
332 </serviceList>
333 <siteList>
334 <site name='org.greenstone.gsdl1'
335 address='http://localhost:8080/soap/servlet/rpcrouter'
336 type='soap' />
337 </siteList>
338 <collectionList>
339 <collection name='org.greenstone.gsdl1/
340 org.greenstone.gsdl2/fao' />
341 <collection name='org.greenstone.gsdl1/demo' />
342 <collection name='org.greenstone.gsdl1/fao' />
343 <collection name='myfiles' />
344 </collectionList>
345 </response>
346</message>
347\end{verbatim}\end{footnotesize}\end{quote}
348This MessageRouter has one site-wide service, a cross-collection searching service. It
349communicates with one site, {\em org.greenstone.gsdl1\/}. It is aware of four
350collections. One of these, {\em myfiles\/}, belongs to it; the other three are
351available through the external site. One of those collections is actually from
352a further external site.
353
354It is possible to ask just for a specific part of the information provided by a
355describe request, rather than the whole message. For example, these two
356messages get the {\em collectionList} and the {\em siteList} respectively:
357\begin{quote}\begin{footnotesize}\begin{verbatim}
358<message lang='en'>
359 <request type='describe' to='' info='collectionList'/>
360</message>
361
362<message lang='en'>
363 <request type='describe' to='' info='siteList'/>
364</message>
365\end{verbatim}\end{footnotesize}\end{quote}
366When a collection is asked to describe itself, what is returned is all of the
367collection specific metadata and a list of services. For example, here is such
368a message, along with a sample response.
369
370\begin{quote}\begin{footnotesize}\begin{verbatim}
371<message lang='en'>
372 <request type='describe' to='demo'/>
373</message>
374
375<message>
376 <response lang='en' type='describe' from='demo' >
377 <collection name='demo'>
378 <serviceList>
379 <service name='TextQuery' type='query' />
380 <service name='DocRetrieve' type='query' />
381 <service name='MetadataRetrieve' type='query' />
382 </serviceList>
383 <metadataList>
384 <metadata name='numDocs'>321</metadata>
385 <metadata name='numSections'>5532</metadata>
386 <metadata name='title'>The demo collection</metadata>
387 <metadata name='aboutText'>This is a demo collection.</metadata>
388 </metadataList>
389 </collection>
390 </response>
391</message>
392\end{verbatim}\end{footnotesize}\end{quote}
393A {\em describe} request sent to a service returns a list of parameters that
394the service accepts, and describes the content type for the request and
395response.
396
397Parameters have the following format:
398\begin{quote}\begin{footnotesize}\begin{verbatim}
399<param name='xxx' type='integer|boolean|string' default='yyy'/>
400<param name='xxx' type='enum_single|enum_multi' default='aa'/>
401 <option name='aa'/><option name='bb'/>...
402</param>
403<param name='xxx' type='multi' occurs='4'>
404 <param .../>
405 <param .../>
406</param>
407\end{verbatim}\end{footnotesize}\end{quote}
408If no default is specified, the parameter is assumed to be mandatory.
409Here are some examples of parameters:
410\begin{quote}\begin{footnotesize}\begin{verbatim}
411<param name='Case' type='boolean' default='0'/>
412
413<param name='MaxDocs' type='integer' default='50'/>
414
415<param name='Index' type='enum' default='dtx'>
416 <option name='dtx'/>
417 <option name='stt'/>
418 <option name='stx'/>
419<param>
420
421<!-- this one is for the text box and field list for the simple field query-->
422<param name='simple' type='multi' occurs='4'>
423 <param name='fqv' type='string'/>
424 <param name='fqf' type='enum_single'>
425 <option name='TI'/><option name='AU'/><option name='OR'/>
426 </param>
427</param>
428
429\end{verbatim}\end{footnotesize}\end{quote}
430Here is a message, along with a sample response.
431\begin{quote}\begin{footnotesize}\begin{verbatim}
432<message>
433 <request lang='en' type='describe' to='demo/TextQuery'/>
434</message>
435
436<message>
437 <response lang='en' type='describe' from='demo/TextQuery' >
438 <service name='TextQuery' type='query'>
439 <paramList>
440 <param name='matchDocs' type='integer' default='50/>
441 <param name='case' type='boolean' default='1'/>
442 <param name='index' type='enum' default='tt'>
443 <option name='tt'/>
444 <option name='t0'/>
445 </param>
446 </paramList>
447 </response>
448</message>
449\end{verbatim}\end{footnotesize}\end{quote}
450
451So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services.
452
453``Configure'' requests are used to tell the MessageRouter to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
454
455So far, we have {\em activate} and {\em deactivate} configure requests.
456Some examples are as follows.
457\begin{quote}\begin{footnotesize}\begin{verbatim}
458<message><request type='configure' to=''>
459<configure action='deactivate' type='collection' name='demo'/>
460</request></message>
461
462<message><request type='configure' to=''>
463<configure action='activate' type='collection' name='demo'/>
464</request></message>
465
466<message><request type='configure' to=''>
467<configure action='activate' type='serviceRack'
468 name='TranslationServices'/>
469</request></message>
470\end{verbatim}\end{footnotesize}\end{quote}
471
472The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first.
473
474The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
475\begin{quote}\begin{footnotesize}\begin{verbatim}
476<message><response from='' type='configure'>
477 <status>demo collection activated</status>
478</response></message>
479\end{verbatim}\end{footnotesize}\end{quote}
480\footnote{this format not properly defined yet}
481
482Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
483
484The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet.
485
486 Other possibilities include transform, enrich, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
487
488The basic structure of a service request is as follows:
489\begin{quote}\begin{footnotesize}\begin{verbatim}
490<message>
491 <request lang='en' type='query' to='demo/TextQuery'>
492 <paramList/>
493 <content/>
494 </request>
495</message>
496\end{verbatim}\end{footnotesize}\end{quote}
497
498The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
499
500\begin{quote}\begin{footnotesize}\begin{verbatim}
501<param name='case' value='1'/>
502<param name='maxDocs' value='34'/>
503<param name='index' value='dtx'/>
504\end{verbatim}\end{footnotesize}\end{quote}
505
506Some requests have a content---for document retrieval, the content is the list of documents to retrieve. For metadata retrieval, teh content is the list of documents, and a list of metadata to retrieve for each document.
507
508Responses vary depending on the type of request.
509Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
510
511The following shows some example query requests and their responses.
512
513Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
514\begin{quote}\begin{footnotesize}\begin{verbatim}
515<message>
516 <request lang='en' to="mgppdemo/TextQuery" type="query">
517 <paramList>
518 <param name="maxDocs" value="10"/>
519 <param name="queryLevel" value="Section"/>
520 <param name="stem" value="1"/>
521 <param name="matchMode" value="some"/>
522 <param name="sortBy" value="natural"/>
523 <param name="index" value="t0"/>
524 <param name="case" value="0"/>
525 </paramList>
526 <content>snail</content>
527 </request>
528</message>
529\end{verbatim}\end{footnotesize}\end{quote}
530
531\begin{quote}\begin{footnotesize}\begin{verbatim}
532<message>
533 <response lang='en' from="mgppdemo/TextQuery" type="query">
534 <content>
535 <documentList>
536 <document name="HASH010f073f22033181e206d3b7"/>
537 <document name="HASH010f073f22033181e206d3b7.2"/>
538 <document name="HASHac0a04dd14571c60d7fbfd"/>
539 </documentList>
540 </content>
541 </response>
542</message>
543\end{verbatim}\end{footnotesize}\end{quote}
544
545Give me the Title metadata for these documents:
546\begin{quote}\begin{footnotesize}\begin{verbatim}
547<message>
548 <request lang='en' to="mgppdemo/MetadataRetrieve" type="retrieve">
549 <content>
550 <documentList>
551 <document name="HASH010f073f22033181e206d3b7"/>
552 <document name="HASH010f073f22033181e206d3b7.2"/>
553 <document name="HASHac0a04dd14571c60d7fbfd"/>
554 </documentList>
555 <metadataList>
556 <metadata name="Title"/>
557 </metadataList>
558 </content>
559 </request>
560</message>
561\end{verbatim}\end{footnotesize}\end{quote}
562
563\begin{quote}\begin{footnotesize}\begin{verbatim}
564<message>
565 <response lang='en' from="mgppdemo/MetadataRetrieve" type="retrieve">
566 <content>
567 <documentList>
568 <document name="HASH010f073f22033181e206d3b7">
569 <metadataList>
570 <metadata name="Title">Farming snails 1:
571Learning about snails; Building a pen; Food and shelter plants
572 </metadata>
573 </metadataList>
574 </document>
575 <document name="HASH010f073f22033181e206d3b7.2">
576 <metadataList>
577 <metadata name="Title">Learning about snails</metadata>
578 </metadataList>
579 </document>
580 <document name="HASHac0a04dd14571c60d7fbfd">
581 <metadataList>
582 <metadata name="Title">Farming snails 2:
583Choosing snails; Care and harvesting; Further improvement
584 </metadata>
585 </metadataList>
586 </document>
587 </documentList>
588 </content>
589 </response>
590</message>
591\end{verbatim}\end{footnotesize}\end{quote}
592
593Give me the text for this document:
594\begin{quote}\begin{footnotesize}\begin{verbatim}
595<message>
596 <request lang='en' to="mgppdemo/DocumentRetrieve" type="retrieve">
597 <content>
598 <documentList>
599 <document name="HASH010f073f22033181e206d3b7.2"/>
600 </documentList>
601 </content>
602 </request>
603</message>
604\end{verbatim}\end{footnotesize}\end{quote}
605
606\begin{quote}\begin{footnotesize}\begin{verbatim}
607<message>
608 <response lang='en' from="mgppdemo/DocumentRetrieve" type="retrieve">
609 <content>
610 <document name="HASH010f073f22033181e206d3b7.2">
611 <content>
612&lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
613&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;11. To farm snails is not hard; however,
614it is quite different from keeping chickens or ducks or from growing crops
615such as maize, rice, cassava or groundnuts.&lt;/P&gt;
616&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
617&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;12. Since farming snails is so different
618from other kinds of farming, you will have to learn a lot of new things.
619&lt;/P&gt;....
620 </content>
621 </document>
622 </content>
623 </response>
624</message>
625\end{verbatim}\end{footnotesize}\end{quote}
626
627Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going.
628
629Build requests generally do not need a content, they just have a parameter list.\footnote{or is the collection the content?} Like any service, the parameters used by the service can be obtained by a describe request to that service.
630
631Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'):
632
633\begin{quote}\begin{footnotesize}\begin{verbatim}
634<message>
635 <request lang='en' type='process' to='build/NewCollection'>
636 <paramList>
637 <param name='creator' value='[email protected]'/>
638 <param name='collName' value='the demo collection'/>
639 <param name='collShortName' value='demo'/>
640 </paramlist>
641 </request>
642</message>
643
644<message>
645 <request lang='en' type='process' to='build/ImportCollection'>
646 <paramList>
647 <param name='collection' value='demo'/>
648 </paramlist>
649 </request>
650</message>
651\end{verbatim}\end{footnotesize}\end{quote}
652
653
654\subsection{Generating the pages}
655
656URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{subsec:url-type}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
657System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
658
659Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
660
661The basic page format is:
662\begin{quote}\begin{footnotesize}\begin{verbatim}
663<page>
664 <config/>
665 <display/>
666 <request/>
667 <response/>
668</page>
669\end{verbatim}\end{footnotesize}\end{quote}
670
671There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
672
673The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and display information, and the xslt files.
674
675\subsubsection{Page action}
676
677Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is
678transformed using {\em home.xsl\/}. For the 'about' page, a {\em
679describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata
680and a list of services, and the result is transformed using {\em about.xsl\/}.
681
682\subsubsection{Query action}
683
684There are three query services which have been implemented: TextQuery, SimpleFieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
685For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is
686displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers
687is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
688documents, with a request for their {\em Title} metadata. The service description and query result are combined into a page of xml, which is
689transformed using {\em basicquery.xsl\/} to produce the html page.
690
691\subsubsection{Applet action}
692
693There are two types of request to the applet action: {\em a=a \& sa=d\/} and
694{\em a=a \& sa=r\/}. The value {\em sa=d\/} means ``display the applet.'' A
695{\em describe} request is sent to the service, which returns the {\footnotesize \verb#<applet>#} HTML element. The transformation file {\em applet.xsl} embeds this
696into the page, and the servlet returns the HTML.
697
698The value {\em sa=r} signals a request from the applet. The result is returned
699directly to the applet code, in XML. The other parameters are sent to the
700service untransformed, and the result is passed directly back to the applet.
701Applet action can therefore work with any applet whose service understands the
702messages.
703
704Here are two examples of requests generated by the Applet action, along with their corresponding responses.
705
706The first request corresponds to the URL arguments {\em a=a \&
707sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
708applet for the mgppdemo collection''.
709
710\begin{quote}\begin{footnotesize}\begin{verbatim}
711<message>
712 <request type='describe' to='mgppdemo/PhindApplet'/>
713</message>
714
715<message>
716 <response type='describe'>
717 <service name='PhindApplet' type='query'>
718 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
719 jaxp.jar, xml-apis.jar'
720 CODE='org.greenstone.applet.phind.Phind.class'
721 CODEBASE='lib/java'
722 HEIGHT='400' WIDTH='500'>
723 <PARAM NAME='library' VALUE=''/>
724 <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
725 <PARAM NAME='collection' VALUE='mgppdemo' />
726 <PARAM NAME='classifier' VALUE='1' />
727 <PARAM NAME='orientation' VALUE='vertical' />
728 <PARAM NAME='depth' VALUE='2' />
729 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
730 <PARAM NAME='backdrop' VALUE='interfaces/default/
731 images/phindbg1.jpg'/>
732 <PARAM NAME='fontsize' VALUE='10' />
733 <PARAM NAME='blocksize' VALUE='10' />
734 The Phind java applet.
735 </applet>
736 </service>
737 </response>
738</message>
739\end{verbatim}\end{footnotesize}\end{quote}
740
741The second request corresponds to the arguments {\em a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
742indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the
743request as parameters. The response is in a form suitable for the applet, placed inside
744{\footnotesize \verb#<appletData>#} in a standard Greenstone message. AppletAction returns the
745contents of appletData to the browser, i.e. to the applet itself.
746
747\begin{quote}\begin{footnotesize}\begin{verbatim}
748<message>
749 <request type='query' to='mgppdemo/PhindApplet'>
750 <paramList>
751 <param name='pc' value='1'/>
752 <param name='pptext' value='health'/>
753 <param name='pfe' value='0'/>
754 <param name='ple' value='10'/>
755 <param name='pfd' value='0'/>
756 <param name='pld' value='10'/>
757 <param name='pfl' value='0'/>
758 <param name='pll' value='10'/>
759 </paramList>
760 </request>
761</message>
762
763<message>
764 <response type='query' from='mgppdemo/PhindApplet'>
765 <appletData>
766 <phindData df='9' ef='46' id='933' lf='15' tf='296'>
767 <expansionList end='10' length='46' start='0'>
768 <expansion df='4' id='8880' num='0' tf='59'>
769 <suffix> CARE</suffix>
770 </expansion>
771 ...
772 </expansionList>
773 <documentList end='10' length='9' start='0'>
774 <document freq='78' hash='HASH4632a8a51d33c47a75c559' num='0'>
775 <title>The Courier - N??159 - Sept- Oct 1996 Dossier Investing
776 in People Country Reports: Mali ; Western Samoa
777 </title>
778 </document>
779 ...
780 </documentList>
781 <thesaurusList end='10' length='15' start='0'>
782 <thesaurus df='7' id='12387' tf='15' type='RT'>
783 <phrase>PUBLIC HEALTH</phrase>
784 </thesaurus>...
785 </thesaurusList>
786 </phindData>
787 </appletData>
788 </response>
789</message>
790\end{verbatim}\end{footnotesize}\end{quote}
791
792Note that the applet HTML may need to know the name of the {\em library}
793program. However, that name is chosen by the person who installed the software
794and will not necessarily be ``library''. To get around this, the applet can
795put a parameter called ``library'' into the applet data with a null value:
796\begin{quote}\begin{footnotesize}\begin{verbatim}
797<PARAM NAME='library' VALUE=''/>\/}
798\end{verbatim}\end{footnotesize}\end{quote}
799When the Applet action encounters this parameter it inserts the name of the
800current library servlet as its value.
801
802\subsubsection{Document action}
803
804DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and
805table of contents would be needed to make the display nicer.
806
807\subsubsection{Formatting the page using XSLT}\label{subsec:xslt}
808
809Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
810located in interfaces/default/transforms. Collections, sites and other interfaces
811can override these files by having their own copy of the appropriate
812files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
813interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
814
815\subsection{Internationalization}
816
817Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
818
819Language specific text strings are specified in resource bundle property files. These live in resources/java.
820
821There is a properties file per class, and one per interface. At the moment, we have
822
823GS2MGPPSearch.properties
824GS2MGPPRetrieve.properties etc - the service classes
825
826interface_default.properties. - for the default interface
827
828To add other languages, create eg GS2MGPPSearch_fr.properties.
829
830The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml $<$display$>$ element - the xslt can get the ones it needs from there.
831xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
832
833All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
834
835\subsection{Collection formation}
836
837Greenstone 2 compatible building has been implemented in gsdl3. so far only mgpp collections will work.
838
839Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory.
840
841You need to carry out the following services:
842NewCollection
843- add docs to import directory
844ImportCollection
845BuildCollection
846ActivateCollection
847
848If you want anything other than the default for the config file, you need to add it by hand - there is currently no ConfigureCollection service which would enable you to do this.
849
850Collection building can also be done on the command line:
851
852ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>
853
854eg
855
856ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol
857
858the options get passed to the underlying script, - there is no good help message yet.
859
860import and build use gs2 import.pl and buildcol.pl so you can specify any of their options if you like.
861
862Building stuff is in src/java/org/greenstone/gsdl3/build.
863
864CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
865
866\section{Details}
867
868This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS.
869
870\subsection{Directory structure}
871
872The first part of Table~\ref{tab:dirs} shows the common stuff which can be shared between
873Greenstone users---the src, libraries etc. These will eventually be installed into appropriate system directories. The second part shows
874stuff used by one person/group---their sites and interface setup
875etc. There can be several sites/interfaces per installation.
876
877\begin{table}
878\center{\footnotesize
879\begin{tabular}{l p{7cm}}
880\hline
881gsdl3
882 & The main installation directory---gsdl3home can be changed to something more standard\\
883gsdl3/src
884 & Source code lives here \\
885gsdl3/src/java/org/greenstone/gsdl3
886 & Contains the top level classes that either have main programs, or are server/servlet classes\\
887gsdl3/src/java/org/greenstone/gsdl3/core
888 & ModuleInterface, MessageRouter, Receptionist---the central classes that the others hang off\\
889gsdl3/src/java/org/greenstone/gsdl3/service
890 & The various service modules---these things do the work\\
891gsdl3/src/java/org/greenstone/gsdl3/util
892 & Utility classes \\
893gsdl3/src/java/org/greenstone/gsdl3/collection
894 & ServiceCluster and Collection classes\\
895gsdl3/src/java/org/greenstone/gsdl3/comms
896 & Communicator classes, eg SOAP\\
897gsdl3/src/java/org/greenstone/gsdl3/build
898 & stuff for collection building \\
899gsdl3/src/java/org/greenstone/gsdl3/action
900 & Action classes used by the Receptionist---do the work of displaying the pages\\
901gsdl3/src/java/org/greenstone/gsdl3/classes
902 & On compilation, the Java classes get put here---they can then be combined into a single jar file, and copied to the java lib directory \\
903gsdl3/src/java/org/greenstone/gdbm
904 & Java wrapper for gdbm---uses j-gdbm, a jni gdbm wrapper\\
905gsdl3/src/java/org/greenstone/testing
906 & Junit scaffolding for unit testing.\\
907gsdl3/src/cpp/
908 & Place for any cpp source code---none yet \\
909gsdl3/packages
910 & Imported packages from other systems eg mg, mgpp \\
911gsdl3/lib
912 & Shared library files\\
913gsdl3/lib/java
914 & Java jar files\\
915gsdl3/resources
916 & any resources that may be needed\\
917gsdl3/resources/java
918 & properties files for java resource bundles - used to handle all the language specific text\\
919gsdl3/bin
920 & executable stuff lives here\\
921gsdl3/bin/script
922 & some perl building scripts\\
923gsdl3/bin/linux
924 & linux executables for eg mgpp\\
925gsdl3/comms
926 & Put some stuff here for want of a better place---things to do with servers and communication. eg soap stuff, and tomcat servlet container\\
927gsdl3/docs
928 & Documentation :-)\\
929gsdl3/web
930 & The place to put any web stuff that the servlet needs. html files go here\\
931gsdl3/web/WEB-INF
932 & The web.xml file lives here (configuration information for tomcat)\\
933gsdl3/web/WEB-INF/classes
934 & Servlet classes go in here\\
935\hline
936gsdl3/sites
937 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (eg soap) to other sites\\
938gsdl3/sites/localsite
939 & One site\\
940gsdl3/sites/localsite/collect
941 & The collections directory \\
942gsdl3/sites/localsite/images
943 & Site specific images \\
944gsdl3/sites/localsite/transforms
945 & Site specific transforms \\
946gsdl3/interfaces
947 & Contains all interface specific stuff (eg images and XSLT transforms\\
948gsdl3/interfaces/default
949 & The default interface\\
950gsdl3/interfaces/default/images
951 & The images\\
952gsdl3/interfaces/default/transforms
953 & The XSLT files\\
954\hline
955\end{tabular}}
956\label{tab:dirs}
957\caption{The Greenstone directory structure}
958\end{table}
959
960\subsection{Installation guide}
961
962\newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}}
963
964Cuurently, greenstone3 is only available through CVS. The installation procedure has been automated.
965
966\subsubsection{Get the source}
967
968\noindent If you have a greenstone\_cvs account, you can use the following:
969
970\begin{footnotesize}\begin{tt}
971\noindent export CVSROOT=:ext:{\em your-username}@cvs.scms.waikato.ac.nz:\\
972\indent /usr/local/global-cvs/gsdl-src\\
973export CVS\_RSH=ssh\\
974cvs co gsdl3\\
975\end{tt}\end{footnotesize}
976
977\noindent Otherwise, you can get it through anonymous access:
978
979\begin{footnotesize}\begin{tt}
980\noindent export CVSROOT=:pserver:cvs\[email protected]:2402\\
981\indent /usr/local/global-cvs/gsdl-src\\
982export CVS\_RSH=ssh\\
983cvs co gsdl3\\
984\end{tt}\end{footnotesize}
985
986\noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}.
987
988\subsubsection{Compile and install greenstone}\label{subsec:compile}
989
990An install.sh script has been constructed (thanks, Stuart) to compile and install greenstone 3. What you nee to do is:
991
992\begin{footnotesize}\begin{tt}
993cd gsdl3
994source setup.bash
995install.bash
996source setup.bash
997\end{tt}\end{footnotesize}
998
999If you want to do greenstone2 compatible building (currently the only type) you need to have greenstone 2 installed, 'source setup.bash' in the top level greenstone 2 directory, then re-'source setup.bash' for greenstone 3. This is to set GSDLHOME for tomcat.
1000
1001\noindent Note: 'source setup.bash' needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.
1002
1003If you want to use SOAP to talk to remote sites, you also need to do the following:
1004
1005\begin{footnotesize}\begin{tt}
1006install-soap.bash
1007\end{tt}\end{footnotesize}
1008
1009Thats it.
1010
1011You dont want to run install.bash twice - it adds stuff into files
1012
1013To update your installation, you can run update.bash - this remakes all the java stuff.
1014
1015
1016\subsubsection{The sample sites}
1017
1018\noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
1019localsite does not connect to any other sites. site1 specifies a SOAP connection to localsite.
1020
1021\noindent The collections which do not have data can be looked at but you cant do any queries on them.
1022
1023
1024\subsubsection{Tomcat}
1025
1026\noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
1027\\
1028\\
1029\noindent The file \begin{footnotesize}{\tt \gsdlhome/web/WEB-INF/web.xml}\end{footnotesize} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
1030There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, ``library'', which serves localsite, and ``library1'' which serves site1.
1031\\
1032\\
1033\noindent One initialisation parameter for the library servlets is {\footnotesize \verb#gsdl3home#}.
1034\begin{footnotesize}\begin{verbatim}
1035<init-param>
1036 <param-name>gsdl3home</param-name>
1037 <param-value>/research/kjdon/home/gsdl3</param-value>
1038</init-param>
1039\end{verbatim}\end{footnotesize}
1040
1041The file \gsdlhome/comms/tomcat/jakarta/conf/server.xml is the tomcat configuration file. setup.bash adds a context for gsdl servlets - this tells tomcat where to find the web.xml file, and what url (eg /gsdl3) to give it.
1042
1043\noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file
1044
1045\subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}
1046
1047\noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
1048
1049\begin{footnotesize}\begin{tt}
1050\noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/bin\\
1051./startup.sh
1052\end{tt}\end{footnotesize}
1053
1054\noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)
1055\\
1056\\
1057\noindent The tomcat server can be accessed on the web at {\footnotesize \verb#http://localhost:8080#}---this gets you to a welcome page.
1058The greenstone stuff is at {\footnotesize \verb#http://localhost:8080/gsdl3#}---this displays {\footnotesize \gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
1059
1060\noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
1061\begin{bulletedlist}
1062\begin{footnotesize}\begin{tt}
1063\item \gsdlhome/web/WEB-INF/web.xml
1064\item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml
1065\end{tt}\end{footnotesize}
1066\item any classes or jar files used by the servlets
1067\end{bulletedlist}
1068\noindent Note: stdin and stdout for the servlets both go to\\
1069\begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/logs/catalina.out}\end{footnotesize}
1070
1071\subsubsection{Using SOAP to talk to a remote site}
1072
1073\noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. site1 specifies a SOAP connection to localsite. If you run site1 without connecting to localsite, you can only see the local collections, eg the dummy collection myfiles. However, if you connect to localsite, you can see all of {\em its} collections as well.
1074\\
1075\\
1076\noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service.
1077
1078this is done by install-soap.bash.
1079You can also deploy a service through the website. If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
1080
1081\noindent The SOAP servlet can be accessed at \begin{footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
1082
1083\noindent To deploy the SOAPServer for localsite:
1084
1085\noindent Click on ``deploy'' and edit the following fields in the deploy form:
1086
1087\begin{tabular}{ll}
1088ID: & org.greenstone.localsite\\
1089Scope: (any will do) & Request---new instantiation for each request\\
1090 & Session---same instantiation across a session\\
1091 & Application---only uses one instantiation\\
1092Methods: &process\\
1093Java Provider / Provider Class: & org.greenstone.gsdl3.SOAPServer\\
1094\end{tabular}
1095
1096\noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the lefthand ``List'' button.
1097
1098\noindent Information about deployed services is maintained between tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
1099
1100\subsubsection{Debugging SOAP}
1101
1102\noindent If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, there is a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
1103
1104\noindent To run it:
1105
1106\noindent {\footnotesize \verb#java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080#}
1107
1108\noindent tomcat uses port 8080 - you need to modify greenstone to talk to port 8070 instead of 8080. - this is specified in the {\footnotesize \verb#site#} element of the site configuration file.
1109\\
1110\\
1111\noindent eg, in \begin{footnotesize}{\tt \gsdlhome/sites/site1/siteConfig.xml}\end{footnotesize}:
1112\begin{footnotesize}\begin{verbatim}
1113<site name="org.greenstone.localsite"
1114 address="http://localhost:8080/soap/servlet/rpcrouter"
1115 type="soap"/>
1116\end{verbatim}\end{footnotesize}
1117
1118\noindent You can replace the 8080 with 8070 if you want to run TcpTunnelGui.
1119
1120\noindent Note that \begin{footnotesize}{\tt http://localhost:8080/soap/servlet/rpcrouter}\end{footnotesize} is the
1121address for talking to the tomcat SOAP servlet services.
1122
1123
1124%\clearpage
1125%\addcontentsline{toc}{chapter}{Bibliography}
1126%\bibliography{main}
1127
1128\end{document}
1129
1130
1131
Note: See TracBrowser for help on using the repository browser.