source: trunk/gsdl3/docs/manual/manual.tex@ 3557

Last change on this file since 3557 was 3557, checked in by kjdon, 21 years ago

added docs to cvs

  • Property svn:keywords set to Author Date Id Revision
File size: 78.2 KB
Line 
1\documentclass[a4paper,11pt]{article}
2\usepackage{times,epsfig}
3\hyphenation{Message-Router Text-Query}
4
5\begin{document}
6
7\title{A modular digital library:\\
8 Architecture and implementation of Greenstone3}
9
10% if you work on this manual, add your name here
11\author{Katherine Don and Ian H. Witten \\[1ex]
12 Department of Computer Science \\
13 University of Waikato \\ Hamilton, New Zealand \\
14 \{kjdon, ihw\}@cs.waikato.ac.nz}
15
16\date{}
17
18\maketitle
19
20\newenvironment{bulletedlist}%
21{\begin{list}{$\bullet$}{\setlength{\itemsep}{0pt}\setlength{\parsep}{0pt}}}%
22{\end{list}}
23
24\noindent
25{\em \tiny This is intended to turn into a multipurpose document that
26\begin{bulletedlist}
27\item forms the basis of a JCDL paper submission
28\item fulfills our NERF pledge to produce a ``design document for Greenstone3''
29by December 2002 ...
30\item ... and a ``definition of internal and external interfaces for all major
31components (including API for external clients)'' by July 2003
32\item turns into a proper manual for Greenstone3
33\end{bulletedlist}
34}
35
36\noindent
37Greenstone Digital Library Version 3 is a complete redesign and
38reimplementation of the Greenstone digital library software. The current
39version (Greenstone2) enjoys considerable success and is being widely used.
40Greenstone3 will capitalize on this success, and in addition it will
41\begin{bulletedlist}
42\item improve flexibility, modularity, and extensibility
43\item lower the bar for ``getting into'' the Greenstone code with a view to
44 understanding and extending it
45\item use XML where possible internally to improve the amount of
46 self-documentation
47\item make full use of existing XML-related standards and software
48\item provide improved internationalization, particularly in terms of sort order,
49 information browsing, etc.
50\item include new features that facilitate additional ``content management''
51 operations
52\item operate on a scale ranging from personal desktop to corporate library
53\item easily permit the incorporation of text mining operations
54\item use Java, to encourage multilinguality, X-compatibility, and to permit
55 easier inclusion of existing Java code (such as for text mining).
56\end{bulletedlist}
57Parts of Greenstone will remain in other languages (e.g. MG, MGPP); JNI (Java
58Native Interface) will be used to communicate with these.
59
60\section{Architecture}
61
62A typical basic Greenstone3 digital library system is made up of a ``back
63end,'' which we call a digital library {\em site\/}, coupled to a ``front end''
64that provides the user interface. Figure 1 shows a simple stand-alone digital library with a web-based front end which communicates with a single site. In this simple example, the entire system is compiled together as a single executable. The point of contact with the back end is the MessageRouter (MR) module---all communication with the site occurs through this module.
65
66The digital library back end in Figure 1 contains two collections, {\em demo}
67and {\em myfiles\/}, and a cluster of collection-formation services. All
68functions of the digital library are called ``services.'' For example,
69AddDocument is a service that adds a document to a collection; ImportCollection
70imports into the Greenstone system all documents associated with a collection,
71converting them as necessary from their original form; BuildCollection builds
72all indexes and browsing structures that are associated with collection;
73ActivateCollection makes a newly-built collection active, so that it can be
74seen by digital library users. These particular services are related: they are
75all concerned with creating a digital library collection. Related
76services may be grouped together into a ``service cluster'': all these services are provided by
77the CollectionFormation ServiceCluster module in Figure 1.
78
79A collection, which as far as the digital library user is concerned is a
80focused group of documents with a uniform means of access, is a type of service
81cluster that groups a set of services that are related by the set of data they
82work on. For example, the {\em demo} collection in Figure 1 contains four
83services. These provide text searching, metadata searching,
84document retrieval, and browsing services to the user.
85
86The Web-based front end in Figure 1 centers around the
87Receptionist, which is the point of contact for the interface generator. A
88servlet takes HTTP commands (in the form of URLs and arguments) and translates
89them into XML form for the Receptionist. This is capable of executing various
90different Actions, each of which involve one or (usually) many calls to the
91digital library's MessageRouter.
92
93Figure 1 shows a very simple example of a digital library structure.
94In practice, there may be many digital library sites, possibly involving
95distributed computers. Each site will have a structure similar to that of the back end in Figure
961. Different sites may know about each other and can gain access to each other's
97collections by forwarding requests. There may also be different user
98interfaces to the library. Figure 1 shows a simple web-based interface, but
99other interfaces may exist, ranging from applets that display documents in
100different ways to alert services that note when new information becomes
101available in one of the collections and formulate email to users. Although in
102the simplest case the front and back ends are compiled together into
103one executable process, in general different MessageRouters will communicate
104amongst themselves, and with Receptionists, using a protocol.
105
106The following subsections elaborate on this structure.
107
108\subsection{Modular structure}
109
110Greenstone3 is made up of independent modules that communicate via a single
111method call:
112\begin{quote}
113 XMLout = process(XMLin);
114\end{quote}
115Both input and output are expressed in XML. This decision shifts attention
116from the design of an Applications Programming Interface (API) to the design of XML
117forms that encode the equivalent information. The advantage is modularization:
118the XML specifications can be modified locally and communication will proceed
119effectively according to the new scheme provided only that all affected modules
120are altered appropriately. Conversely, if an API is changed then all modules
121usually have to be recompiled to reflect the update.
122
123Modules are thought of as ``agents'' that have, or have access to, certain
124functionality. A module may respond to a message by processing it itself, or
125forwarding it to another module, or a combination of the two.\footnote{Francois
126used some nice words to tie up modules and agents. Kathy, can you remember
127what he was saying?}
128
129If modules are on different computers, the communication will take place using
130SOAP (Simple Object Access Protocol) (although other protocols are possible). Figure 2 shows a Greenstone system where the local site has no collections or services of its own. Instead, the MessageRouter (1 in the diagram) talks to two other sites using SOAP. The local MR has two Communicator modules
131 that enable it to make SOAP requests; the two remote sites each have a SOAP server which
132listens for such requests and fulfills them.
133
134A potential downside of expressing the programming interface structure in XML
135is execution efficiency. The input and output XMLin and XMLout in the above
136statement can be either a serialized String representation, which is the
137primary representation method, or a Document Object Model (DOM), which is a
138tree that represents the parsed XML string. Two versions of the processing
139operation will be provided, string to string and tree to tree.
140
141\subsection{Dynamic configurability}
142
143Digital libraries need to be dynamic. It must be possible to routinely add new
144collections, or new user interfaces, or completely new kinds of service, to a
145running digital library without having to bring it down and restart it.
146
147The digital library back end is built around a central MessageRouter module
148that provides a way of gaining access to any collection or service. When new
149collections come up, they register with the MessageRouter in order to make
150themselves visible throughout the system. When users make requests, they are
151passed to the MessageRouter, which forwards them to the appropriate module for
152processing. Requests are synchronous; the requesting process is blocked until
153the result is received. (An asynchronous-to-synchronous buffering module is
154envisaged if this should become necessary for certain purposes.)
155
156The most basic request, which any module will respond to, is
157``describe-yourself''. (In fact, the ability to respond to
158``describe-yourself'' is really what defines a ``module.'') The MessageRouter
159responds with an XML document which typically specifies some collections that
160are available locally, and some other Greenstone sites (their own collections
161may also be listed). Its response may also describe service clusters or single services provided by the
162MessageRouter itself, for example, cross-collection searching, or collection formation capability.
163
164A plain ``describe-yourself'' request will return a complete description. A
165``describe-yourself'' message sent to a collection returns collection-specific
166metadata, and a list of services that the collection provides. It is possible
167to add a qualifier to the request which asks for a particular facet of the
168complete description instead, thereby achieving communication economy.
169
170Using these facilities, it is possible for a user interface module to ask a
171MessageRouter for a list of local collections, remote sites and their
172collections, and for each collection a list of the services available. The XML
173documents containing this information could be amalgamated and presented to the
174user as an XML form that actually implements the services that are represented.
175
176\subsection{Interacting with the user}
177
178The MessageRouter, together with the services it provides access to, forms the
179core of the Greenstone digital library system. Clients could be written that
180call in a variety of ways upon the services that Greenstone provides.
181
182A very important form of client is one that implements user interaction with
183Greenstone3 through a Web browser, which is the standard way of communicating
184with the digital library system. The user makes a request by clicking a URL or
185submitting a Web form. This request is intercepted by a servlet which invokes
186a Greenstone module called a Receptionist. The Receptionist represents the
187user's normal point of contact with the system: based on the input, it creates XML messages which it passes i into the Greenstone system through the
188MessageRouter. The responses are gathered together and translated it into the form of
189a Web page for presentation to the user.
190
191The Receptionist receives from the servlet an XML representation of
192the arguments in the URL (``CGI arguments'', though we do not use the
193CGI mechanism). One of these arguments is the Action, which, along
194with the Subaction argument determines what information must be
195requested from the MessageRouter to fulfill the request. Table 1 shows
196a list of the actions that are understood by Greenstone2; Greenstone
1973 will have similar functionality.
198
199The Receptionist includes a Java class for each action. These classes do not
200know anything about the collections, services, or other sites that are
201available in the Greenstone system. Instead, they decode the other arguments in
202the URL to determine what information must be requested, and send it through
203the MessageRouter. A single action often generates several different requests:
204for example, to generate the traditional Greenstone home page, the PageAction must query the MessageRouter for a list of its collections. Then, for each collection, collection metadata such as the collection image and collection Title must be retrieved. The XML results returned by these requests are put together
205into one large XML tree, to which is appended system configuration and
206translation information. The resulting XML structure is converted, using XSLT
207files appropriate to that particular action, to an HTML page for presentation
208to the user.
209
210Other types of client which do not use HTML may interact with the Receptionist. An output type specifier is included in each request to the Receptionist: using XSLT modes, different output formats may be generated such as XML or WML.
211
212\subsection{Digital library services}
213
214A digital library consists of several different ``collections,'' each
215represented by a collection module. For each collection, a set of ``services''
216is provided. Examples of services are
217\begin{bulletedlist}
218 \item full-text query
219 \item fielded query
220 \item music query
221 \item document retrieval
222 \item metadata retrieval
223 \item browsing classifier
224 \item hierarchical phrase browsing.
225\end{bulletedlist}
226
227Services are provided by modules called ``service modules'', which each
228implement a group of related operations. For example, one service is MGPPGDBM,
229which implements four operations: full-text and fielded queries, and document
230and metadata retrieval. MGPPGDBM operates on collections that are in the
231format of standard Greenstone2 collections, and provides these four services
232for such collections. Another service is GSDL2Classifier, which provides
233operations that correspond to a browsing classifier. Together these two
234classes allow a Greenstone2 collection to be used, completely unchanged, within
235Greenstone3 (provided an appropriate configuration file is created).
236
237Service modules are self-describing modules: that is, they respond to the
238``describe-yourself'' message. As noted above, collections are also
239self-describing modules: they respond to ``describe-yourself'' by returning
240collection-specific metadata, and a list of services that the collection
241provides---which can then be queried individually using ``describe-yourself''
242messages. Thus a collection may be viewed as a cluster of services.
243Greenstone3 uses service clusters to represent other things than collections.
244For example, all the operations associated with building a particular kind of
245collection may be grouped together into a service cluster.
246
247\subsection{Data in the system}\footnote{I haven't discussed this with anyone yet, however I like it :-) actually now Rob likes it too. NOTE: if we keep this document-resource idea, need to change all the resource refs in this paper to document!!}
248
249Data in the system consists of 'documents' and 'resources'. A document is an XML document\footnote{whats a better word for a generic document, not a greenstone document ??} that exists independently in the system. You could delete all other documents and it would still be valid (although links to other documents may become invalid). A resource is something that is associated with a document, and doesn't exist outside of that document's context.
250
251For example, a book that has been added to a collection will be represented by an XML document. The document contains metadata associated with the book, for example Title, Source Author etc. It has xlinks to associated resources or other documents. Any images in the book would be resources belonging to that document. The original representation of the book, eg the pdf file, would also be a resource of the document. There may be associated documents, such as the same book but translated into a different language. This translation is a document in its own right, but is linked to by the original document.
252
253Documents are indexed, but resources are not. This means that documents can be discovered through searching and browsing. Resources, on the other hand, can only be found via the containing document. Both can be retrieved. Documents are identified by a system id eg HASHxxx. Resources are identified by a unique identifier. This is likely to be a file path---this could be appended to an HTTP address to enable retrieval of the document via HTTP, or could be used as an identifier to request the resource from the site via XML messages.
254
255The content of the document need not be stored with the document---it may live in the compressed data files. The documents themselves may be stored compressed or in a database. Currently, in Greenstone2, the equivalent information is stored in a gdbm database.
256
257Documents don't just have to be books and text files. A collection could contain images---each image would have a document, and the content of the document would point to the image file.
258A document could be a sequence of other documents eg a powerpoint show of individual slides.
259A classifier is a document - a hierarchical ordering by metadata of a set of documents into lists or categories.
260
261\subsection{Getting off the ground}
262
263We have described in broad terms the basic components of Greenstone3. It is a
264highly configurable system that allows new modules to be added while it is
265running---dynamic configuration. However, in order to get it off the ground,
266configuration files are used to define an initial configuration.
267
268A single computer system may have several different Greenstone systems
269or ``sites'' running simultaneously, each of which typically serve
270different collections. For example, a single user may have a public
271Greenstone site which offers collections to external users over the
272web, as well as a private site that offers personal collections (like
273email) that cannot be accessed externally. Or in a multiuser research
274environment, each user may have one or more sites reflecting
275Greenstone collections, or additional facilities, in different stages
276of development.
277
278The computer system will have just one Greenstone directory structure,
279though this structure may support several different sites. Each site
280has a home directory in the Greenstone structure, inside which is a
281``collect'' directory that contains the collections offered by that site.
282
283The sites can be ``served'' in different ways. A servlet can be started up, which invokes a
284Receptionist and a MessageRouter. One of the arguments to
285the servlet is the site's home directory. This configuration has a client and server compiled together. The information in this site can then be accessed via the web. Alternatively, a SOAPServer could be started up, which just invokes a MessageRouter. Other Greenstone systems or clients can communicate with this site via SOAP. Greenstone is not limited to SOAP communication---any protocol which can transmit XML may be used to communicate between sites, or between clients and servers.
286
287For each site there is a configuration file that specifies the URI for the site
288(localSiteName), and a list of external sites that the site connects to. It
289may also specify any services or service clusters provided by the site that are not connected with
290a collection---for example, a language translation service. Collections are
291not specified in this configuration file; instead they are determined by the
292contents of the ``collect'' directory for the site. This allows new
293collections to be added dynamically by placing them in that directory.
294
295\section{Greenstone Implementation}
296\label{sec:impl}
297
298
299\subsection{classes etc??}
300
301In general, a Greenstone module corresponds to a Java class. The Receptionist, Action, MessageRouter, Collection, ServiceCluster modules are all Java classes. The exception is the service. Many services share operations, for example, access to the MGPP index files. For this reason, several services may be implemented by a single class---we call this a ServicesImpl class. For example, MGPPGDBMServices is subclass of ServicesImpl which provides services that use the MGPP files and GDBM databases of a Greenstone 2 collection: TextQuery, DocumentRetrieve and MetadataRetrieve. MGGDBMServices provides the same services, but uses MG and GDBM files from a Greenstone 2 collection.
302
303\subsection{Configuring Greenstone}
304\label{subsec:config}
305
306Greenstone3 involves several different kinds of configuration files, all
307expressed in XML. Each site has a configuration file that binds parameters for
308the site, {\em siteConfig.xml}. Each collection has two configuration files, {\em collectionConfig.xml} and {\em buildConfig.xml\/}, that give metadata for the
309collection.\footnote{These replace {\em collect.cfg} and {\em build.cfg} in
310Greenstone2.} The first includes user-defined metadata for the collection,
311such as its name and the {\em About this collection} text; and also gives
312instructions on how the collection is to be built. The second is produced by
313the build-time process and includes any metadata that can be determined
314automatically.\footnote{Currently it is produced by hand, because collections must
315be built with Greenstone2.}
316
317\subsubsection{Site configuration file}
318
319The file {\em siteConfig.xml} specifies the URI for the site ({\em
320localSiteName\/}), any services or service clusters provided by the site that are not connected
321with a particular collection (for example, translation services), and a list of
322known external sites to connect to. Collections are not specified in the site
323configuration file, instead they are determined by the contents of the site's
324collections directory.
325
326Here is a configuration file for a rudimentary site with no site-wide services,
327which does not connect to any external sites.
328\begin{quote}\begin{footnotesize}\begin{verbatim}
329<config>
330 <localSiteName value="org.greenstone.localsite"/>
331 <serviceClusterList/>
332 <servicesImplList/>
333 <siteList/>
334</config>
335\end{verbatim}\end{footnotesize}\end{quote}
336The following configuration file is for a site with one site-wide service, a
337translation service. It connects to the previous site using SOAP.
338\begin{quote}\begin{footnotesize}\begin{verbatim}
339<config>
340 <localSiteName value="org.greenstone.gsdl1"/>
341 <servicesImplList>
342 <servicesImpl name="TranslationServices"/>
343 </servicesImplList>
344 <serviceClusterList/>
345 <siteList>
346 <site name="org.greenstone.localsite"
347 address="http://localhost:8080/soap/servlet/rpcrouter"
348 type="soap"/>
349 </siteList>
350</config>
351\end{verbatim}\end{footnotesize}\end{quote}
352
353\subsubsection{Building configuration file}
354
355The file {\em buildConfig.xml} contains all metadata about the collection that can
356be determined automatically when building the collection, such as the number of
357documents it contains. It also includes a list of servicesImpl classes that are
358required at runtime to provide the services that have been built into the
359collection. The servicesImpl names are Java classes that are loaded
360dynamically at runtime. Any information inside the servicesImpl element is
361specific to that service---there is no set format. Here is an example:
362
363\begin{quote}\begin{footnotesize}\begin{verbatim}
364<buildConfiguration>
365 <metadataList>
366 <metadata name="iconCollection">mgppdemo.gif</metadata>
367 <metadata name="colName">mgpp demo</metadata>
368 <metadata name="numDocs">5</metadata>
369 <metadata name="numSections">189</metadata>
370 </metadataList>
371 <servicesImplList>
372 <servicesImpl name="MGPPGDBMServices">
373 <defaultIndex name="tt"/>
374 <defaultLevel name="Section"/>
375 <levelList>
376 <level name="Document"/>
377 <level name="Section"/>
378 </levelList>
379 <indexList>
380 <index name="tt"/>
381 <index name="t0"/>
382 </indexList>
383 <metadataList>
384 <element name="Title"/>
385 <element name="Subject"/>
386 <element name="Organization"/>
387 <element name="URL"/>
388 </metadataList>
389 </servicesImpl>
390 <servicesImpl name="PhindServices"/>
391 <servicesImpl name="GSDL2ClassifierServices">
392 <classifierList>
393 <classifier name="CL1">
394 <metadataList>
395 <metadata name="Title">Subject</metadata>
396 </metadataList>
397 </classifier>
398 <classifier name="CL2" >
399 <metadataList>
400 <metadata name="Title">Title</metadata>
401 </metadataList>
402 </classifier>
403 </classifierList>
404 </servicesImpl>
405 </servicesImplList>
406</buildConfig>
407\end{verbatim}\end{footnotesize}\end{quote}
408Note: because {\em collectionConfig.xml} is not used yet, the {\em iconCollection}
409and {\em colName} metadata elements have been specified here.
410
411\subsubsection{Collection configuration file}
412
413The format of {\em collectionConfig.xml} has not yet been defined.
414
415\subsubsection{Starting up}
416
417We use the Tomcat web server, which operates either stand-alone in a test mode
418or in conjunction with the Apache web server. The Greenstone LibraryServlet
419class is loaded by Tomcat and the servlet's {\em init()} method is called. Each time a
420{\em get\/}/{\em put\/}/{\em post} (etc.) is used, a new thread is started and
421{\em doGet()\/}/{\em doPut()\/}/{\em doPost()} (etc.) is called.
422
423The {\em init()} method creates a new Receptionist and a new instance of the
424MessageRouter. The appropriate system variables are set in each (interface
425name, site name, etc.) and then {\em configure()} is called. A MessageRouter
426reference is given to the Receptionist. The servlet then communicates only with
427the Receptionist, not with the MessageRouter.
428
429The Receptionist loads up all the different Action classes. A
430static list is used initially, and other Actions may be loaded on the fly as needed.
431
432The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This
433lists the ServicesImpl classes that need to be loaded, and lists any sites that need
434to be connected to. It looks inside the {\em collect} directory which contains
435all the site's collections and loads up a Collection object for each valid
436collection found.
437
438The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml}
439files, determines the metadata, and loads ServicesImpl classes based on the
440names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServicesImpl>#} XML element is passed to the object to be used in configuration.\footnote{Kathy, I don't
441understand this sentence.}
442
443\section{System messages}
444
445Once the system is up and running (the configuration
446process described in Section~\ref{subsec:config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
447 First, we examine the basic message
448formats, then how the system creates and responds to the messages.
449
450All messages are enclosed in
451\begin{quote}\begin{footnotesize}\begin{verbatim}
452<message lang='xx'>
453\end{verbatim}\end{footnotesize}\end{quote}
454The language attribute is used by the XSLT to determine the language currently
455being used by the user interface. Virtually all messages contain text strings,
456and services use this attribute to return strings in the appropriate language.
457Requests are called {\em <request>\/}, responses are called {\em <response>\/}.
458A single message can hold several requests or responses.
459
460There are two different types of message, explained in the two subsections
461below. The first is a simple representation of the arguments in a Greenstone
462URL. It is a rudimentary message passed into the digital library system from
463outside. The response is a page of data, typically in HTML. All other messages
464are internal Greenstone messages, and have the same basic format.\footnote{We
465format names in lower case with the first letter of internal words capitalized,
466like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
467
468This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
469
470\subsubsection{Servlet to Receptionist messages}\label{subsec:url-type}
471
472Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
473Greenstone URL. The two main arguments are {\em a} (action) and {\em sa}
474(subaction).\footnote{The {\em sa} replaces Greenstone's old {\em p} arg for
475the page action, and is new for other actions. For example, a text query could
476be encoded as {\em a=q \& sa=text\/}.} All other arguments are treated as
477parameters.
478
479Here is the XML representation of the arguments:
480
481\begin{quote}\begin{footnotesize}\begin{verbatim}
482<request type='action' action='a-arg-value' subaction='sa-arg-value'
483 output='html'>
484 <paramList>
485 <param name='xx' value=''yyy'/>
486 <param name=...
487 </paramList>
488</request>
489\end{verbatim}\end{footnotesize}\end{quote}
490The receptionist routes the message to the appropriate action. The output
491field is used to indicate what type of output to return. The actions do not
492return responses in the normal format; instead they return a page of
493information, expressed by default in HTML. Alternative formats could be XML or WML.
494
495The LibraryServlet class communicates with the Receptionist, which is the entry
496point into the system. Future GUIs could communicate either with the
497Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
498
499The arguments used currently are shown in Table~\ref{tab:args}a.
500Other arguments can be specified by the particular service. For example, the
501TextQuery service that the MGPPGDBMService module provides uses the additional
502arguments shown in Table~\ref{tab:args}b.
503
504\begin{table}
505\center{\footnotesize
506\begin{tabular}{llll}
507\cline{2-4}
508(a) & \bf Action & \bf Argument & \bf Typical value \\
509\cline{2-4}
510& p (page) & sa & home, about \\
511& & c (collection) & demo, mgppdemo, ... \\
512& q (query) & sa & text, field, music\\
513& & c & demo, mgppdemo, ... \\
514& & q (query) & the \\
515& r (resource) & sa & (not used yet) \\
516& & c & demo, mgppdemo, ... \\
517& & r (resource) & HASH01af33...\\
518& a (applet) & sa & d (display), r (request) \\
519& & c & demo, mgppdemo, ... \\
520\cline{2-4}\\
521\cline{2-4}
522(b) & \bf Argument & \bf Values \\
523\cline{2-4}
524& s (stem) & 0, 1 \\
525& k (casefold) & 0, 1 \\
526& mm (matchMode) & all, some \\
527& sb (sortBy) & rank, natural \\
528& ql (queryLevel) & \multicolumn{2}{l}{Document, Section, Paragraph} \\
529& md (matchDocs) & 10, 20, ... \\
530\cline{2-4}
531\end{tabular}}
532\label{tab:args}
533\caption{Arguments that can appear in a Greenstone URL: (a) generic;
534(b) additional arguments for the TextQuery service}
535\end{table}
536
537Here is an example message that retrieves the home page in French:
538\begin{quote}\begin{footnotesize}\begin{verbatim}
539<message lang='fr'>
540 <request type='action' action='p' subaction='home' output='html'/>
541</message>
542\end{verbatim}\end{footnotesize}\end{quote}
543
544This message represents a text query:
545\begin{quote}\begin{footnotesize}\begin{verbatim}
546<message lang='en'>
547 <request type='action' page='q/text' output='html'>
548 <paramList>
549 <param name='k' value='0'/>
550 <param name='s' value='1'/>
551 <param name='md' value='10'/>
552 <param name='c' value='demo'/>
553 <param name='q' value='the'/>
554 </paramList>
555</message>
556\end{verbatim}\end{footnotesize}\end{quote}
557
558\subsubsection{Module to module messages}
559
560In Greenstone3's modular architecture messages are used extensively to pass
561information from one module to another, for example from an Action to the
562MessageRouter module, and from that module to a service module. Requests have
563a {\em to} attribute and responses have {\em from\/}. These are addresses used
564by routing modules. For example {\em to='site1/site2/demo/TextQuery'} routes a
565message to a MessageRouter ({\em site1\/}), from there to another MessageRouter
566({\em site2\/}), from there to a collection ({\em demo\/}), and from there to a
567particular service ({\em TextQuery\/}).
568
569Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system.
570
571The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
572\begin{quote}\begin{footnotesize}\begin{verbatim}
573<message lang='en'>
574 <request type='describe' to=''/>
575</message>
576\end{verbatim}\end{footnotesize}\end{quote}
577If the {\em to} field is empty, the request is answered by the first module that it is passed to.
578An example response from a MessageRouter might look like this:
579\begin{quote}\begin{footnotesize}\begin{verbatim}
580<message lang='en'>
581 <response type='describe'>
582 <serviceList>
583 <service name='CrossCollectionSearch' type='query' />
584 </serviceList>
585 <siteList>
586 <site name='org.greenstone.gsdl1'
587 address='http://localhost:8080/soap/servlet/rpcrouter'
588 type='soap' />
589 </siteList>
590 <collectionList>
591 <collection name='org.greenstone.gsdl1/
592 org.greenstone.gsdl2/fao' />
593 <collection name='org.greenstone.gsdl1/demo' />
594 <collection name='org.greenstone.gsdl1/fao' />
595 <collection name='myfiles' />
596 </collectionList>
597 </response>
598</message>
599\end{verbatim}\end{footnotesize}\end{quote}
600This MessageRouter has one site-wide service, a cross-collection searching service. It
601communicates with one site, {\em org.greenstone.gsdl1\/}. It is aware of four
602collections. One of these, {\em myfiles\/}, belongs to it; the other three are
603available through the external site. One of those collections is actually from
604a further external site.
605
606It is possible to ask just for a specific part of the information provided by a
607describe request, rather than the whole message. For example, these two
608messages get the {\em collectionList} and the {\em siteList} respectively:
609\begin{quote}\begin{footnotesize}\begin{verbatim}
610<message lang='en'>
611 <request type='describe' to='' info='collectionList'/>
612</message>
613
614<message lang='en'>
615 <request type='describe' to='' info='siteList'/>
616</message>
617\end{verbatim}\end{footnotesize}\end{quote}
618When a collection is asked to describe itself, what is returned is all of the
619collection specific metadata and a list of services. For example, here is such
620a message, along with a sample response.
621
622\begin{quote}\begin{footnotesize}\begin{verbatim}
623<message lang='en'>
624 <request type='describe' to='demo'/>
625</message>
626
627<message lang='en'>
628 <response type='describe' from='demo' >
629 <collection name='demo'>
630 <serviceList>
631 <service name='TextQuery' type='query' />
632 <service name='DocRetrieve' type='query' />
633 <service name='MetadataRetrieve' type='query' />
634 </serviceList>
635 <metadataList>
636 <metadata name='numDocs'>321</metadata>
637 <metadata name='numSections'>5532</metadata>
638 <metadata name='title'>The demo collection</metadata>
639 <metadata name='aboutText'>This is a demo collection.</metadata>
640 </metadataList>
641 </collection>
642 </response>
643</message>
644\end{verbatim}\end{footnotesize}\end{quote}
645A {\em describe} request sent to a service returns a list of parameters that
646the service accepts, and describes the content type for the request and
647response.
648
649Parameters have the following format:
650\begin{quote}\begin{footnotesize}\begin{verbatim}
651<param name='xxx' type='integer|boolean|string|input' default='yyy'/>
652<param name='xxx' type='enum' default='aa'/>
653 <option name='aa'/><option name='bb'/>...
654</param>
655\end{verbatim}\end{footnotesize}\end{quote}
656If no default is specified, the parameter is assumed to be mandatory.
657Here are three examples of parameters:
658\begin{quote}\begin{footnotesize}\begin{verbatim}
659<param name='Case' type='boolean' default='0'/>
660
661<param name='MaxDocs' type='integer' default='50'/>
662
663<param name='Index' type='enum' default='dtx'>
664 <option name='dtx'/>
665 <option name='stt'/>
666 <option name='stx'/>
667<param>
668\end{verbatim}\end{footnotesize}\end{quote}
669Here is a message, along with a sample response.
670\begin{quote}\begin{footnotesize}\begin{verbatim}
671<message lang='en'>
672 <request type='describe' to='demo/TextQuery'/>
673</message>
674
675<message lang='en'>
676 <response type='describe' from='demo/TextQuery' >
677 <service name='TextQuery' type='query'>
678 <paramList>
679 <param name='matchDocs' type='integer' default='50/>
680 <param name='case' type='boolean' default='1'/>
681 <param name='index' type='enum' default='tt'>
682 <option name='tt'/>
683 <option name='t0'/>
684 </param>
685 </paramList>
686 </response>
687</message>
688\end{verbatim}\end{footnotesize}\end{quote}
689
690So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services.
691
692``Configure'' requests are used to tell the MessageRouter to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
693
694So far, we have {\em activate} and {\em deactivate} configure requests.
695Some examples are as follows.
696\begin{quote}\begin{footnotesize}\begin{verbatim}
697<message><request type='configure' to=''>
698<configure action='deactivate' type='collection' name='demo'/>
699</request></message>
700
701<message><request type='configure' to=''>
702<configure action='activate' type='collection' name='demo'/>
703</request></message>
704
705<message><request type='configure' to=''>
706<configure action='activate' type='servicesImpl'
707 name='TranslationServices'/>
708</request></message>
709\end{verbatim}\end{footnotesize}\end{quote}
710
711The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the servicesImpl class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new servicesImpl object. As for collections, if one already exists, it is deactivated first.
712
713The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
714\begin{quote}\begin{footnotesize}\begin{verbatim}
715<message><response from='' type='configure'>
716 <status>demo collection activated</status>
717</response></message>
718\end{verbatim}\end{footnotesize}\end{quote}
719\footnote{this format not properly defined yet}
720
721Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
722
723The main type of requests in the system are for services. There are different types of services: query, build\footnote{need new name?}, transform, enrich, extract, accrete. The two most common ones are build and query. Build is for collection formation, query is for the typical use of those collections---querying, browsing, retrieving documents. The other types of service generally enhance the functionality of the first two. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
724
725'Query' requests are the most used requests in the system. They are requests for data of some kind, for example, a list of the documents matching a certain criteria, the Title and Author metadata for some specified documents, the text for a specified document, and so on. Each request has a content, and some parameters that specify modifications to the way the query is carried out. So the basic form of a query request is as follows:
726
727\begin{quote}\begin{footnotesize}\begin{verbatim}
728<message lang='en'>
729 <request type='query' to='demo/TextQuery'>
730 <paramList/>
731 <content/>
732 </request>
733</message>
734\end{verbatim}\end{footnotesize}\end{quote}
735
736The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request. The value of the parameter can be an attribute, or the content of the parameter.
737Attributes can be used for simple strings.
738
739\begin{quote}\begin{footnotesize}\begin{verbatim}
740<param name='case' value='1'/>
741<param name='maxDocs' value='34'/>
742<param name='index' value='dtx'/>
743\end{verbatim}\end{footnotesize}\end{quote}
744or
745\begin{quote}\begin{footnotesize}\begin{verbatim}
746<param name='case'>1</param>
747<param name='maxDocs'>34</param>
748<param name='index'>dtx</param>
749\end{verbatim}\end{footnotesize}\end{quote}
750
751The content of the query is the actual query itself---for a text query, this is the query string. For an image or music query, it would be the image file or music clip. For document retrieval, the identifier of the document is the content.
752
753Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.
754
755The following shows some example query requests and their responses.
756
757Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
758\begin{quote}\begin{footnotesize}\begin{verbatim}
759<message lang='en'>
760 <request to="mgppdemo/TextQuery" type="query">
761 <paramList>
762 <param name="maxDocs" value="10"/>
763 <param name="queryLevel" value="Section"/>
764 <param name="stem" value="1"/>
765 <param name="matchMode" value="some"/>
766 <param name="sortBy" value="natural"/>
767 <param name="index" value="t0"/>
768 <param name="case" value="0"/>
769 </paramList>
770 <content>snail</content>
771 </request>
772</message>
773\end{verbatim}\end{footnotesize}\end{quote}
774
775\begin{quote}\begin{footnotesize}\begin{verbatim}
776<message lang='en'>
777 <response from="mgppdemo/TextQuery" type="query">
778 <content>
779 <resourceList>
780 <resource name="HASH010f073f22033181e206d3b7"/>
781 <resource name="HASH010f073f22033181e206d3b7.2"/>
782 <resource name="HASHac0a04dd14571c60d7fbfd"/>
783 </resourceList>
784 </content>
785 </response>
786</message>
787\end{verbatim}\end{footnotesize}\end{quote}
788
789Give me the Title metadata for these documents:
790\begin{quote}\begin{footnotesize}\begin{verbatim}
791<message lang='en'>
792 <request to="mgppdemo/MetadataRetrieve" type="query">
793 <content>
794 <resourceList>
795 <resource name="HASH010f073f22033181e206d3b7"/>
796 <resource name="HASH010f073f22033181e206d3b7.2"/>
797 <resource name="HASHac0a04dd14571c60d7fbfd"/>
798 </resourceList>
799 <metadataList>
800 <metadata name="Title"/>
801 </metadataList>
802 </content>
803 </request>
804</message>
805\end{verbatim}\end{footnotesize}\end{quote}
806
807\begin{quote}\begin{footnotesize}\begin{verbatim}
808<message lang='en'>
809 <response from="mgppdemo/MetadataRetrieve" type="query">
810 <content>
811 <resourceList>
812 <resource name="HASH010f073f22033181e206d3b7">
813 <metadataList>
814 <metadata name="Title">Farming snails 1:
815Learning about snails; Building a pen; Food and shelter plants
816 </metadata>
817 </metadataList>
818 </resource>
819 <resource name="HASH010f073f22033181e206d3b7.2">
820 <metadataList>
821 <metadata name="Title">Learning about snails</metadata>
822 </metadataList>
823 </resource>
824 <resource name="HASHac0a04dd14571c60d7fbfd">
825 <metadataList>
826 <metadata name="Title">Farming snails 2:
827Choosing snails; Care and harvesting; Further improvement
828 </metadata>
829 </metadataList>
830 </resource>
831 </resourceList>
832 </content>
833 </response>
834</message>
835\end{verbatim}\end{footnotesize}\end{quote}
836
837Give me the text for this document:
838\begin{quote}\begin{footnotesize}\begin{verbatim}
839<message lang='en'>
840 <request to="mgppdemo/ResourceRetrieve" type="query">
841 <content>
842 <resourceList>
843 <resource name="HASH010f073f22033181e206d3b7.2"/>
844 </resourceList>
845 </content>
846 </request>
847</message>
848\end{verbatim}\end{footnotesize}\end{quote}
849
850\begin{quote}\begin{footnotesize}\begin{verbatim}
851<message lang='en'>
852 <response from="mgppdemo/ResourceRetrieve" type="query">
853 <content>
854 <resource name="HASH010f073f22033181e206d3b7.2">
855 <content>
856&lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
857&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;11. To farm snails is not hard; however,
858it is quite different from keeping chickens or ducks or from growing crops
859such as maize, rice, cassava or groundnuts.&lt;/P&gt;
860&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
861&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;12. Since farming snails is so different
862from other kinds of farming, you will have to learn a lot of new things.
863&lt;/P&gt;....
864 </content>
865 </resource>
866 </content>
867 </response>
868</message>
869\end{verbatim}\end{footnotesize}\end{quote}
870
871Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going.
872
873Build requests generally do not need a content, they just have a parameter list.\footnote{or is the collection the content?} Like any service, the parameters used by the service can be obtained by a describe request to that service.
874
875Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'):
876
877\begin{quote}\begin{footnotesize}\begin{verbatim}
878<message lang='en'>
879 <request type='build' to='build/NewCollection'>
880 <paramList>
881 <param name='creator' value='[email protected]'/>
882 <param name='collName' value='the demo collection'/>
883 <param name='collShortName' value='demo'/>
884 </paramlist>
885 </request>
886</message>
887
888<message lang='en'>
889 <request type='build' to='build/ImportCollection'>
890 <paramList>
891 <param name='collection' value='demo'/>
892 </paramlist>
893 </request>
894</message>
895\end{verbatim}\end{footnotesize}\end{quote}
896
897
898\subsection{Generating the pages}
899
900URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{subsec:url-type}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
901System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
902
903Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
904
905The basic page format is:
906\begin{quote}\begin{footnotesize}\begin{verbatim}
907<page>
908 <config/>
909 <translate/>
910 <request/>
911 <response/>
912</page>
913\end{verbatim}\end{footnotesize}\end{quote}
914
915There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Translate contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
916
917The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and translate information, and the xslt files.
918
919\subsubsection{Page action}
920
921Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is
922transformed using {\em home.xsl\/}. For the 'about' page, a {\em
923describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata
924and a list of services, and the result is transformed using {\em about.xsl\/}.
925
926\subsubsection{Query action}
927
928Currently, only text query has been implemented.
929For each page, the service description is requested from the TextQuery service or the current collection (via a describe request). This is done every time the query page is
930displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If there is no query
931string specified in the URL, only this information is needed---the request was for the blank query page.
932If there is a query string specified, i.e. the user has entered a query, a query request to the TextQuery service is sent. This has the query string as content, and all the parameters from the URL in the parameter list. A list of document identifiers
933is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
934documents, with a request for their {\em Title} metadata. The result is
935transformed using {\em textquery.xsl\/}.
936
937\subsubsection{Applet action}
938
939There are two types of request to the applet action: {\em a=a \& sa=d\/} and
940{\em a=a \& sa=r\/}. The value {\em sa=d\/} means ``display the applet.'' A
941{\em describe} request is sent to the service, which returns the {\footnotesize \verb#<applet>#} HTML element. The transformation file {\em applet.xsl} embeds this
942into the page, and the servlet returns the HTML.
943
944The value {\em sa=r} signals a request from the applet. The result is returned
945directly to the applet code, in XML. The other parameters are sent to the
946service untransformed, and the result is passed directly back to the applet.
947Applet action can therefore work with any applet whose service understands the
948messages.
949
950Here are two examples of requests generated by the Applet action, along with their corresponding responses.
951
952The first request corresponds to the URL arguments {\em a=a \&
953sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
954applet for the mgppdemo collection''.
955
956\begin{quote}\begin{footnotesize}\begin{verbatim}
957<message>
958 <request type='describe' to='mgppdemo/PhindApplet'/>
959</message>
960
961<message>
962 <response type='describe'>
963 <service name='PhindApplet' type='query'>
964 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
965 jaxp.jar, xml-apis.jar'
966 CODE='org.greenstone.applet.phind.Phind.class'
967 CODEBASE='lib/java'
968 HEIGHT='400' WIDTH='500'>
969 <PARAM NAME='library' VALUE=''/>
970 <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
971 <PARAM NAME='collection' VALUE='mgppdemo' />
972 <PARAM NAME='classifier' VALUE='1' />
973 <PARAM NAME='orientation' VALUE='vertical' />
974 <PARAM NAME='depth' VALUE='2' />
975 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
976 <PARAM NAME='backdrop' VALUE='interfaces/default/
977 images/phindbg1.jpg'/>
978 <PARAM NAME='fontsize' VALUE='10' />
979 <PARAM NAME='blocksize' VALUE='10' />
980 The Phind java applet.
981 </applet>
982 </service>
983 </response>
984</message>
985\end{verbatim}\end{footnotesize}\end{quote}
986
987The second request corresponds to the arguments {\em a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
988indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the
989request as parameters. The response is in a form suitable for the applet, placed inside
990{\footnotesize \verb#<appletData>#} in a standard Greenstone message. AppletAction returns the
991contents of appletData to the browser, i.e. to the applet itself.
992
993\begin{quote}\begin{footnotesize}\begin{verbatim}
994<message>
995 <request type='query' to='mgppdemo/PhindApplet'>
996 <paramList>
997 <param name='pc' value='1'/>
998 <param name='pptext' value='health'/>
999 <param name='pfe' value='0'/>
1000 <param name='ple' value='10'/>
1001 <param name='pfd' value='0'/>
1002 <param name='pld' value='10'/>
1003 <param name='pfl' value='0'/>
1004 <param name='pll' value='10'/>
1005 </paramList>
1006 </request>
1007</message>
1008
1009<message>
1010 <response type='query' from='mgppdemo/PhindApplet'>
1011 <appletData>
1012 <phindData df='9' ef='46' id='933' lf='15' tf='296'>
1013 <expansionList end='10' length='46' start='0'>
1014 <expansion df='4' id='8880' num='0' tf='59'>
1015 <suffix> CARE</suffix>
1016 </expansion>
1017 ...
1018 </expansionList>
1019 <documentList end='10' length='9' start='0'>
1020 <document freq='78' hash='HASH4632a8a51d33c47a75c559' num='0'>
1021 <title>The Courier - N??159 - Sept- Oct 1996 Dossier Investing
1022 in People Country Reports: Mali ; Western Samoa
1023 </title>
1024 </document>
1025 ...
1026 </documentList>
1027 <thesaurusList end='10' length='15' start='0'>
1028 <thesaurus df='7' id='12387' tf='15' type='RT'>
1029 <phrase>PUBLIC HEALTH</phrase>
1030 </thesaurus>...
1031 </thesaurusList>
1032 </phindData>
1033 </appletData>
1034 </response>
1035</message>
1036\end{verbatim}\end{footnotesize}\end{quote}
1037
1038Note that the applet HTML may need to know the name of the {\em library}
1039program. However, that name is chosen by the person who installed the software
1040and will not necessarily be ``library''. To get around this, the applet can
1041put a parameter called ``library'' into the applet data with a null value:
1042\begin{quote}\begin{footnotesize}\begin{verbatim}
1043<PARAM NAME='library' VALUE=''/>\/}
1044\end{verbatim}\end{footnotesize}\end{quote}
1045When the Applet action encounters this parameter it inserts the name of the
1046current library servlet as its value.
1047
1048\subsubsection{Resource action}
1049
1050ResourceAction sends a query to the ResourceRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and
1051table of contents would be needed to make the display nicer.
1052
1053\subsubsection{Formatting the page using XSLT}\label{subsec:xslt}
1054
1055Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
1056located in interfaces/default/transforms. Collections, sites and other interfaces
1057can override these files by having their own copy of the appropriate
1058files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
1059interface, default interface.
1060
1061\subsection{Internationalization}
1062
1063Internationalization is a bit part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
1064
1065At the moment:\footnote{this may change soon, so I haven't 'nice'd this text yet}
1066
1067Language specific text strings are specified as xml files, named by
1068the language code, eg en.xml, fr.xml.
1069
1070They are located in interfaces/translate. This assumes one set of
1071language files per system set up. (or should they be site/interface
1072specific??)
1073
1074A Translate class is used to hold the xml for the languages. The
1075Receptionist has a Translate object. It sets the default language to
1076be 'en', the current language is whatever a message lang attribute
1077specifies.
1078
1079The translation object internally holds DOM trees for the languages it
1080has loaded. It has a mapping between language name and the tree. When
1081the default language is set, the appropriate xml file is read in and
1082parsed into a DOM tree.
1083
1084A call to getLanguageTree(lang) returns a DOM element of the form:
1085
1086\begin{quote}\begin{footnotesize}\begin{verbatim}
1087<translate>
1088<current><text>.. the actual text elems...</text></current>
1089<default><text>.. the actual text elems...</text></default>
1090<translate>
1091\end{verbatim}\end{footnotesize}\end{quote}
1092If the specified lang has not been loaded yet, it will be read into
1093memory. Only languages which have been asked for are loaded into
1094memory. But once loaded, they stay there. Will need to see how much
1095memory this requires once we use full language files.---may need to
1096limit the number of cached languages? or maybe only hold two in
1097memory, and read them in from file again when a new one is asked for.
1098
1099The xml files start with the {\em <text>\/} element. The elements are
1100organized hierarchically. An example is the following.
1101
1102\begin{quote}\begin{footnotesize}\begin{verbatim}
1103<text>
1104<common>
1105<nzdl>New Zealand Digital Library</nzdl>
1106<aboutpage>about page</aboutpage>
1107<search>Search</search>
1108<browse>Browse</browse>
1109<applet>Applet</applet>
1110<home>HOME</home>
1111<on>on</on>
1112<off>off</off>
1113</common>
1114<query>
1115<queryoptions>Query Options:</queryoptions>
1116<params><case><name>Case differences:</name>
1117<on>ignore case differences</on>
1118<off>upper/lower case must match</off></case>
1119<stem><name>Word endings:</name>
1120<on>ignore word endings</on>
1121<off>whole word must match</off></stem>
1122<sortBy><name>Sort results by:</name>
1123<rank>rank</rank><natural>none</natural></sortBy>
1124<maxDocs><name>Maximum number of documents to return:</name></maxDocs>
1125<matchMode><name>Match mode:</name>
1126<all>all</all><some>some</some></matchMode>
1127<queryLevel><name>Level:</name><Section>Section</Section>
1128<Document>Document</Document></queryLevel></params>
1129<beginsearch>Begin Search</beginsearch>
1130</query>
1131</text>
1132\end{verbatim}\end{footnotesize}\end{quote}
1133Most of the text strings will be specified by the main xml files, but
1134some will come from the services/collections. In this case, the lang
1135attribute of the message will indicate which language text to return.
1136
1137Text strings can added to the HTML output in two ways. In the XSLT, we
1138know which text strings are needed, eg 'home' for the home link. home
1139is in common/home, so we get the text by calling the text template
1140with common/home as a param:
1141
1142\begin{quote}\begin{footnotesize}\begin{verbatim}
1143<xsl:call-template name='text'>
1144<xsl:with-param name='key'>common/home</xsl:with-param>
1145</xsl:call-template>
1146\end{verbatim}\end{footnotesize}\end{quote}
1147If we want to specify text strings in the xml result (rather than the
1148XSLT---would we want to do this?), we can use
1149{\footnotesize \verb#<text key='common/home'/>#}.
1150{\footnotesize \verb#<xsl:apply-templates select='text'/>#} must then be used when
1151processing the parent node.
1152
1153The template is shown below. Basically, it looks for an appropriate
1154element in the current language tree, and if its not found, it looks
1155in the default language tree.
1156
1157\begin{quote}\begin{footnotesize}\begin{verbatim}
1158<xsl:template name='text' match='text'>
1159<xsl:param name='key'><xsl:value-of select='@key'/></xsl:param>
1160<!-- try the current language -->
1161<xsl:variable name='path1'>
1162ancestor::page/translate/current/text/<xsl:value-of select='$key'/>
1163</xsl:variable>
1164<xsl:variable name='string1'><xsl:value-of
1165select='java:org.apache.xalan.lib.Extensions.evaluate($path1)'/>
1166</xsl:variable>
1167<xsl:choose><xsl:when test='boolean(string($string1))'>
1168<xsl:value-of select='$string1'/></xsl:when>
1169<xsl:otherwise>
1170<!-- try the default language -->
1171<xsl:variable name='path2'>
1172ancestor::page/translate/default/text/<xsl:value-of select='$key'/>
1173</xsl:variable>
1174<xsl:value-of select=
1175'java:org.apache.xalan.lib.Extensions.evaluate($path2)'/>
1176</xsl:otherwise>
1177</xsl:choose>
1178</xsl:template>
1179\end{verbatim}\end{footnotesize}\end{quote}
1180
1181
1182\subsection{Collection formation}
1183
1184
1185There is no facility to create collections in GSDL3 yet. There are three
1186working servicesImpl classes: MGPPGDBMServices, GSDL2ClassifierServices and PhindServices---these use
1187standard collections built with MGPP and gdbm from GSDL2. For
1188PhindService, you need to add 'classify phind' to the collect.cfg file
1189during building. For the GSDL2ClassifierServices you need to have any other classifiers specified.
1190
1191To use a collection in GSDL3, build using mgpp in the old greenstone
1192(see mgpp\_in\_greenstone.txt in the mgpp/docs directory of either
1193gsdl).
1194
1195Then copy the collection over into the appropriate collect directory,
1196and create index/buildConfig.xml (see \ref{subsec:config}). The basic info
1197that you need is shown below. Substitute the appropriate values for
1198your collection. Only put the phind service one in if you have a phind
1199classifier.
1200
1201\begin{quote}\begin{footnotesize}\begin{verbatim}
1202<buildConfiguration>
1203 <metadataList>
1204 <metadata name="iconCollection">mgppdemo.gif</metadata>
1205 <metadata name="colName">mgpp demo</metadata>
1206 </metadataList>
1207 <servicesImplList>
1208 <servicesImpl name="MGPPGDBMServices">
1209 <defaultIndex name="tt"/>
1210 <defaultLevel name='Section'/>
1211 </servicesImpl>
1212 <servicesImpl name="PhindServices"/>
1213 <servicesImpl name="GSDL2ClassifierServices">
1214 <classifierList>
1215 <classifier name="CL1">
1216 <metadataList>
1217 <metadata name="Title">Subject</metadata>
1218 </metadataList>
1219 </classifier>
1220 <classifier name="CL2" >
1221 <metadataList>
1222 <metadata name="Title">Title</metadata>
1223 </metadataList>
1224 </classifier>
1225 </classifierList>
1226 </servicesImpl>
1227 </servicesImplList>
1228</buildConfiguration>
1229\end{verbatim}\end{footnotesize}\end{quote}
1230
1231\section{Details}
1232
1233This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS.
1234
1235\subsection{Directory structure}
1236
1237The first part of Table~\ref{tab:dirs} shows the common stuff which can be shared between
1238Greenstone users---the src, libraries etc. These will eventually be installed into appropriate system directories. The second part shows
1239stuff used by one person/group---their sites and interface setup
1240etc. There can be several sites/interfaces per installation.
1241
1242\begin{table}
1243\center{\footnotesize
1244\begin{tabular}{l p{7cm}}
1245\hline
1246gsdl3
1247 & The main installation directory---gsdl3home can be changed to something more standard\\
1248gsdl3/src
1249 & Source code lives here \\
1250gsdl3/src/java/org/greenstone/gsdl3
1251 & Contains the top level classes that either have main programs, or are server/servlet classes\\
1252gsdl3/src/java/org/greenstone/gsdl3/core
1253 & ModuleInterface, MessageRouter, Receptionist---the central classes that the others hang off\\
1254gsdl3/src/java/org/greenstone/gsdl3/service
1255 & The various service modules---these things do the work\\
1256gsdl3/src/java/org/greenstone/gsdl3/util
1257 & Utility classes \\
1258gsdl3/src/java/org/greenstone/gsdl3/collection
1259 & Collection class\\
1260gsdl3/src/java/org/greenstone/gsdl3/comms
1261 & Communicator classes, eg SOAP\\
1262gsdl3/src/java/org/greenstone/gsdl3/action
1263 & Action classes used by the Receptionist---do the work of displaying the pages\\
1264gsdl3/src/java/org/greenstone/gsdl3/classes
1265 & On compilation, the Java classes get put here---they can then be combined into a single jar file, and copied to the java lib directory \\
1266gsdl3/src/java/org/greenstone/gdbm
1267 & Java wrapper for gdbm---uses j-gdbm, a jni gdbm wrapper\\
1268gsdl3/src/java/org/greenstone/testing
1269 & Junit scaffolding for unit testing.\\
1270gsdl3/src/cpp/
1271 & Place for any cpp source code---none yet \\
1272gsdl3/packages
1273 & Imported packages from other systems eg mg, mgpp \\
1274gsdl3/lib
1275 & Shared library files\\
1276gsdl3/lib/java
1277 & Java jar files\\
1278gsdl3/comms
1279 & Put some stuff here for want of a better place---things to do with servers and communication. eg soap stuff, and tomcat servlet container\\
1280gsdl3/docs
1281 & Documentation :-)\\
1282gsdl3/web
1283 & The place to put any web stuff that the servlet needs. html files go here\\
1284gsdl3/web/WEB-INF
1285 & The web.xml file lives here (configuration information for tomcat)\\
1286gsdl3/web/WEB-INF/classes
1287 & Servlet classes go in here\\
1288\hline
1289gsdl3/sites
1290 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (eg soap) to other sites\\
1291gsdl3/sites/localsite
1292 & One site\\
1293gsdl3/sites/localsite/collect
1294 & The collections directory \\
1295gsdl3/sites/localsite/images
1296 & Site specific images \\
1297gsdl3/sites/localsite/transforms
1298 & Site specific transforms \\
1299gsdl3/interfaces
1300 & Contains all interface specific stuff (eg images and XSLT transforms\\
1301gsdl3/interfaces/default
1302 & The default interface\\
1303gsdl3/interfaces/default/images
1304 & The images\\
1305gsdl3/interfaces/default/transforms
1306 & The XSLT files\\
1307gsdl3/interfaces/translate
1308 & Language specific stuff---language xml files containing all the text strings go here\\
1309\hline
1310\end{tabular}}
1311\label{tab:dirs}
1312\caption{The Greenstone directory structure}
1313\end{table}
1314
1315\subsection{Installation guide}
1316
1317\newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}}
1318
1319Cuurently, greenstone3 is only available through CVS. The installation procedure has not been automated. Eventually, all that will be needed (hopefully) will be a {\footnotesize \verb#configure, make, make install#} sequence. But for now, all the steps must be done by hand.
1320
1321\subsubsection{Get the source}
1322
1323\noindent If you have a greenstone\_cvs account, you can use the following:
1324
1325\begin{footnotesize}\begin{tt}
1326\noindent export CVSROOT=:ext:{\em your-username}@cvs.scms.waikato.ac.nz:\\
1327\indent /usr/local/global-cvs/gsdl-src\\
1328export CVS\_RSH=ssh\\
1329cvs co gsdl3\\
1330\end{tt}\end{footnotesize}
1331
1332\noindent Otherwise, you can get it through anonymous access:
1333
1334\begin{footnotesize}\begin{tt}
1335\noindent export CVSROOT=:pserver:cvs\[email protected]:2402\\
1336\indent /usr/local/global-cvs/gsdl-src\\
1337export CVS\_RSH=ssh\\
1338cvs co gsdl3\\
1339\end{tt}\end{footnotesize}
1340
1341\noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}.
1342\\
1343\\
1344\noindent You also need to download the mgpp code - it comes in a separate CVS module.
1345
1346\noindent I once added a directory for mgpp in gsdl3/packages in cvs---now I can't get
1347rid of it, so you need to delete it before you start.
1348
1349\begin{footnotesize}\begin{tt}
1350\noindent cd \gsdlhome/packages\\
1351rm -r mgpp\\
1352cvs co mgpp\\
1353\end{tt}\end{footnotesize}
1354
1355\subsubsection{Compile and install greenstone}\label{subsec:compile}
1356
1357\noindent From here on, \gsdlhome\ is the absolute path to the top-level directory of the gsdl3 checkout.
1358For example, /research/kjdon/gsdl3.
1359\\
1360\\
1361\noindent First, set up your classpath:\\
1362\begin{footnotesize}\begin{tt}
1363cd \gsdlhome\\
1364source setup.bash
1365\end{tt}\end{footnotesize}
1366
1367\noindent Note: this step needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.
1368\\
1369\\
1370\noindent Compile mgpp:\\
1371\begin{footnotesize}\begin{tt}
1372cd \gsdlhome/packages/mgpp\\
1373./configure --prefix \gsdlhome\\
1374make\\
1375make install\\
1376\end{tt}\end{footnotesize}
1377
1378\noindent Note: you need to use \gsdlhome\ as the prefix for mgpp's configure at this stage---mgpp has been set up properly, but gsdl3 hasn't.
1379
1380\noindent Next you need to compile greenstone.
1381
1382\noindent A jar file is used from tomcat during compilation, so this must be unpacked first.
1383\begin{footnotesize}\begin{tt}
1384cd \gsdlhome/comms/tomcat/\\
1385tar xzvf jakarta-tomcat-4.0.1.tar.gz \\
1386\end{tt}\end{footnotesize}
1387\\
1388\\
1389\noindent Do a \verb#make#, then a \verb#make install# in each of the following directories:\\
1390\begin{footnotesize}\begin{tt}
1391\gsdlhome/src/java/org/greenstone/gdbm\\
1392\gsdlhome/src/java/org/greenstone/testing\\
1393\gsdlhome/src/java/org/greenstone/gsdl3\\
1394\gsdlhome/src/java/org/greenstone/applet/phind
1395\end{tt}\end{footnotesize}
1396
1397\subsubsection{Set up the sample sites}
1398
1399\noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
1400localsite does not connect to any other sites. site1 specifies a SOAP connection to localsite.
1401
1402\noindent The collections which do not have data can be looked at but you cant do any queries on them.
1403
1404\noindent The data comes in tar files, which need to be unpacked:
1405
1406\begin{footnotesize}\begin{tt}
1407\noindent cd \gsdlhome/sites/localsite/collect/mgppdemo/index/\\
1408tar xzvf mgpp-indexfiles.tar.gz\\
1409cd ../../chinesedemo/index\\
1410tar xzvf chinese-index-files.tar.gz\\
1411\end{tt}\end{footnotesize}
1412
1413\subsubsection{Set up tomcat}
1414
1415\noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
1416\\
1417\\
1418\noindent The file \begin{footnotesize}{\tt \gsdlhome/web/WEB-INF/web.xml}\end{footnotesize} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
1419There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, ``library'', which serves localsite, and ``library1'' which serves site1.
1420\\
1421\\
1422\noindent One initialisation parameter for the library servlets is {\footnotesize \verb#gsdl3home#}.
1423\begin{footnotesize}\begin{verbatim}
1424<init-param>
1425 <param-name>gsdl3home</param-name>
1426 <param-value>/research/kjdon/home/gsdl3</param-value>
1427</init-param>
1428\end{verbatim}\end{footnotesize}
1429
1430\noindent You need to replace {\footnotesize \verb#/research/kjdon/home/gsdl3#} with the correct path for \gsdlhome, in both library servlet entries.
1431\\
1432\\
1433\noindent Next, symbolic links to the sites, interfaces and lib directories need to be set up---this enables tomcat to 'see' files in these directories.
1434
1435\begin{footnotesize}\begin{tt}
1436\noindent cd \gsdlhome/web\\
1437ln -s ../interfaces\\
1438ln -s ../sites\\
1439ln -s ../lib
1440\end{tt}\end{footnotesize}
1441
1442\noindent The test servlet needs to be compiled: (you need to set up your {\footnotesize CLASSPATH} if you haven't already, see \ref{subsec:compile})\\
1443\begin{footnotesize}\begin{tt}
1444\noindent cd \gsdlhome/web/WEB-INF/classes\\
1445javac TestServlet.java
1446\end{tt}\end{footnotesize}
1447\\
1448\\
1449\noindent Next, one of the scripts that runs tomcat needs to be altered to use our {\footnotesize CLASSPATH}.
1450
1451\begin{footnotesize}\begin{tt}
1452\noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1
1453\end{tt}\end{footnotesize}
1454\\
1455\\
1456\noindent edit {\footnotesize \verb#bin/catalina.sh#}:
1457
1458\noindent on line 89 add {\footnotesize \$CLASSPATH} to the {\footnotesize CP="...."} line ie. {\footnotesize CP="\$CLASSPATH:..."}---this
1459sets up the classpath properly
1460\\
1461\\
1462\noindent Now you need to tell tomcat about the greenstone context:
1463\\
1464\\
1465\noindent edit {\footnotesize \verb#conf/server.xml#}:
1466
1467\noindent you need to add a context for gsdl servlets---there are other context elements in the xml---this one goes at the same level as those ones.\\
1468add the following (putting the correct path for \gsdlhome)
1469
1470\begin{footnotesize}\begin{tt}
1471\noindent <!-- GSDL3 Service -->\\
1472<Context path="/gsdl3" docBase="\gsdlhome/web" debug="1" reloadable="true"/>
1473\end{tt}\end{footnotesize}
1474
1475\noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file
1476
1477\subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}
1478
1479\noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
1480
1481\begin{footnotesize}\begin{tt}
1482\noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/bin\\
1483./startup.sh
1484\end{tt}\end{footnotesize}
1485
1486\noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)
1487\\
1488\\
1489\noindent The tomcat server can be accessed on the web at {\footnotesize \verb#http://localhost:8080#}---this gets you to a welcome page.
1490The greenstone stuff is at {\footnotesize \verb#http://localhost:8080/gsdl3#}---this displays {\footnotesize \gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
1491
1492\noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
1493\begin{bulletedlist}
1494\begin{footnotesize}\begin{tt}
1495\item \gsdlhome/web/WEB-INF/web.xml
1496\item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml
1497\end{tt}\end{footnotesize}
1498\item any classes or jar files used by the servlets
1499\end{bulletedlist}
1500\noindent Note: stdin and stdout for the servlets both go to\\
1501\begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/logs/catalina.out}\end{footnotesize}
1502
1503\subsubsection{Using SOAP to talk to a remote site}
1504
1505\noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. site1 specifies a SOAP connection to localsite. If you run site1 without connecting to localsite, you can only see the local collections, eg the dummy collection myfiles. However, if you connect to localsite, you can see all of {\em its} collections as well.
1506\\
1507\\
1508\noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service.
1509\\
1510\\
1511\noindent Set up SOAP:
1512\\
1513\\
1514\begin{footnotesize}\begin{tt}
1515\noindent cd \gsdlhome/comms/soap\\
1516tar xzvf soap-bin-2.2.tar.gz
1517\end{tt}\end{footnotesize}
1518\\
1519\\
1520\noindent The context for the SOAP servlet needs to be added to the tomcat server.xml file in the same way that you added the context for gsdl3:
1521
1522\noindent edit \begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml}\end{footnotesize}
1523
1524\noindent add the following (put the proper path for \gsdlhome)
1525
1526\begin{footnotesize}\begin{tt}
1527\noindent <!-- SOAP Service -->\\
1528<Context path="/soap" docBase="\gsdlhome/comms/soap/soap-2\_2/webapps/soap"\\
1529debug="1" reloadable="true"/>
1530\end{tt}\end{footnotesize}
1531\\
1532\\
1533\noindent Next, the class SOAPServer must be altered---the constructor is not allowed any arguments, so it has a path hard coded in it. This is the address of the site that is to be served. In \begin{footnotesize}{\tt \gsdlhome/src/java/org/greenstone/gsdl3/SOAPServer.java}\end{footnotesize}, you need to change the {\footnotesize \verb#site_home#} variable to \begin{footnotesize}{\tt \gsdlhome/sites/localsite}\end{footnotesize} (using the absolute path).
1534\\
1535\\
1536\noindent The SOAPServer service now needs to be deployed. If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
1537
1538\noindent The SOAP servlet can be accessed at \begin{footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
1539
1540\noindent To deploy the SOAPServer for localsite:
1541
1542\noindent Click on ``deploy'' and edit the following fields in the deploy form:
1543
1544\begin{tabular}{ll}
1545ID: & org.greenstone.localsite\\
1546Scope: (any will do) & Request---new instantiation for each request\\
1547 & Session---same instantiation across a session\\
1548 & Application---only uses one instantiation\\
1549Methods: &process\\
1550Java Provider / Provider Class: & org.greenstone.gsdl3.SOAPServer\\
1551\end{tabular}
1552
1553\noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the lefthand ``List'' button.
1554
1555\noindent Information about deployed services is maintained between tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
1556
1557\subsubsection{Debugging SOAP}
1558
1559\noindent If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, there is a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
1560
1561\noindent To run it:
1562
1563\noindent {\footnotesize \verb#java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080#}
1564
1565\noindent tomcat uses port 8080 - you need to modify greenstone to talk to port 8070 instead of 8080. - this is specified in the {\footnotesize \verb#site#} element of the site configuration file.
1566\\
1567\\
1568\noindent eg, in \begin{footnotesize}{\tt \gsdlhome/sites/site1/siteConfig.xml}\end{footnotesize}:
1569\begin{footnotesize}\begin{verbatim}
1570<site name="org.greenstone.localsite"
1571 address="http://localhost:8080/soap/servlet/rpcrouter"
1572 type="soap"/>
1573\end{verbatim}\end{footnotesize}
1574
1575\noindent You can replace the 8080 with 8070 if you want to run TcpTunnelGui.
1576
1577\noindent Note that \begin{footnotesize}{\tt http://localhost:8080/soap/servlet/rpcrouter}\end{footnotesize} is the
1578address for talking to the tomcat SOAP servlet services.
1579
1580
1581%\clearpage
1582%\addcontentsline{toc}{chapter}{Bibliography}
1583%\bibliography{main}
1584
1585\end{document}
1586
1587
1588
Note: See TracBrowser for help on using the repository browser.