Changeset 4162


Ignore:
Timestamp:
2003-04-15T15:36:16+12:00 (19 years ago)
Author:
kjdon
Message:

partial update of teh manual

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r3712 r4162  
    33\hyphenation{Message-Router Text-Query}
    44
     5\newenvironment{gsc}% Greenstone text bits
     6{\begin{footnotesize}\begin{tt}}%
     7{\end{tt}\end{footnotesize}}
     8 
     9\newcommand{\gst}[1]{{\footnotesize \tt #1} }
    510\begin{document}
    611
     
    4752Native Interface) will be used to communicate with these.
    4853
    49 
    50 \section{Architecture}
    51 
    52 This section is covered by the paper: An agent based architecture for dynamic digital library construction and configuration. Either cut and paste it in here, or link to the text?? or have two separate docs. dont want to have to maintain two separate versions of the same thing.
    53 
    54 \section{Greenstone Implementation}
    55 \label{sec:impl}
    56 
    57 \subsection{Configuring Greenstone}
    58 \label{subsec:config}
    59 
    60 Greenstone3 involves several different kinds of configuration files, all
    61 expressed in XML. Each site has a configuration file that binds parameters for
    62 the site, {\em siteConfig.xml}.  Each collection has two configuration files, {\em collectionConfig.xml} and {\em buildConfig.xml\/}, that give metadata for the
    63 collection.\footnote{These replace {\em collect.cfg} and {\em build.cfg} in
     54A description of the general design and architecture of Greenstone3 is covered by the document ``The design of Greenstone3: An agent based dynamic digital library'' (design-2002.ps, in the gsdl3/docs/manual directory).
     55
     56\section{System modules}\label{sec:modules}
     57
     58A Greenstone3 'library' system consists of many components... Figure~\ref{fig:local} shows they fit together in a stand-alone system.
     59
     60\begin{figure}[t]
     61  \centering
     62  \includegraphics[width=4in]{local} %5.8
     63  \caption{A simple stand-alone site.}
     64  \label{fig:local}
     65\end{figure}
     66
     67
     68{\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site.
     69
     70{\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, eg all the building services may be part of a cluster. What is part of a cluster is specified by the site config file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
     71Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
     72
     73{\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory.
     74
     75{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
     76
     77{\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.
     78
     79{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in  more detail in Section~\ref{sec:pagegen}.
     80
     81
     82\section{Configuration}\label{sec:config}
     83
     84Initial Greenstone3 system configuration  is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for
     85the site, \gst{siteConfig.xml}.  Each collection has two configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, that give metadata and other information for the
     86collection.\footnote{\gst{siteConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
    6487Greenstone2.}  The first includes user-defined metadata for the collection,
    6588such as its name and the {\em About this collection} text; and also gives
    6689instructions on how the collection is to be built.  The second is produced by
    6790the build-time process and includes any metadata that can be determined
    68 automatically.\footnote{Currently only the buildConfig.xml file is used - collections are built using gs2 style building and therefore use the old collect.cfg.}
    69 
    70 \subsubsection{Site configuration file}
    71 
    72 The file {\em siteConfig.xml} specifies the URI for the site ({\em
    73 localSiteName\/}), any services or service clusters provided by the site that are not connected
    74 with a particular collection (for example, translation services, or collection building), and a list of
     91automatically. It also includes configuration information for any serviceRacks needed by the collection.
     92
     93The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will have no effect. There are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
     94
     95\subsection{Site configuration file}\label{sec:siteconfig}
     96
     97The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of
    7598known external sites to connect to.  Collections are not specified in the site
    7699configuration file, instead they are determined by the contents of the site's
    77100collections directory.
    78101
    79 Here is a configuration file for a rudimentary site with no site-wide services,
    80 which does not connect to any external sites.\footnote{should the code be tolerant of missing elements? or do we require empty elements?}
    81 \begin{quote}\begin{footnotesize}\begin{verbatim}
    82 <config>
     102The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat gsdl3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
     103 
     104The first example in Figure~\ref{fig:siteconfig} shows a site configuration file for a rudimentary site with no site-wide services,
     105which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster.  It also connects to the first site using SOAP.
     106These two sites are running on the same machine. For site gsdl1 to talk to site localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is \gst{http://localhost:8090/soap/servlet/rpcrouter}.
     107
     108
     109\begin{figure}
     110\begin{gsc}\begin{verbatim}
     111<siteConfig>
    83112  <localSiteName value="org.greenstone.localsite"/>
     113  <httpAddress value="http://localhost:8090/gsdl3/sites/localsite"/>
    84114  <serviceClusterList/>
    85115  <serviceRackList/>
    86116  <siteList/>
    87 </config>
    88 \end{verbatim}\end{footnotesize}\end{quote}
    89 The following configuration file is for a site with one site-wide service cluster - a collection building cluster.  It also connects to the previous site using SOAP.
    90 \begin{quote}\begin{footnotesize}\begin{verbatim}
    91 <config>
     117</siteConfig>
     118\end{verbatim}\end{gsc}
     119
     120\begin{gsc}\begin{verbatim}
     121<siteConfig>
    92122  <localSiteName value="org.greenstone.gsdl1"/>
    93   <serviceRackList/>
    94     <servicesImpl name="TranslationServices"/>
    95   </servicesImplList>
     123  <httpAddress value="http://localhost:8090/gsdl3/sites/gsdl1"/>
    96124  <serviceClusterList> 
    97125    <serviceCluster name="build">
     
    108136  <siteList>
    109137    <site name="org.greenstone.localsite"
    110       address="http://localhost:8080/soap/servlet/rpcrouter"
     138      address="http://localhost:8090/soap/servlet/rpcrouter"
    111139      type="soap"/>
    112140  </siteList>
    113 </config>
    114 \end{verbatim}\end{footnotesize}\end{quote}
    115 
    116 These two sites are running on the same machine. For site1 to talk to localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is "http://localhost:8080/soap/servlet/rpcrouter"
    117 
    118 \subsubsection{Building configuration file}
    119 
    120 The file {\em buildConfig.xml} contains all metadata and other information about the collection that can
     141</siteConfig>
     142\end{verbatim}\end{gsc}
     143\caption{Two sample site config files}
     144\label{fig:siteconfig}
     145\end{figure}
     146
     147
     148
     149\subsection{Collection configuration file}\label{sec:collconfig}
     150
     151The collection configuration file is where the collection designer (eg a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig}
     152here is an example as it is at present.
     153
     154\begin{figure}
     155\begin{gsc}\begin{verbatim}
     156<collectionConfig xmlns:gsf="http://www.greenstone.org/
     157                         configformat">
     158  <metadataList>   
     159    <metadata name="colName" lang="en">greenstone mgpp demo
     160    </metadata>
     161    <metadata name="colDescription" lang="en">This is a
     162      demonstration collection for the Greenstone digital
     163      library software. It contains a small subset (11 books)
     164      of the Humanity Development Library.</metadata>
     165    <metadata name="colDescription" lang="fr">C'est une
     166      collection pour demonstration du logiciel Greenstone.
     167      Elle contient une petite partie du projet de bibliotheques
     168      humanitaires et de developpement (11 livres).</metadata>
     169    <metadata name="colIcon">mgppdemo.gif</metadata>
     170  </metadataList>
     171  <search type='mgpp'>
     172    <index name="tt" content="text,metadata"
     173            level="Document,Section">
     174      <displayName lang="en">books</displayName>
     175    </index>
     176    <format>   
     177      <gsf:template match="documentNode">
     178    <td><gsf:link><gsf:metadata name="Title"/>(<gsf:metadata
     179        name="Source"/>)</gsf:link></td>
     180      </gsf:template>
     181    </format>
     182  </search>
     183  <browse>
     184    <classifier name="CL1" type="Hierarchy" content="Subject"
     185    level="Document">
     186      <option name="hfile" value="sub.txt"/>
     187      <option name="sort" value="Title"/>
     188    </classifier>
     189    <classifier name="CL2" type="AZList" content="Title"
     190    level="Document">
     191      <displayName lang='en'>all titles</displayName>
     192      <format>
     193        <gsf:template match="classifierNode">
     194      <td><gsf:link type="classifier"><gsf:metadata name="Title"/>
     195        </gsf:link></td>
     196    </gsf:template>
     197      </format>
     198    </classifier>
     199    <classifier name="CL3" type="List" content="Keyword"
     200    level="Document">
     201      <format>
     202    <gsf:template match="documentNode"><td><gsf:link>
     203    <gsf:metadata name="Keyword"/></gsf:link></td></gsf:template>
     204      </format>
     205    </classifier>
     206    <classifier type="Phind" content="text" level="Section"/>     
     207  </browse>
     208</collectionConfig>
     209\end{verbatim}\end{gsc}
     210\caption{Sample collectionConfig.xml file}
     211\label{fig:collconfig}
     212\end{figure}
     213
     214The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8.
     215The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list suold look like.
     216The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in teh hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.
     217
     218There is also a need for a descripiton of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.
     219
     220\subsection{Building configuration file}\label{sec:buildconfig}
     221
     222The file \gst{buildConfig.xml} contains the metadata and other information about the collection that can
    121223be determined automatically when building the collection, such as the number of
    122224documents it contains.  It also includes a list of serviceRack classes that are
     
    124226collection.  The serviceRack names are Java classes that are loaded
    125227dynamically at runtime. Any information inside the serviceRack element is
    126 specific to that service---there is no set format.  Here is an example:
    127 
    128 \begin{quote}\begin{footnotesize}\begin{verbatim}
    129 
    130 <buildConfig>
     228specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve,  GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration.
     229
     230
     231\begin{figure}
     232\begin{gsc}\begin{verbatim}
     233<buildConfig xmlns:gsf="www.greenstone.org/format" >
    131234  <metadataList>
    132235    <metadata name="numDocs">11</metadata>
    133     <metadata name="colIcon">mgppdemo.gif</metadata>
    134     <metadata name="colName">Greenstone demo collection</metadata>
    135     <metadata name="colDescription">This is a demonstration
    136 collection for the Greenstone digital library software. It
    137 contains a small subset  of the Humanitarian and Development
    138 Libraries.</metadata>
     236    <metadata name="documentMetadata"><element name="Title"/>
     237        <element name="Subject"/><element name="Organization"/>
     238        <element name="URL"/></metadata>
    139239  </metadataList>
    140240  <serviceRackList>
    141241    <serviceRack name="GS2MGPPRetrieve">
    142242      <defaultLevel name="Section"/>
    143     <!-- something list this should be used to advertise
    144 what metadata the collection has available to be retrieved -
    145 however, it is not used yet -->
    146       <metadataList>
    147         <element name="Title"/><element name="Subject"/>
    148     <element name="Organization"/><element name="URL"/>
    149       </metadataList>
     243      <levelList>
     244        <level name="Document"/>
     245        <level name="Section"/>
     246      </levelList>
     247      <classifierList>
     248        <classifier name="CL1" content="Subject"
     249          documentInterleave="true" orientation='vertical'/>
     250        <classifier name="CL2" content="Title"
     251          documentInterleave="false" orientation='horizontal'/>
     252        <classifier name="CL4" content="Organisation"
     253          documentInterleave="true" orientation='vertical'/>
     254        <classifier name="CL5" content="Keyword"
     255          documentInterleave="true" orientation='vertical'/>
     256      </classifierList>
    150257    </serviceRack>
    151258    <serviceRack name="GS2MGPPSearch">
     
    161268      </indexList>
    162269      <fieldList>
    163         <field name="TX"/><field name="SU"/><field name="TI"/>
     270        <field shortname="TX" name="TextOnly"/>
     271        <field shortname="SU" name="Subject"/>
     272        <field shortname="TI" name="Title"/>
    164273      </fieldList>
    165274    </serviceRack>
    166275    <serviceRack name="PhindPhraseBrowse"/>
    167     <serviceRack name="GS2Browse">
    168       <classifierList>
    169         <classifier name="CL1"><metadataList>
    170        <metadata name="Title">Subject</metadata>
    171     </metadataList></classifier>
    172         <classifier name="CL2" ><metadataList>
    173        <metadata name="Title">Title</metadata>
    174         </metadataList></classifier>
    175     <classifier name="CL4"><metadataList>
    176           <metadata name="Title">Organization</metadata>
    177         </metadataList></classifier>
    178     <classifier name="CL5" ><metadataList>
    179           <metadata name="Title">Keyword</metadata>
    180         </metadataList></classifier>
    181       </classifierList>
    182     </serviceRack>
    183276  </serviceRackList>
    184 </buildConfig>   
    185 \end{verbatim}\end{footnotesize}\end{quote}
    186 Note: because {\em collectionConfig.xml} is not used yet, the {\em colIcon}, {\em colDescription}
    187 and {\em colName} metadata elements have been specified here.
    188 
    189 \subsubsection{Collection configuration file}
    190 
    191 The format of {\em collectionConfig.xml} has not yet been defined.
    192 
    193 \subsubsection{Starting up}
     277</buildConfig>
     278\end{verbatim}\end{gsc}
     279\caption{Sample buildConfig.xml file}
     280\label{fig:buildconfig}
     281\end{figure}
     282
     283 
     284\subsection{Start up configuration}\label{sec:startup-config}
    194285
    195286We use the Tomcat web server, which operates either stand-alone in a test mode
    196287or in conjunction with the Apache web server.  The Greenstone LibraryServlet
    197 class is loaded by Tomcat  and the servlet's {\em init()} method is called.  Each time a
    198 {\em get\/}/{\em put\/}/{\em post} (etc.) is used, a new thread is started and
    199 {\em doGet()\/}/{\em doPut()\/}/{\em doPost()} (etc.) is called.
    200 
    201 The {\em init()} method creates a new Receptionist and a new instance of the
     288class is loaded by Tomcat  and the servlet's \gst{init()} method is called.  Each time a
     289\gst{get/put/post} (etc.) is used, a new thread is started and
     290\gst{doGet()/doPut()/doPost()} (etc.) is called.
     291
     292The \gst{init()} method creates a new Receptionist and a new
    202293MessageRouter. The appropriate system variables are set in each (interface
    203 name, site name, etc.) and then {\em configure()} is called. A MessageRouter
     294name, site name, etc.) and then \gst{configure()} is called. A MessageRouter
    204295reference is given to the Receptionist. The servlet then communicates only with
    205296the Receptionist, not with the MessageRouter.
    206297
    207298The Receptionist loads up all the different Action classes. A
    208 static list is used initially, and other Actions may be loaded on the fly as needed.
    209 
    210 The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This
    211 lists the ServiceRack classes that need to be loaded, and lists any sites that need
    212 to be connected to.  It looks inside the {\em collect} directory which contains
    213 all the site's collections and loads up a Collection object for each valid
    214 collection found.
    215 
    216 The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml}
     299static list is used initially, and other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
     300
     301The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. This
     302lists the ServiceRack and ServiceCluster classes that need to be loaded and  any sites that need
     303to be connected to. 
     304It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
     305Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
     306ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
     307For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If teh server for teh remote site is up and running, this should  be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static lists.
     308
     309The MessageRouter also looks inside the site's \gst{collect} directory loads up a Collection object for each valid collection found.
     310
     311The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
    217312files, determines the metadata, and loads ServiceRack classes based on the
    218 names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServiceRack>#} XML element is passed to the object to be used in  configuration.
    219 
    220 \section{System messages}
     313names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.
     314Collection objects are added to teh module map with their name as a key, and also a collection element is added into teh collectionList xml.
     315
     316\subsection{Run-time (re)configuration}\label{sec:runtime-config}
     317
     318The startup configuration reads in teh various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided.
     319
     320Currently there are commands to reconfigure the entire site---i.e. the MessageRouter repeats the whole of its startup initialisation.
     321
     322***TODO***
     323whats available, whats not. show URLS, refer to system messages in next section
     324
     325\section{System messages}\label{sec:messages}
     326
     327for each type of message, show the basic elements, then some example messages.
     328Lists must only have the same elements in them.
    221329
    222330Once the system is up and running (the configuration
    223 process described in Section~\ref{subsec:config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
    224  First, we examine the basic message
    225 formats, then how the system creates and responds to the messages.
     331process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
     332
     333First, we look at how messages originate, and how they flow in the system. Then, we examine the basic message
     334format, and look at the different types of messages.
     335
     336\subsection{Message flow}
     337
     338\subsection{Basic format}
    226339
    227340All messages are enclosed in
    228 \begin{quote}\begin{footnotesize}\begin{verbatim}
    229 <message>
    230 \end{verbatim}\end{footnotesize}\end{quote}
    231 Messages contain either {\em <request>\/} or {\em <response>\/} elements--- a single message may contain multiple requests. Each {\em <request>\/} (and {\em <response>\/}?) has a language attribute, of the form ``lang='xx'''.
     341\begin{quote}\begin{gsc}\begin{verbatim}
     342<message>
     343\end{verbatim}\end{gsc}\end{quote}
     344Messages contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='xx'}.
    232345The language attribute is used by the XSLT to determine the language currently
    233346being used by the user interface.  Virtually all messages contain text strings,
     
    239352This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
    240353
    241 \subsubsection{Servlet to Receptionist messages}\label{subsec:url-type}
     354\subsection{cgi-type messages}\label{sec:cgi}
    242355
    243356Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
    244 Greenstone URL.  The two main arguments are {\em a} (action) and {\em sa}
    245 (subaction).\footnote{The {\em sa} replaces Greenstone's old {\em p} arg for
     357Greenstone URL.  The two main arguments are \gst{a} (action) and \gst{sa}
     358(subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for
    246359the page action, and is new for other actions.  For example, a text query could
    247 be encoded as {\em a=q \& sa=text\/}.}  All other arguments are treated as
     360be encoded as \gst{a=q \& sa=text\/}.}  All other arguments are treated as
    248361parameters.
    249362
    250363Here is the XML representation of the arguments:
    251364
    252 \begin{quote}\begin{footnotesize}\begin{verbatim}
     365\begin{quote}\begin{gsc}\begin{verbatim}
    253366<request type='cgi' action='a-arg-value' subaction='sa-arg-value'
    254367         lang='en' output='html'>
    255368  <paramList>
    256     <param name='xx' value=''yyy'/>
     369    <param name='xx' value='yyy'/>
    257370    <param name=...
    258371  </paramList>
    259372</request>
    260 \end{verbatim}\end{footnotesize}\end{quote}
     373\end{verbatim}\end{gsc}\end{quote}
    261374The receptionist routes the message to the appropriate action.  The output
    262375field is used to indicate what type of output to return. The actions do not
     
    278391\hline
    279392a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
     393& & s (system)\\
    280394sa & subaction & home, about (page action)\\
    281395c & collection or  & demo, build \\
     
    285399ro & request only & 0 or 1 - if set to one, the request is carried out \\
    286400& & but no processing of the results is done \\
     401& & currently only used in process actions \\
    287402o & output type & xml, html, wml \\
    288 l & language & en, fr, zh \\
     403l & language & en, fr, zh ...\\
    289404d & document id & HASHxxx \\
    290405r & resource id & ???\\
    291 id & process handle & an integer identifying a particular process request \\
     406pid & process handle & an integer identifying a particular process request \\
    292407\hline
    293408\end{tabular}}
     409\caption{Generic arguments that can appear in a Greenstone URL}
    294410\label{tab:args}
    295 \caption{Generic rguments that can appear in a Greenstone URL}
    296411\end{table}
    297412
    298413Here is an example message that retrieves the home page in French:
    299 \begin{quote}\begin{footnotesize}\begin{verbatim}
     414\begin{quote}\begin{gsc}\begin{verbatim}
    300415<message>
    301416  <request lang='fr' type='cgi' action='p' subaction='home'
    302417    output='html'/>
    303418</message>
    304 \end{verbatim}\end{footnotesize}\end{quote}
     419\end{verbatim}\end{gsc}\end{quote}
    305420
    306421This message represents a text query:
    307 \begin{quote}\begin{footnotesize}\begin{verbatim}
     422\begin{quote}\begin{gsc}\begin{verbatim}
    308423<message>
    309424  <request  lang='en' type='cgi' action='q'  output='html'>
     
    319434  </paramList>
    320435</message>
    321 \end{verbatim}\end{footnotesize}\end{quote}
     436\end{verbatim}\end{gsc}\end{quote}
    322437
    323438\subsubsection{Module to module messages}
     
    326441information from one module to another, for example from an Action to the
    327442MessageRouter module, and from that module to a service module.  Requests have
    328 a {\em to} attribute and responses have {\em from\/}.  These are addresses used
    329 by routing modules.  For example {\em to='site1/site2/demo/TextQuery'} routes a
    330 message to a MessageRouter ({\em site1\/}), from there to another MessageRouter
    331 ({\em site2\/}), from there to a collection ({\em demo\/}), and from there to a
    332 particular service ({\em TextQuery\/}).
     443a \gst{to} attribute and responses have \gst{from}.  These are addresses used
     444by routing modules.  For example \gst{to='site1/site2/demo/TextQuery'} routes a
     445message to a MessageRouter (\gst{site1}), from there to another MessageRouter
     446(\gst{site2}), from there to a collection (\gst{demo}), and from there to a
     447particular service (\gst{TextQuery}).
    333448
    334449Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system.
    335450
     451\subsection{'describe'-type messages}\label{sec:describe}
    336452The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
    337 \begin{quote}\begin{footnotesize}\begin{verbatim}
     453\begin{quote}\begin{gsc}\begin{verbatim}
    338454<message>
    339455  <request lang='en' type='describe' to=''/>
    340456</message>
    341 \end{verbatim}\end{footnotesize}\end{quote}
    342 If the {\em to} field is empty, the request is answered by the first module that it is passed to.
     457\end{verbatim}\end{gsc}\end{quote}
     458If the \gst{to} field is empty, the request is answered by the first module that it is passed to.
    343459An example response from a MessageRouter might look like this:
    344 \begin{quote}\begin{footnotesize}\begin{verbatim}
     460\begin{quote}\begin{gsc}\begin{verbatim}
    345461<message>
    346462  <response lang='en' type='describe'>
     
    362478  </response>
    363479</message>
    364 \end{verbatim}\end{footnotesize}\end{quote}
     480\end{verbatim}\end{gsc}\end{quote}
    365481This MessageRouter has one site-wide service, a cross-collection searching service. It
    366 communicates with one site, {\em org.greenstone.gsdl1\/}.  It is aware of four
    367 collections.  One of these, {\em myfiles\/}, belongs to it; the other three are
     482communicates with one site, \gst{org.greenstone.gsdl1}.  It is aware of four
     483collections.  One of these, \gst{myfiles}, belongs to it; the other three are
    368484available through the external site.  One of those collections is actually from
    369485a further external site.
     
    371487It is possible to ask just for a specific part of the information provided by a
    372488describe request, rather than the whole message.  For example, these two
    373 messages get the {\em collectionList} and the {\em siteList} respectively:
    374 \begin{quote}\begin{footnotesize}\begin{verbatim}
     489messages get the \gst{collectionList} and the \gst{siteList} respectively:
     490\begin{quote}\begin{gsc}\begin{verbatim}
    375491<message lang='en'>
    376492  <request type='describe' to='' info='collectionList'/>
     
    380496  <request type='describe' to='' info='siteList'/>
    381497</message>
    382 \end{verbatim}\end{footnotesize}\end{quote}
     498\end{verbatim}\end{gsc}\end{quote}
    383499When a collection is asked to describe itself, what is returned is all of the
    384500collection specific metadata and a list of services.  For example, here is such
    385501a message, along with a sample response.
    386502
    387 \begin{quote}\begin{footnotesize}\begin{verbatim}
     503\begin{quote}\begin{gsc}\begin{verbatim}
    388504<message lang='en'>
    389505  <request type='describe' to='demo'/>
     
    408524  </response>
    409525</message>
    410 \end{verbatim}\end{footnotesize}\end{quote}
    411 A {\em describe} request sent to a service returns a list of parameters that
     526\end{verbatim}\end{gsc}\end{quote}
     527A \gst{describe} request sent to a service returns a list of parameters that
    412528the service accepts, and describes the content type for the request and
    413529response.
    414530
    415531Parameters have the following format:
    416 \begin{quote}\begin{footnotesize}\begin{verbatim}
     532\begin{quote}\begin{gsc}\begin{verbatim}
    417533<param name='xxx' type='integer|boolean|string' default='yyy'/>
    418534<param name='xxx' type='enum_single|enum_multi' default='aa'/>
     
    423539 <param .../>
    424540</param>
    425 \end{verbatim}\end{footnotesize}\end{quote}
     541\end{verbatim}\end{gsc}\end{quote}
    426542If no default is specified, the parameter is assumed to be mandatory.
    427543Here are some examples of parameters:
    428 \begin{quote}\begin{footnotesize}\begin{verbatim}
     544\begin{quote}\begin{gsc}\begin{verbatim}
    429545<param name='Case' type='boolean' default='0'/>
    430546
     
    446562</param>
    447563
    448 \end{verbatim}\end{footnotesize}\end{quote}
     564\end{verbatim}\end{gsc}\end{quote}
    449565Here is a message, along with a sample response.
    450 \begin{quote}\begin{footnotesize}\begin{verbatim}
     566\begin{quote}\begin{gsc}\begin{verbatim}
    451567<message>
    452568  <request lang='en'  type='describe' to='demo/TextQuery'/>
     
    466582  </response>
    467583</message>
    468 \end{verbatim}\end{footnotesize}\end{quote}
     584\end{verbatim}\end{gsc}\end{quote}
    469585
    470586So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services.
    471587
    472 ``Configure'' requests are used to tell the MessageRouter to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
    473 
    474 So far, we have {\em activate} and {\em deactivate} configure requests.
     588\subsection{'system'-type messages}
     589``System'' requests are used to tell the MessageRouter or a Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
     590
     591So far, we have \gst{activate} and \gst{deactivate} configure requests.
    475592Some examples are as follows.
    476 \begin{quote}\begin{footnotesize}\begin{verbatim}
     593\begin{quote}\begin{gsc}\begin{verbatim}
    477594<message><request type='configure' to=''>
    478595<configure action='deactivate' type='collection' name='demo'/>
     
    487604           name='TranslationServices'/>
    488605</request></message>
    489 \end{verbatim}\end{footnotesize}\end{quote}
     606\end{verbatim}\end{gsc}\end{quote}
    490607
    491608The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above.  Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first.
    492609
    493610The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
    494 \begin{quote}\begin{footnotesize}\begin{verbatim}
     611\begin{quote}\begin{gsc}\begin{verbatim}
    495612<message><response from='' type='configure'>
    496613  <status>demo collection activated</status>
    497614</response></message>
    498 \end{verbatim}\end{footnotesize}\end{quote}
     615\end{verbatim}\end{gsc}\end{quote}
    499616\footnote{this format not properly defined yet}
    500617
    501618Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
    502619
     620\subsection{'process'-type messages}
     621
     622divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse...
     623show basic structure, then more detailed format for each subtype
     624
    503625The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet.
    504626
     
    506628
    507629The basic structure of a service request is as follows:
    508 \begin{quote}\begin{footnotesize}\begin{verbatim}
     630\begin{quote}\begin{gsc}\begin{verbatim}
    509631<message>
    510632  <request lang='en'  type='query' to='demo/TextQuery'>
    511633    <paramList/>
    512     <content/>
     634    other elements...
    513635  </request>
    514636</message>
    515 \end{verbatim}\end{footnotesize}\end{quote}
     637\end{verbatim}\end{gsc}\end{quote}
    516638
    517639The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
    518640
    519 \begin{quote}\begin{footnotesize}\begin{verbatim}
     641\begin{quote}\begin{gsc}\begin{verbatim}
    520642<param name='case' value='1'/>
    521643<param name='maxDocs' value='34'/>
    522644<param name='index' value='dtx'/>
    523 \end{verbatim}\end{footnotesize}\end{quote}
    524 
    525 Some requests have a content---for document retrieval, the content is the list of documents to retrieve. For metadata retrieval, teh content is the list of documents, and a list of metadata to retrieve for each document.
     645\end{verbatim}\end{gsc}\end{quote}
     646
     647Some requests have other content---for document retrieval, this would be a list of documents to retrieve. For metadata retrieval, the content is the list of documents, and a list of metadata to retrieve for each document.
    526648
    527649Responses vary depending on the type of request.
     650
     651\subsubsection{'query'-type services}
    528652Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
    529653
     
    531655
    532656Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
    533 \begin{quote}\begin{footnotesize}\begin{verbatim}
    534 <message>
    535   <request lang='en'  to="mgppdemo/TextQuery" type="query">
     657\begin{quote}\begin{gsc}\begin{verbatim}
     658<message>
     659  <request lang='en'  to="mgppdemo/TextQuery" type="process">
    536660    <paramList>
    537661      <param name="maxDocs" value="10"/>
     
    542666      <param name="index" value="t0"/>
    543667      <param name="case" value="0"/>
     668      <param name="query" value="snail"/>
    544669    </paramList>
    545     <content>snail</content>
    546670  </request>
    547671</message>
    548 \end{verbatim}\end{footnotesize}\end{quote}
    549 
    550 \begin{quote}\begin{footnotesize}\begin{verbatim}
     672\end{verbatim}\end{gsc}\end{quote}
     673
     674\begin{quote}\begin{gsc}\begin{verbatim}
    551675<message>
    552676  <response lang='en' from="mgppdemo/TextQuery" type="query">
    553     <content>
    554       <documentList>
    555         <document name="HASH010f073f22033181e206d3b7"/>
    556         <document name="HASH010f073f22033181e206d3b7.2"/>
    557         <document name="HASHac0a04dd14571c60d7fbfd"/>
    558       </documentList>
    559     </content>
     677    <documentList>
     678      <document name="HASH010f073f22033181e206d3b7"/>
     679      <document name="HASH010f073f22033181e206d3b7.2"/>
     680      <document name="HASHac0a04dd14571c60d7fbfd"/>
     681    </documentList>
    560682  </response>
    561683</message>
    562 \end{verbatim}\end{footnotesize}\end{quote}
    563 
     684\end{verbatim}\end{gsc}\end{quote}
     685
     686\subsubsection{'retrieve'-type services}
    564687Give me the Title metadata for these documents:
    565 \begin{quote}\begin{footnotesize}\begin{verbatim}
     688\begin{quote}\begin{gsc}\begin{verbatim}
    566689<message>
    567690  <request lang='en'  to="mgppdemo/MetadataRetrieve"
    568691    type="retrieve">
    569     <content>
    570692      <documentList>
    571693        <document name="HASH010f073f22033181e206d3b7"/>
     
    579701  </request>
    580702</message>
    581 \end{verbatim}\end{footnotesize}\end{quote}
    582 
    583 \begin{quote}\begin{footnotesize}\begin{verbatim}
     703\end{verbatim}\end{gsc}\end{quote}
     704
     705\begin{quote}\begin{gsc}\begin{verbatim}
    584706<message>
    585707  <response lang='en' from="mgppdemo/MetadataRetrieve"
     
    611733  </response>
    612734</message>
    613 \end{verbatim}\end{footnotesize}\end{quote}
     735\end{verbatim}\end{gsc}\end{quote}
    614736
    615737Give me the text for this document:
    616 \begin{quote}\begin{footnotesize}\begin{verbatim}
     738\begin{quote}\begin{gsc}\begin{verbatim}
    617739<message>
    618740  <request lang='en'   to="mgppdemo/DocumentRetrieve"
     
    625747  </request>
    626748</message>
    627 \end{verbatim}\end{footnotesize}\end{quote}
    628 
    629 \begin{quote}\begin{footnotesize}\begin{verbatim}
     749\end{verbatim}\end{gsc}\end{quote}
     750
     751\begin{quote}\begin{gsc}\begin{verbatim}
    630752<message>
    631753  <response lang='en' from="mgppdemo/DocumentRetrieve"
     
    647769  </response>
    648770</message>
    649 \end{verbatim}\end{footnotesize}\end{quote}
    650 
     771\end{verbatim}\end{gsc}\end{quote}
     772
     773\subsubsection{'browse'-type services}
     774
     775\subsubsection{'process'-type services}
    651776Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going.
    652777
     
    655780Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'):
    656781
    657 \begin{quote}\begin{footnotesize}\begin{verbatim}
     782\begin{quote}\begin{gsc}\begin{verbatim}
    658783<message>
    659784  <request lang='en'  type='process' to='build/NewCollection'>
     
    673798  </request>
    674799</message>
    675 \end{verbatim}\end{footnotesize}\end{quote}
    676 
    677 
    678 \subsection{Generating the pages}
    679 
    680 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{subsec:url-type}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
     800\end{verbatim}\end{gsc}\end{quote}
     801
     802\subsubsection{'enrich]-type services}
     803
     804\subsection{'status'-type messages}
     805
     806
     807\subsection{'format'-type messages}
     808
     809\subsection{'applet'-type services}
     810
     811\section{Page generation}\label{sec:pagegen}
     812
     813URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
    681814System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
    682815
     
    684817
    685818The basic  page format  is:
    686 \begin{quote}\begin{footnotesize}\begin{verbatim}
     819\begin{quote}\begin{gsc}\begin{verbatim}
    687820<page>
    688  <config/>
    689  <display/>
    690  <request/>
    691  <response/>
     821  <pageExtra>
     822    <config/>
     823    <display/>
     824  </pageExtra>
     825  <pageRequest/>
     826  <pageResponse/>
    692827</page>
    693 \end{verbatim}\end{footnotesize}\end{quote}
     828\end{verbatim}\end{gsc}\end{quote}
    694829
    695830There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
    696831
    697 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and display information, and the xslt files.
    698 
    699 \subsubsection{Page action}
     832The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
     833
     834
     835Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
     836located in interfaces/default/transforms. Collections, sites and other interfaces
     837can override these files by having their own copy of the appropriate
     838files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
     839interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
     840***TODO*** describe a bit more??
     841
     842\subsection{Internationalization}
     843
     844Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
     845
     846Language specific text strings are specified in resource bundle property files. These live in resources/java.
     847
     848There is a properties file per class, and one per interface. At the moment, we have
     849
     850GS2MGPPSearch.properties
     851GS2MGPPRetrieve.properties etc - the service classes
     852
     853interface\_default.properties. - for the default interface
     854
     855To add other languages, create eg GS2MGPPSearch\_fr.properties.
     856
     857The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there.
     858xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
     859
     860All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
     861
     862\subsection{Page action}
    700863
    701864Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.  The page is
    702 transformed using {\em home.xsl\/}.  For the 'about' page, a {\em
    703 describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
    704 and a list of services, and the result is transformed using {\em about.xsl\/}.
    705 
    706 \subsubsection{Query action}
    707 
    708 There are three query services which have been implemented: TextQuery, SimpleFieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
     865transformed using \gst{home.xsl}.  For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
     866and a list of services, and the result is transformed using \gst{about.xsl}.
     867
     868
     869\subsection{Query action}
     870
     871There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
    709872For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is done every time the query page is
    710873displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
    711874is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
    712 documents, with a request for their {\em Title} metadata.  The service description and query result are combined into a page of xml, which is
    713 transformed using {\em basicquery.xsl\/} to produce the html page.
    714 
    715 \subsubsection{Applet action}
    716 
    717 There are two types of request to the applet action: {\em a=a \& sa=d\/} and
    718 {\em a=a \& sa=r\/}.  The value {\em sa=d\/} means ``display the applet.'' A
    719 {\em describe} request is sent to the service, which returns the {\footnotesize \verb#<applet>#} HTML element.  The transformation file {\em applet.xsl} embeds this
     875documents, with a request for their \gst{Title} metadata.  The service description and query result are combined into a page of xml, which is
     876transformed using \gst{basicquery.xsl} to produce the html page.
     877
     878\subsection{Applet action}
     879
     880There are two types of request to the applet action: \gst{a=a \& sa=d\/} and
     881\gst{a=a \& sa=r\/}.  The value \gst{sa=d\/} means ``display the applet.'' A
     882\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element.  The transformation file \gst{applet.xsl} embeds this
    720883into the page, and the servlet returns the HTML.
    721884
    722 The value {\em sa=r} signals a request from the applet.  The result is returned
     885The value \gst{sa=r} signals a request from the applet.  The result is returned
    723886directly to the applet code, in XML.  The other parameters are sent to the
    724887service untransformed, and the result is passed directly back to the applet.
     
    728891Here are two examples of requests generated by the Applet action, along with their corresponding responses.
    729892
    730 The first request corresponds to the URL arguments {\em a=a \&
     893The first request corresponds to the URL arguments \gst{a=a \&
    731894sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
    732895applet for the mgppdemo collection''.
    733896
    734 \begin{quote}\begin{footnotesize}\begin{verbatim}
     897\begin{quote}\begin{gsc}\begin{verbatim}
    735898<message>
    736899  <request type='describe' to='mgppdemo/PhindApplet'/>
     
    761924  </response>
    762925</message>
    763 \end{verbatim}\end{footnotesize}\end{quote}
    764 
    765 The second request corresponds to the  arguments {\em a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
     926\end{verbatim}\end{gsc}\end{quote}
     927
     928The second request corresponds to the  arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
    766929indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
    767930request as parameters. The response is in a form suitable for the applet, placed inside
    768 {\footnotesize \verb#<appletData>#} in a standard Greenstone message.  AppletAction returns the
     931\gst{<appletData>} in a standard Greenstone message.  AppletAction returns the
    769932contents of appletData to the browser, i.e. to the applet itself.
    770933
    771 \begin{quote}\begin{footnotesize}\begin{verbatim}
     934\begin{quote}\begin{gsc}\begin{verbatim}
    772935<message>
    773936  <request type='query' to='mgppdemo/PhindApplet'>
     
    812975  </response>
    813976</message>
    814 \end{verbatim}\end{footnotesize}\end{quote}
    815 
    816 Note that the applet HTML may need to know the name of the {\em library}
     977\end{verbatim}\end{gsc}\end{quote}
     978
     979Note that the applet HTML may need to know the name of the \gst{library}
    817980program.  However, that name is chosen by the person who installed the software
    818981and will not necessarily be ``library''.  To get around this, the applet can
    819982put a parameter called ``library'' into the applet data with a null value:
    820 \begin{quote}\begin{footnotesize}\begin{verbatim}
     983\begin{quote}\begin{gsc}\begin{verbatim}
    821984<PARAM NAME='library' VALUE=''/>\/}
    822 \end{verbatim}\end{footnotesize}\end{quote}
     985\end{verbatim}\end{gsc}\end{quote}
    823986When the Applet action encounters this parameter it inserts the name of the
    824987current library servlet as its value.
    825988
    826 \subsubsection{Document action}
     989\subsection{Document action}
    827990
    828991DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document.  At this stage no additional information is obtained, but in future stuff like Title and
    829992table of contents would be needed to make the display nicer.
    830993
    831 \subsubsection{Formatting the page using XSLT}\label{subsec:xslt}
    832 
    833 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
    834 located in interfaces/default/transforms. Collections, sites and other interfaces
    835 can override these files by having their own copy of the appropriate
    836 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
    837 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
    838 
    839 \subsection{Internationalization}
    840 
    841 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
    842 
    843 Language specific text strings are specified in resource bundle property files. These live in resources/java.
    844 
    845 There is a properties file per class, and one per interface. At the moment, we have
    846 
    847 GS2MGPPSearch.properties
    848 GS2MGPPRetrieve.properties etc - the service classes
    849 
    850 interface\_default.properties. - for the default interface
    851 
    852 To add other languages, create eg GS2MGPPSearch\_fr.properties.
    853 
    854 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml $<$display$>$ element - the xslt can get the ones it needs from there.
    855 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
    856 
    857 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
    858 
    859 \subsection{Collection formation}
    860 
    861 Greenstone 2 compatible building has been implemented in gsdl3. so far only mgpp collections will work.
     994
     995
     996\section{Collection formation}
     997
     998
     999Greenstone 2 compatible building has been implemented in gsdl3.
    8621000
    8631001Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory.
     
    8741012Collection building can also be done on the command line:
    8751013
    876 ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>
     1014\gst{ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>}
    8771015
    8781016eg
    8791017
    880 ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator kjdon@cs.waikato.ac.nz testcol
     1018\gst{ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator kjdon@cs.waikato.ac.nz testcol}
    8811019
    8821020the options get passed to the underlying script, - there is no good help message yet.
     
    8881026CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
    8891027
    890 \section{Details}
     1028\subsection{Collection design}\label{sec:colldesign}
     1029
     1030\section{Installation details}
    8911031
    8921032This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS.
     
    9001040
    9011041\begin{table}
     1042\caption{The Greenstone directory structure}
     1043\label{tab:dirs}
    9021044\center{\footnotesize
    9031045\begin{tabular}{l p{7cm}}
     
    9291071gsdl3/src/java/org/greenstone/testing
    9301072  & Junit scaffolding for unit testing.\\
     1073gsdl3/src/java/org/greenstone/applet
     1074 & where the code for applets goes \\
     1075gsdl3/src/java/org/greenstone/applet/phind
     1076  & the phind applet (phrase browsing) \\
    9311077gsdl3/src/cpp/
    9321078  & Place for any cpp source code---none yet \\
     
    9401086 & any resources that may be needed\\
    9411087gsdl3/resources/java
    942  & properties files for java resource bundles - used to handle all the language specific text\\
     1088 & properties files for java resource bundles - used to handle all the language specific text This directory is on the classpath, so any other Java resources can be placed here \\
     1089gsdl3/resources/soap
     1090 & soap service description files \\
    9431091gsdl3/bin
    9441092  & executable stuff lives here\\
     
    9511099gsdl3/docs
    9521100  & Documentation :-)\\
     1101\hline
    9531102gsdl3/web
    954   & The place to put any web stuff that the servlet needs. html files go here\\
     1103  & This is where the web site is defined. Any static html files can go here. This directory is the Tomcat root directory.\\
    9551104gsdl3/web/WEB-INF
    956   & The web.xml file lives here (configuration information for tomcat)\\
     1105  & The web.xml file lives here (servlet configuration information for tomcat)\\
    9571106gsdl3/web/WEB-INF/classes
    9581107  & Servlet classes go in here\\
    959 \hline
    960 gsdl3/sites
     1108gsdl3/web/sites
    9611109  & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (eg soap) to other sites\\
    962 gsdl3/sites/localsite
    963   & One site\\
    964 gsdl3/sites/localsite/collect
     1110gsdl3/web/sites/localsite
     1111  & One site - the site configuration file lives here\\
     1112gsdl3/web/sites/localsite/collect
    9651113  & The collections directory \\
    966 gsdl3/sites/localsite/images
     1114gsdl3/web/sites/localsite/images
    9671115  & Site specific images \\
    968 gsdl3/sites/localsite/transforms
     1116gsdl3/web/sites/localsite/transforms
    9691117  & Site specific transforms \\
    970 gsdl3/interfaces
    971   & Contains all interface specific stuff (eg images and XSLT transforms\\
    972 gsdl3/interfaces/default
     1118gsdl3/web/interfaces
     1119  & Contains directories for different interfaces - an interface is defined by its images and xslt files \\
     1120gsdl3/web/interfaces/default
    9731121  & The default interface\\
    974 gsdl3/interfaces/default/images
    975   & The images\\
    976 gsdl3/interfaces/default/transforms
    977   & The XSLT files\\
     1122gsdl3/web/interfaces/default/images
     1123  & The images for the default interface\\
     1124gsdl3/web/interfaces/default/transforms
     1125  & The XSLT files for the default interface\\
    9781126\hline
    9791127\end{tabular}}
    980 \label{tab:dirs}
    981 \caption{The Greenstone directory structure}
    9821128\end{table}
    9831129
    9841130\subsection{Installation guide}
    9851131
    986 \newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}}
    987 
    988 Cuurently, greenstone3 is only available through CVS. The installation procedure has  been automated.
     1132\newcommand{\gsdlhome}{\$GSDL3HOME}
     1133\newcommand{\gshome}{\$GSDLHOME}
     1134
     1135Cuurently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}.
    9891136
    9901137\subsubsection{Get the source}
    9911138
    992 \noindent If you have a greenstone\_cvs account, you can use the following:
    993 
    994 \begin{footnotesize}\begin{tt}
    995 \noindent export CVSROOT=:ext:{\em your-username}@cvs.scms.waikato.ac.nz:\\
    996 \indent /usr/local/global-cvs/gsdl-src\\
    997 export CVS\_RSH=ssh\\
    998 cvs co gsdl3\\
    999 \end{tt}\end{footnotesize}
    1000 
    1001 \noindent Otherwise, you can get it through anonymous access:
    1002 
    1003 \begin{footnotesize}\begin{tt}
    1004 \noindent export CVSROOT=:pserver:cvs\_anon@cvs.scms.waikato.ac.nz:2402\\
    1005 \indent /usr/local/global-cvs/gsdl-src\\
    1006 export CVS\_RSH=ssh\\
    1007 cvs co gsdl3\\
    1008 \end{tt}\end{footnotesize}
    1009 
    1010 \noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}.
     1139If you have a greenstone\_cvs account, you can use the following:
     1140
     1141\begin{quote}\begin{gsc}\begin{verbatim}
     1142export CVS_RSH=ssh
     1143cvs -d :ext:@cvs.scms.waikato.ac.nz:/usr/local/global-cvs/
     1144                   gsdl-src co gsdl3
     1145\end{verbatim}\end{gsc}\end{quote}
     1146
     1147Otherwise, you can get it through anonymous access:
     1148
     1149\begin{quote}\begin{gsc}\begin{verbatim}
     1150cvs -d :pserver:cvs\_anon@cvs.scms.waikato.ac.nz:2402/usr/local/
     1151           global-cvs/gsdl-src co gsdl3
     1152\end{verbatim}\end{gsc}\end{quote}
     1153
     1154If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some versions of CVS have trouble accessing this repository. We are using version 1.11.1p1.
    10111155
    10121156\subsubsection{Compile and install greenstone}\label{subsec:compile}
    10131157
    1014 An install.sh script has been constructed (thanks, Stuart) to compile and install greenstone 3. What you nee to do is:
    1015 
    1016 \begin{footnotesize}\begin{tt}
    1017 cd gsdl3
    1018 source setup.bash
    1019 install.bash
    1020 source setup.bash
    1021 \end{tt}\end{footnotesize}
    1022 
    1023 If you want to do greenstone2 compatible building (currently the only type) you need to have greenstone 2 installed, 'source setup.bash' in the top level greenstone 2 directory, then re-'source setup.bash' for greenstone 3. This is to set GSDLHOME for tomcat.
    1024 
    1025 \noindent Note: 'source setup.bash' needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.
     1158An install.sh script has been constructed to compile and install Greenstone3. What you need to do is:
     1159
     1160\begin{quote}\begin{gsc}
     1161cd gsdl3\\
     1162source setup.bash\\
     1163install.bash\\
     1164source setup.bash\\
     1165\end{gsc}\end{quote}
     1166
     1167If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for tomcat.
     1168
     1169\noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.
    10261170
    10271171If you want to use SOAP to talk to remote sites, you also need to do the following:
    10281172
    1029 \begin{footnotesize}\begin{tt}
     1173\begin{quote}\begin{gsc}
    10301174install-soap.bash 
    1031 \end{tt}\end{footnotesize}
    1032 
    1033 Thats it.
    1034 
    1035 You dont want to run install.bash twice - it adds stuff into files
    1036 
    1037 To update your installation, you can run update.bash - this remakes all the java stuff.
     1175\end{gsc}\end{quote}
     1176
     1177There is one java command that sometimes doesn't work under bash, so you may need to cut and paste it into the terminal to get it to work. See the output from the bash-script for details.
     1178
     1179To shutdown or startup tomcat, the commands are:
     1180\begin{quote}\begin{gsc}
     1181\gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\
     1182\gsdlhome/comms/tomcat/jakarta/bin/startup.sh\\
     1183\end{gsc}\end{quote}
     1184
     1185You dont want to run install.bash twice - it adds stuff into files.
     1186To update your installation, you can run update.bash - this updates your code form cvs, and remakes all the java stuff.
    10381187
    10391188
    10401189\subsubsection{The sample sites}
    10411190
    1042 \noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name,  site-wide services if any, and a list of remote sites to connect to.
    1043 localsite does not connect to any other sites. site1 specifies a SOAP connection to localsite.
    1044 
    1045 \noindent The collections which do not have data can be looked at but you cant do any queries on them.
    1046 
     1191\noindent There are two greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
     1192localsite does not connect to any other sites. soapsite specifies a SOAP connection to localsite.
    10471193
    10481194\subsubsection{Tomcat}
    10491195
    10501196\noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
    1051 \\
    1052 \\
    1053 \noindent The file \begin{footnotesize}{\tt \gsdlhome/web/WEB-INF/web.xml}\end{footnotesize} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
    1054 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, ``library'', which serves localsite, and ``library1'' which serves site1.
    1055 \\
    1056 \\
    1057 \noindent One initialisation parameter for the library servlets is {\footnotesize \verb#gsdl3home#}.
    1058 \begin{footnotesize}\begin{verbatim}
    1059 <init-param>
    1060   <param-name>gsdl3home</param-name>
    1061   <param-value>/research/kjdon/home/gsdl3</param-value>
    1062 </init-param>
    1063 \end{verbatim}\end{footnotesize}
    1064 
    1065 The file \gsdlhome/comms/tomcat/jakarta/conf/server.xml is the tomcat configuration file. setup.bash adds a context for gsdl servlets - this tells tomcat where to find the web.xml file, and what url (eg /gsdl3) to give it.
    1066 
    1067 \noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file
     1197
     1198The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
     1199There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite.
     1200
     1201The initialisation parameters used by the library servlets are as follows:
     1202
     1203\begin{tabular}{lll}
     1204\bf name & \bf sample value & \bf description \\
     1205\hline
     1206gsdl3home & /research/kjdon/gsdl3 & the base directory of the gsdl3 installation \\
     1207sitename & localsite & the site to use \\
     1208interfacename & default & the interface to use\\
     1209libraryname & library & the name of the library program \\
     1210defaultlang & en & the default language for the interface\\
     1211receptionist & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\
     1212messagerouter & NewMessageRouter & (optional) specifies an alternative MessageRouter to use\\
     1213\hline
     1214\end{tabular}
     1215
     1216It is possible to run several servlets at once, with different combinations of sites and/or interfaces.
     1217
     1218The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the tomcat configuration file. The installation process adds a context for greenstone3 servlets (\gst{\gsdlhome/web})---this tells tomcat where to find the web.xml file, and what url (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
     1219
     1220
     1221Tomcat runs by default on port 8080---this can be changed in server.xml. The siteConfig files also need changing if Tomcat's port is changed: \gst{<httpAddress>} for the site, and \gst{<address>} for a remote site both use this.
     1222
    10681223
    10691224\subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}
     
    10711226\noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\  to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
    10721227
    1073 \begin{footnotesize}\begin{tt}
    1074 \noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/bin\\
     1228\begin{gsc}\begin{tt}
     1229\noindent cd \gsdlhome/comms/tomcat/jakarta/bin\\
    10751230./startup.sh
    1076 \end{tt}\end{footnotesize}
     1231\end{tt}\end{gsc}
    10771232
    10781233\noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)
    10791234\\
    10801235\\
    1081 \noindent The tomcat server can be accessed on the web at {\footnotesize \verb#http://localhost:8080#}---this gets you to a welcome page.
    1082 The greenstone stuff is at {\footnotesize \verb#http://localhost:8080/gsdl3#}---this displays {\footnotesize \gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
     1236\noindent The tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page.
     1237The greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
    10831238
    10841239\noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
    10851240\begin{bulletedlist}
    1086 \begin{footnotesize}\begin{tt}
     1241\begin{gsc}
    10871242\item \gsdlhome/web/WEB-INF/web.xml
    10881243\item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml
    1089 \end{tt}\end{footnotesize}
     1244\end{gsc}
    10901245\item any classes or jar files used by the servlets
    10911246\end{bulletedlist}
    10921247\noindent Note: stdin and stdout for the servlets both go to\\
    1093 \begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/logs/catalina.out}\end{footnotesize}
     1248\gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out}
     1249
     1250On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting tomcat.
    10941251
    10951252\subsubsection{Using SOAP to talk to a remote site}
    10961253
    1097 \noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. site1 specifies a SOAP connection to localsite. If you run site1 without connecting to localsite, you can only see the local  collections, eg the dummy collection myfiles. However, if you connect to localsite, you can see all of {\em its} collections as well.
     1254\noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. soapsite specifies a SOAP connection to localsite. If you run soapsite without connecting to localsite, you don't get any collections. However, if you connect to localsite, you can see all of {\em its} collections.
    10981255\\
    10991256\\
    1100 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service.
    1101 
    1102 this is done by install-soap.bash.
     1257\noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service.
     1258This is done by install-soap.bash.
    11031259You can also deploy a service through the website.  If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
    11041260
    1105 \noindent The SOAP servlet can be accessed at \begin{footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
     1261\noindent The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
    11061262
    11071263\noindent To deploy the SOAPServer for localsite:
     
    11241280\subsubsection{Debugging SOAP}
    11251281
    1126 \noindent If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, there is a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
    1127 
    1128 \noindent To run it:
    1129 
    1130 \noindent {\footnotesize \verb#java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080#}
    1131 
    1132 \noindent tomcat uses port 8080 - you need to modify greenstone to talk to port 8070 instead of 8080. - this is specified in the {\footnotesize \verb#site#} element of the site configuration file.
    1133 \\
    1134 \\
    1135 \noindent eg, in \begin{footnotesize}{\tt \gsdlhome/sites/site1/siteConfig.xml}\end{footnotesize}:
    1136 \begin{footnotesize}\begin{verbatim}
     1282If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, use a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port.
     1283To run it, type:
     1284
     1285\begin{quote}\gst{java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080}
     1286\end{quote}
     1287
     12888070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify Greenstone to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
     1289\begin{quote}\begin{gsc}\begin{verbatim}
    11371290<site name="org.greenstone.localsite"
    11381291      address="http://localhost:8080/soap/servlet/rpcrouter"
    11391292      type="soap"/>
    1140 \end{verbatim}\end{footnotesize}
    1141 
    1142 \noindent You can replace the 8080 with 8070 if you want to run TcpTunnelGui.
    1143 
    1144 \noindent Note that \begin{footnotesize}{\tt http://localhost:8080/soap/servlet/rpcrouter}\end{footnotesize} is the
     1293\end{verbatim}\end{gsc}\end{quote}
     1294
     1295Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the
    11451296address for talking to the tomcat SOAP servlet services.
     1297
     1298\section{Developer's notes}
     1299
     1300Here are some random notes for developers who want to modify the source code.
     1301\subsection{Greenstone utility classes}
     1302
     1303These are found in \gst{gsdl3/src/java/org/greenstone/gsdl3/util} and provide a variety of useful functions. Table~\ref{tab:utils} gives a brief description of the various classes.
     1304
     1305\begin{table}
     1306\caption{The utility classes in org.greenstone.gsdl3.util}
     1307\label{tab:utils}
     1308\center{\footnotesize
     1309\begin{tabular}{lp{3.75in}}
     1310\hline
     1311\bf Utility class & \bf Description\\
     1312ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\
     1313Dictionary & wrapper around a ResourceBundle, providing strings with parameter\\
     1314GSCGI & class to map between short name cgi args and long name request parameters \\
     1315GSFile & class to create all greenstone file paths eg used to locate configuration files, xslt files and collection data. \\
     1316GSHTML & provides convenience methods for dealing with HTML, eg making strings HTML safe\\
     1317GSPath & used to create, examine and modify message address paths\\
     1318GSStatus & some static codes for status messages\\
     1319GSXML & lots of methods for extracting information out of greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by greenstone.\\
     1320GSXSLT & some manipulation functions for greenstone XSLT\\
     1321Misc & miscellaneous functions\\
     1322OID & class to handle greenstone (2) OIDs\\
     1323XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
     1324XMLTransformer & methods to transform XML using XSLT \\
     1325XSLTUtil & contains static methods to be called from within XSLT \\
     1326\hline
     1327\end{tabular}
     1328}
     1329\end{table}
     1330
     1331\subsection{Creating new services}
     1332
     1333a browse type service must also implement servicenameMetadataRetrieve service.
     1334\subsection{Working with XML}
     1335
     1336We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but dont append them to something already in the document tree, they will be separate - but they still know who their owner document is.
     1337
     1338To create new Documents, and convert Strings or Files to Documents, use XMLConverter.
     1339eg:
     1340\begin{quote}\begin{gsc}
     1341XMLConverter converter = new XMLConverter();\\
     1342Document doc = converter.newDOM();\\
     1343
     1344File stylesheet = new File(``query.xsl'');\\
     1345Document style = converter.getDOM(stylesheet);\\
     1346
     1347String message = ``<message><request type='cgi'/></message>'';\\
     1348Document m = converter.getDOM(message);\\
     1349\end{gsc}\end{quote}
     1350
     1351To output a document as a String, use \gst{converter.getString(doc);}
     1352
     1353To add nodes and stuff to an empty document - create them, then append to the tree:
     1354\begin{quote}\begin{gsc}
     1355Document doc = converter.newDOM();\\
     1356Element e = doc.createElement(``message'');\\
     1357doc.appendChild(e);\\
     1358\end{gsc}\end{quote}
     1359
     1360Note that you can only append one node to a document---this will become the toplevel node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node.
     1361
     1362Nodes can only be created by a Document. Document has creation methods for all types of Nodes, for example \gst{createElement(element\_name)}, \gst{createAttribute(attr\_name)},  \gst{createTextNode(text\_data)} etc.
     1363
     1364DOM006 Hierarchy request error: happens if you have more than one root node in your document
     1365
     1366\subsection{Greenstone XML}
     1367
     1368Greenstone format namespace: (at the moment)
     1369xmlns:gsf="http://www.greenstone.org/configformat"
     1370
     1371
     1372no DTDs or Schema defined yet. Until there are, try and keep to teh following rules:
     1373
     1374\begin{bulletedlist}
     1375
     1376\item always return expected elements even if empty, eg \gst{<paramList/>}.
     1377
     1378\item If you get the whole documetn it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documetns, they are \gst{<documentNode>}s.
     1379
     1380\item insiode a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it.
     1381
     1382\end{bulletedlist}
     1383\subsection{Working with XSLT}
     1384
     1385\begin{bulletedlist}
     1386\item {\em adding html to an xml doc:}
     1387
     1388eg I have a text node with html inside it inside a resource element
     1389to add that to a new XML doc, I use
     1390\gst{<xsl:value-of select='resource'>}
     1391
     1392if the output mode is xml or html, this will escape any special characters
     1393ie $<$ and $>$ etc
     1394
     1395use
     1396\gst{<xsl:value-of disable-output-escaping="yes" select='resource'>}
     1397instead.
     1398
     1399\item {\em including an xml doc into a stylesheet:}
     1400
     1401\gst{<xsl:variable name='import' select='document(``newdoc.xml'')'/>}
     1402
     1403then can use the info:
     1404
     1405\gst{<xsl:value-of select='\$import/element'/>}
     1406
     1407\item {\em selecting an ancestor:}
     1408
     1409 the ancestor axis contains the parent of the context node, and its
     1410 parent and so on. to pick one node among these:
     1411 ancestor::elem-name. I dont know how this works if there are two
     1412 nodes with the same name in the axis.
     1413
     1414\item {\em basic XSLT elements:}
     1415\begin{quote}\begin{footnotesize}\begin{verbatim}
     1416<xsl:template match='xxx' name='yyy'/>
     1417
     1418<xsl:apply-templates select='xxx'/>
     1419<xsl:call-templates name='yyy'/>
     1420
     1421<xsl:variable name='doc' select='document("layout.xml")'/>
     1422
     1423<xsl:value-of select='$doc/chapter1'/> $
     1424\end{verbatim}\end{footnotesize}\end{quote}
     1425
     1426\item {\em using namespaces:}
     1427If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names dont match. This includes http:// on the front.
     1428
     1429\item I dont think \gst{<xsl:with-param name='xxx' select='true'/>} is
     1430the same as \gst{<xsl:with-param name='xxx'>true</xsl:with-param>}.
     1431Use the second one.
     1432
     1433\item to select a node from a list based on an attribute value: for example
     1434\begin{quote}\begin{footnotesize}\begin{verbatim}
     1435<xsl:variable name='name'>CL1</xsl:variable>
     1436
     1437<xsl:value-of select="classifier[@name=\$name]/@content"/>
     1438\end{verbatim}\end{footnotesize}\end{quote}
     1439
     1440
     1441\end{bulletedlist}
     1442\subsubsection{What can I do to speed up XSL transformations?}
     1443
     1444This information taken from the Xalan FAQS page.
     1445
     1446\begin{bulletedlist}
     1447
     1448\item Use a Templates object (with a different Transformers for each
     1449transformation) to perform multiple transformations with the same set
     1450of stylesheet instructions.
     1451
     1452\item Set up your stylesheets to function efficiently.
     1453
     1454\item Don't use "//" (descendant axes) patterns near the root of a
     1455large document.
     1456
     1457\item Use xsl:key elements and the key() function as an efficient way
     1458to retrieve node sets.
     1459
     1460\item Where possible, use pattern matching rather than xsl:if or
     1461xsl:when statements.
     1462
     1463\item xsl:for-each is fast because it does not require pattern matching.
     1464
     1465\item Keep in mind that xsl:sort prevents incremental processing.
     1466
     1467\item When you create variables,\\
     1468\gst{<xsl:variable name="fooElem" select="foo"/>} is usually faster
     1469than \\
     1470\gst{<xsl:variable name="fooElem"><xsl:value-of-select="foo"/></xsl:variable>}.
     1471
     1472\item Be careful using the last() function.
     1473
     1474\item The use of index predicates within match patterns can be expensive.
     1475
     1476\item Decoding and encoding is expensive.
     1477
     1478\item For the ultimate in server-side scalability, perform transform
     1479operations on the client.
     1480
     1481\end{bulletedlist}
     1482
     1483\subsection{Java gdbm}
     1484
     1485To talk to gdbm, a jni wrapper called java-gdbm is used. It was
     1486obtained from:\\ \gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/}
     1487
     1488It uses packing objects to convert to and from an array of bytes (in
     1489gdbm file) from and to java objects. In my GDBMWrapper class I use
     1490StringPacking - uses UTF-8 encoding. but some stuff came out funny. so
     1491I had to changes the from\_bytes method in StringPacking.java to use
     1492new String(raw, "UTF-8") instead of new String(raw). this seems to
     1493work.
     1494
     1495Note---if we use this gdbm stuff to create the file too, may need to
     1496alter the to-bytes method.
     1497
     1498The makefile in j-gdbm is crap---it tries to get stuff from its
     1499original CVS tree.  I have created a new Makefile---in my-j-gdbm
     1500directory.  this stuff needs to go into cvs probably.
     1501
     1502
     1503
     1504\subsection{Resources}
     1505
     1506This is a list of some useful resources that we have come across during development of gsdl3.
     1507
     1508Contents for 'The Java Native Interface Programmer's Guide and
     1509Specification' on-line\\
     1510\gst{http://java.sun.com/docs/books/jni/html/jniTOC.html}
     1511
     1512Java Native Interface Specification\\
     1513\gst{http://java.sun.com/j2se/1.4/docs/guide/jni/spec/jniTOC.doc.html}
     1514
     1515JNI Documentation Contents\\
     1516\gst{http://java.sun.com/j2se/1.4/docs/guide/jni/index.html}
     1517
     1518another JNI page\\
     1519\gst{http://mindprod.com/jni.html}
     1520
     1521Java 1.4 api index\\
     1522\gst{http://java.sun.com/j2se/1.4/docs/api/index.html}
     1523
     1524Java tutorial index\\
     1525\gst{http://java.sun.com/docs/books/tutorial/index.html}
     1526
     1527Safari books online - has java, XML, XSLT, etc books\\
     1528\gst{http://proquest.safaribooksonline.com/mainhom.asp?home}
     1529
     1530Java 1.4 i18n FAQ\\
     1531\gst{http://www.sun.com/developers/gadc/faq/java/java1.4.html}
     1532
     1533Java and XSLT page\\
     1534\gst{http://www.javaolympus.com/java/Java\%20and\%20XSLT.html}
     1535
     1536Xalan-Java overview\\
     1537\gst{http://xml.apache.org/xalan-j/overview.html}
     1538
     1539Tomcat documentation index\\
     1540\gst{http://jakarta.apache.org/tomcat/tomcat-4.0-doc/index.html}
     1541
     1542Servlet and JSP tutorial\\
     1543\gst{http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/}
     1544
     1545Core Servlets and JavaServer Pages, book by Marty Hall. download the
     1546pdf from here (try before you buy link)\\
     1547\gst{http://www.coreservlets.com/}
     1548
     1549J-gdbm page\\
     1550\gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/}
     1551
     1552Stuarts page of links\\
     1553\gst{http://www.cs.waikato.ac.nz/~nzdl/gsdl3/}
     1554
     1555a good basic xslt tutorial\\
     1556\gst{http://www.zvon.org/xxl/XSLTutorial/Books/Output/contents.html}
     1557
     1558JAXP (java api for xml processing) package overview\\
     1559\gst{http://java.sun.com/xml/jaxp/dist/1.1/docs/api/overview-summary.html}
     1560
     1561DeveloperWorks, xml zone\\
     1562\gst{http://www-106.ibm.com/developerworks/xml/}
     1563
     1564xslt.com\\
     1565\gst{http://www.xslt.com/}
     1566
     1567jeni tennison's xslt pages\\
     1568\gst{http://www.jenitennison.com/xslt/}
     1569
     1570apaches xml tools\\
     1571\gst{http://xml.apache.org/}
    11461572
    11471573
Note: See TracChangeset for help on using the changeset viewer.