Changeset 4892


Ignore:
Timestamp:
2003-07-10T16:44:56+12:00 (21 years ago)
Author:
kjdon
Message:

updated some more, but stil not finished going through it

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r4236 r4892  
    77{\end{tt}\end{footnotesize}}
    88 
    9 \newcommand{\gst}[1]{{\footnotesize \tt #1} }
     9\newcommand{\gst}[1]{{\footnotesize \tt #1}}
    1010\begin{document}
    1111
     
    7575{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
    7676
    77 {\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to Actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.
    78 
    79 {\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in  more detail in Section~\ref{sec:pagegen}.
     77{\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to teh appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT for example. There is a hierarchy of different REceptionist types, which is described in Section~\ref{sec:recepts}.
     78
     79{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in  more detail in Section~\ref{sec:pagegen}.
    8080
    8181
    8282\section{Configuration}\label{sec:config}
    8383
    84 Initial Greenstone3 system configuration  is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for
    85 the site, \gst{siteConfig.xml}.  Each collection has two configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, that give metadata and other information for the
    86 collection.\footnote{\gst{siteConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
    87 Greenstone2.}  The first includes user-defined metadata for the collection,
    88 such as its name and the {\em About this collection} text; and also gives
     84Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a config file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Each collection has two configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, that give metadata, display and other information for the
     85collection.\footnote{\gst{siteConfig.xml} and \gst{interfaceConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
     86Greenstone2.}  The first includes user-defined presentation metadata for the collection,
     87such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives
    8988instructions on how the collection is to be built.  The second is produced by
    9089the build-time process and includes any metadata that can be determined
     
    102101The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat gsdl3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
    103102 
    104 The first example in Figure~\ref{fig:siteconfig} shows a site configuration file for a rudimentary site with no site-wide services,
     103Figure~\ref{fig:siteconfig} shows two example site configuration files. The first example is for a rudimentary site with no site-wide services,
    105104which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster.  It also connects to the first site using SOAP.
    106 These two sites are running on the same machine. For site gsdl1 to talk to site localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is \gst{http://localhost:8090/soap/servlet/rpcrouter}.
     105These two sites are running on the same machine. For site \gst{gsdl1} to talk to site \gst{localsite}, a SOAP server must be run for \gst{localsite}. The address of the SOAP server, in this case, is \gst{http://localhost:8090/soap/servlet/rpcrouter}.
    107106
    108107
     
    141140</siteConfig>
    142141\end{verbatim}\end{gsc}
    143 \caption{Two sample site config files}
     142\caption{Two sample site configuration files}
    144143\label{fig:siteconfig}
    145144\end{figure}
    146145
    147 
     146\subsection{Interface configuration file}\label{sec:interfaceconfig}
     147
     148The interface config file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (but other ones can be loaded dynamically). If the interface uses servlets, it specifies what short name each action should use for the action cgi parameter eg QueryAction should use a=q. If the interface uses xslt, it specifies what xslt file should be used for each action and subaction.
     149
     150\begin{figure}
     151\begin{gsc}\begin{verbatim}
     152<interfaceConfig>
     153  <actionList>
     154    <action name='p' class='PageAction'>
     155      <subaction name='home' xslt='home.xsl'/>
     156      <subaction name='about' xslt='about.xsl'/>
     157    </action>
     158    <action name='q' class='QueryAction' xslt='basicquery.xsl'/>
     159    <action name='b' class='BrowseAction' xslt='classifier.xsl'/>
     160    <action name='a' class='AppletAction' xslt='applet.xsl'/>
     161    <action name='d' class='DocumentAction' xslt='document.xsl'/>
     162    <action name='pr' class='ProcessAction' xslt='process.xsl'/>
     163    <action name='s' class='SystemAction' xslt='system.xsl'/>
     164  </actionList>
     165</interfaceConfig>
     166\end{verbatim}\end{gsc}
     167\caption{A sample interface config file}
     168\label{fig:ifaceconfig}
     169\end{figure}
     170
     171This makes it easy for developers to implement and use different actions and/or xslt files without recompilation. The server must be restarted, however.
    148172
    149173\subsection{Collection configuration file}\label{sec:collconfig}
    150174
    151 The collection configuration file is where the collection designer (eg a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig}
    152 here is an example as it is at present.
     175The collection configuration file is where the collection designer (eg a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far. (Since collection building at this stage is still done using Greenstone2 perl scripts and the old \gst{collect.cfg} file, we have only defined the format for the parts of \gst{collectionConfig.xml} that are used by the runtime-system.)
     176
    153177
    154178\begin{figure}
     
    210234\caption{Sample collectionConfig.xml file}
    211235\label{fig:collconfig}
     236***** REDO *****
    212237\end{figure}
    213238
     239****REDO****
    214240The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8.
    215241The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list should look like.
    216242The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in the hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.
    217243
    218 There is also a need for a description of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.
     244The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading}, \gst{DocumentContent}, and other information that could be specified (in a yet to be decided format) are things such as  whether or not to display the cover image, table of contents etc.
    219245
    220246\subsection{Building configuration file}\label{sec:buildconfig}
     
    226252collection.  The serviceRack names are Java classes that are loaded
    227253dynamically at runtime. Any information inside the serviceRack element is
    228 specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve,  GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration.
     254specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve,  GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. The collectionConfig.xml file is also passed ot the ServiceRack objects at configure time---the \gst{format} and \gst{displayItem} information is used directly from the \gst{collectionConfig.xml} file rather than added into \gst{buildConfig.xml} during building. This enables changes in \gst{collectionConfig.xml} to take effect in the collection without rebuilding being necessary.
    229255
    230256
     
    291317
    292318The \gst{init()} method creates a new Receptionist and a new
    293 MessageRouter. By default, the base Receptionist and MessageRouter classes are used, but subclasses can be used if they are specified in the servlet init params (see Section~\ref{sec:tomcat}). The appropriate system variables are set in each (interface
    294 name, site name, etc.) and then \gst{configure()} is called. The MessageRouter
     319MessageRouter. Default classes (DefaultReceptionist, MessageRouter) are used unless subclasses have been specified  in the servlet initiation parameters (see Section~\ref{sec:tomcat}). The appropriate system variables are set for each object (interface
     320name, site name, etc.) and then \gst{configure()} is called on both. The MessageRouter
    295321is passed to the Receptionist. The servlet then communicates only with
    296322the Receptionist, not with the MessageRouter.
    297323
    298 The Receptionist loads up all the different Action classes. A
    299 static list is used initially, and other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
    300 
    301 The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. This
    302 lists the ServiceRack and ServiceCluster classes that need to be loaded and  any sites that need
    303 to be connected to. 
    304 It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
     324The Receptionist reads in the \gst{interfaceConfig.xml} file, and loads up all the different Action classes. Other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too.
     325If the Receptionist is a Transforming receptionist, a mapping between shortnames  and xslt files is also created.
     326
     327The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
    305328Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
    306329ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
    307 For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If the server for the remote site is up and running, this should  be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure sites commands must be sent (see next section).
     330For each site specified, the MessageRouter creates an appropriate type Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should  be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure sites commands must be sent (see next section).
    308331
    309332The MessageRouter also looks inside the site's \gst{collect} directory, and  loads up a Collection object for each valid collection found.
     
    311334The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
    312335files, determines the metadata, and loads ServiceRack classes based on the
    313 names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.
     336names specified in \gst{buildConfig.xml\/}. The \gst{<serviceRack>} XML element is passed to the object to be used in configuration. The \gst{collectionConfig.xml} contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.
    314337Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList xml.
    315338
     
    318341The startup configuration reads in the various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided. Currently, these can only be accessed by typing in cgi-arguments into the URL, there is no nice web form yet to do this. SystemAction converts these arguments into system requests, which are described in Section~\ref{sec:system}.
    319342
    320 The cgi arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (a is action, sa is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{c=xxx}, where \gst{xxx} is the name of the collection or cluster.
    321 
     343The cgi arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the arguments in abit more detail.
     344
     345\begin{table}
     346\caption{Example run-time configuration arguments.}
     347\label{tab:run-time config}
    322348\begin{tabular}{lp{8cm}}
    323 a=s\&sa=c & reconfigures the whole site, reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument ss (system subset). The valid values are collectionList, siteList, serviceList, clusterList. \\
    324 a=s\&sa=c\&c=demo & reconfigures a collection or cluster. ss can also be used here, valid values are metadataList and serviceList. \\
    325 a=s\&sa=a & activate a specific module. Modules are specified using two arguments, st (system module type) and sn (system module name). Valid types are collection, cluster site.\\
    326 a=s\&sa=d & deactivate a module. st and sn can be used here too. Valid types are collection, cluster, site, service. \\
    327 a=s\&sa=d\&c=demo & deactivate a module belonging to a collection or cluster. Valid types are service. \\
     349\gst{a=s\&sa=c} & reconfigures the whole site, reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument \gst{ss} (system subset). The valid values are \gst{collectionList}, \gst{siteList}, \gst{serviceList}, \gst{clusterList}. \\
     350\gst{a=s\&sa=c\&sc=XXX} & reconfigures the XXX collection or cluster. \gst{ss} can also be used here, valid values are \gst{metadataList} and \gst{serviceList}. \\
     351\gst{a=s\&sa=a} & activate a specific module. Modules are specified using two arguments, \gst{st} (system module type) and \gst{sn} (system module name). Valid types are \gst{collection}, \gst{cluster} \gst{site}.\\
     352\gst{a=s\&sa=d} & deactivate a module. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{collection}, \gst{cluster}, \gst{site}, \gst{service}. \\
     353\gst{a=s\&sa=d\&sc=XXX} & deactivate a module belonging to the XXX collection or cluster. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{service}. \\
    328354\end{tabular}
    329 
     355\end{table}
    330356
    331357\section{System messages}\label{sec:messages}
     
    335361process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
    336362
    337 There are two different styles of messaging.  The first style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.Each individual  communication is contained in a \gst{<message>} element\footnote{all sample requests and responses shown here  are assumed to have \gst{<message>} elements}.
    338 They contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests/responses. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='...'}.
    339 The language attribute is used by the XSLT to determine the language currently
    340 being used by the user interface.  Virtually all messages contain text strings,
    341 and services use this attribute to return strings in the appropriate language. Element and attribute names are formated in lower case with the first letter of internal words capitalized, like 'matchDocs'. Each request typically specifies one service or one action, and the response contains either the data requested, or a status message.
    342 Lists must only have the same elements in them.(put this here??)
    343 
    344 Requests have
    345 a \gst{to} attribute and responses have \gst{from}.  These are addresses used
     363There are two different styles of messaging.  The first style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML. Each individual  communication is contained in a \gst{<message>} element\footnote{all sample requests and responses shown are assumed to have \gst{<message>} elements}.
     364They contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests/responses. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='...'}. Virtually all responses contain text strings, and this attribute specifies the preferred language for these strings.  Element and attribute names are formated in lower case with the first letter of internal words capitalized, like 'matchDocs'. Each request typically specifies one service or one action, and the response contains either the data requested, or a status message.
     365
     366
     367Requests have a \gst{to} attribute and responses have \gst{from}.  These are addresses used
    346368by routing modules.  For example \gst{to='site1/demo/TextQuery'} routes a
    347 message to modules named site1, demo then TextQuery. These modules happen to be a MessageRouter for a remote site (site1), a Collection (demo), and a Service (TextQuery).
    348 
    349 There are several types of request: 'describe', 'system', 'process', 'status', 'format'. These requests can ask for any functionality available in the system.
    350 The second messaging style is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has a request type of 'cgi'. It has the same format as any other request in the system.  The response, however, is a page of data, typically in HTML.
    351 
    352 These cgi-type messages come into the Receptionist and are passed to the appropriate action. The actions generate appropriate internal messages which are sent to the MessageRouter. The responses are put together into a single piece of XML and transformed, using XSLT, into a 'page' of HTML.
    353 
    354 \subsection{cgi-type messages}\label{sec:cgi}
    355 
    356 These are the special 'external'-style messages. Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
    357 Greenstone URL.  The two main arguments are \gst{a} (action) and \gst{sa}
     369message to modules named \gst{site1}, \gst{demo} then \gst{TextQuery}. These modules happen to be a MessageRouter for a remote site (\gst{site1}), a Collection (\gst{demo}), and a Service (\gst{TextQuery}).
     370
     371There are several types of request, specified by the \gst{type} attribute: \gst{describe}, \gst{system}, \gst{process}, \gst{status}, \gst{format}. These requests can ask for any functionality available in the system. They are described in more detail in Sections~\ref{sec:describe}, \ref{sec:system}, \ref{sec:process}, \ref{sec:status}, and \ref{sec:format}, respectively.
     372
     373The second messaging style is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has a request type of 'page', as it is a request for a page of data. It has the same format as any other request in the system.  The response, however, does not follow the same format as other responses, and may given in different formats, such as XML, HTML etc.
     374
     375These page-type messages come into the Receptionist and are passed to the appropriate action. The actions generate appropriate internal messages which are sent to the MessageRouter. The responses are put together into a single page of XML. This may be returned as XML, or transformed into some other form, eg HTML using XSLT. This type of message is described in Section~\ref{sec:page}.
     376
     377\subsection{page-type messages}\label{sec:page}
     378
     379These are the special 'external'-style messages. Requests originate from outside Greenstone, for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifiying what type of page is required. If the external context is a servlet, the arguments represent the 'cgi' arguments in a Greenstone URL.  The two main arguments are \gst{a} (action) and \gst{sa}
    358380(subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for
    359381the page action, and is new for other actions.  For example, a text query could
    360 be encoded as \gst{a=q \& sa=text\/}.}  All other arguments are treated as
     382be encoded as \gst{a=q \& sa=text\/}.}  All other arguments are encoded as
    361383parameters.
    362384
    363 Here is the XML representation of the arguments:
    364 
    365 \begin{quote}\begin{gsc}\begin{verbatim}
    366 <request type='cgi' action='a-arg-value' subaction='sa-arg-value'
    367          lang='en' output='html'>
     385Here is some examples of  requests\footnote{In a servlet context, these correspond to the URLs \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
     386
     387\begin{quote}\begin{gsc}\begin{verbatim}
     388<request type='page' action='p' subaction='about'
     389         lang='fr' output='html'>
    368390  <paramList>
    369     <param name='xx' value='yyy'/>
    370     <param name=...
     391    <param name='c' value='demo'/>
    371392  </paramList>
    372393</request>
    373394\end{verbatim}\end{gsc}\end{quote}
    374 The receptionist routes the message to the appropriate action.  The output
    375 field is used to indicate what type of output to return. The actions do not
    376 return responses in the normal format; instead they return a page of
    377 information, expressed by default in HTML. Alternative formats could be XML or WML. The basic structure of the XML data (before transformation to HTML or other) is described in Section~\ref{sec:pagegen}. What the HTML looks like depends on the XSLT used to transform the data, and will not be shown here.
     395
     396\begin{quote}\begin{gsc}\begin{verbatim}
     397<request  lang='en' type='page' action='q'  output='html'>
     398  <paramList>
     399    <param name='s' value='TextQuery'/>
     400    <param name='c' value='demo'/>
     401    <param name='rt' value='r'/>
     402    <!-- the rest are the service specific params -->
     403    <param name='ca' value='0'/> <!-- casefold -->
     404    <param name='st' value='1'/> <!-- stem -->
     405    <param name='m' value='10'/> <!-- maxdocs -->
     406    <param name='q' value='snail'/> <!-- query string -->
     407  </paramList>
     408</request>
     409\end{verbatim}\end{gsc}\end{quote}
     410
     411The Receptionist routes the message to the appropriate Action (determined by looking up its shortname$->$Action object map). The actions determine what information is needed from the server and retrieves it, making one or more internal requests to the MessageRouter. This information is gathered together into a single response, and returned to the Receptionist. The Receptionist may process the result further, depending on what type of Receptionist is it, and returns the page to the external entity. Section~\ref{sec:pagegen} describes the different types of Receptionist, and details the structure of the 'pages' they produce.
    378412
    379413The LibraryServlet class communicates with the Receptionist, which is the entry
    380414point into the system.  Future GUIs could communicate either with the
    381 Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. However, the Receptionist will pass other types of request directly to the MessageRouter. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
    382 
    383 The cgi arguments used currently are shown in Table~\ref{tab:args}.
    384 Other arguments can be specified by  particular actions. For example, when the query action receives a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
     415Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they may use the either page type requests, asking for predefined pages of information, or they can use any of the other internal type requests--- these requests will be passed directly to the MessageRouter. If they communicate with the MessageRouter directly, they must use the internal message format described in the next sections---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
     416
     417The main  arguments/parameters used currently are shown in Table~\ref{tab:args}.
     418Other arguments can be specified by  particular actions. These include any parameters needed to access services.  For example, the TextQuery service has a set of parameters including stem and case etc, that are only used by the query action.
    385419
    386420\begin{table}
     
    411445\end{table}
    412446
    413 Here is an example request that retrieves the home page in French:
    414 \begin{quote}\begin{gsc}\begin{verbatim}
    415 a=p&sa=home&l=fr
    416 
    417 <request lang='fr' type='cgi' action='p' subaction='home'
    418     output='html'/>
    419 \end{verbatim}\end{gsc}\end{quote}
    420 
    421 This request represents a text query:
    422 \begin{quote}\begin{gsc}\begin{verbatim}
    423 a=q&l=en&s=TextQuery&c=demo&rt=r&ca=0&st=1&m=10&q=snail
    424 
    425 <request  lang='en' type='cgi' action='q'  output='html'>
    426   <paramList>
    427     <param name='s' value='TextQuery'/>
    428     <param name='c' value='demo'/>
    429     <param name='rt' value='r'/>
    430     <!-- the rest are the service specific params -->
    431     <param name='ca' value='0'/> <!-- casefold -->
    432     <param name='st' value='1'/> <!-- stem -->
    433     <param name='m' value='10'/> <!-- maxdocs -->
    434     <param name='q' value='snail'/> <!-- query string -->
    435   </paramList>
    436 </request>
    437 \end{verbatim}\end{gsc}\end{quote}
    438 
    439 These cgi requests get passed to the appropriate action, which determines what data is required for the page, and what internal requests to send off. The page generation process for the different actions is described in Section~\ref{sec:pagegen}.
     447
    440448\subsection{'describe'-type messages}\label{sec:describe}
    441 This is the first of the internal messages.
    442 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The info is predefined apart from any language specific text strings, which are put together as each request comes in.
     449**** REDO (the responses may now contain display information which is not shown here) ****
     450This is the first of the standard internal messages.
     451The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The response is predefined apart from any language-specific text strings, which are put together as each request comes in.
    443452\begin{quote}\begin{gsc}\begin{verbatim}
    444453<request lang='en' type='describe' to=''/>
     
    528537</param>
    529538\end{verbatim}\end{gsc}\end{quote}
    530 ****describe the various types, what the type means - display purposes- etc.
    531539
    532540If no default is specified, the parameter is assumed to be mandatory.
    533541Here are some examples of parameters:
    534542\begin{quote}\begin{gsc}\begin{verbatim}
    535 <param name='Case' type='boolean' default='0'/>
    536 
    537 <param name='MaxDocs' type='integer' default='50'/>
    538 
    539 <param name='Index' type='enum' default='dtx'>
     543<param name='case' type='boolean' default='0'/>
     544
     545<param name='maxDocs' type='integer' default='50'/>
     546
     547<param name='index' type='enum' default='dtx'>
    540548  <option name='dtx'/>
    541549  <option name='stt'/>
     
    545553<!-- this one is for the text box and field list for the
    546554simple field query-->
    547 <param name='simple' type='multi' occurs='4'>
     555<param name='simpleField' type='multi' occurs='4'>
    548556  <param name='fqv' type='string'/>
    549557  <param name='fqf' type='enum_single'>
     
    553561
    554562\end{verbatim}\end{gsc}\end{quote}
    555 The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in   a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or more items, respectively, can be selected.
     563The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in   a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or many items, respectively, can be selected.
    556564A multi-type parameter indicates that two or more parameters are associated, and should be displayed appropriately. For example, in a field query, the text box and field list should be associated. The occurs attribute specifies how many times the parameter should be displayed on the page.
    557 Parameters also come with display information...
     565Parameters also come with display information: all the text strings needed to present them to teh user. These include the name of the parameter and the display values for any options.
    558566
    559567A service description also contains a display element - this contains all the language dependent text strings - put together on the fly. These strings are name of the service, what to use for the submit button, and text strings for all the parameters: name, what each value is called, etc.
    560 Here is a request, along with a sample response.
    561 
    562 \begin{quote}\begin{gsc}\begin{verbatim}
    563 <request lang='en'  type='describe' to='demo/TextQuery'/>
    564 
    565 <response lang='en' type='describe' from='demo/TextQuery' >
    566   <service name='TextQuery' type='query'>
    567   <paramList>
    568     <param name='matchDocs' type='integer' default='50/>
    569     <param name='case' type='boolean' default='1'/>
    570     <param name='index' type='enum' default='tt'>
    571       <option name='tt'/>
    572       <option name='t0'/>
    573     </param>
    574   </paramList>
    575 </response>
    576 \end{verbatim}\end{gsc}\end{quote}
    577 \begin{figure}
     568
     569Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. Figure~\ref{fig:query-display} gives an example html search form that may be generated from this describe response.
     570
    578571\begin{quote}\begin{gsc}\begin{verbatim}
    579572<request lang="en" to="mgppdemo/FieldQuery" type="describe" />
     
    632625</response>
    633626\end{verbatim}\end{gsc}\end{quote}
    634 \end{figure}
    635627
    636628\begin{figure}[t]
    637629  \centering
    638630  \includegraphics[width=3.5in]{query2.ps}
    639   \caption{Sample query form.}
    640   \label{fig:query}
     631  \caption{The previous query service describe response as displayed on the search page.}
     632  \label{fig:query-display}
    641633\end{figure}
    642634
    643 describe request to an applet type service: returns ...
     635A describe request to an applet type service returns the applet html element: this will be embedded into a web page to run the applet.
    644636\begin{quote}\begin{gsc}\begin{verbatim}
    645637<request type='describe' to='mgppdemo/PhindApplet'/>
     
    669661\end{verbatim}\end{gsc}\end{quote}
    670662
     663Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the receptionist must fill in this parameter before displaying the html.
     664
    671665\subsection{'system'-type messages}\label{sec:system}
    672 ``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
     666
     667``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currenlty they are initiated by particular cgi parameters (see Section~\ref{sec:runtime-config}).
    673668
    674669The basic format of a system request is as follows:
     
    680675\end{verbatim}\end{gsc}\end{quote}
    681676
    682 Each system request is specified in a system element. The following are examples:
     677One or more actual requests are specified in  system elements. The following are examples:
    683678\begin{quote}\begin{gsc}\begin{verbatim}
    684679<system type='configure' subset=''/>
     
    699694\end{verbatim}\end{gsc}\end{quote}
    700695
     696At some stage, an error or status code should be included.
    701697
    702698System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests.
    703699
    704 \subsection{'process'-type messages} ***** TODO ****
    705 
    706 divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse...
    707 show basic structure, then more detailed format for each subtype
    708 
    709 The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet.
    710 
    711   Other possibilities include  transform, enrich, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
    712 
    713 The basic structure of a service request is as follows:
    714 \begin{quote}\begin{gsc}\begin{verbatim}
    715 <message>
    716   <request lang='en'  type='query' to='demo/TextQuery'>
    717     <paramList/>
    718     other elements...
    719   </request>
    720 </message>
    721 \end{verbatim}\end{gsc}\end{quote}
    722 
    723 The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
     700\subsection{'process'-type messages}
     701
     702The main type of requests in the system are for services. There are different types of services, currently: \gst{query}, \gst{browse}, \gst{retrieve}, \gst{process}, \gst{applet}, \gst{enrich}. Query services do some kind of search and return a list of document identifiers. Retrieve services can return the content of those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. Process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet. Enrich services take a document and return the document with some extra markup added.
     703
     704  Other possibilities include transform, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
     705
     706The basic structure of a service 'process' request is as follows:
     707\begin{quote}\begin{gsc}\begin{verbatim}
     708
     709<request lang='en'  type='process' to='demo/TextQuery'>
     710  <paramList/>
     711  other elements...
     712</request>
     713
     714\end{verbatim}\end{gsc}\end{quote}
     715
     716The parameters are name-value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
    724717
    725718\begin{quote}\begin{gsc}\begin{verbatim}
     
    729722\end{verbatim}\end{gsc}\end{quote}
    730723
    731 Some requests have other content---for document retrieval, this would be a list of documents to retrieve. For metadata retrieval, the content is the list of documents, and a list of metadata to retrieve for each document.
    732 
    733 Responses vary depending on the type of request.
     724Some requests have other content---for document retrieval, this would be a list of document identifiers to retrieve. For metadata retrieval, the content is the list of documents to retrieve metadata for.
     725
     726Responses vary depending on the type of request. The following sections look at hte process type requests and responses for each type of service.
    734727
    735728\subsubsection{'query'-type services}
    736 Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
    737 
    738 The following shows some example query requests and their responses.
    739 
    740 Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
    741 \begin{quote}\begin{gsc}\begin{verbatim}
    742 <message>
    743   <request lang='en'  to="mgppdemo/TextQuery" type="process">
    744     <paramList>
    745       <param name="maxDocs" value="10"/>
    746       <param name="queryLevel" value="Section"/>
    747       <param name="stem" value="1"/>
    748       <param name="matchMode" value="some"/>
    749       <param name="sortBy" value="natural"/>
    750       <param name="index" value="t0"/>
    751       <param name="case" value="0"/>
    752       <param name="query" value="snail"/>
    753     </paramList>
    754   </request>
    755 </message>
    756 \end{verbatim}\end{gsc}\end{quote}
    757 
    758 \begin{quote}\begin{gsc}\begin{verbatim}
    759 <message>
    760   <response lang='en' from="mgppdemo/TextQuery" type="query">
    761     <documentList>
    762       <document name="HASH010f073f22033181e206d3b7"/>
    763       <document name="HASH010f073f22033181e206d3b7.2"/>
    764       <document name="HASHac0a04dd14571c60d7fbfd"/>
    765     </documentList>
    766   </response>
    767 </message>
    768 \end{verbatim}\end{gsc}\end{quote}
     729Responses to query requests contain a list of document identifiers, along with some other information, dependent on the query type. For a text query, this includes term frequency information, and some metadata about the result. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
     730
     731The following shows an example query request and its response.
     732
     733Find at most 10 Sections in the mgppdemo collection, containing the word snail (stemmed), returning the results in ranked order:
     734\begin{quote}\begin{gsc}\begin{verbatim}
     735<request lang='en'  to="mgppdemo/TextQuery" type="process">
     736  <paramList>
     737    <param name="maxDocs" value="10"/>
     738    <param name="queryLevel" value="Section"/>
     739    <param name="stem" value="1"/>
     740    <param name="matchMode" value="some"/>
     741    <param name="sortBy" value="1"/>
     742    <param name="index" value="t0"/>
     743    <param name="case" value="0"/>
     744    <param name="query" value="snail"/>
     745  </paramList>
     746</request>
     747
     748<response from="mgppdemo/TextQuery" type="process">
     749  <metadataList>
     750    <metadata name="numDocsMatched" value="59" />
     751  </metadataList>
     752  <documentNodeList>
     753    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
     754    docType='hierarchy' nodeType="leaf" />
     755    <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"
     756    docType='hierarchy' nodeType="leaf" />
     757    <documentNode nodeID="HASH010f073f22033181e206d3b7.1"
     758    docType='hierarchy' nodeType="interior" />
     759    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.2.2"
     760    docType='hierarchy' nodeType="leaf" />
     761    ...
     762  </documentNodeList>
     763  <termList>
     764    <term field="" freq="454" name="snail" numDocsMatch="58" stem="3">
     765      <equivTermList>
     766        <term freq="" name="Snail" numDocsMatch="" />
     767        <term freq="" name="snail" numDocsMatch="" />
     768        <term freq="" name="Snails" numDocsMatch="" />
     769        <term freq="" name="snails" numDocsMatch="" />
     770      </equivTermList>
     771    </term>
     772  </termList>
     773</response>
     774\end{verbatim}\end{gsc}\end{quote}
     775
     776The list of document identifiers includes some information about document type and node type. Currently, document types include \gst{simple}, \gst{paged} and \gst{hierarchy}. \gst{simple} is for single section documents, i.e. ones with no sub-structure. \gst{paged} is documents that have a single list of sections, while \gst{hierarchy} type documents have a hierarchy of nested sections. For \gst{paged} and \gst{hierarchy} type documents, the node type identifies whather a section is the root of the document, an internal section, or a leaf.
     777
     778The term list identifies, for each term in teh query, what its frequency in the collection is, how many documents contained that term, and a list of its equivalent terms (if stemming or casefolding was used).
     779
     780\subsubsection{'browse'-type services}
     781
     782Browse type services are used for classification browsing. The request consists of a list of classifier identifiers, and some structure parameters listing what structure to retrieve.
     783
     784\begin{quote}\begin{gsc}\begin{verbatim}
     785<request lang="en" to="mgppdemo/ClassifierBrowse" type="process">
     786  <paramList>
     787    <param name="structure" value="ancestors" />
     788    <param name="structure" value="children" />
     789  </paramList>
     790  <classifierNodeList>
     791    <classifierNode nodeID="CL1.2" />
     792  </classifierNodeList>
     793</request>
     794
     795<response from="mgppdemo/ClassifierBrowse" type="process">
     796  <classifierNodeList>
     797    <classifierNode nodeID="CL1">
     798      <nodeStructure>
     799    <classifierNode nodeID="CL1">
     800          <classifierNode nodeID="CL1.2">
     801        <classifierNode nodeID="CL1.2.1" />
     802        <classifierNode nodeID="CL1.2.2" />
     803        <classifierNode nodeID="CL1.2.3" />
     804        <classifierNode nodeID="CL1.2.4" />
     805        <classifierNode nodeID="CL1.2.5" />
     806          </classifierNode>
     807    </classifierNode>
     808      </nodeStructure>
     809    </classifierNode>
     810  </classifierNodeList>
     811</response>
     812\end{verbatim}\end{gsc}\end{quote}
     813
     814Possible values for structure parameters are \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. The response gives, for each identifier in the request, a \gst{<nodeStructure>} element with all the requested structure put together into a hierarchy. The structure may include classifier and document nodes.
     815
    769816
    770817\subsubsection{'retrieve'-type services}
     818
     819Retrieval services are special in that requests are not explicilty initiated by a user from a form on a web page, but are called from actions in response to other things. This means that their names are hard-coded into the Actions. DocumentContentRetrieve, DocumentStructureRetrieve and DocumentMetadataRetrieve are the standard names for retrieval services for content, structure, and metadata of documents. Requests to each of these include a list of document identifiers. Because these generally refer to parts of documents, the elements are called \gst{<documentNode>}. For the content, that is all that is required. For the metadata retrieval service, the request also needs parameters specifying what metadata is required. For structure retrieval services, requests need parameters specifying what structure or structural info is required.
     820
     821Some example requests and responses follow.
     822
    771823Give me the Title metadata for these documents:
    772824\begin{quote}\begin{gsc}\begin{verbatim}
    773 <message>
    774   <request lang='en'  to="mgppdemo/MetadataRetrieve"
    775     type="retrieve">
    776       <documentList>
    777         <document name="HASH010f073f22033181e206d3b7"/>
    778         <document name="HASH010f073f22033181e206d3b7.2"/>
    779         <document name="HASHac0a04dd14571c60d7fbfd"/>
    780       </documentList>
     825
     826<request lang="en" to="mgppdemo/DocumentMetadataRetrieve" type="process">
     827  <paramList>
     828    <param name="metadata" value="Title" />
     829  </paramList>
     830  <documentNodeList>
     831    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"/>
     832    <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12"/>
     833    <documentNode nodeID="HASH010f073f22033181e206d3b7.1"/>
     834    ...
     835  </documentNodeList>
     836</request>
     837
     838<response from="mgppdemo/DocumentMetadataRetrieve" type="process">
     839  <documentNodeList>
     840    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
    781841      <metadataList>
    782         <metadata name="Title"/>
     842        <metadata name="Title">Putting snails in your second pen</metadata>
    783843      </metadataList>
    784     </content>
    785   </request>
    786 </message>
    787 \end{verbatim}\end{gsc}\end{quote}
    788 
    789 \begin{quote}\begin{gsc}\begin{verbatim}
    790 <message>
    791   <response lang='en' from="mgppdemo/MetadataRetrieve"
    792     type="retrieve">
    793     <content>
    794       <documentList>
    795         <document name="HASH010f073f22033181e206d3b7">
    796           <metadataList>
    797             <metadata name="Title">Farming snails 1:
    798 Learning about snails; Building a pen; Food and shelter plants
    799             </metadata>
    800           </metadataList>
    801         </document>
    802         <document name="HASH010f073f22033181e206d3b7.2">
    803           <metadataList>
    804             <metadata name="Title">Learning about snails
    805         </metadata>
    806           </metadataList>
    807         </document>
    808         <document name="HASHac0a04dd14571c60d7fbfd">
    809           <metadataList>
    810             <metadata name="Title">Farming snails 2:
    811 Choosing snails; Care and harvesting; Further improvement
    812             </metadata>
    813           </metadataList>
    814         </document>
    815       </documentList>
    816     </content>
    817   </response>
    818 </message>
    819 \end{verbatim}\end{gsc}\end{quote}
    820 
    821 Give me the text for this document:
    822 \begin{quote}\begin{gsc}\begin{verbatim}
    823 <message>
    824   <request lang='en'   to="mgppdemo/DocumentRetrieve"
    825     type="retrieve">
    826     <content>
    827       <documentList>
    828         <document name="HASH010f073f22033181e206d3b7.2"/>
    829       </documentList>
    830     </content>
    831   </request>
    832 </message>
    833 \end{verbatim}\end{gsc}\end{quote}
    834 
    835 \begin{quote}\begin{gsc}\begin{verbatim}
    836 <message>
    837   <response lang='en' from="mgppdemo/DocumentRetrieve"
    838     type="retrieve">
    839     <content>
    840       <document name="HASH010f073f22033181e206d3b7.2">
    841         <content>
     844    </documentNode>
     845    <documentNode nodeID="HASH010f073f22033181e206d3b7.2.12">
     846      <metadataList>
     847        <metadata name="Title">Now you must decide</metadata>
     848      </metadataList>
     849    </documentNode>
     850    <documentNode nodeID="HASH010f073f22033181e206d3b7.1">
     851      <metadataList>
     852        <metadata name="Title">Introduction</metadata>
     853      </metadataList>
     854    </documentNode>
     855  </documentNodeList>
     856</response>
     857\end{verbatim}\end{gsc}\end{quote}
     858
     859One or more parameters specifying metadata may be included in a request. Also, a value of \gst{all} will retrieve all the metadata for each document.
     860
     861Any browse-type service must also implement a metadata retrieval service to provide metadata for the nodes in the classification hierarchy. The name of it is the brose service name plus \gst{MetadataRetrieve}. For example, the ClassifierBrowse service described in the previous section should also have a ClassifierBrowseMetadataRetrieve service. The request and response format is exactly the same as for the DocumentMetadataREtrieve service, except that \gst{<documentNode>} elements are replaced by \gst{<classifierNode>} elements (and the corresponding list element is also changed).
     862
     863Give me the text (content) of this document:
     864\begin{quote}\begin{gsc}\begin{verbatim}
     865<request lang="en" to="mgppdemo/DocumentContentRetrieve" type="process">
     866  <paramList />
     867  <documentNodeList>
     868    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
     869  </documentNodeList>
     870</request>
     871
     872<response from="mgppdemo/DocumentContentRetrieve" type="process">
     873  <documentNodeList>
     874    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
     875      <nodeContent>&lt;Section&gt;
     876 
    842877&lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
    843 &lt;P ALIGN=&quot;JUSTIFY&quot;&gt;11. To farm snails is not hard; however,
    844 it is quite different from keeping chickens or ducks or from growing crops
    845 such as maize, rice, cassava or groundnuts.&lt;/P&gt;
    846 &lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
    847 &lt;P ALIGN=&quot;JUSTIFY&quot;&gt;12. Since farming snails is so different
    848 from other kinds of farming, you will have to learn a lot of new things.
    849 &lt;/P&gt;....
    850         </content>
    851       </document>
    852     </content>
    853   </response>
    854 </message>
    855 \end{verbatim}\end{gsc}\end{quote}
    856 
    857 \subsubsection{'browse'-type services}
    858 
    859 \subsubsection{'process'-type services}
    860 Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going.
    861 
    862 Build requests generally do not need a content, they just have a parameter list.\footnote{or is the collection the content?} Like any service, the parameters used by the service can be obtained by a describe request to that service.
    863 
    864 Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'):
    865 
    866 \begin{quote}\begin{gsc}\begin{verbatim}
    867 <message>
    868   <request lang='en'  type='process' to='build/NewCollection'>
    869     <paramList>
    870       <param name='creator' value='[email protected]'/>
    871       <param name='collName' value='the demo collection'/>
    872       <param name='collShortName' value='demo'/>
    873     </paramlist>
    874   </request>
    875 </message>
    876 
    877 <message>
    878   <request lang='en'  type='process' to='build/ImportCollection'>
    879     <paramList>
    880       <param name='collection' value='demo'/>
    881     </paramlist>
    882   </request>
    883 </message>
    884 \end{verbatim}\end{gsc}\end{quote}
    885 
    886 \subsubsection{'enrich]-type services}
     878&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;190. When the plants in your second pen have
     879grown big enough to provide food and shelter, you can put in the snails.&lt;/P&gt;
     880
     881      </nodeContent>
     882    </documentNode>
     883  </documentNodeList>
     884</response>
     885\end{verbatim}\end{gsc}\end{quote}
     886
     887The content of a node is returned in a \gst{<nodeContent>} element.
     888
     889Give me the ancestors and children of the specified node, along with the number of siblings it has:
     890\begin{quote}\begin{gsc}\begin{verbatim}
     891<request lang="en" to="mgppdemo/DocumentStructureRetrieve" type="process">
     892  <paramList>
     893    <param name="structure" value="ancestors" />
     894    <param name="structure" value="children" />
     895    <param name="info" value="numSiblings" />
     896  </paramList>
     897  <documentNodeList>
     898    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2" />
     899  </documentNodeList>
     900</request>
     901
     902<response from="mgppdemo/DocumentStructureRetrieve" type="process">
     903  <documentNodeList>
     904    <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2">
     905      <nodeStructureInfo>
     906        <info name="numSiblings" value="2" />
     907      </nodeStructureInfo>
     908      <nodeStructure>
     909        <documentNode nodeID="HASHac0a04dd14571c60d7fbfd"
     910                docType='hierarchy' nodeType="root">
     911          <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4"
     912                  docType='hierarchy' nodeType="interior">
     913            <documentNode nodeID="HASHac0a04dd14571c60d7fbfd.4.2"
     914                   docType='hierarchy' nodeType="leaf" />
     915          </documentNode>
     916        </documentNode>
     917      </nodeStructure>
     918    </documentNode>
     919  </documentNodeList>
     920</response>
     921\end{verbatim}\end{gsc}\end{quote}
     922
     923Structure is returned inside a nodeStructure element, while structural info is returned in a nodeStructureInfo element. Possible values for strcuture parameters are as for browse services: \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}.
     924
     925\subsubsection{'process'-type services}\label{sec:process}
     926Process requests are not a request for data---they are a request for some action to be carried out, for example, create a new collection, or import a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a response is sent back after a successful start to the command. The status may be polled by the requester to see how the process is going.
     927
     928Process requests generally contain just a parameter list. Like for any service, the parameters used by a process-type service can be obtained by a describe request to that service.
     929
     930Here are two example requests for process-services that are part of the build service cluster (hence the addresses all begin with 'build/'), followed by an example response:
     931
     932\begin{quote}\begin{gsc}\begin{verbatim}
     933<request lang='en'  type='process' to='build/NewCollection'>
     934  <paramList>
     935    <param name='creator' value='[email protected]'/>
     936    <param name='collName' value='the demo collection'/>
     937    <param name='collShortName' value='demo'/>
     938  </paramlist>
     939</request>
     940
     941<request lang='en'  type='process' to='build/ImportCollection'>
     942  <paramList>
     943    <param name='collection' value='demo'/>
     944  </paramlist>
     945</request>
     946
     947<response from="build/ImportCollection">
     948  <status code="2" pid="2">Starting process...</status>
     949</response>
     950\end{verbatim}\end{gsc}\end{quote}
     951
     952The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of teh status element is (currenlty) just the output from the process so far. Status messages, which are described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.
    887953
    888954\subsubsection{'applet'-type services}
    889955
    890 \begin{quote}\begin{gsc}\begin{verbatim}
    891 <message>
     956Applet-type services are those that process the data for an applet. A request consists only of a list of parameters, and the response contains an \gst{<appletData>} element that contains the XML data to be returned to tehe applet. The format of this is entirely specific to the applet---there is no set format to the applet data.
     957
     958Here is an example request and response, used by the Phind applet:
     959\begin{quote}\begin{gsc}\begin{verbatim}
    892960  <request type='query' to='mgppdemo/PhindApplet'>
    893961    <paramList>
     
    902970    </paramList>
    903971  </request>
    904 </message>
    905 
    906 <message>
     972
    907973  <response type='query' from='mgppdemo/PhindApplet'>
    908974    <appletData>
     
    930996    </appletData>
    931997  </response>
    932 </message>
    933 \end{verbatim}\end{gsc}\end{quote}
    934 
    935 \subsection{'status'-type messages}
    936 
    937 
    938 \subsection{'format'-type messages}
    939 
     998
     999\end{verbatim}\end{gsc}\end{quote}
     1000
     1001\subsubsection{'enrich'-type services}
     1002
     1003*** TODO ****
     1004
     1005\subsection{'status'-type messages}\label{sec:status}
     1006
     1007These are only used with process-type services, which are those where a request is sent to start some type of process (see Section~\ref{sec:process}). The initial response states whether the process had successfully started, and whether its still continuing. If the process is not finished, status requests can be sent repeatedly to the service to poll the status, using the pid to identify the  process.  Status codes are used to identify the state of a process. The values used at the moment are listed in Table~\ref{tab:status codes}\footnote{A more standard set of codes should probably be used, for example, the HTTP codes}.
     1008
     1009\begin{table}
     1010\caption{Status codes currently used in Greenstone 3}
     1011\label{tab:status codes}
     1012\begin{tabular}{llp{8cm}}
     1013\bf code name & \bf code  & \bf meaning \\
     1014& \bf value & \\
     1015SUCCESS &  1 & the request was accepted, and the process was  completed \\
     1016ACCEPTED & 2 & the request was accepted, and the process has been started, but it is not completed yet \\
     1017ERROR & 3 & there was an error and the process was stopped \\
     1018CONTINUING & 10 & the process is still continuing \\
     1019COMPLETED & 11 & the process has  finished \\
     1020HALTED & 12 & the process has stopped  \\
     1021INFO & 20 & just an info message that doesnt imply anything \\
     1022\end{tabular}
     1023\end{table}
     1024
     1025 The following shows an example status request, along with two responses, the first a 'ok but continuing' response, and the second a 'successfully completed' response. The content of the status elements in the two responses is the output from the process since the last status update was sent back.
     1026
     1027\begin{quote}\begin{gsc}\begin{verbatim}
     1028<request lang="en" to="build/ImportCollection" type="status">
     1029  <paramList>
     1030    <param name="pid" value="2" />
     1031  </paramList>
     1032</request>
     1033
     1034<response from="build/ImportCollection">
     1035  <status code="2" pid="2">Collection construction: import collection.
     1036command = import.pl -collectdir /research/kjdon/home/gsdl3/web/sites/
     1037    localsite/collect test1
     1038starting
     1039  </status>
     1040</response>
     1041
     1042<response from="build/ImportCollection">
     1043  <status code="11" pid="2">RecPlug: getting directory
     1044/research/kjdon/home/gsdl3/web/sites/localsite/collect/test1/import
     1045WARNING - no plugin could process /.keepme
     1046 
     1047*********************************************
     1048Import Complete
     1049*********************************************
     1050* 1 document was considered for processing
     1051* 0 were processed and included in the collection
     1052* 1 was rejected. See /research/kjdon/home/gsdl3/web/sites/
     1053    localsite/collect/test1/etc/fail.log for a list of rejected documents
     1054Success
     1055  </status>
     1056</response>
     1057\end{verbatim}\end{gsc}\end{quote}
     1058
     1059\subsection{'format'-type messages}\label{sec:format}
     1060
     1061Collection designers are able to specify how their collection looks to a certain degree. They can specify format statements for display that will apply to the results of a search, the display of a document, entries in a classification hierarchy, for example. This info is generally service specific. All services respond to a format request, where they return any service specific formatting information. A typical request and response looks like this:
    9401062\begin{quote}\begin{gsc}\begin{verbatim}
    9411063<request lang="en" to="mgppdemo/FieldQuery" type="format" />
     
    9431065<response from="mgppdemo/FieldQuery" type="format">
    9441066  <format>
    945     <gsf:template match="documentNode"><td><gsf:link><gsf:metadata name="Title" />(<gsf:metadata name="Source" />)</gsf:link></td></gsf:template>
     1067    <gsf:template match="documentNode"><td><gsf:link>
     1068      <gsf:metadata name="Title" />(<gsf:metadata name="Source" />)
     1069      </gsf:link></td>
     1070    </gsf:template>
    9461071  </format>
    9471072</response>
    9481073\end{verbatim}\end{gsc}\end{quote}
    9491074
    950 \section{Page generation}\label{sec:pagegen}
    951 
    952 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
     1075The actual format statements are described further in Section~\ref{sec:colldesign}. They are templates written directly in XSLT, or in GSF, which  stands for Greenstone Format, and is a simple XML representation of the more complicated XSLT templates.
     1076GSF style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format xml is transformed to xslt using xslt with the config\_format.xsl stylesheet.
     1077
     1078\section{Page generation}\label{sec:pagegen} **** REDO ********
     1079
     1080\subsection{Receptionists}\label{sec:recepts}
     1081
     1082The receptionist is the controlling module for the page generation part of greenstone. It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages.
     1083
     1084There are different ways of providing an interface to greenstone, from web based cgi style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.
     1085
     1086Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they dont do anything.
     1087
     1088TransformingReceptionist: This extends Receptionist, and overwrites postProcessPage to transform the page using xslt. An xslt is listed for each action in the receptionists config file, and this is used to transform the page. First, some display information, and config information is added to the page. Then it is transformed using the specified xslt for the action, and returned.
     1089
     1090WebReceptionist: The WebReceptionist extends TransformingREceptionist. It doesn't do much else except some argument conversion. To keep the url's short, parameters from the services are given shortnames, and these are used in the web pages.
     1091
     1092DefaultReceptionist: This extends WebReceptionist, and is the default one for greenstone 3 servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. THe receptionist sends a describe request to teh collection to get this, and appends it to teh page before transformation using xslt.
     1093
     1094NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from teh ClassifierBrowse service.
     1095
     1096By default, the LibraryServlet uses DefaultReceptionist. However, there is an init-param called receptionist which can be set to make the servlet use a different one.
     1097
     1098
     1099* talk general first: get data, get format info, transform gsf->xsl. transfrom xml->html
     1100
     1101URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
    9531102System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
    9541103
     
    9671116\end{verbatim}\end{gsc}\end{quote}
    9681117
     1118* show config and describe whats its used for
     1119
    9691120There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
    9701121
     
    9771128files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
    9781129interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
    979 ***TODO*** describe a bit more??
     1130***TODO*** describe a bit more?? currently only can get this locally
    9801131
    9811132\subsection{Internationalization}
     
    9951146
    9961147The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there.
    997 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
     1148xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? but we dont want to re-load teh properties file every time a new text string is needed.
    9981149
    9991150All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
    10001151
     1152* for each page type, show a typical request (cgi or xml??) and a sample response
     1153
    10011154\subsection{Page action}
    1002 
     1155* kind of info pages. other actions are associated with specific services.
     1156* uses describe requests to modules
    10031157Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.  The page is
    10041158transformed using \gst{home.xsl}.  For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
     
    10081162\subsection{Query action}
    10091163
     1164THe basic url is \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}.
    10101165There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
    1011 For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is done every time the query page is
    1012 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
     1166For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is currently done every time the query page is
     1167displayed, but should be cached. The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
    10131168is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
    1014 documents, with a request for their \gst{Title} metadata. The service description and query result are combined into a page of xml, which is
     1169documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the xslt that wil be used to transform the page (Formatter object??). The service description and query result are combined into a page of xml, which is
    10151170transformed using \gst{basicquery.xsl} to produce the html page.
    10161171
    10171172\subsection{Applet action}
    10181173
    1019 There are two types of request to the applet action: \gst{a=a \& sa=d\/} and
    1020 \gst{a=a \& sa=r\/}.  The value \gst{sa=d\/} means ``display the applet.'' A
     1174There are two types of request to the applet action: \gst{a=a \& rt=d\/} and
     1175\gst{a=a \& rt=r\/}.  The value \gst{rt=d\/} means ``display the applet.'' A
    10211176\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element.  The transformation file \gst{applet.xsl} embeds this
    10221177into the page, and the servlet returns the HTML.
    10231178
    1024 The value \gst{sa=r} signals a request from the applet.  The result is returned
     1179The value \gst{rt=r} signals a request from the applet.  The result is returned
    10251180directly to the applet code, in XML.  The other parameters are sent to the
    10261181service untransformed, and the result is passed directly back to the applet.
     
    10311186
    10321187The first request corresponds to the URL arguments \gst{a=a \&
    1033 sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
     1188rt=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
    10341189applet for the mgppdemo collection''.
    10351190
    10361191
    1037 The second request corresponds to the  arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
     1192The second request corresponds to the  arguments \gst{a=a \& rt=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
    10381193indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
    10391194request as parameters. The response is in a form suitable for the applet, placed inside
     
    10691224\bf arg & \bf description\\
    10701225a=s & system action\\
    1071 sa=c|a|d & type of system request: c (configure), a (add/activate), \\
     1226sa=c$|$a$|$d & type of system request: c (configure), a (add/activate), \\
    10721227& d (delete/deactivate) \\
    10731228c=demo &  the request will go to this collection/servicecluster \\
     
    10771232& collectionList, siteList.\\
    10781233&  For a collection/cluster, can be metadataList or serviceList.\\
    1079 sn= & \\
    1080 st & \\
     1234sn=demo & \\
     1235st=collection& \\
    10811236\hline
    10821237\end{tabular}
     
    10921247\subsection{Importing gs2 collections}
    10931248
    1094 Collections built in a Greenstone2 system can be used in Greenstone3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs3.pl}. You need to specify the collect directory and the collection name. Eg.
    1095 
    1096 \gst{convert\_coll\_from\_gs2.pl -collectdir /research/kjdon/gsdl3/web/sites/localsite/collect demo}
    1097 
    1098 This creates the appropriate Greenstone3 XML configuration files. If you restart Tomcat, or give an add command (\gst{a=s\&sa=a\&st=collection\&sn=demo}), you should be able to see your new collection.
     1249Collections built in a Greenstone2 system can be used in Greenstone3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs2.pl}. You need to specify the collect directory and the collection name. Eg.
     1250
     1251\gst{convert\_coll\_from\_gs2.pl -collectdir /research/kjdon/gsdl3/web/\-sites/\-localsite/collect demo}
     1252
     1253This creates the appropriate Greenstone3 XML configuration files. If you restart Tomcat, or give an add command (\gst{a=s\&sa=a\&st=collection\&sn=demo}), you should be able to see your new collection. You may need to edit some of the format stuff by hand.
    10991254
    11001255
    11011256\subsection{Building new collections through the web interface}
    11021257
    1103 Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to the build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to the import directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the config files by hand. Editing collect.cfg will change the way building is done (by Greenstone2), and editing collectionConfig.xml will change the way the collection is used (by Greenstone3).
     1258Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to the build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to the import directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the collect.cfg config file by hand.
    11041259
    11051260You need to carry out the following steps:
     
    11081263NewCollection\\
    11091264- add docs to import directory\\
    1110 - optionally edit collect.cfg and/or collectionConfig.cfg\\
     1265- optionally edit collect.cfg
    11111266ImportCollection\\
    11121267BuildCollection\\
     
    11141269\end{quote}
    11151270
     1271Note, activate uses \gst{activate\_gs2\_style\_coll.pl} which is similar to \gst{convert\_coll\_from\_gs2.pl} but assumes that collectionConfig.xml already exists.
    11161272
    11171273\subsection{Command line building}
     
    11341290\end{verbatim}\end{gsc}
    11351291
    1136 the options get passed to the underlying script, - there is no good help message yet.
    1137 
     1292The options get passed to the underlying script, - there is no good help message yet.
    11381293import and build use gs2 import.pl and buildcol.pl so you can specify any of their options if you like.
     1294The sequence of steps is the same as for building via the web interface: new, manually add documents to the import directory, and edit collect.cfg if needed, import, build, activate.
    11391295
    11401296Building stuff is in src/java/org/greenstone/gsdl3/build.
    1141 
    1142 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses Greenstone 2 Perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
     1297CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses Greenstone 2 Perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events. The perl stuff just passes any messages on---should be more informative in  future.
    11431298
    11441299\subsection{Collection design}\label{sec:colldesign}
    11451300
    1146 \section{Installation details}
     1301Part of collection design involves deciding how the collection should look. Greenstone has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
     1302
     1303In standard greenstone, the library is served to a web browser by a servlet, and the html is generated using XSLT. XSLT templates are used to format all the parts of the pages. Some commonly overwritten templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
     1304
     1305Real XSL templates for formatting search results or classifier  lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.
     1306 
     1307\begin{gsc}\begin{verbatim}
     1308<xsl:template match="documentNode" priority="2"
     1309     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     1310  <xsl:param name="collName"/>
     1311    <td><a href="{\$library_name}?a=d&amp;c={\$collName}&amp;
     1312           d={@nodeID}&amp;dt={@docType}"><xsl:value-of
     1313           select="metadataList/metadata[@name='Keyword']"/></a>
     1314    </td>
     1315</xsl:template>
     1316 \end{verbatim}\end{gsc}
     1317 
     1318To write this, the user would need to know that:
     1319\begin{bulletedlist}
     1320\item the variable \$library\_name exists,
     1321\item the collection name is passed in as a parameter called collName
     1322\item metadata for a document is found in a metadataList and that its form is \gst{<metadata name="Keyword">the value</metadata>}
     1323\item the arguments needed for the link to the document are a, sa, c, d and dt.
     1324\end{bulletedlist}
     1325 
     1326Since XSLT is written in XML, we can use XSLT to transform XML into XSLT. GSF uses a simple set of XML elements to represent the old (Greenstone2) format statement elements, and we use XSLT to transform it into a proper XSLT template.
     1327 
     1328\begin{tabular}{ll}
     1329\bf Greenstone 2        & \bf Greenstone 3 \\
     1330\gst{[Text]} & \gst{<gsf:text/>} \\
     1331\gst{[num]} & \gst{<gsf:num/>}\\
     1332\gst{[link][/link]} & \gst{<gsf:link></gsf:link>} or \\
     1333& \gst{<gsf:link type='document'></gsf:link>}\\
     1334\gst{[srclink][/srclink]} & \gst{<gsf:link type='source'></gsf:link>}\\
     1335\gst{[icon]} & \gst{<gsf:icon/>} or \\
     1336& \gst{<gsf:icon type='document'/>}\\
     1337\gst{[srcicon]} & \gst{<gsf:icon type='source'/>}\\
     1338\gst{[Title]} (metadata) & \gst{<gsf:metadata name='Title'/>} or \\
     1339& \gst{<gsf:metadata name='Title' select='current'/>}\\
     1340\gst{[parent:Title]} & \gst{<gsf:metadata name='Title' select='parent' />}\\
     1341\gst{[parent(All):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'/>}\\
     1342\gst{[parent(Top):Title]} & \gst{<gsf:metadata name='Title' select='root' />}\\
     1343\gst{[parent(All': '):Title]} & \gst{<gsf:metadata name='Title' select='ancestors'}\\
     1344& \gst{separator=': ' />}\\
     1345\end{tabular}
     1346 
     1347 Other select values for gsf:metadata are \gst{children} and \gst{descendents}. How you would actually use these is unclear.
     1348 
     1349The user specifies a \gst{<gsf:template>} for what they want to format---these can match \gst{documentNode} or \gst{classifierNode} (for node in a classification hierarchy).
     1350 
     1351The template above is now represented as:
     1352 
     1353\begin{gsc}\begin{verbatim}
     1354<gsf:template match='documentNode'>
     1355  <td><gsf:link><gsf:metadata name='Keyword'/></gsf:link></td>
     1356</gsf:template>
     1357\end{verbatim}\end{gsc}
     1358 
     1359I am not sure how the \{If\} and \{Or\} stuff will go yet. Any ideas????
     1360\section{Greenstone Installation}
    11471361
    11481362This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS.
    11491363
    11501364\subsection{Directory structure}
    1151 
    1152 The first part of Table~\ref{tab:dirs} shows the common stuff which can be shared between
    1153 Greenstone users---the src, libraries etc. These will eventually be installed into appropriate system directories. The second part shows
     1365Table~\ref{tab:dirs} shows the file hierarchy for Greenstone3.
     1366The first part shows the common stuff which can be shared between
     1367Greenstone users---the src, libraries etc. Under linux, these will eventually be installed into appropriate system directories. The second part shows
    11541368stuff used by one person/group---their sites and interface setup
    11551369etc. There can be several sites/interfaces per installation.
     
    12951509To shutdown or startup Tomcat, the commands are:
    12961510\begin{quote}\begin{gsc}
    1297 \gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\
    1298 \gsdlhome/comms/tomcat/jakarta/bin/startup.sh\\
     1511\gsdlhome/comms/jakarta/tomcat/bin/shutdown.sh\\
     1512\gsdlhome/comms/jakarta/tomcat/bin/startup.sh\\
    12991513\end{gsc}\end{quote}
    13001514
     
    13171531The initialisation parameters used by the library servlets are as follows:
    13181532
    1319 \begin{tabular}{lll}
     1533\begin{tabular}{llp{5cm}}
    13201534\bf name & \bf sample value & \bf description \\
    13211535\hline
     
    13321546It is possible to run several servlets at once, with different combinations of sites and/or interfaces.
    13331547
    1334 The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
     1548The file \gst{\gsdlhome/comms/jakarta/tomcat/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
    13351549
    13361550
     
    13431557
    13441558\begin{gsc}\begin{tt}
    1345 \noindent cd \gsdlhome/comms/tomcat/jakarta/bin\\
     1559\noindent cd \gsdlhome/comms/jakarta/tomcat/bin\\
    13461560./startup.sh
    13471561\end{tt}\end{gsc}
     
    13571571\begin{gsc}
    13581572\item \gsdlhome/web/WEB-INF/web.xml
    1359 \item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml
     1573\item \gsdlhome/comms/jakarta/tomcat-tomcat-4.0.1/conf/server.xml
    13601574\end{gsc}
    13611575\item any classes or jar files used by the servlets
    13621576\end{bulletedlist}
    13631577\noindent Note: stdin and stdout for the servlets both go to\\
    1364 \gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out}
     1578\gst{\gsdlhome/comms/jakarta/tomcat/logs/catalina.out}
    13651579
    13661580On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting Tomcat.
     
    14121626address for talking to the Tomcat SOAP servlet services.
    14131627
    1414 \section{Developer's notes}
     1628\section{Greenstone Customization}
     1629
     1630\subsection{How to define a new interface}
     1631
     1632Most of an interface is defined by XSLT files, which are stored in web/interfaces/interface-name/transform.
     1633\subsection{Adding a new language}
     1634
     1635Adding a new interface language to Greenstone 3 is easy. All of the language-dependent text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where name is the interface name). Each service class has one with the same name as the class (eg GS2Search.properties). To add another language these files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties.
     1636
     1637Keys will be looked up in the properties file closest to the specified language. For example, if language fr\_CA was specified (french language, country Canada), and the default locale was en\_GB,  java would look at properties files in the following order, until it found the key: XXX\_fr\_CA.properties, XXX\_fr.properties,  XXX\_en\_GB.properties, then XXX\_en.properties, and finally the default XXX.properties.
     1638\section{Greenstone Development}
    14151639
    14161640Here are some random notes for developers who want to modify the source code.
     
    14471671\subsection{Creating new services}
    14481672
    1449 a browse type service must also implement servicenameMetadataRetrieve service.
     1673*inherit from service rack
     1674
     1675* what methods are expected
     1676
     1677*service type responses expected
     1678
     1679*a browse type service must also implement servicenameMetadataRetrieve service.
     1680
     1681* should a metadata retrieval service advertise what metadata is available??
     1682\subsection{creating new actions/pages}
     1683
    14501684\subsection{Working with XML}
    14511685
     
    14611695Document style = converter.getDOM(stylesheet);\\
    14621696
    1463 String message = ``<message><request type='cgi'/></message>'';\\
     1697String message = ``<message><request type='page'/></message>'';\\
    14641698Document m = converter.getDOM(message);\\
    14651699\end{gsc}\end{quote}
     
    14851719xmlns:gsf="http://www.greenstone.org/configformat"
    14861720
    1487 
     1721(xslt namespace: xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    14881722no DTDs or Schema defined yet. Until there are, try and keep to the following rules:
    14891723
     
    15541788\end{verbatim}\end{footnotesize}\end{quote}
    15551789
     1790\item{\em using Java extension elements:}
     1791
     1792Declare the namespace for your java extensions using one of the following
     1793three formats.
     1794
     1795class format: \gst{xmlns:my-class="xalan://FQCN"} where FQCN is the fully qualified class name. Examples: \gst{xmlns:my-class="xalan://java.util.Hashtable"},  \gst{xmlns:my-class="xalan://mypackage.myclass"}
     1796
     1797package format: \gst{xmlns:my-class="xalan://PJPN"} where PJPN is a partial java package name. That is, it is the beginning of or the complete name of a java  package. Examples: \gst{xmlns:my-package="xalan://java.util"}, \gst{xmlns:my-package="xalan://mypackage"}
     1798
     1799Java format: \gst{xmlns:java="http://xml.apache.org/xalan/java"}
     1800
     1801Then, how you use the java classes and methods depends on which format you declared you namespace.
     1802
     1803class format:
     1804
     1805To create an instance of an object: \gst{prefix:new (args)}. Example: \gst{<xsl:variable name="myType" select="my-class:new()">}
     1806
     1807To invoke an instance method on a specified object: \gst{prefix:methodName (object, args)} where methodName is the name of the method to invoke on object with the args arguments. object must be an object of the class indicated by the namespace declaration. Example: \gst{<xsl:variable name="new-pop" select="my-class:valueOf(\$myType, string(@population))">}
     1808
     1809To invoke an instance method on a default object: \gst{prefix:methodName (args)}  where  methodName is the name of the method to invoke with the args arguments. If a matching method is found, a default instance of the class will be created if it does not already exist. Example: \gst{<xsl:variable name="new-pop" select="my-class:valueOf(string(@population))">}
     1810
     1811To invoke a static method: \gst{prefix:methodName (args)} where methodName is the name of the method to invoke with the args arguments. Example: \gst{<xsl:variable name="new-pop" select="my-class:printit(string(@population))">}
     1812
     1813package format:
     1814
     1815o create an instance of an object:
     1816                   prefix:subpackage.class.new (args)
     1817
     1818                   where prefix is the extension namespace prefix, subpackage is the rest of the package name (the
     1819                   beginning of the package name was in the namespace declaration), and class is the name of the class.
     1820                   A new instance is to be created with the args constructor arguments (if any). All constructor methods
     1821                   are qualified for method selection.
     1822                   Example: <xsl:variable name="myType"
     1823                          select="my-package:extclass.new()">
     1824
     1825                   To invoke an instance method on a specified instance:
     1826                   prefix:methodName (object, args)
     1827
     1828                   where prefix is the extension namespace prefix and methodName is the name of the method to invoke
     1829                   on object with the args arguments. Only instance methods of the object with the name methodName
     1830                   are qualified methods. If a matching method is found, object will be used to identify the object instance
     1831                   and args will be passed to the invoked method.
     1832                   Example: <xsl:variable name="new-pop"
     1833                        select="my-package:valueOf(\$myType, string(@population))">
     1834
     1835                   To invoke a static method:
     1836                   prefix:subpackage.class.methodName (args)
     1837
     1838                   where prefix is the extension namespace prefix, subpackage is the rest of the package name (the
     1839                   beginning of the package name was in the namespace declaration), class is the name of the class, and
     1840                   methodName is the name of the method to invoke with the args arguments. Only static methods with
     1841                   the name methodName are qualified methods. If a matching method is found, args will be passed to the
     1842                   invoked static method.
     1843                   Example: <xsl:variable name="new-pop"
     1844                        select="my-package:extclass.printit(string(@population))">
     1845
     1846
     1847                       Unlike the class format namespace, there is no concept of a default object since the namespace
     1848                       declaration does not identify a unique class.
     1849
     1850java format:
     1851
     1852
     1853
     1854
     1855                   To create an instance of an object:
     1856                   prefix:FQCN.new (args)
     1857
     1858                   where prefix is the extension namespace prefix for the Java namespace and FQCN is the fully qualified
     1859                   class name of the class whose constructor is to be called. A new instance is to be created with the
     1860                   args constructor arguments (if any). All constructor methods are qualified for method selection.
     1861                   Example: <xsl:variable name="myHash"
     1862                          select="java:java.util.Hashtable.new()">
     1863
     1864                   To invoke an instance method on a specified instance:
     1865                   prefix:methodName (object, args)
     1866
     1867                   where prefix is the extension namespace prefix and methodName is the name of the method to invoke
     1868                   on object with the args arguments. Only instance methods of the object with the name methodName
     1869                   are qualified methods. If a matching method is found, object will be used to identify the object instance
     1870                   and args will be passed to the invoked method.
     1871                   Example: <xsl:variable name="new-pop"
     1872                        select="java:put(\$myHash, string(@region), \$newpop)">
     1873
     1874                   To invoke a static method:
     1875                   prefix:FQCN.methodName (args)
     1876
     1877                   where prefix is the extension namespace prefix, FQCN is the fully qualified class name of the class
     1878                   whose static method is to be called, and methodName is the name of the method to invoke with the
     1879                   args arguments. Only static methods with the name methodName are qualified methods. If a matching
     1880                   method is found, args will be passed to the invoked static method.
     1881                   Example: <xsl:variable name="new-pop"
     1882                        select="java:java.lang.Integer.valueOf(string(@population))">
     1883
     1884
     1885                       Unlike the class format namespace, there is no concept of a default object since the namespace
     1886                       declaration does not identify a unique class.
     1887
     1888
     1889
     1890
     1891
     1892
    15561893
    15571894\end{bulletedlist}
     
    15781915
    15791916\item xsl:for-each is fast because it does not require pattern matching.
     1917
     1918\item avoid recursion
    15801919
    15811920\item Keep in mind that xsl:sort prevents incremental processing.
Note: See TracChangeset for help on using the changeset viewer.