Changeset 7826


Ignore:
Timestamp:
2004-07-29T13:32:35+12:00 (20 years ago)
Author:
kjdon
Message:

added some more info

Location:
trunk/gsdl3/docs/manual
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r7635 r7826  
    462462
    463463Once the build process is complete, the building directory should be renamed to index (after deleting or renaming the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
     464
     465Summary:
    464466
    465467[TODO: need to describe namespaces somewhere? ]
     
    10111013\subsection{Overview of modules??}
    10121014
    1013 A \gsiii\  'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system. There is a one-to-one correspondance between modules and Java classes, with the exception of services: for coding and/or run-time efficiency reasons, several Service modules may be grouped together into one ServiceRack class.
     1015A \gsiii\  'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system. The top left part is concerned with displaying the data, while the bottom right part is the collection data serving part. The two sides communicate through the MessaegRouter. There is a one-to-one correspondance between modules and Java classes, with the exception of services: for coding and/or run-time efficiency reasons, several Service modules may be grouped together into one ServiceRack class.
    10141016
    10151017\begin{figure}[t]
    10161018  \centering
    1017   \includegraphics[width=4in]{local} %5.8
     1019  \includegraphics[width=4in]{newlocal} %5.8
    10181020  \caption{A simple stand-alone site.}
    10191021  \label{fig:local}
     
    10231025{\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site.
    10241026
    1025 {\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e.g. all the building services may be part of a cluster. What is part of a cluster is specified by the site configuration file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
     1027{\em Collection and ServiceCluster}: these are very similar, and group a set of services into a conceptual group.. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e.g. all the building services may be part of a cluster. What is part of a cluster is specified by the site configuration file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.
    10261028Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
    10271029
    1028 {\em Service}: these provide the core functionality of the system e.g. searching, retrieving documents, building collections etc. One or more may be grouped into a single Java class (ServiceRack) for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, e.g. searching, retrieving documents, building collections etc.
     1030{\em Service}: these provide the core functionality of the system e.g. searching, retrieving documents, building collections etc. One or more may be grouped into a single Java class (ServiceRack) for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory.
    10291031
    10301032{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
    10311033
    1032 {\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to the appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT for example. There is a hierarchy of different Receptionist types, which is described in Section~\ref{sec:recepts}.
     1034{\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to the appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT. There is a hierarchy of different Receptionist types, which is described in Section~\ref{sec:recepts}.
    10331035
    10341036{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'CGI' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in  more detail in Section~\ref{sec:pagegen}.
     
    10521054If the Receptionist is a TransformingReceptionist, a mapping between shortnames  and XSLT file names is also created.
    10531055
    1054 The MessageRouter reads in its site configuration file \gst{siteConfig.xml} (see Section~\ref{sec:siteconfig}). It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
     1056The MessageRouter reads in its site configuration file \gst{siteConfig.xml} (see Section~\ref{sec:siteconfig}). It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are part of what get returned in response to a describe request (see Section~\ref{sec:describe}.).
     1057
    10551058Each ServiceRack specified in the configuration file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
     1059
    10561060ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
    1057 For each site specified, the MessageRouter creates an appropriate type of Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should  be successful. The site will be added to the mapping with its site name as a key. The site's collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure-sites commands must be sent (see Section~\ref{sec:runtime-config}).
    1058 
    1059 The MessageRouter also looks inside the site's \gst{collect} directory, and  loads up a Collection object for each valid collection found.
    1060 
     1061
     1062For each site specified, the MessageRouter creates an appropriate type of Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should  be successful. The site will be added to the mapping with its site name as a key. The site's collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure-site command must be sent (see Section~\ref{sec:runtime-config}).
     1063
     1064The MessageRouter also looks inside the site's \gst{collect} directory, and  loads up a Collection object for each valid collection found. If a \gst{collectionInit.xml} file is present, a subclass of Collection may be used.
    10611065The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
    10621066files, determines the metadata, and loads ServiceRack classes based on the
     
    10671071
    10681072There are two types of messages used by the system: external and internal messages. All messages have an enclosing \gst{<message>} element, which contains either one or more requests, or one or more responses. In the following descriptions, the message element is not shown, but is assumed to be present. 
    1069 Action in \gsiii\  is originated by a request coming in from the outside. In the standard web-based \gs\ , this comes from a servlet into the receptionist. This ``external'' type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request.
    1070 
    1071 Messages inside the system (``internal'' messages) all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned. Currently all requests are individual, so any requests can be combined into the same message, and they will be answered separately, with their responses being sent back in a single message.
    1072 
    1073 When a page request comes in to the Receptionist, it looks at the action attribute to determine which action to send it to. The response is returned from the action. The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the servlet \gs\  we transform using XSLT to generate html pages which get returned to the servlet.
     1073Action in \gsiii\  is originated by a request coming in from the outside. In the standard web-based \gs, this comes from a servlet and is passed into the Receptionist. This ``external'' type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter of the request.
     1074
     1075Messages inside the system (``internal'' messages) all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned. Currently all requests are independent, so any requests can be combined into the same message, and they will be answered separately, with their responses being sent back in a single message.
     1076
     1077When a page request (external request) comes in to the Receptionist, it looks at the action attribute and passes the request to the appropriate Action module. The Action will fire one or more internal requests to the MessageRouter, based on the arguments. The data is gathered into a  response, which is returned to the Receptionist.  The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the \gs\ servlet  we transform using XSLT to generate html pages.
    10741078
    10751079Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text
    1076 There are different request types: describe, process, system...
    1077 
    1078 The message formats for each request type, and the response formats for each module are described in the following section.
    1079 
    1080 \subsection{an attempt at an API: message formats}
    1081 
    1082 \subsubsection{external$->$action}\label{sec:page-requests}
    1083 
    1084 request:
    1085 These are the special 'external'-style messages. Requests originate from outside \gs\ , for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a \gs\  URL.  The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
    1086 
    1087 Here are some examples of  requests\footnote{In a servlet context, these correspond to the arguments \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
    1088 
    1089 \begin{quote}\begin{gsc}\begin{verbatim}
    1090 <request type='page' action='p' subaction='about'
    1091          lang='fr' output='html'>
    1092   <paramList>
    1093     <param name='c' value='demo'/>
    1094   </paramList>
    1095 </request>
    1096 \end{verbatim}\end{gsc}\end{quote}
    1097 
    1098 \begin{quote}\begin{gsc}\begin{verbatim}
    1099 <request type='page' action='q' lang='en' output='html'>
    1100   <paramList>
    1101     <param name='s' value='TextQuery'/>
    1102     <param name='c' value='demo'/>
    1103     <param name='rt' value='r'/>
    1104     <!-- the rest are the service specific params -->
    1105     <param name='ca' value='0'/> <!-- casefold -->
    1106     <param name='st' value='1'/> <!-- stem -->
    1107     <param name='m' value='10'/> <!-- maxdocs -->
    1108     <param name='q' value='snail'/> <!-- query string -->
    1109   </paramList>
    1110 </request>
    1111 \end{verbatim}\end{gsc}\end{quote}
    1112 
    1113 The Receptionist routes the message to the appropriate Action (determined by looking up its shortname$->$Action object map). The actions determine what information is needed from the server and retrieves it, making one or more internal requests to the MessageRouter. This information is gathered together into a single response, and returned to the Receptionist. The Receptionist may process the result further, depending on what type of Receptionist is it.
    1114 
    1115 
    1116 \begin{table}
    1117 {\footnotesize
    1118 \begin{tabular}{lll}
    1119 \hline
    1120 \bf Argument & \bf Meaning &\bf Typical values \\
    1121 \hline
    1122 a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
    1123 & & s (system)\\
    1124 sa & subaction & home, about (page action)\\
    1125 c & collection or  & demo, build \\
    1126 & service cluster \\
    1127 s & service name & TextQuery, ImportCollection \\
    1128 rt & request type & d (display), r (request), s (status) \\
    1129 ro & response only & 0 or 1 - if set to one, the request is carried out \\
    1130 & & but no processing of the results is done \\
    1131 & & currently only used in process actions \\
    1132 o & output type & XML, html, WML \\
    1133 l & language & en, fr, zh ...\\
    1134 d & document id & HASHxxx \\
    1135 r & resource id & ???\\
    1136 pid & process handle & an integer identifying a particular process request \\
    1137 \hline
    1138 \end{tabular}}
    1139 \caption{Generic arguments that can appear in a \gs\  URL}
    1140 \label{tab:args}
    1141 \end{table}
     1080There are different internal request types: describe, process, system, format, status. Process requests do the actual work of the system, while the other types get auxiliary information. The format of the requests and responses for each internal request type are described in the following sections. External style requests, and their page responses are described in the Section about page generation (Section~\ref{sec:pagegen}).
    11421081
    11431082\subsection{'describe'-type messages}\label{sec:describe}
     
    11761115
    11771116It is possible to ask just for a specific part of the information provided by a
    1178 describe request, rather than the whole thing.  For example, these two
    1179 messages get the \gst{collectionList} and the \gst{siteList} respectively:
     1117describe request, rather than the whole thing. For example, these two
     1118messages get the \gst{collectionList} and the \gst{siteList} respectively: 
    11801119\begin{quote}\begin{gsc}\begin{verbatim}
    11811120<request lang='en' type='describe' to=''>
     
    11921131\end{verbatim}\end{gsc}\end{quote}
    11931132
    1194 When a collection or service cluster is asked to describe itself, what is returned is a list of metadata, some display elements, and  a list of services.  For example, here is such
    1195 a message, along with a sample response.
     1133Subset options for the MessageRouter include \gst{collectionList}, \gst{serviceClusterList}, \gst{serviceList}, \gst{siteList}.
     1134
     1135When a collection or service cluster is asked to describe itself, what is returned is a list of metadata, some display elements, and  a list of services.  For example, here is such a message, along with a sample response.
    11961136
    11971137\begin{quote}\begin{gsc}\begin{verbatim}
     
    12311171\end{verbatim}\end{gsc}\end{quote}
    12321172
    1233 The subset parameter can also be used in a describe request to a collection, to retrieve just the \gst{metadataList} or \gst{serviceList}.
     1173Subset options for a collection or serviceCluster include \gst{metadataList}, \gst{serviceList}, and \gst{displayItemList}.
    12341174
    12351175This collection provides many typical services. Notice how this response lists the services available, while the collection configuration file for this collection (Figure~\ref{fig:collconfig}) described serviceRacks. Once the service racks have been configured, they become transparent in the system, and only services are referred to.
     
    12371177
    12381178A \gst{describe} request sent to a service returns a list of parameters that
    1239 the service accepts, some display information, (and in future may describe the content type for the request and response).
    1240 
    1241 Parameters can by in the following formats:
     1179the service accepts and some display information, (and in future may describe the content type for the request and response). Subset options for the request include \gst{paramList} and \gst{displayItemList}.
     1180
     1181Parameters can be in the following formats:
    12421182\begin{quote}\begin{gsc}\begin{verbatim}
    12431183<param name='xxx' type='integer|boolean|string|invisible' default='yyy'/>
     
    12801220A service description also contains some display information---this includes the name of the service, and the text  for the submit button.
    12811221
    1282 Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} gives an example html search form that may be generated from this describe response.
     1222Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} shows an example html search form that may be generated from this describe response.
    12831223
    12841224\begin{quote}\begin{gsc}\begin{verbatim}
     
    13941334Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the Receptionist must fill in this parameter before displaying the html.
    13951335
    1396 \subsubsection{'system'-type messages}\label{sec:system}
    1397 
    1398 ``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently they are initiated by particular CGI parameters (see Section~\ref{sec:runtime-config}).
     1336\subsection{'system'-type messages}\label{sec:system}
     1337
     1338``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently these requests are initiated by particular CGI requests (see Section~\ref{sec:runtime-config}).
    13991339
    14001340The basic format of a system request is as follows:
     
    14171357The third request is to activate collection demo. This could be a new collection, or a reactivation of an old one. If a collection module already exists, it will be deleted, and a new one loaded. The final request deactivates the site site1---this removes the site from the siteList and module map, and also removes any of that sites collections/services from the static lists.
    14181358
    1419 
    1420 A response just contains a status message, for example:
    1421 \begin{quote}\begin{gsc}\begin{verbatim}
    1422 <response from="">
    1423   <status>collectionList reconfigured successfully</status>
    1424 </response>
    1425 \end{verbatim}\end{gsc}\end{quote}
    1426 
    1427 At some stage, an error or status code should be included.
     1359A response just contains a status message\footnote{TODO: add in error/status codes}, for example:
     1360\begin{quote}\begin{gsc}\begin{verbatim}
     1361<status>MessageRouter reconfigured successfully</status>
     1362<status>Error on reconfiguring collectionList</status>
     1363<status>collection:demo activated</status>
     1364<status>site:site1 deactivated</status>
     1365\end{verbatim}\end{gsc}\end{quote}
    14281366
    14291367System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests.
     
    14451383\end{verbatim}\end{gsc}\end{quote}
    14461384
    1447 The actual format statements are described in Section~\ref{sec:formatstmt}. They are templates written directly in XSLT, or in GSF, which  stands for Greenstone Format, and is a simple XML representation of the more complicated XSLT templates.
    1448 GSF style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format XML is transformed to XSLT using XSLT with the config\_format.xsl stylesheet.
     1385The actual format statements are described in Section~\ref{sec:formatstmt}. They are templates written directly in XSLT, or in GSF (GreenStone Format) which is a simple XML representation of the more complicated XSLT templates.
     1386GSF-style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format XML is transformed to XSLT using XSLT with the config\_format.xsl stylesheet.
    14491387
    14501388\subsection{'status'-type messages}\label{sec:status}
    14511389
    1452 These are only used with process-type services, which are those where a request is sent to start some type of process (see Section~\ref{sec:process}). The initial response states whether the process had successfully started, and whether its still continuing. If the process is not finished, status requests can be sent repeatedly to the service to poll the status, using the pid to identify the  process.  Status codes are used to identify the state of a process. The values used at the moment are listed in Table~\ref{tab:status codes}\footnote{A more standard set of codes should probably be used, for example, the HTTP codes}.
     1390These are only used with process-type services, which are those where a request is sent to start some type of process (see Section~\ref{sec:process}). An initial 'process' request to a 'process' service generates a response which states whether the process had successfully started, and whether its still continuing. If the process is not finished, status requests can be sent repeatedly to the service to poll the status, using the pid to identify the  process.  Status codes are used to identify the state of a process. The values used at the moment are listed in Table~\ref{tab:status codes}\footnote{A more standard set of codes should probably be used, for example, the HTTP codes}.
    14531391
    14541392\begin{table}
     
    15061444\end{verbatim}\end{gsc}\end{quote}
    15071445
    1508 \subsubsection{process messages}
     1446\subsection{'process'-type messages}
    15091447
    15101448Process requests and responses  provide  the major functionality of the system---these are the ones that do the actual work. The format depends on the service they are for, so I'll describe these by service.
     
    17601698\end{verbatim}\end{gsc}\end{quote}
    17611699
    1762 The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of the status element is (currently) just the output from the process so far. Status messages, which are described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.
     1700The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of the status element is (currently) just the output from the process so far. Status messages, which were described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.
    17631701
    17641702\subsubsection{'applet'-type services}
     
    18121750
    18131751Enrich services typically take some text of documents (inside \gst{<nodeContent>} tags) and returns the text marked up in some way. One example of this is the GatePOSTag service: this identifies Dates, Locations, People and Organizations in the text, and annotates the text with the labels. In the following example, the request is for Location and Dates to be identified.
    1814 *** TODO ****
     1752
    18151753\begin{quote}\begin{gsc}\begin{verbatim}
    18161754<request lang="en" to="GatePOSTag" type="process">
     
    18371775    FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS
    18381776    <annotation type="Location">Rome</annotation>
    1839           <annotation type="Date">1986</annotation>
     1777        <annotation type="Date">1986</annotation>
    18401778        P-69
    18411779        ISBN 92-5-102397-2
     
    18471785\end{verbatim}\end{gsc}\end{quote}
    18481786
    1849 \subsection{Page generation}\label{sec:pagegen} **** REDO ********
    1850 
    1851 * talk general first: get data, get format info, transform gsf->xsl. transfrom xml->html
    1852 
    1853 * state saving. the XSLT files assume that arguments are saved somehow. This needs to be implemented outside \gs\  proper - we do this in the servlet, using something or other.
    1854 
    1855 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\  URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the CGI-arguments to determine what requests need to be made to the system.
    1856 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
    1857 
    1858 Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
     1787\subsection{Page generation}\label{sec:pagegen}
     1788
     1789A 'page' is some XML or HTML (or other?) data returned in response to an
     1790external 'page'-type request. These requests originate from outside \gs\ , for example from a servlet, or java application, and are received by the Receptionist. As described below in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\  URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to.
     1791
     1792Action modules decode the rest of the arguments to determine what requests need to be made to the system. One or more internal requests may be made to the MessageRouter. A request for format information from the Collection/Service may also be made. The resulting data is gathered together into a single XML response, \gst{<page>}, and returned to the Receptionist.
     1793
     1794The page format is described in Section~\ref{sec:page-format}. The XML may be returned as is, or may be modified by the Receptionist. The various Receptionists are described in Section~\ref{sec:recepts}. The default receptionist used by a servlet transforms the XML into HTML using XSL stylesheets. Section~\ref{sec:collformat} looks at collection specific formatting, in particular for HTML output.
     1795Sections~\ref{sec:pageaction} to \ref{sec:systemaction} look at the various actions and what kind of data they gather.
     1796
     1797\subsubsection{'page'-type requests and their arguments}\label{sec:page-requests}
     1798
     1799These are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a \gs\  URL.  The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
     1800
     1801Here are some examples of  requests\footnote{In a servlet context, these correspond to the arguments \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
     1802
     1803\begin{quote}\begin{gsc}\begin{verbatim}
     1804<request type='page' action='p' subaction='about'
     1805         lang='fr' output='html'>
     1806  <paramList>
     1807    <param name='c' value='demo'/>
     1808  </paramList>
     1809</request>
     1810\end{verbatim}\end{gsc}\end{quote}
     1811
     1812\begin{quote}\begin{gsc}\begin{verbatim}
     1813<request type='page' action='q' lang='en' output='html'>
     1814  <paramList>
     1815    <param name='s' value='TextQuery'/>
     1816    <param name='c' value='demo'/>
     1817    <param name='rt' value='r'/>
     1818    <!-- the rest are the service specific params -->
     1819    <param name='ca' value='0'/> <!-- casefold -->
     1820    <param name='st' value='1'/> <!-- stem -->
     1821    <param name='m' value='10'/> <!-- maxdocs -->
     1822    <param name='q' value='snail'/> <!-- query string -->
     1823  </paramList>
     1824</request>
     1825\end{verbatim}\end{gsc}\end{quote}
     1826
     1827There are some standard arguments used in Greenstone, and they are described in Table~\ref{tab:args}. These are used by Receptionists and Actions. The GSParams class specifies all the general basic arguments, and whether they should be saved or not (Some arguments need to be saved during a session, and this needs to be implemented outside \gs\  proper --- currently we do this in the servlet, using servlet session handling). The servlet has an init parameter \gst{params\_class} which specifies which params class to use: GSParams can be subclassed if necessary. The Receptionist and Actions must not have conflicting argument names.
     1828
     1829Other arguments are used dynamically and come from the Services. Service arguments must always be saved during a session. Services may be created by different people, and may reside on a different site. There is no guarantee that there is no conflict with argument names between services and actions. Therefore service parameters are namespaced when they are put on the page, whereas interface (receptionist and action) parameters have no namespace. The default namespace is s1 (service1) --- any parameters that are for the service will be prefixed by this. For example, the case parameter for a search will be put in the page as s1.case, and the resulting argument in a search URL will be s1.case. When actions are deciding which parameters need to be sent in a request to a service, they can use the namespace information.
     1830
     1831If there are  two or more services combined on a page with a single submit button, they will use namespaces s1, s2, s3 etc as needed. The s  (service) parameter will end up with a list of services. For example, \gst{s=TextQuery,MusicQuery,} and the order of these determines the mapping order of the namespaces, i.e. s1 will map to TextQuery, s2 to MusicQuery.
     1832
     1833\begin{table}
     1834{\footnotesize
     1835\begin{tabular}{lll}
     1836\hline
     1837\bf Argument & \bf Meaning &\bf Typical values \\
     1838\hline
     1839a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
     1840& & s (system)\\
     1841sa & subaction & home, about (page action)\\
     1842c & collection or  & demo, build \\
     1843& service cluster \\
     1844s & service name & TextQuery, ImportCollection \\
     1845rt & request type & d (display), r (request), s (status) \\
     1846ro & response only & 0 or 1 - if set to one, the request is carried out \\
     1847& & but no processing of the results is done \\
     1848& & currently only used in process actions \\
     1849o & output type & XML, html, WML \\
     1850l & language & en, fr, zh ...\\
     1851d & document id & HASHxxx \\
     1852r & resource id & ???\\
     1853pid & process handle & an integer identifying a particular process request \\
     1854\hline
     1855\end{tabular}}
     1856\caption{Generic arguments that can appear in a \gs\  URL}
     1857\label{tab:args}
     1858\end{table}
     1859
     1860\subsubsection{page format}\label{sec:page-format}
    18591861
    18601862The basic  page format  is:
    18611863\begin{quote}\begin{gsc}\begin{verbatim}
    1862 <page>
     1864<page lang='en'>
    18631865  <pageRequest/>
    18641866  <pageResponse/>
     
    18961898NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from the ClassifierBrowse service.
    18971899
    1898 By default, the LibraryServlet uses DefaultReceptionist. However, there is an init-param called receptionist which can be set to make the servlet use a different one.
    1899 
     1900By default, the LibraryServlet uses DefaultReceptionist. However, there is a servlet init-param called \gst{receptionist} which can be set to make the servlet use a different one.
     1901
     1902\subsubsection{Collection specific formatting}\label{sec:collformat}
     1903get format info, transform gsf->xsl. transfrom xml->html
     1904
     1905config params are passed in to the transformation
    19001906\subsubsection{CGI arguments}
    19011907
    1902 The arguments used by the page come from several sources. Receptionist uses a couple, actions use some and services. the receptionist and actions are treated as a whole so must not have conflicting arguments. GSParams class specifies all the general basic arguments, and whether they should be saved or not. servlet has an init parameter params\_class, that specifies which params class to use - if subclass it. actions or receptionist  may specify some new ones
    1903 
    1904 services may be created by different people, may be on a different site. cant guarantee no conflict with action params, or even with other services.
    1905 so service params are namespaced when they are put on the page. interface (recept and action) params will have no namespace) the default namespace is s1 (service1) - any parameters that are for the service will be prefixed by this. e.g. the case parameter for a search will be put in the page as s1.case.
    1906 The actions must now look for all the s1 parameters to send to the service.
    1907 
    1908 if there are  two or more services combined on a page with a single submit button, they will use s1, s2, s3 etc as needed. the s parameter (service) will end up with a list e.g. s=TextQuery,MusicQuery, and the order of these determines the mapping order of the namespaces, ie s1 will be TextQuery, s2 MusicQuery.
    1909 
    1910 also talk about saving arguments - save ones that GSParams says to save, and any service ones should always save.
    1911 
    1912 \subsubsection{Page action}
    1913 * kind of info pages. other actions are associated with specific services.
    1914 * uses describe requests to modules
    1915 Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.  The page is
    1916 transformed using \gst{home.xsl}.  For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
    1917 and a list of services, and the result is transformed using \gst{about.xsl}.
    1918 
    1919 
    1920 \subsubsection{Query action}
     1908
     1909\subsubsection{Page action}\label{sec:pageaction}
     1910
     1911PageAction is responsible for displaying kinds of information pages, such as the home page of the library, or the home page of a collection, or the help and preferenecs pages. These pages are not associated with specific services like the other page types. In general, the data comes from describe requests to various modules.
     1912The different pages are requested using the subaction argument. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.    For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
     1913and a list of services.
     1914
     1915
     1916\subsubsection{Query action}\label{sec:queryaction}
    19211917
    19221918The basic URL is \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}.
     
    19251921displayed, but should be cached. The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
    19261922is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
    1927 documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the XSLT that will be used to transform the page (Formatter object??). The service description and query result are combined into a page of XML, which is
    1928 transformed using \gst{basicquery.xsl} to produce the html page.
    1929 
    1930 \subsubsection{Applet action}
     1923documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the XSLT that will be used to transform the page. The service description and query result are combined into a page of XML, which is returned to the Receptionist.
     1924
     1925\subsubsection{Applet action}\label{sec:appletaction}
    19311926
    19321927There are two types of request to the applet action: \gst{a=a \& rt=d\/} and
     
    19351930into the page, and the servlet returns the HTML.
    19361931
    1937 The value \gst{rt=r} signals a request from the applet.  The result is returned
    1938 directly to the applet code, in XML.  The other parameters are sent to the
    1939 service untransformed, and the result is passed directly back to the applet.
    1940 Applet action can therefore work with any applet whose service understands the
    1941 messages.
    1942 
    1943 Here are two examples of requests generated by the Applet action, along with their corresponding responses.
    1944 
    1945 The first request corresponds to the URL arguments \gst{a=a \&
    1946 rt=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
    1947 applet for the mgppdemo collection''.
    1948 
    1949 
    1950 The second request corresponds to the  arguments \gst{a=a \& rt=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
    1951 indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
    1952 request as parameters. The response is in a form suitable for the applet, placed inside
    1953 \gst{<appletData>} in a standard \gs\  message.  AppletAction returns the
    1954 contents of appletData to the browser, i.e. to the applet itself.
    1955 
     1932The value \gst{rt=r} signals a request from the applet. A process request containing all the parameters is sent to the applet service. The result contains an appletData element, which contains a single  element - this element is returned
     1933directly to the applet, in XML. No transformation is done.
     1934Because the AppletAction doesn't know or care anything about the applet data, it can work with any applet-service pair.
    19561935
    19571936Note that the applet HTML may need to know the name of the \gst{library}
     
    19621941<PARAM NAME='library' VALUE=''/>
    19631942\end{verbatim}\end{gsc}\end{quote}
    1964 When the Applet action encounters this parameter it inserts the name of the
     1943When the AppletAction encounters this parameter it inserts the name of the
    19651944current library servlet as its value.
    19661945
    1967 \subsubsection{Document action}
    1968 
    1969 DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document.  At this stage no additional information is obtained, but in future stuff like Title and
    1970 table of contents would be needed to make the display nicer.
    1971 
    1972 
    1973 \subsubsection{System action}\label{sec:system-action}
     1946\subsubsection{Document action}\label{sec:documentaction}
     1947
     1948DocumentAction is responsible for displaying a document to the user. The display might involve some metadata and/or text for a document or part of a document. For hierarchical documents, a table of contents may be shown, while for paged documents (those with a single linear list of sections), next and previous page buttons may be shown. These different display types require different information about the document. Depending on the arguments, DocumentAction will send requests to several services: DocumentMetadataRetrieve, DocumentStructureRetrieve and DocumentContentRetrieve.
     1949
     1950A basic display, for example, Title and text, involves a metadata request to get the Title, and a content request to get the text. Hierarchical table of contents display requires a structure request. If the entire contents is to be displayed, the parameter \gst{structure=entire} would be sent in the request. Otherwise, parameters \gst{structure=ancestors}, \gst{structure=children} and possibly \gst{structure=siblings} may be used, depending in the position of the current node in the document. These return a hierarchical structure of nodes, containing ancestor nodes, child nodes and sibling nodes, respectively.
     1951For paged display, the structure is not actually needed. A structure request is still sent, but this time it requests some information, rather the structure itself. The information requested includes the number of siblings and the current position of the current node, or the number of children (if the current node is the root of the document).
     1952
     1953Metadata may be requested for the current node, or for any nodes in the structure, and content also. The metadata and content are added into the appropriate nodes in the structure hierarchy, and this is returned as the page data.
     1954
     1955\subsubsection{XML Document action}\label{sec:xmldocumentaction}
     1956
     1957XMLDOcumentAction is a little different to the standard DocumentAction. It operates in two modes, \gst{text} and \gst{toc}. In \gst{text} mode, it will retrieve the content of the current document node using a DocumentContentRetrieve request. In \gst{toc} mode, it retrieves the entire table of contents for the document using a DocumentStructureRetrieve request. Either mode may also retrieve metadata for the current section or each section in the table of contents.
     1958
     1959\subsubsection{GS2Browse action}\label{sec:browseaction}
     1960
     1961GS2BrowseAction is for displaying Greenstone 2 style classifiers.
     1962\subsubsection{System action}\label{sec:systemaction}
    19741963
    19751964SystemAction allows for manual reconfiguration of various components at run-time. There is no interactive web-page displaying the options, it merely turns a set of CGI arguments into an XML system request. The response from a system request is a message which is displayed to the user.
     
    19991988
    20001989
    2001 \subsubsection{Some class info - where should this go??}
     1990\subsection{Other code information}
     1991
     1992Greenstone has a set of Utility classes, which are briefly described in Table~\ref{tab:utils}.
     1993
    20021994\begin{table}[h]
    20031995\caption{The utility classes in org.greenstone.gsdl3.util}
     
    20082000\bf Utility class & \bf Description\\
    20092001\hline
    2010 ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\
    2011 Dictionary & wrapper around a Resource Bundle, providing strings with parameter\\
    2012 GSCGI & class to map between short name CGI arguments and long name request parameters \\
     2002Dictionary & wrapper around a Resource Bundle, providing strings with parameters\\
     2003GSConstants & holds some constants used for servlet arguments and configuration variables\\
     2004GSEntityResolver & an EntityResolver which can be used to find resources such as DTDs\\
    20132005GSFile & class to create all \gs\  file paths e.g. used to locate configuration files, XSLT files and collection data. \\
    20142006GSHTML & provides convenience methods for dealing with HTML, e.g. making strings HTML safe\\
     2007GSParams & contains names and default values for interface parameters\\
     2008NZDLParams & a subclass of GSParams which holds default service parameters too, necessary for the classic style interface.\\
    20152009GSPath & used to create, examine and modify message address paths\\
     2010GSSQL & contains static strings for all the SQL table/field names\\
    20162011GSStatus & some static codes for status messages\\
    20172012GSXML & lots of methods for extracting information out of \gs\  XML, and creating some common types of elements. Also has static Strings for element and attribute names used by \gs\ .\\
     
    20192014Misc & miscellaneous functions\\
    20202015OID & class to handle \gs\  (2) OIDs\\
     2016GS3OID & subclass of OID to handle \gsiii\ OIDs\\
     2017SQLQuery & contains a connection to a SQL database, along with some methods for accessing the data, such as converting MG numbers to and from Greenstone OIDs.\\
    20212018XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
    20222019XMLTransformer & methods to transform XML using XSLT \\
     
    20542051
    20552052\subsection{new interfaces}\label{sec:new-interfaces}
     2053
     2054It is easy to create new interfaces to \gsiii. Here we are talking about interfaces other than those to display in typical browser.
     2055
     2056Handheld devices: Use the standard servlet setup, but with a different set of XSLT files to format the pages for small screens, or use WML.
     2057
     2058Java GUI Interface: There are couple of alternatives. Depending on what you want to display in the GUI, you could talk to either a Receptionist or a MessageRouter. The library classes can be set up and compiled into the GUI program.
     2059Talking to a Receptionist will give you access to pages of XML. It is likely that the standard Receptionist class would be used - this doesn't transform the data to HTML. Queries such as ``give me the home page of a collection'' and ``do the following search'' can be issued. All teh data needed for the result view is returned. Queries are quite simple, but are limited to what kinds of Actions are available in the library.
     2060Talking to a MessageRouter requires a bit more effort on the part of the GUI program, but results in greater flexibility. The kinds of queries that can be issued are individual units of action, such as ``describe yourself'', ``search'', ``retrieve the content for this document''. More than one request may need to be made for a particular feature of the GUI. However you can ask for any combination of data available in the system, you are not relying on Actions. What you will implemenet though, may be a lot like the Action code in terms of request sequences.
     2061
     2062Interfaces in other programming languages: Because the communication is all XML based, other interfaces can talk to the Java library if a communication protocol is set up. This could be done using SOAP for example. LIke for Java GUI interfaces, the program could talk to a Receptionist or to a MessageRouter.
    20562063e.g. java interface. where you can interface to. MR vs Receptionist. diff receptionists. egs, handheld - using servlet, transforming recpt, but new set of XSLT java program other program - talk to recpt but just get back XML data for pages. java gui - just talk to MR, do all processing itself.
     2064
     2065Remote interfaces: remote interfaces can be set up in the same way as above, using a communication protocol between the interface, and the library program.
    20572066
    20582067\subsection{Adding new classifiers}\label{sec:new-classifiers}
     
    20652074There are two types of standard \gs\  collections: collections built with the \gsiii\  building system, and collections that are imported from \gsii\ . There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\  has an ability to use any type of collection you can come up with, assuming  some java code is provided.
    20662075
    2067 
    20682076There are four levels of customisation that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\  to describe these different levels.
    20692077
    20702078Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the \gsii\  MGPP collections were the first to be served in \gsiii\ . When we came to do \gsii\  MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
    20712079
    2072 The nzmaps collection used the same level of customisation, just implementing new services and fitting all the extra display elements into the standard query/display framework using javascript.
    2073 
    2074 The gberg collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
     2080The XML Sample Texts (gberg) collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
    20752081
    20762082To tell \gs\  to load up a different type of collection class, we use another configuration file: etc/collectionInit.xml. This  specifies the name of the collection class to use.
     
    20832089Document display is  significantly different to standard \gs\ . There are two modes of display: table of contents mode, and content mode. Clicking on a document link from the collection home page takes the user to the table of contents for the collection. Clicking on one of the sections in the table of contents takes them to a display of that section. To facilitate this, not only do we need new XSLT files , we also needed a new action. XMLDocumentAction was created, that used two subactions, toc and text, for the different modes of display.
    20842090
    2085 The Receptionist was told about this new action by the addition of the following to the interfaceConfig.xml file:
     2091The Receptionist was told about this new action by the addition of the following element to the interfaceConfig.xml file:
    20862092
    20872093\begin{gsc}\begin{verbatim}
     
    21252131Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not \gs\  format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with \gs\  format elements.
    21262132
    2127 The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to may all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder gsdl3/resources/dtd.
    2128 
    2129 \subsection{The NZDL mirror site}
    2130 
    2131 The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\  instead of \gsii\ . It uses a new site and a new interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and interface.
    2132 
    2133 The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because its running on Linux, we were able to link to all the collections in the old \gs\  installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
    2134 
    2135 A new interface, also called nzdl, was created in the interfaces directory.
    2136 In many cases, creating a new interface just requires the new images and XSLT  to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This setup also required a bit more customisation.
    2137 
    2138 The standard \gs\  navigation bar lists all the services available for the collection. In \gsii\ , the navigation bar provided the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produced the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
    2139 The standard receptionist (DefaultReceptionist) gathers a little bit of extra info for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface needed a bit more information than this: for the ClassifierBrowse service, if there was one, the list of classifiers and their display elements must be obtained. So a new Receptionist was written that inherited from DefaultReceptionist, and added this new info into the page.
     2133The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to make all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder gsdl3/resources/dtd.
     2134
     2135\subsection{The Classic Interface}
     2136
     2137The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\  instead of \gsii\ . It uses a new site (nzdl) with the classic interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and classic interface.
     2138
     2139The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because it is running on Linux, we were able to link to all the collections in the old \gs\  installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
     2140
     2141The classic interface was created to be used by this site (and is now a standard part of Greenstone).
     2142In many cases, creating a new interface just requires the new images and XSLT  to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This classic interface required a bit more customisation.
     2143
     2144The standard \gsiii\  navigation bar lists all the services available for the collection. In \gsii\ , the navigation bar provides the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produces the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
     2145The standard receptionist (DefaultReceptionist) gathers a little bit of extra information for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface needed a bit more information than this: for the ClassifierBrowse service, if there was one, the list of classifiers and their display elements must be obtained. So a new Receptionist (NZDLReceptionist) was written that inherited from DefaultReceptionist, and added this new info into the page.
    21402146
    21412147One of the servlet initialisation parameters is the receptionist class: this was added to the servlet definition in the web.xml file so that the LibraryServlet would load up the right receptionist class.
     
    21662172Sitename is the name of the site's directory, eg localsite. The siteuri is the identifier that will be used for the SOAP resource, eg org.greenstone.localsite. It should be a unique name amongst all the SOAP services that you want to connect to.
    21672173
    2168 The script makes sure that the SOAP servlet is deployed on Tomcat, and then deploys the service for the site specified. A resource file (\gst{sitename.xml}) is created which is used to specify the service. It can be found in \gst{gsdl3/resources/soap}, and is generated from \gst{site.xml.in}.
     2174The script deploys the service for the site specified. A resource file (\gst{sitename.xml}) is created which is used to specify the service. It can be found in \gst{gsdl3/resources/soap}, and is generated from \gst{site.xml.in}.
    21692175
    21702176To get siteA to talk to siteB, you need to deploy a SOAP server on siteB, then add a \gst{<site>} element to the \gst{<siteList>} of siteA's \gst{siteConfig.xml} file (in \gst{gsdl3/web/sites/siteA/siteConfig.xml}).
     
    21852191\section{Using \gsiii\  from CVS}\label{app:cvs}
    21862192
    2187 *** need to make sure building stuff is in here ***
     2193[TODO: need to make sure building stuff is in here]
    21882194
    21892195\gsiii\  is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes.
     
    21952201\begin{quote}\begin{gsc}\begin{verbatim}
    21962202cvs -d :pserver:cvs\[email protected]:2402/usr/local/
    2197            global-cvs/gsdl-src co gsdl3
     2203           global-cvs/gsdl-src co -P gsdl3
    21982204\end{verbatim}\end{gsc}\end{quote}
    21992205
    22002206If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some older versions of CVS have trouble accessing this repository due to the port number being present. We are using version 1.11.1p1.
    22012207
    2202 The software needs to be compiled and installed. The installation procedure has been semi-automated. The following sections describe installation under Linux and windows.
     2208The software needs to be compiled and installed. The installation procedure has been semi-automated. The following sections describe installation under Linux and windows. The most up to date instructions may be found in the README.txt file in the top level gsdl3 directory.
    22032209
    22042210\subsection{Linux install}
    22052211
    2206 An install.sh script is provided to compile and install \gsiii\ . What you need to do is:
     2212What you need to do is:
    22072213
    22082214\begin{quote}\begin{gsc}
     
    22102216source gs3-setup.sh\\
    22112217./gs3-prepare.sh\\
    2212 ./gs3-configure.sh \\
    2213 ./gs3-compile.sh \\
     2218./configure \\
     2219make \\
     2220make install \\
     2221\[make docs\] \\
    22142222./gs3-finalise.sh\\
     2223source gs3-setup.sh \\
    22152224\end{gsc}\end{quote}
    22162225
     
    22252234\subsection{Windows install}
    22262235
     2236[TODO: check that these are correct]
    22272237Make sure that the following environment variables are set: JAVA\_HOME (where the JAva 2 SDK is installed); PATH (should include the CVS program, and \%JAVA\_HOME\%$\backslash$bin). The following commands should be run in a DOS prompt.
    22282238
     
    22472257
    22482258To run \gs\ , run gs3-launch.bat. This will start the Tomcat server in a new DOS window (stop it by closing the window), and open a broser window showing the \gsiii\  homepage.
    2249 
    2250 \subsection{Creating a distribution}
    2251 
    2252 The installation scripts have been set up in such a way that it is easy to create different distribution types (for linux). To create a standard binary distribution, carry out the following steps:
    2253 
    2254 \begin{gsc}\begin{verbatim}
    2255 cvs co gsdl3
    2256 cd gsdl3
    2257 source gs3-setup.sh
    2258 ./gs3-prepare.sh
    2259 ./gs3-configure.sh
    2260 ./gs3-compile.sh
    2261 ./gs3-for-distribution.sh
    2262 
    2263 mv Header ../
    2264 cd ../
    2265 tar czvf gsdl3.tgz gsdl3/
    2266 cat Header gsdl3.tgz > gsdl3-x.xx-unix.sh
    2267 \end{verbatim}\end{gsc}
    2268 
    2269 Note that gs3-for-distribution.sh removes some files that are not needed for the distribution, including all the CVS directories. Once you have run this, you will no longer be able to update your gsdl3 code via cvs.
    2270 
    2271 To create a source distribution, you can do:
    2272 \begin{gsc}\begin{verbatim}
    2273 cvs co gsdl3
    2274 cd gsdl3
    2275 source gs3-setup.sh
    2276 ./gs3-prepare.sh
    2277 <delete unnecessary files>
    2278 cd ../
    2279 tar czvf gsdl3-x.xx-src.tgz gsdl3/
    2280 \end{verbatim}\end{gsc}
    2281 
    2282 Some of the gs3-for-distribution script will need to be run (at the stage of delete unnecessary files), and there needs to be instructions on what to do when someone downloads the source distro.
    2283  
    2284 I think it would be:
    2285 \begin{gsc}\begin{verbatim}
    2286 tar xzvf gsdl3-x.xx-src.tgz
    2287 cd gsdl3
    2288 source gs3-setup.sh
    2289 ./gs3-configure.sh
    2290 ./gs3-compile.sh
    2291 ./gs3-finalise.sh
    2292 \end{verbatim}\end{gsc}
    2293 
    22942259
    22952260\newpage
Note: See TracChangeset for help on using the changeset viewer.