Ignore:
Timestamp:
2003-05-08T09:29:29+12:00 (21 years ago)
Author:
kjdon
Message:

some more changes, but still not finished

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r4190 r4236  
    5252Native Interface) will be used to communicate with these.
    5353
    54 A description of the general design and architecture of Greenstone3 is covered by the document ``The design of Greenstone3: An agent based dynamic digital library'' (design-2002.ps, in the gsdl3/docs/manual directory).
     54A description of the general design and architecture of Greenstone3 is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory).
    5555
    5656\section{System modules}\label{sec:modules}
    5757
    58 A Greenstone3 'library' system consists of many components... Figure~\ref{fig:local} shows they fit together in a stand-alone system.
     58A Greenstone3 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system.
    5959
    6060\begin{figure}[t]
     
    7171Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different.
    7272
    73 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory.
     73{\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, eg searching, retrieving documents, building collections etc.
    7474
    7575{\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer.
    7676
    77 {\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.
     77{\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to Actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.
    7878
    7979{\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in  more detail in Section~\ref{sec:pagegen}.
     
    8989instructions on how the collection is to be built.  The second is produced by
    9090the build-time process and includes any metadata that can be determined
    91 automatically. It also includes configuration information for any serviceRacks needed by the collection.
     91automatically. It also includes configuration information for any ServiceRacks needed by the collection.
    9292
    9393The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will have no effect. There are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
     
    213213
    214214The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8.
    215 The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list suold look like.
    216 The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in teh hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.
    217 
    218 There is also a need for a descripiton of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.
     215The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list should look like.
     216The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in the hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.
     217
     218There is also a need for a description of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.
    219219
    220220\subsection{Building configuration file}\label{sec:buildconfig}
    221221
    222 The file \gst{buildConfig.xml} contains the metadata and other information about the collection that can
    223 be determined automatically when building the collection, such as the number of
    224 documents it contains.  It also includes a list of serviceRack classes that are
     222The file \gst{buildConfig.xml} is produced by the collection building process, and contains metadata and other information about the collection that can
     223be determined automatically, such as the number of
     224documents it contains.  It also includes a list of ServiceRack classes that are
    225225required at runtime to provide the services that have been built into the
    226226collection.  The serviceRack names are Java classes that are loaded
     
    291291
    292292The \gst{init()} method creates a new Receptionist and a new
    293 MessageRouter. The appropriate system variables are set in each (interface
    294 name, site name, etc.) and then \gst{configure()} is called. A MessageRouter
    295 reference is given to the Receptionist. The servlet then communicates only with
     293MessageRouter. By default, the base Receptionist and MessageRouter classes are used, but subclasses can be used if they are specified in the servlet init params (see Section~\ref{sec:tomcat}). The appropriate system variables are set in each (interface
     294name, site name, etc.) and then \gst{configure()} is called. The MessageRouter
     295is passed to the Receptionist. The servlet then communicates only with
    296296the Receptionist, not with the MessageRouter.
    297297
     
    303303to be connected to. 
    304304It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.).
    305 Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
     305Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.
    306306ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList.
    307 For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If teh server for teh remote site is up and running, this should  be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static lists.
    308 
    309 The MessageRouter also looks inside the site's \gst{collect} directory loads up a Collection object for each valid collection found.
     307For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If the server for the remote site is up and running, this should  be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure sites commands must be sent (see next section).
     308
     309The MessageRouter also looks inside the site's \gst{collect} directory, and loads up a Collection object for each valid collection found.
    310310
    311311The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml}
    312312files, determines the metadata, and loads ServiceRack classes based on the
    313313names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.
    314 Collection objects are added to teh module map with their name as a key, and also a collection element is added into teh collectionList xml.
     314Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList xml.
    315315
    316316\subsection{Run-time (re)configuration}\label{sec:runtime-config}
    317317
    318 The startup configuration reads in teh various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided.
    319 
    320 Currently there are commands to reconfigure the entire site, part of the site, single collections etc.
    321 The configure request messages are described in Section~\ref{sec:system}. A new action, SystemAction, is used to convert 'cgi'-arguments into system requests. Currently there is no configure web pages, but the arguments can be entered in the URL. The arguments and urls are described in Section~\ref{sec:system-action}.
    322 
    323 
    324 ***TODO***
    325 whats available, whats not. show URLS, refer to system messages in next section
     318The startup configuration reads in the various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided. Currently, these can only be accessed by typing in cgi-arguments into the URL, there is no nice web form yet to do this. SystemAction converts these arguments into system requests, which are described in Section~\ref{sec:system}.
     319
     320The cgi arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (a is action, sa is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{c=xxx}, where \gst{xxx} is the name of the collection or cluster.
     321
     322\begin{tabular}{lp{8cm}}
     323a=s\&sa=c & reconfigures the whole site, reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument ss (system subset). The valid values are collectionList, siteList, serviceList, clusterList. \\
     324a=s\&sa=c\&c=demo & reconfigures a collection or cluster. ss can also be used here, valid values are metadataList and serviceList. \\
     325a=s\&sa=a & activate a specific module. Modules are specified using two arguments, st (system module type) and sn (system module name). Valid types are collection, cluster site.\\
     326a=s\&sa=d & deactivate a module. st and sn can be used here too. Valid types are collection, cluster, site, service. \\
     327a=s\&sa=d\&c=demo & deactivate a module belonging to a collection or cluster. Valid types are service. \\
     328\end{tabular}
     329
    326330
    327331\section{System messages}\label{sec:messages}
    328332
    329 for each type of message, show the basic elements, then some example messages.
    330 Lists must only have the same elements in them.
    331333
    332334Once the system is up and running (the configuration
    333335process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing.
    334336
    335 First, we look at how messages originate, and how they flow in the system. Then, we examine the basic message
    336 format, and look at the different types of messages.
    337 
    338 \subsection{Message flow}
    339 
    340 \subsection{Basic format}
    341 
    342 All messages are enclosed in
    343 \begin{quote}\begin{gsc}\begin{verbatim}
    344 <message>
    345 \end{verbatim}\end{gsc}\end{quote}
    346 Messages contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='xx'}.
     337There are two different styles of messaging.  The first style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.Each individual  communication is contained in a \gst{<message>} element\footnote{all sample requests and responses shown here  are assumed to have \gst{<message>} elements}.
     338They contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests/responses. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='...'}.
    347339The language attribute is used by the XSLT to determine the language currently
    348340being used by the user interface.  Virtually all messages contain text strings,
    349 and services use this attribute to return strings in the appropriate language.
    350 
    351 There are two different styles of messaging, explained in the two subsections
    352 below.  The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system.  The response is a page of data, typically in HTML.  The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
    353 
    354 This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
     341and services use this attribute to return strings in the appropriate language. Element and attribute names are formated in lower case with the first letter of internal words capitalized, like 'matchDocs'. Each request typically specifies one service or one action, and the response contains either the data requested, or a status message.
     342Lists must only have the same elements in them.(put this here??)
     343
     344Requests have
     345a \gst{to} attribute and responses have \gst{from}.  These are addresses used
     346by routing modules.  For example \gst{to='site1/demo/TextQuery'} routes a
     347message to modules named site1, demo then TextQuery. These modules happen to be a MessageRouter for a remote site (site1), a Collection (demo), and a Service (TextQuery).
     348
     349There are several types of request: 'describe', 'system', 'process', 'status', 'format'. These requests can ask for any functionality available in the system.
     350The second messaging style is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has a request type of 'cgi'. It has the same format as any other request in the system.  The response, however, is a page of data, typically in HTML.
     351
     352These cgi-type messages come into the Receptionist and are passed to the appropriate action. The actions generate appropriate internal messages which are sent to the MessageRouter. The responses are put together into a single piece of XML and transformed, using XSLT, into a 'page' of HTML.
    355353
    356354\subsection{cgi-type messages}\label{sec:cgi}
    357355
    358 Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
     356These are the special 'external'-style messages. Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a
    359357Greenstone URL.  The two main arguments are \gst{a} (action) and \gst{sa}
    360358(subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for
     
    377375field is used to indicate what type of output to return. The actions do not
    378376return responses in the normal format; instead they return a page of
    379 information, expressed by default in HTML. Alternative formats could be XML or WML.
     377information, expressed by default in HTML. Alternative formats could be XML or WML. The basic structure of the XML data (before transformation to HTML or other) is described in Section~\ref{sec:pagegen}. What the HTML looks like depends on the XSLT used to transform the data, and will not be shown here.
    380378
    381379The LibraryServlet class communicates with the Receptionist, which is the entry
    382380point into the system.  Future GUIs could communicate either with the
    383 Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
     381Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. However, the Receptionist will pass other types of request directly to the MessageRouter. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
    384382
    385383The cgi arguments used currently are shown in Table~\ref{tab:args}.
    386 Other arguments can be specified by  particular actions.. For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
     384Other arguments can be specified by  particular actions. For example, when the query action receives a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
    387385
    388386\begin{table}
     
    399397s & service name & TextQuery, ImportCollection \\
    400398rt & request type & d (display), r (request), s (status) \\
    401 ro & request only & 0 or 1 - if set to one, the request is carried out \\
     399ro & response only & 0 or 1 - if set to one, the request is carried out \\
    402400& & but no processing of the results is done \\
    403401& & currently only used in process actions \\
     
    413411\end{table}
    414412
    415 Here is an example message that retrieves the home page in French:
    416 \begin{quote}\begin{gsc}\begin{verbatim}
    417 <message>
    418   <request lang='fr' type='cgi' action='p' subaction='home'
     413Here is an example request that retrieves the home page in French:
     414\begin{quote}\begin{gsc}\begin{verbatim}
     415a=p&sa=home&l=fr
     416
     417<request lang='fr' type='cgi' action='p' subaction='home'
    419418    output='html'/>
    420 </message>
    421 \end{verbatim}\end{gsc}\end{quote}
    422 
    423 This message represents a text query:
    424 \begin{quote}\begin{gsc}\begin{verbatim}
    425 <message>
    426   <request  lang='en' type='cgi' action='q'  output='html'>
     419\end{verbatim}\end{gsc}\end{quote}
     420
     421This request represents a text query:
     422\begin{quote}\begin{gsc}\begin{verbatim}
     423a=q&l=en&s=TextQuery&c=demo&rt=r&ca=0&st=1&m=10&q=snail
     424
     425<request  lang='en' type='cgi' action='q'  output='html'>
    427426  <paramList>
    428427    <param name='s' value='TextQuery'/>
     
    435434    <param name='q' value='snail'/> <!-- query string -->
    436435  </paramList>
    437 </message>
    438 \end{verbatim}\end{gsc}\end{quote}
    439 
    440 \subsubsection{Module to module messages}
    441 
    442 In Greenstone3's modular architecture messages are used extensively to pass
    443 information from one module to another, for example from an Action to the
    444 MessageRouter module, and from that module to a service module.  Requests have
    445 a \gst{to} attribute and responses have \gst{from}.  These are addresses used
    446 by routing modules.  For example \gst{to='site1/site2/demo/TextQuery'} routes a
    447 message to a MessageRouter (\gst{site1}), from there to another MessageRouter
    448 (\gst{site2}), from there to a collection (\gst{demo}), and from there to a
    449 particular service (\gst{TextQuery}).
    450 
    451 Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system.
    452 
     436</request>
     437\end{verbatim}\end{gsc}\end{quote}
     438
     439These cgi requests get passed to the appropriate action, which determines what data is required for the page, and what internal requests to send off. The page generation process for the different actions is described in Section~\ref{sec:pagegen}.
    453440\subsection{'describe'-type messages}\label{sec:describe}
    454 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
    455 \begin{quote}\begin{gsc}\begin{verbatim}
    456 <message>
    457   <request lang='en' type='describe' to=''/>
    458 </message>
    459 \end{verbatim}\end{gsc}\end{quote}
    460 If the \gst{to} field is empty, the request is answered by the first module that it is passed to.
     441This is the first of the internal messages.
     442The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The info is predefined apart from any language specific text strings, which are put together as each request comes in.
     443\begin{quote}\begin{gsc}\begin{verbatim}
     444<request lang='en' type='describe' to=''/>
     445\end{verbatim}\end{gsc}\end{quote}
     446If the \gst{to} field is empty, it is answered by the MessageRouter.
    461447An example response from a MessageRouter might look like this:
    462448\begin{quote}\begin{gsc}\begin{verbatim}
    463 <message>
    464   <response lang='en' type='describe'>
    465     <serviceList>
    466       <service name='CrossCollectionSearch' type='query' />
    467     </serviceList>
    468     <siteList>
    469       <site name='org.greenstone.gsdl1'
     449<response lang='en' type='describe'>
     450  <serviceList>
     451    <service name='CrossCollectionSearch' type='query' />
     452  </serviceList>
     453  <siteList>
     454    <site name='org.greenstone.gsdl1'
    470455            address='http://localhost:8080/soap/servlet/rpcrouter'
    471456            type='soap' />
    472     </siteList>
    473     <collectionList>
    474       <collection name='org.greenstone.gsdl1/
     457  </siteList>
     458  <collectionList>
     459    <collection name='org.greenstone.gsdl1/
    475460                  org.greenstone.gsdl2/fao' />
    476       <collection name='org.greenstone.gsdl1/demo' />
    477       <collection name='org.greenstone.gsdl1/fao' />
    478       <collection name='myfiles' />
    479     </collectionList>
    480   </response>
    481 </message>
     461    <collection name='org.greenstone.gsdl1/demo' />
     462    <collection name='org.greenstone.gsdl1/fao' />
     463    <collection name='myfiles' />
     464  </collectionList>
     465</response>
    482466\end{verbatim}\end{gsc}\end{quote}
    483467This MessageRouter has one site-wide service, a cross-collection searching service. It
     
    488472
    489473It is possible to ask just for a specific part of the information provided by a
    490 describe request, rather than the whole message.  For example, these two
     474describe request, rather than the whole thing.  For example, these two
    491475messages get the \gst{collectionList} and the \gst{siteList} respectively:
    492476\begin{quote}\begin{gsc}\begin{verbatim}
    493 <message lang='en'>
    494   <request type='describe' to='' info='collectionList'/>
    495 </message>
    496 
    497 <message lang='en'>
    498   <request type='describe' to='' info='siteList'/>
    499 </message>
    500 \end{verbatim}\end{gsc}\end{quote}
    501 When a collection is asked to describe itself, what is returned is all of the
     477<request lang='en' type='describe' to=''>
     478  <paramList>
     479    <param name='subset' value='collectionList'/>
     480  </paramList>
     481</request>
     482
     483<request lang='en' type='describe' to=''>
     484  <paramList>
     485    <param name='subset' value='siteList'/>
     486  </paramList>
     487</request>
     488\end{verbatim}\end{gsc}\end{quote}
     489When a collection or service cluster is asked to describe itself, what is returned is all of the
    502490collection specific metadata and a list of services.  For example, here is such
    503491a message, along with a sample response.
    504492
    505493\begin{quote}\begin{gsc}\begin{verbatim}
    506 <message lang='en'>
    507   <request type='describe' to='demo'/>
    508 </message>
    509 
    510 <message>
    511   <response lang='en' type='describe' from='demo' >
    512     <collection name='demo'>
    513       <serviceList>
    514         <service name='TextQuery' type='query' />
    515         <service name='DocRetrieve' type='query' />
    516         <service name='MetadataRetrieve' type='query' />
    517       </serviceList>
    518       <metadataList>
    519         <metadata name='numDocs'>321</metadata>
    520         <metadata name='numSections'>5532</metadata>
    521         <metadata name='title'>The demo collection</metadata>
    522         <metadata name='aboutText'>This is a demo collection.
    523     </metadata>
    524       </metadataList>
    525     </collection>
    526   </response>
    527 </message>
    528 \end{verbatim}\end{gsc}\end{quote}
     494<request lang='en' type='describe' to='demo'/>
     495
     496<response lang='en' type='describe' from='demo' >
     497  <collection name='demo'>
     498    <serviceList>
     499      <service name='TextQuery' type='query' />
     500      <service name='DocumentContentRetrieve' type='retrieve' />
     501      <service name='DocumentMetadataRetrieve' type='retrieve' />
     502    </serviceList>
     503    <metadataList>
     504      <metadata name='numDocs'>321</metadata>
     505      <metadata name='numSections'>5532</metadata>
     506      <metadata name='colName' lang='en'>The demo collection</metadata>
     507      <metadata name='colDescription' lang='en'>This is a demo collection.
     508      </metadata>
     509    </metadataList>
     510  </collection>
     511</response>
     512\end{verbatim}\end{gsc}\end{quote}
     513
     514The subset parameter can also be used in a describe request to a collection, to retrieve just the metadataList or serviceList.
     515
    529516A \gst{describe} request sent to a service returns a list of parameters that
    530 the service accepts, and describes the content type for the request and
    531 response.
     517the service accepts, some display information, (and in future may describe the content type for the request and response).
    532518
    533519Parameters have the following format:
     
    542528</param>
    543529\end{verbatim}\end{gsc}\end{quote}
     530****describe the various types, what the type means - display purposes- etc.
     531
    544532If no default is specified, the parameter is assumed to be mandatory.
    545533Here are some examples of parameters:
     
    565553
    566554\end{verbatim}\end{gsc}\end{quote}
    567 Here is a message, along with a sample response.
    568 \begin{quote}\begin{gsc}\begin{verbatim}
    569 <message>
    570   <request lang='en'  type='describe' to='demo/TextQuery'/>
    571 </message>
    572 
    573 <message>
    574   <response lang='en' type='describe' from='demo/TextQuery' >
    575     <service name='TextQuery' type='query'>
     555The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in   a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or more items, respectively, can be selected.
     556A multi-type parameter indicates that two or more parameters are associated, and should be displayed appropriately. For example, in a field query, the text box and field list should be associated. The occurs attribute specifies how many times the parameter should be displayed on the page.
     557Parameters also come with display information...
     558
     559A service description also contains a display element - this contains all the language dependent text strings - put together on the fly. These strings are name of the service, what to use for the submit button, and text strings for all the parameters: name, what each value is called, etc.
     560Here is a request, along with a sample response.
     561
     562\begin{quote}\begin{gsc}\begin{verbatim}
     563<request lang='en'  type='describe' to='demo/TextQuery'/>
     564
     565<response lang='en' type='describe' from='demo/TextQuery' >
     566  <service name='TextQuery' type='query'>
     567  <paramList>
     568    <param name='matchDocs' type='integer' default='50/>
     569    <param name='case' type='boolean' default='1'/>
     570    <param name='index' type='enum' default='tt'>
     571      <option name='tt'/>
     572      <option name='t0'/>
     573    </param>
     574  </paramList>
     575</response>
     576\end{verbatim}\end{gsc}\end{quote}
     577\begin{figure}
     578\begin{quote}\begin{gsc}\begin{verbatim}
     579<request lang="en" to="mgppdemo/FieldQuery" type="describe" />
     580
     581<response from="mgppdemo/FieldQuery" type="describe">
     582  <service name="FieldQuery" type="query">
    576583    <paramList>
    577       <param name='matchDocs' type='integer' default='50/>
    578       <param name='case' type='boolean' default='1'/>
    579       <param name='index' type='enum' default='tt'>
    580         <option name='tt'/>
    581     <option name='t0'/>
     584      <param default="Section" name="level" type="enum_single">
     585        <option name="Document" />
     586        <option name="Section" />
     587      </param>
     588      <param default="1" name="case" type="boolean" />
     589      <param default="1" name="stem" type="boolean" />
     590      <param default="10" name="maxDocs" type="integer" />
     591      <param name="simpleField" occurs="4" type="multi">
     592        <param name="fqv" type="string" />
     593        <param default="" name="fqf" type="enum_single">
     594          <option name="ZZ" /><option name="TX" />
     595          <option name="SU" /><option name="TI" />
     596        </param>
    582597      </param>
    583598    </paramList>
    584   </response>
    585 </message>
    586 \end{verbatim}\end{gsc}\end{quote}
    587 
    588 So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services.
     599    <display>
     600      <name>Form Query</name>
     601      <submit>Search</submit>
     602      <param name="level">
     603        <name>Granularity to search at</name>
     604        <option name="Document">Document</option>
     605        <option name="Section">Section</option>
     606      </param>
     607      <param name="case">
     608        <name>Turn casefolding </name>
     609        <option name="0">off</option>
     610        <option name="1">on</option>
     611      </param>
     612      <param name="stem">
     613        <name>Turn stemming </name>
     614        <option name="0">off</option>
     615        <option name="1">on</optin>
     616      </param>
     617      <param name="maxDocs">
     618        <name>Maximum documents to return</name>
     619      </param>
     620      <param name="fqv">
     621        <name>Search for </name>
     622      </param>
     623      <param name="fqf">
     624        <name>in field</name>
     625        <option name="ZZ">All fields</option>
     626        <option name="TX">TextOnly</option>
     627        <option name="SU">Subject</option>
     628        <option name="TI">Title</option>
     629     </param>
     630    </display>
     631  </service>
     632</response>
     633\end{verbatim}\end{gsc}\end{quote}
     634\end{figure}
     635
     636\begin{figure}[t]
     637  \centering
     638  \includegraphics[width=3.5in]{query2.ps}
     639  \caption{Sample query form.}
     640  \label{fig:query}
     641\end{figure}
     642
     643describe request to an applet type service: returns ...
     644\begin{quote}\begin{gsc}\begin{verbatim}
     645<request type='describe' to='mgppdemo/PhindApplet'/>
     646
     647<response type='describe'>
     648  <service name='PhindApplet' type='query'>
     649    <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
     650            jaxp.jar, xml-apis.jar'
     651            CODE='org.greenstone.applet.phind.Phind.class'
     652            CODEBASE='lib/java'
     653            HEIGHT='400' WIDTH='500'>
     654      <PARAM NAME='library' VALUE=''/>
     655      <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
     656      <PARAM NAME='collection' VALUE='mgppdemo' />
     657      <PARAM NAME='classifier' VALUE='1' />
     658      <PARAM NAME='orientation' VALUE='vertical' />
     659      <PARAM NAME='depth' VALUE='2' />
     660      <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
     661      <PARAM NAME='backdrop' VALUE='interfaces/default/>
     662                      images/phindbg1.jpg'/>
     663      <PARAM NAME='fontsize' VALUE='10' />
     664      <PARAM NAME='blocksize' VALUE='10' />
     665      The Phind java applet.
     666    </applet>
     667  </service>
     668</response>
     669\end{verbatim}\end{gsc}\end{quote}
    589670
    590671\subsection{'system'-type messages}\label{sec:system}
    591 ``System'' requests are used to tell the MessageRouter or a Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
    592 
    593 So far, we have \gst{activate} and \gst{deactivate} configure requests.
    594 Some examples are as follows.
    595 \begin{quote}\begin{gsc}\begin{verbatim}
    596 <message><request type='configure' to=''>
    597 <configure action='deactivate' type='collection' name='demo'/>
    598 </request></message>
    599 
    600 <message><request type='configure' to=''>
    601 <configure action='activate' type='collection' name='demo'/>
    602 </request></message>
    603 
    604 <message><request type='configure' to=''>
    605 <configure action='activate' type='serviceRack'
    606            name='TranslationServices'/>
    607 </request></message>
    608 \end{verbatim}\end{gsc}\end{quote}
    609 
    610 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above.  Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first.
    611 
    612 The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
    613 \begin{quote}\begin{gsc}\begin{verbatim}
    614 <message><response from='' type='configure'>
    615   <status>demo collection activated</status>
    616 </response></message>
    617 \end{verbatim}\end{gsc}\end{quote}
    618 \footnote{this format not properly defined yet}
    619 
    620 Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
    621 
    622 \subsection{'process'-type messages}
     672``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change.
     673
     674The basic format of a system request is as follows:
     675
     676\begin{quote}\begin{gsc}\begin{verbatim}
     677<request type='system' to=''>
     678  <system .../>
     679</request>
     680\end{verbatim}\end{gsc}\end{quote}
     681
     682Each system request is specified in a system element. The following are examples:
     683\begin{quote}\begin{gsc}\begin{verbatim}
     684<system type='configure' subset=''/>
     685<system type='configure' subset='collectionList'/>
     686<system type='activate' moduleType='collection' moduleName='demo'/>
     687<system type='deactivate' moduleType='site' moduleName='site1'/>
     688\end{verbatim}\end{gsc}\end{quote}
     689
     690The first request reconfigures the whole site---the MessageRouter goes through its whole configure process again. The second request just reconfigures the collectionList---the MessageRouter will delete all its collection information, and re-look through the collect directory and reload all the collections again.
     691The third request is to activate collection demo. This could be a new collection, or a reactivation of an old one. If a collection module already exists, it will be deleted, and a new one loaded. The final request deactivates the site site1---this removes the site from the siteList and module map, and also removes any of that sites collections/services from the static lists.
     692
     693
     694A response just contains a status message, for example:
     695\begin{quote}\begin{gsc}\begin{verbatim}
     696<response from="">
     697  <status>collectionList reconfigured successfully</status>
     698</response>
     699\end{verbatim}\end{gsc}\end{quote}
     700
     701
     702System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests.
     703
     704\subsection{'process'-type messages} ***** TODO ****
    623705
    624706divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse...
     
    804886\subsubsection{'enrich]-type services}
    805887
    806 \subsection{'status'-type messages}
    807 
    808 
    809 \subsection{'format'-type messages}
    810 
    811 \subsection{'applet'-type services}
    812 
    813 \section{Page generation}\label{sec:pagegen}
    814 
    815 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
    816 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
    817 
    818 Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
    819 
    820 The basic  page format  is:
    821 \begin{quote}\begin{gsc}\begin{verbatim}
    822 <page>
    823   <pageExtra>
    824     <config/>
    825     <display/>
    826   </pageExtra>
    827   <pageRequest/>
    828   <pageResponse/>
    829 </page>
    830 \end{verbatim}\end{gsc}\end{quote}
    831 
    832 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
    833 
    834 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
    835 
    836 
    837 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
    838 located in interfaces/default/transforms. Collections, sites and other interfaces
    839 can override these files by having their own copy of the appropriate
    840 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
    841 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
    842 ***TODO*** describe a bit more??
    843 
    844 \subsection{Internationalization}
    845 
    846 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
    847 
    848 Language specific text strings are specified in resource bundle property files. These live in resources/java.
    849 
    850 There is a properties file per class, and one per interface. At the moment, we have
    851 
    852 GS2MGPPSearch.properties
    853 GS2MGPPRetrieve.properties etc - the service classes
    854 
    855 interface\_default.properties. - for the default interface
    856 
    857 To add other languages, create eg GS2MGPPSearch\_fr.properties.
    858 
    859 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there.
    860 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
    861 
    862 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
    863 
    864 \subsection{Page action}
    865 
    866 Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.  The page is
    867 transformed using \gst{home.xsl}.  For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
    868 and a list of services, and the result is transformed using \gst{about.xsl}.
    869 
    870 
    871 \subsection{Query action}
    872 
    873 There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
    874 For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is done every time the query page is
    875 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
    876 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
    877 documents, with a request for their \gst{Title} metadata.  The service description and query result are combined into a page of xml, which is
    878 transformed using \gst{basicquery.xsl} to produce the html page.
    879 
    880 \subsection{Applet action}
    881 
    882 There are two types of request to the applet action: \gst{a=a \& sa=d\/} and
    883 \gst{a=a \& sa=r\/}.  The value \gst{sa=d\/} means ``display the applet.'' A
    884 \gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element.  The transformation file \gst{applet.xsl} embeds this
    885 into the page, and the servlet returns the HTML.
    886 
    887 The value \gst{sa=r} signals a request from the applet.  The result is returned
    888 directly to the applet code, in XML.  The other parameters are sent to the
    889 service untransformed, and the result is passed directly back to the applet.
    890 Applet action can therefore work with any applet whose service understands the
    891 messages.
    892 
    893 Here are two examples of requests generated by the Applet action, along with their corresponding responses.
    894 
    895 The first request corresponds to the URL arguments \gst{a=a \&
    896 sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
    897 applet for the mgppdemo collection''.
    898 
    899 \begin{quote}\begin{gsc}\begin{verbatim}
    900 <message>
    901   <request type='describe' to='mgppdemo/PhindApplet'/>
    902 </message>
    903 
    904 <message>
    905   <response type='describe'>
    906     <service name='PhindApplet' type='query'>
    907       <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar,
    908             jaxp.jar, xml-apis.jar'
    909               CODE='org.greenstone.applet.phind.Phind.class'
    910               CODEBASE='lib/java'
    911               HEIGHT='400' WIDTH='500'>
    912         <PARAM NAME='library' VALUE=''/>
    913         <PARAM NAME='phindcgi' VALUE='?a=a&amp;sa=r&amp;sn=Phind'/>
    914         <PARAM NAME='collection' VALUE='mgppdemo' />
    915         <PARAM NAME='classifier' VALUE='1' />
    916         <PARAM NAME='orientation' VALUE='vertical' />
    917         <PARAM NAME='depth' VALUE='2' />
    918         <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' />
    919         <PARAM NAME='backdrop' VALUE='interfaces/default/
    920                       images/phindbg1.jpg'/>
    921         <PARAM NAME='fontsize' VALUE='10' />
    922         <PARAM NAME='blocksize' VALUE='10' />
    923         The Phind java applet.
    924       </applet>
    925     </service>
    926   </response>
    927 </message>
    928 \end{verbatim}\end{gsc}\end{quote}
    929 
    930 The second request corresponds to the  arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
    931 indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
    932 request as parameters. The response is in a form suitable for the applet, placed inside
    933 \gst{<appletData>} in a standard Greenstone message.  AppletAction returns the
    934 contents of appletData to the browser, i.e. to the applet itself.
     888\subsubsection{'applet'-type services}
    935889
    936890\begin{quote}\begin{gsc}\begin{verbatim}
     
    979933\end{verbatim}\end{gsc}\end{quote}
    980934
     935\subsection{'status'-type messages}
     936
     937
     938\subsection{'format'-type messages}
     939
     940\begin{quote}\begin{gsc}\begin{verbatim}
     941<request lang="en" to="mgppdemo/FieldQuery" type="format" />
     942
     943<response from="mgppdemo/FieldQuery" type="format">
     944  <format>
     945    <gsf:template match="documentNode"><td><gsf:link><gsf:metadata name="Title" />(<gsf:metadata name="Source" />)</gsf:link></td></gsf:template>
     946  </format>
     947</response>
     948\end{verbatim}\end{gsc}\end{quote}
     949
     950\section{Page generation}\label{sec:pagegen}
     951
     952URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.
     953System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
     954
     955Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user.
     956
     957The basic  page format  is:
     958\begin{quote}\begin{gsc}\begin{verbatim}
     959<page>
     960  <pageExtra>
     961    <config/>
     962    <display/>
     963  </pageExtra>
     964  <pageRequest/>
     965  <pageResponse/>
     966</page>
     967\end{verbatim}\end{gsc}\end{quote}
     968
     969There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
     970
     971The following subsections outline, for each action, what data is needed and what requests are generated to send to the system.
     972
     973
     974Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are
     975located in interfaces/default/transforms. Collections, sites and other interfaces
     976can override these files by having their own copy of the appropriate
     977files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
     978interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
     979***TODO*** describe a bit more??
     980
     981\subsection{Internationalization}
     982
     983Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
     984
     985Language specific text strings are specified in resource bundle property files. These live in resources/java.
     986
     987There is a properties file per class, and one per interface. At the moment, we have
     988
     989GS2MGPPSearch.properties
     990GS2MGPPRetrieve.properties etc - the service classes
     991
     992interface\_default.properties. - for the default interface
     993
     994To add other languages, create eg GS2MGPPSearch\_fr.properties.
     995
     996The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there.
     997xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
     998
     999All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
     1000
     1001\subsection{Page action}
     1002
     1003Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.  The page is
     1004transformed using \gst{home.xsl}.  For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
     1005and a list of services, and the result is transformed using \gst{about.xsl}.
     1006
     1007
     1008\subsection{Query action}
     1009
     1010There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
     1011For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is done every time the query page is
     1012displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
     1013is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
     1014documents, with a request for their \gst{Title} metadata.  The service description and query result are combined into a page of xml, which is
     1015transformed using \gst{basicquery.xsl} to produce the html page.
     1016
     1017\subsection{Applet action}
     1018
     1019There are two types of request to the applet action: \gst{a=a \& sa=d\/} and
     1020\gst{a=a \& sa=r\/}.  The value \gst{sa=d\/} means ``display the applet.'' A
     1021\gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element.  The transformation file \gst{applet.xsl} embeds this
     1022into the page, and the servlet returns the HTML.
     1023
     1024The value \gst{sa=r} signals a request from the applet.  The result is returned
     1025directly to the applet code, in XML.  The other parameters are sent to the
     1026service untransformed, and the result is passed directly back to the applet.
     1027Applet action can therefore work with any applet whose service understands the
     1028messages.
     1029
     1030Here are two examples of requests generated by the Applet action, along with their corresponding responses.
     1031
     1032The first request corresponds to the URL arguments \gst{a=a \&
     1033sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind
     1034applet for the mgppdemo collection''.
     1035
     1036
     1037The second request corresponds to the  arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this
     1038indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
     1039request as parameters. The response is in a form suitable for the applet, placed inside
     1040\gst{<appletData>} in a standard Greenstone message.  AppletAction returns the
     1041contents of appletData to the browser, i.e. to the applet itself.
     1042
     1043
    9811044Note that the applet HTML may need to know the name of the \gst{library}
    9821045program.  However, that name is chosen by the person who installed the software
     
    10231086\section{Collection formation}
    10241087
    1025 So far, only Greenstone2 style building is available. This uses the import.pl and buildcol.pl perl scripts from Greenstone2. THese scripts and their needed perl modules have not been added to teh Greenstone3 system, so to do building, you need to have Greenstoen2 installed, and GSDLHOME, and GSDLOS set. (can do this by running 'source setup.bash' in the top level directory of gsdl.
    1026 
    1027 There are three ways of getting collections into greenstoen3.
     1088So far, only Greenstone2 style building is available. This uses the import.pl and buildcol.pl perl scripts from Greenstone2. These scripts and their needed perl modules have not been added to the Greenstone3 system, so to do building, you need to have Greenstone2 installed, and GSDLHOME, and GSDLOS set. (can do this by running 'source setup.bash' in the top level directory of gsdl.
     1089
     1090There are three ways of getting collections into Greenstone3.
    10281091
    10291092\subsection{Importing gs2 collections}
    10301093
    1031 Collections built in a Greenstone2 system can be used in Greensotne3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs3.pl}. You need to specify the collect directory and the collection name. Eg.
     1094Collections built in a Greenstone2 system can be used in Greenstone3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs3.pl}. You need to specify the collect directory and the collection name. Eg.
    10321095
    10331096\gst{convert\_coll\_from\_gs2.pl -collectdir /research/kjdon/gsdl3/web/sites/localsite/collect demo}
     
    10381101\subsection{Building new collections through the web interface}
    10391102
    1040 Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to teh build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to teh import directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the config files by hand. Editing collect.cfg will change the way building is done (by Greenstone2), and editing collectionConfig.xml will change the way the collection is used (by Greenstone3).
     1103Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to the build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to the import directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the config files by hand. Editing collect.cfg will change the way building is done (by Greenstone2), and editing collectionConfig.xml will change the way the collection is used (by Greenstone3).
    10411104
    10421105You need to carry out the following steps:
     
    10771140Building stuff is in src/java/org/greenstone/gsdl3/build.
    10781141
    1079 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
     1142CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses Greenstone 2 Perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
    10801143
    10811144\subsection{Collection design}\label{sec:colldesign}
     
    11861249\newcommand{\gshome}{\$GSDLHOME}
    11871250
    1188 Cuurently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}.
     1251Currently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}.
    11891252
    11901253\subsubsection{Get the source}
     
    12071270If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some versions of CVS have trouble accessing this repository. We are using version 1.11.1p1.
    12081271
    1209 \subsubsection{Compile and install greenstone}\label{subsec:compile}
     1272\subsubsection{Compile and install Greenstone}\label{subsec:compile}
    12101273
    12111274An install.sh script has been constructed to compile and install Greenstone3. What you need to do is:
     
    12181281\end{gsc}\end{quote}
    12191282
    1220 If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for tomcat.
    1221 
    1222 \noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.
     1283If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for Tomcat.
     1284
     1285\noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running Tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.
    12231286
    12241287If you want to use SOAP to talk to remote sites, you also need to do the following:
     
    12301293There is one java command that sometimes doesn't work under bash, so you may need to cut and paste it into the terminal to get it to work. See the output from the bash-script for details.
    12311294
    1232 To shutdown or startup tomcat, the commands are:
     1295To shutdown or startup Tomcat, the commands are:
    12331296\begin{quote}\begin{gsc}
    12341297\gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\
     
    12361299\end{gsc}\end{quote}
    12371300
    1238 You dont want to run install.bash twice - it adds stuff into files.
    1239 To update your installation, you can run update.bash - this updates your code form cvs, and remakes all the java stuff.
     1301You don't want to run install.bash twice - it adds stuff into files.
     1302To update your installation, you can run update.bash - this updates your code form CVS, and remakes all the java stuff.
    12401303
    12411304
    12421305\subsubsection{The sample sites}
    12431306
    1244 \noindent There are two greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
     1307\noindent There are two Greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.
    12451308localsite does not connect to any other sites. soapsite specifies a SOAP connection to localsite.
    12461309
    1247 \subsubsection{Tomcat}
    1248 
    1249 \noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
    1250 
    1251 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.
    1252 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite.
     1310\subsubsection{Tomcat}\label{sec:tomcat}
     1311
     1312\noindent Tomcat is a servlet container. It is used to serve a Greenstone site using a servlet.
     1313
     1314The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for Tomcat---tells it what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
     1315There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are Greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite.
    12531316
    12541317The initialisation parameters used by the library servlets are as follows:
     
    12691332It is possible to run several servlets at once, with different combinations of sites and/or interfaces.
    12701333
    1271 The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the tomcat configuration file. The installation process adds a context for greenstone3 servlets (\gst{\gsdlhome/web})---this tells tomcat where to find the web.xml file, and what url (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
     1334The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.
    12721335
    12731336
     
    12751338
    12761339
    1277 \subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}
    1278 
    1279 \noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\  to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
     1340\subsubsection{Serving your site using Tomcat}\label{subsec:runtomcat}
     1341
     1342\noindent To run Tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\  to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,
    12801343
    12811344\begin{gsc}\begin{tt}
     
    12841347\end{tt}\end{gsc}
    12851348
    1286 \noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)
     1349\noindent ({\footnotesize \verb#./shutdown.sh#} shuts down Tomcat)
    12871350\\
    12881351\\
    1289 \noindent The tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page.
    1290 The greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
    1291 
    1292 \noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
     1352\noindent The Tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page.
     1353The Greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.
     1354
     1355\noindent Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
    12931356\begin{bulletedlist}
    12941357\begin{gsc}
     
    13011364\gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out}
    13021365
    1303 On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting tomcat.
     1366On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting Tomcat.
    13041367
    13051368\subsubsection{Using SOAP to talk to a remote site}
     
    13081371\\
    13091372\\
    1310 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service.
     1373\noindent The SOAP server we use is actually run as a servlet in Tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service.
    13111374This is done by install-soap.bash.
    1312 You can also deploy a service through the website.  If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
     1375You can also deploy a service through the website.  If Tomcat is not running, start it up (see \ref{subsec:runtomcat}).
    13131376
    13141377\noindent The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
     
    13271390\end{tabular}
    13281391
    1329 \noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the lefthand ``List'' button.
    1330 
    1331 \noindent Information about deployed services is maintained between tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
     1392\noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the left hand ``List'' button.
     1393
     1394\noindent Information about deployed services is maintained between Tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart Tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
    13321395
    13331396\subsubsection{Debugging SOAP}
     
    13471410
    13481411Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the
    1349 address for talking to the tomcat SOAP servlet services.
     1412address for talking to the Tomcat SOAP servlet services.
    13501413
    13511414\section{Developer's notes}
     
    13661429Dictionary & wrapper around a ResourceBundle, providing strings with parameter\\
    13671430GSCGI & class to map between short name cgi args and long name request parameters \\
    1368 GSFile & class to create all greenstone file paths eg used to locate configuration files, xslt files and collection data. \\
     1431GSFile & class to create all Greenstone file paths eg used to locate configuration files, xslt files and collection data. \\
    13691432GSHTML & provides convenience methods for dealing with HTML, eg making strings HTML safe\\
    13701433GSPath & used to create, examine and modify message address paths\\
    13711434GSStatus & some static codes for status messages\\
    1372 GSXML & lots of methods for extracting information out of greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by greenstone.\\
    1373 GSXSLT & some manipulation functions for greenstone XSLT\\
     1435GSXML & lots of methods for extracting information out of Greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by Greenstone.\\
     1436GSXSLT & some manipulation functions for Greenstone XSLT\\
    13741437Misc & miscellaneous functions\\
    1375 OID & class to handle greenstone (2) OIDs\\
     1438OID & class to handle Greenstone (2) OIDs\\
    13761439XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
    13771440XMLTransformer & methods to transform XML using XSLT \\
     
    13871450\subsection{Working with XML}
    13881451
    1389 We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but dont append them to something already in the document tree, they will be separate - but they still know who their owner document is.
     1452We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but don't append them to something already in the document tree, they will be separate - but they still know who their owner document is.
    13901453
    13911454To create new Documents, and convert Strings or Files to Documents, use XMLConverter.
     
    14111474\end{gsc}\end{quote}
    14121475
    1413 Note that you can only append one node to a document---this will become the toplevel node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node.
     1476Note that you can only append one node to a document---this will become the top level node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node.
    14141477
    14151478Nodes can only be created by a Document. Document has creation methods for all types of Nodes, for example \gst{createElement(element\_name)}, \gst{createAttribute(attr\_name)},  \gst{createTextNode(text\_data)} etc.
     
    14231486
    14241487
    1425 no DTDs or Schema defined yet. Until there are, try and keep to teh following rules:
     1488no DTDs or Schema defined yet. Until there are, try and keep to the following rules:
    14261489
    14271490\begin{bulletedlist}
     
    14291492\item always return expected elements even if empty, eg \gst{<paramList/>}.
    14301493
    1431 \item If you get the whole documetn it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documetns, they are \gst{<documentNode>}s.
    1432 
    1433 \item insiode a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it.
     1494\item If you get the whole document it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documents, they are \gst{<documentNode>}s.
     1495
     1496\item inside a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it.
    14341497
    14351498\end{bulletedlist}
     
    14781541
    14791542\item {\em using namespaces:}
    1480 If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names dont match. This includes http:// on the front.
    1481 
    1482 \item I dont think \gst{<xsl:with-param name='xxx' select='true'/>} is
     1543If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names don't match. This includes http:// on the front.
     1544
     1545\item I don't think \gst{<xsl:with-param name='xxx' select='true'/>} is
    14831546the same as \gst{<xsl:with-param name='xxx'>true</xsl:with-param>}.
    14841547Use the second one.
     
    15511614The makefile in j-gdbm is crap---it tries to get stuff from its
    15521615original CVS tree.  I have created a new Makefile---in my-j-gdbm
    1553 directory.  this stuff needs to go into cvs probably.
     1616directory.  this stuff needs to go into CVS probably.
    15541617
    15551618
     
    15721635\gst{http://mindprod.com/jni.html}
    15731636
    1574 Java 1.4 api index\\
     1637Java 1.4 API index\\
    15751638\gst{http://java.sun.com/j2se/1.4/docs/api/index.html}
    15761639
     
    15781641\gst{http://java.sun.com/docs/books/tutorial/index.html}
    15791642
    1580 Safari books online - has java, XML, XSLT, etc books\\
     1643Safari books online - has Java, XML, XSLT, etc books\\
    15811644\gst{http://proquest.safaribooksonline.com/mainhom.asp?home}
    15821645
Note: See TracChangeset for help on using the changeset viewer.