Changeset 3711


Ignore:
Timestamp:
2003-01-24T17:20:38+13:00 (21 years ago)
Author:
kjdon
Message:

some changes, still needs more work but I've run out of time.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r3557 r3711  
    2222{\end{list}}
    2323
    24 \noindent
    25 {\em \tiny This is intended to turn into a multipurpose document that
    26 \begin{bulletedlist}
    27 \item forms the basis of a JCDL paper submission
    28 \item fulfills our NERF pledge to produce a ``design document for Greenstone3''
    29 by December 2002 ...
    30 \item ... and a ``definition of internal and external interfaces for all major
    31 components (including API for external clients)'' by July 2003
    32 \item turns into a proper manual for Greenstone3
    33 \end{bulletedlist}
    34 }
    3524
    3625\noindent
     
    5847Native Interface) will be used to communicate with these.
    5948
     49
    6050\section{Architecture}
    6151
    62 A typical basic Greenstone3 digital library system is made up of a ``back
    63 end,'' which we call a digital library {\em site\/}, coupled to a ``front end''
    64 that provides the user interface. Figure 1 shows a simple stand-alone digital library with a web-based front end which communicates with a single site. In this simple example,  the entire system is compiled together as a single executable. The point of contact with the back end is the MessageRouter (MR) module---all communication with the site occurs through this module.
    65 
    66 The digital library back end in Figure 1 contains two collections, {\em demo}
    67 and {\em myfiles\/}, and a cluster of collection-formation services.  All
    68 functions of the digital library are called ``services.''  For example,
    69 AddDocument is a service that adds a document to a collection; ImportCollection
    70 imports into the Greenstone system all documents associated with a collection,
    71 converting them as necessary from their original form; BuildCollection builds
    72 all indexes and browsing structures that are associated with collection;
    73 ActivateCollection makes a newly-built collection active, so that it can be
    74 seen by digital library users.  These particular services are related: they are
    75 all concerned with creating a digital library collection.  Related
    76 services may be grouped together into a  ``service cluster'': all these services are provided by
    77 the CollectionFormation ServiceCluster module in Figure 1.
    78 
    79 A collection, which as far as the digital library user is concerned is a
    80 focused group of documents with a uniform means of access, is a type of service
    81 cluster that groups a set of services that are related by the set of data they
    82 work on.  For example, the {\em demo} collection in Figure 1 contains four
    83 services. These provide text searching, metadata searching,
    84 document retrieval, and browsing services to the user. 
    85 
    86 The Web-based front end in Figure 1 centers around the
    87 Receptionist, which is the point of contact for the interface generator.  A
    88 servlet takes HTTP commands (in the form of URLs and arguments) and translates
    89 them into XML form for the Receptionist.  This is capable of executing various
    90 different Actions, each of which involve one or (usually) many calls to the
    91 digital library's MessageRouter.
    92 
    93 Figure 1 shows a very simple example of a digital library structure.
    94 In practice, there may be many digital library sites, possibly involving
    95 distributed computers.  Each site will have a structure similar to that of the back end in Figure
    96 1.  Different sites may know about each other and can gain access to each other's
    97 collections by forwarding requests.  There may also be different user
    98 interfaces to the library.  Figure 1 shows a simple web-based interface, but
    99 other interfaces may exist, ranging from applets that display documents in
    100 different ways to alert services that note when new information becomes
    101 available in one of the collections and formulate email to users.  Although in
    102 the simplest case the front and back ends are compiled together into
    103 one executable process, in general different MessageRouters will communicate
    104 amongst themselves, and with Receptionists, using a protocol.
    105 
    106 The following subsections elaborate on this structure.
    107 
    108 \subsection{Modular structure}
    109 
    110 Greenstone3 is made up of independent modules that communicate via a single
    111 method call:
    112 \begin{quote}
    113     XMLout =  process(XMLin);
    114 \end{quote}
    115 Both input and output are expressed in XML.  This decision shifts attention
    116 from the design of an Applications Programming Interface (API) to the design of XML
    117 forms that encode the equivalent information.  The advantage is modularization:
    118 the XML specifications can be modified locally and communication will proceed
    119 effectively according to the new scheme provided only that all affected modules
    120 are altered appropriately.  Conversely, if an API is changed then all modules
    121 usually have to be recompiled to reflect the update.
    122 
    123 Modules are thought of as ``agents'' that have, or have access to, certain
    124 functionality.  A module may respond to a message by processing it itself, or
    125 forwarding it to another module, or a combination of the two.\footnote{Francois
    126 used some nice words to tie up modules and agents.  Kathy, can you remember
    127 what he was saying?}
    128 
    129 If modules are on different computers, the communication will take place using
    130 SOAP (Simple Object Access Protocol) (although other protocols are possible).  Figure 2 shows a Greenstone system where the local site has no collections or services of its own. Instead, the MessageRouter (1 in the diagram) talks to two other sites using SOAP.  The local MR  has two Communicator modules
    131  that enable it to make SOAP requests; the two remote sites each have a SOAP server which
    132 listens for such requests and fulfills them. 
    133 
    134 A potential downside of expressing the programming interface structure in XML
    135 is execution efficiency.  The input and output XMLin and XMLout in the above
    136 statement can be either a serialized String representation, which is the
    137 primary representation method, or a Document Object Model (DOM), which is a
    138 tree that represents the parsed XML string.  Two versions of the processing
    139 operation will be provided, string to string and tree to tree.
    140 
    141 \subsection{Dynamic configurability}
    142 
    143 Digital libraries need to be dynamic.  It must be possible to routinely add new
    144 collections, or new user interfaces, or completely new kinds of service, to a
    145 running digital library without having to bring it down and restart it.
    146 
    147 The digital library back end is built around a central MessageRouter module
    148 that provides a way of gaining access to any collection or service.  When new
    149 collections come up, they register with the MessageRouter in order to make
    150 themselves visible throughout the system.  When users make requests, they are
    151 passed to the MessageRouter, which forwards them to the appropriate module for
    152 processing.  Requests are synchronous; the requesting process is blocked until
    153 the result is received.  (An asynchronous-to-synchronous buffering module is
    154 envisaged if this should become necessary for certain purposes.)
    155 
    156 The most basic request, which any module will respond to, is
    157 ``describe-yourself''.  (In fact, the ability to respond to
    158 ``describe-yourself'' is really what defines a ``module.'')  The MessageRouter
    159 responds with an XML document which typically specifies some collections that
    160 are available locally, and some other Greenstone sites (their own collections
    161 may also be listed).  Its response may also describe service clusters or single  services provided by the
    162 MessageRouter itself, for example, cross-collection searching, or collection formation capability.
    163 
    164 A plain ``describe-yourself'' request will return a complete description.  A
    165 ``describe-yourself'' message sent to a collection returns collection-specific
    166 metadata, and a list of services that the collection provides.  It is possible
    167 to add a qualifier to the request which asks for a particular facet of the
    168 complete description instead, thereby achieving communication economy.
    169 
    170 Using these facilities, it is possible for a user interface module to ask a
    171 MessageRouter for a list of local collections, remote sites and their
    172 collections, and for each collection a list of the services available.  The XML
    173 documents containing this information could be amalgamated and presented to the
    174 user as an XML form that actually implements the services that are represented.
    175 
    176 \subsection{Interacting with the user}
    177 
    178 The MessageRouter, together with the services it provides access to, forms the
    179 core of the Greenstone digital library system.  Clients could be written that
    180 call in a variety of ways upon the services that Greenstone provides.
    181 
    182 A very important form of client is one that implements user interaction with
    183 Greenstone3 through a Web browser, which is the standard way of communicating
    184 with the digital library system.  The user makes a request by clicking a URL or
    185 submitting a Web form.  This request is intercepted by a servlet which invokes
    186 a Greenstone module called a Receptionist.  The Receptionist represents the
    187 user's normal point of contact with the system: based on the input, it creates  XML messages which it passes i into the Greenstone system through the
    188 MessageRouter. The responses are gathered together and translated it into the form of
    189 a Web page for presentation to the user.
    190 
    191 The Receptionist receives from the servlet an XML representation of
    192 the arguments in the URL (``CGI arguments'', though we do not use the
    193 CGI mechanism).  One of these arguments is the Action, which, along
    194 with the Subaction argument determines what information must be
    195 requested from the MessageRouter to fulfill the request.  Table 1 shows
    196 a list of the actions that are understood by Greenstone2; Greenstone
    197 3 will have similar functionality.
    198 
    199 The Receptionist includes a Java class for each action.  These classes do not
    200 know anything about the collections, services, or other sites that are
    201 available in the Greenstone system.  Instead, they decode the other arguments in
    202 the URL to determine what information must be requested, and send it through
    203 the MessageRouter.  A single action often generates several different requests:
    204 for example, to generate the traditional Greenstone home page, the PageAction must query the MessageRouter for a list of its collections. Then, for each collection, collection metadata such as the collection image and collection Title must be retrieved.  The XML results returned by these requests are put together
    205 into one large XML tree, to which is appended system configuration and
    206 translation information.  The resulting XML structure is converted, using XSLT
    207 files appropriate to that particular action, to an HTML page for presentation
    208 to the user.
    209 
    210 Other types of client which do not use HTML may interact with the Receptionist. An output type specifier is included in each request to the Receptionist: using XSLT modes, different output formats may be generated such as XML or WML.
    211 
    212 \subsection{Digital library services}
    213 
    214 A digital library consists of several different ``collections,'' each
    215 represented by a collection module.  For each collection, a set of ``services''
    216 is provided.  Examples of services are
    217 \begin{bulletedlist}
    218      \item  full-text query
    219      \item  fielded query
    220      \item  music query
    221      \item document retrieval
    222      \item  metadata retrieval
    223      \item browsing classifier
    224      \item  hierarchical phrase browsing.
    225 \end{bulletedlist}
    226 
    227 Services are provided by modules called ``service modules'', which each
    228 implement a group of related operations.  For example, one service is MGPPGDBM,
    229 which implements four operations: full-text and fielded queries, and document
    230 and metadata retrieval.  MGPPGDBM operates on collections that are in the
    231 format of standard Greenstone2 collections, and provides these four services
    232 for such collections.  Another service is GSDL2Classifier, which provides
    233 operations that correspond to a browsing classifier.  Together these two
    234 classes allow a Greenstone2 collection to be used, completely unchanged, within
    235 Greenstone3 (provided an appropriate configuration file is created).
    236 
    237 Service modules are self-describing modules: that is, they respond to the
    238 ``describe-yourself'' message.  As noted above, collections are also
    239 self-describing modules: they respond to ``describe-yourself'' by returning
    240 collection-specific metadata, and a list of services that the collection
    241 provides---which can then be queried individually using ``describe-yourself''
    242 messages.  Thus a collection may be viewed as a cluster of services.
    243 Greenstone3 uses service clusters to represent other things than collections.
    244 For example, all the operations associated with building a particular kind of
    245 collection may be grouped together into a service cluster.
    246 
    247 \subsection{Data in the system}\footnote{I haven't discussed this with anyone yet, however I like it :-)  actually now Rob likes it too. NOTE: if we keep this document-resource idea, need to change all the resource refs in this paper to document!!}
    248 
    249 Data in the system consists of 'documents' and 'resources'. A document is an XML document\footnote{whats a better word for a generic document, not a greenstone document ??} that exists independently in the system. You could delete all other documents and it would still be valid (although links to other documents may become invalid). A resource is something that is associated with a document, and doesn't exist outside of that document's context.
    250 
    251 For example, a book that has been added to a collection will be represented by an XML document. The document contains metadata associated with the book, for example Title, Source Author etc. It has xlinks to associated resources or other documents. Any images in the book would be resources belonging to that document. The original representation of the book, eg the pdf file, would also be a resource of the document. There may be associated documents, such as the same book but translated into a different language. This translation is a document in its own right, but is linked to by the original document.
    252 
    253 Documents are indexed, but resources are not. This means that documents can be discovered through searching and browsing. Resources, on the other hand, can only be found via the containing document. Both can be retrieved. Documents are identified by a system id eg HASHxxx. Resources are identified by a unique identifier. This is likely to be a file path---this could be appended to an HTTP address to enable retrieval of the document via HTTP, or could be used as an identifier to request the resource from the site via XML messages.
    254 
    255 The content of the document need not be stored with the document---it may live in the compressed data files. The documents themselves may be stored compressed or in a database. Currently, in Greenstone2, the equivalent information is stored in a gdbm database.
    256 
    257 Documents don't just have to be books and text files. A collection could contain images---each image would have a document, and the content of the document would point to the image file.
    258 A document could be a sequence of other documents eg a powerpoint show of individual slides.
    259 A classifier is a document - a hierarchical ordering by metadata of a set of documents into lists or categories.
    260 
    261 \subsection{Getting off the ground}
    262 
    263 We have described in broad terms the basic components of Greenstone3.  It is a
    264 highly configurable system that allows new modules to be added while it is
    265 running---dynamic configuration.  However, in order to get it off the ground,
    266 configuration files are used to define an initial configuration.
    267 
    268 A single computer system may have several different Greenstone systems
    269 or ``sites'' running simultaneously, each of which typically serve
    270 different collections.  For example, a single user may have a public
    271 Greenstone site which offers collections to external users over the
    272 web, as well as a private site that offers personal collections (like
    273 email) that cannot be accessed externally.  Or in a multiuser research
    274 environment, each user may have one or more sites reflecting
    275 Greenstone collections, or additional facilities, in different stages
    276 of development.
    277 
    278 The computer system will have just one Greenstone directory structure,
    279 though this structure may support several different sites.  Each site
    280 has a home directory in the Greenstone structure, inside which is a
    281 ``collect'' directory that contains the collections offered by that site.
    282 
    283 The sites can be ``served'' in different ways. A servlet can be started up, which invokes a
    284 Receptionist and a MessageRouter.   One of the arguments to
    285 the servlet is the site's home directory. This configuration has a client and server compiled together. The information in this site can then be accessed via the web. Alternatively, a SOAPServer could be started up, which just invokes a MessageRouter. Other Greenstone systems or clients can communicate with this site via SOAP. Greenstone is not limited to SOAP communication---any protocol which can transmit XML may be used to communicate between sites, or between clients and servers.
    286 
    287 For each site there is a configuration file that specifies the URI for the site
    288 (localSiteName), and a list of external sites that the site connects to.  It
    289 may also specify any services or service clusters provided by the site that are not connected with
    290 a collection---for example, a language translation service.  Collections are
    291 not specified in this configuration file; instead they are determined by the
    292 contents of the ``collect'' directory for the site.  This allows new
    293 collections to be added dynamically by placing them in that directory.
     52This section is covered by the paper: An agent based architecture for dynamic digital library construction and configuration. Either cut and paste it in here, or link to the text?? or have two separate docs. dont want to have to maintain two separate versions of the same thing.
    29453
    29554\section{Greenstone Implementation}
    29655\label{sec:impl}
    297 
    298 
    299 \subsection{classes etc??}
    300 
    301 In general, a Greenstone module corresponds to a Java class. The Receptionist, Action, MessageRouter, Collection, ServiceCluster modules are all Java classes. The exception is the service. Many services share operations, for example, access to the MGPP index files. For this reason, several services may be implemented by a single class---we call this a  ServicesImpl class. For example, MGPPGDBMServices is subclass of ServicesImpl which provides services that use the MGPP files and GDBM databases of a Greenstone 2 collection: TextQuery, DocumentRetrieve and MetadataRetrieve. MGGDBMServices provides the same services, but uses MG and GDBM files from a Greenstone 2 collection.
    30256
    30357\subsection{Configuring Greenstone}
     
    31266instructions on how the collection is to be built.  The second is produced by
    31367the build-time process and includes any metadata that can be determined
    314 automatically.\footnote{Currently it is produced by hand, because collections must
    315 be built with Greenstone2.}
     68automatically.\footnote{Currently only the buildConfig.xml file is used - collections are built using gs2 style building and therefore use the old collect.cfg.}
    31669
    31770\subsubsection{Site configuration file}
     
    31972The file {\em siteConfig.xml} specifies the URI for the site ({\em
    32073localSiteName\/}), any services or service clusters provided by the site that are not connected
    321 with a particular collection (for example, translation services), and a list of
     74with a particular collection (for example, translation services, or collection building), and a list of
    32275known external sites to connect to.  Collections are not specified in the site
    32376configuration file, instead they are determined by the contents of the site's
     
    32578
    32679Here is a configuration file for a rudimentary site with no site-wide services,
    327 which does not connect to any external sites.
     80which does not connect to any external sites.\footnote{should the code be tolerant of missing elements? or do we require empty elements?}
    32881\begin{quote}\begin{footnotesize}\begin{verbatim}
    32982<config>
    33083  <localSiteName value="org.greenstone.localsite"/>
    33184  <serviceClusterList/>
    332   <servicesImplList/>
     85  <serviceRackList/>
    33386  <siteList/>
    33487</config>
    33588\end{verbatim}\end{footnotesize}\end{quote}
    336 The following configuration file is for a site with one site-wide service, a
    337 translation service.  It connects to the previous site using SOAP.
     89The following configuration file is for a site with one site-wide service cluster - a collection building cluster.  It also connects to the previous site using SOAP.
    33890\begin{quote}\begin{footnotesize}\begin{verbatim}
    33991<config>
    34092  <localSiteName value="org.greenstone.gsdl1"/>
    341   <servicesImplList>
     93  <serviceRackList/>
    34294    <servicesImpl name="TranslationServices"/>
    34395  </servicesImplList>
    344   <serviceClusterList/>
     96  <serviceClusterList> 
     97    <serviceCluster name="build">
     98      <metadataList>
     99        <metadata name="Title">Collection builder</metadata>
     100        <metadata name="Description">Builds collections in a gsdl2-style manner</metadata>
     101      </metadataList>
     102      <serviceRackList>
     103        <serviceRack name="GS2Construct"/>
     104      </serviceRackList>
     105    </serviceCluster>
     106  </serviceClusterList>
    345107  <siteList>
    346108    <site name="org.greenstone.localsite"
     
    351113\end{verbatim}\end{footnotesize}\end{quote}
    352114
     115These two sites are running on the same machine. For site1 to talk to localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is "http://localhost:8080/soap/servlet/rpcrouter"
     116
    353117\subsubsection{Building configuration file}
    354118
    355 The file {\em buildConfig.xml} contains all metadata about the collection that can
     119The file {\em buildConfig.xml} contains all metadata and other information about the collection that can
    356120be determined automatically when building the collection, such as the number of
    357 documents it contains.  It also includes a list of servicesImpl classes that are
     121documents it contains.  It also includes a list of serviceRack classes that are
    358122required at runtime to provide the services that have been built into the
    359 collection.  The servicesImpl names are Java classes that are loaded
    360 dynamically at runtime. Any information inside the servicesImpl element is
     123collection.  The serviceRack names are Java classes that are loaded
     124dynamically at runtime. Any information inside the serviceRack element is
    361125specific to that service---there is no set format.  Here is an example:
    362126
    363127\begin{quote}\begin{footnotesize}\begin{verbatim}
    364 <buildConfiguration>
     128
     129<buildConfig>
    365130  <metadataList>
    366     <metadata name="iconCollection">mgppdemo.gif</metadata>
    367     <metadata name="colName">mgpp demo</metadata>
    368     <metadata name="numDocs">5</metadata>
    369     <metadata name="numSections">189</metadata>
     131    <metadata name="numDocs">11</metadata>
     132    <metadata name="colIcon">mgppdemo.gif</metadata>
     133    <metadata name="colName">Greenstone demo collection</metadata>
     134    <metadata name="colDescription">This is a demonstration collection for the Greenstone digital library software. It contains a small subset  of the Humanitarian and Development Libraries.</metadata>
    370135  </metadataList>
    371   <servicesImplList>
    372     <servicesImpl name="MGPPGDBMServices">
     136  <serviceRackList>
     137    <serviceRack name="GS2MGPPRetrieve">
     138      <defaultLevel name="Section"/>
     139    <!-- something list this should be used to advertise what metadata the collection has available to be retrieved - however, it is not used yet -->
     140      <metadataList>
     141        <element name="Title"/><element name="Subject"/><element name="Organization"/><element name="URL"/>
     142      </metadataList>
     143    </serviceRack>
     144    <serviceRack name="GS2MGPPSearch">
    373145      <defaultIndex name="tt"/>
    374146      <defaultLevel name="Section"/>
     
    380152        <index name="tt"/>
    381153        <index name="t0"/>
    382       </indexList> 
    383       <metadataList>
    384         <element name="Title"/>
    385         <element name="Subject"/>
    386         <element name="Organization"/>
    387         <element name="URL"/>
    388       </metadataList>
    389     </servicesImpl>
    390     <servicesImpl name="PhindServices"/>
    391     <servicesImpl name="GSDL2ClassifierServices">
     154      </indexList>
     155      <fieldList>
     156        <field name="TX"/><field name="SU"/><field name="TI"/>
     157      </fieldList>
     158    </serviceRack>
     159    <serviceRack name="PhindPhraseBrowse"/>
     160    <serviceRack name="GS2Browse">
    392161      <classifierList>
    393         <classifier name="CL1">
    394           <metadataList>
    395             <metadata name="Title">Subject</metadata>
    396           </metadataList>
    397         </classifier>
    398         <classifier name="CL2" >
    399           <metadataList>
    400             <metadata name="Title">Title</metadata>
    401           </metadataList>
    402         </classifier>
     162        <classifier name="CL1"><metadataList><metadata name="Title">Subject</metadata></metadataList></classifier>
     163        <classifier name="CL2" ><metadataList><metadata name="Title">Title</metadata></metadataList></classifier>
     164    <classifier name="CL4"><metadataList><metadata name="Title">Organization</metadata></metadataList></classifier>
     165    <classifier name="CL5" ><metadataList><metadata name="Title">Keyword</metadata></metadataList></classifier>
    403166      </classifierList>
    404     </servicesImpl>
    405   </servicesImplList>
     167    </serviceRack>
     168  </serviceRackList>
    406169</buildConfig>   
    407170\end{verbatim}\end{footnotesize}\end{quote}
    408 Note: because {\em collectionConfig.xml} is not used yet, the {\em iconCollection}
     171Note: because {\em collectionConfig.xml} is not used yet, the {\em colIcon}, {\em colDescription}
    409172and {\em colName} metadata elements have been specified here.
    410173
     
    431194
    432195The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This
    433 lists the ServicesImpl classes that need to be loaded, and lists any sites that need
     196lists the ServiceRack classes that need to be loaded, and lists any sites that need
    434197to be connected to.  It looks inside the {\em collect} directory which contains
    435198all the site's collections and loads up a Collection object for each valid
     
    437200
    438201The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml}
    439 files, determines the metadata, and loads ServicesImpl classes based on the
    440 names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServicesImpl>#} XML element is passed to the object to be used in  configuration.\footnote{Kathy, I don't
    441 understand this sentence.}
     202files, determines the metadata, and loads ServiceRack classes based on the
     203names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServiceRack>#} XML element is passed to the object to be used in  configuration.
    442204
    443205\section{System messages}
     
    450212All messages are enclosed in
    451213\begin{quote}\begin{footnotesize}\begin{verbatim}
    452 <message lang='xx'>
    453 \end{verbatim}\end{footnotesize}\end{quote}
     214<message>
     215\end{verbatim}\end{footnotesize}\end{quote}
     216Messages contain either {\em <request>\/} or {\em <response>\/} elements--- a single message may contain multiple requests. Each {\em <request>\/} (and {\em <response>\/}?) has a language attribute, of the form ``lang='xx'''.
    454217The language attribute is used by the XSLT to determine the language currently
    455218being used by the user interface.  Virtually all messages contain text strings,
    456219and services use this attribute to return strings in the appropriate language.
    457 Requests are called {\em <request>\/}, responses are called {\em <response>\/}.
    458 A single message can hold several requests or responses.
    459 
    460 There are two different types of message, explained in the two subsections
    461 below.  The first is a simple representation of the arguments in a Greenstone
    462 URL.  It is a rudimentary message passed into the digital library system from
    463 outside. The response is a page of data, typically in HTML.  All other messages
    464 are internal Greenstone messages, and have the same basic format.\footnote{We
    465 format names in lower case with the first letter of internal words capitalized,
    466 like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
     220
     221There are two different styles of messaging, explained in the two subsections
     222below.  The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system.  The response is a page of data, typically in HTML.  The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message.
    467223
    468224This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system.
     
    480236
    481237\begin{quote}\begin{footnotesize}\begin{verbatim}
    482 <request type='action' action='a-arg-value' subaction='sa-arg-value'
    483          output='html'>
     238<request type='cgi' action='a-arg-value' subaction='sa-arg-value'
     239         lang='en' output='html'>
    484240  <paramList>
    485241    <param name='xx' value=''yyy'/>
     
    497253Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information.  If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.
    498254
    499 The arguments used currently are shown in Table~\ref{tab:args}a.
    500 Other arguments can be specified by the particular service. For example, the
    501 TextQuery service that the MGPPGDBMService module provides uses the additional
    502 arguments shown in Table~\ref{tab:args}b.
     255The cgi arguments used currently are shown in Table~\ref{tab:args}.
     256Other arguments can be specified by  particular actions.. For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.
    503257
    504258\begin{table}
    505259\center{\footnotesize
    506260\begin{tabular}{llll}
    507 \cline{2-4}
    508 (a) & \bf Action & \bf Argument & \bf Typical value \\
    509 \cline{2-4}
    510 & p (page) & sa & home, about \\
    511 & & c (collection) & demo, mgppdemo, ... \\
    512 & q (query) & sa & text, field, music\\
    513 & & c  & demo, mgppdemo, ... \\
    514 & & q (query) & the \\
    515 & r (resource) & sa & (not used yet) \\
    516 & & c  & demo, mgppdemo, ... \\
    517 & & r (resource) & HASH01af33...\\
    518 & a (applet) & sa & d (display), r (request) \\
    519 & & c  & demo, mgppdemo, ... \\
    520 \cline{2-4}\\
    521 \cline{2-4}
    522 (b) & \bf Argument & \bf Values \\
    523 \cline{2-4}
    524 & s (stem) & 0, 1 \\
    525 & k (casefold) & 0, 1 \\
    526 & mm (matchMode) & all, some \\
    527 & sb (sortBy) & rank, natural \\
    528 & ql (queryLevel) & \multicolumn{2}{l}{Document, Section, Paragraph} \\
    529 & md (matchDocs) & 10, 20, ... \\
    530 \cline{2-4}
     261\hline
     262\bf Argument & \bf Meaning &\bf Typical values \\
     263\hline
     264a & action & a (applet), q (query), b (browse), p (page), pr (process) \\
     265sa & subaction & home, about (page action)\\
     266c & collection or service cluster & demo, build \\
     267s & service name & TextQuery, ImportCollection \\
     268rt & request type & d (display), r (request), s (status) \\
     269ro & request only & 0 or 1 - if set to one, the request is carried out but no processing of the results is done \\
     270o & output type & xml, html, wml \\
     271l & language & en, fr, zh \\
     272d & document id & HASHxxx \\
     273r & resource id & ???\\
     274id & process handle & an integer identifying a particular process request \\
     275\hline
    531276\end{tabular}}
    532277\label{tab:args}
    533 \caption{Arguments that can appear in a Greenstone URL: (a) generic;
    534 (b) additional arguments for the TextQuery service}
     278\caption{Generic rguments that can appear in a Greenstone URL}
    535279\end{table}
    536280
    537281Here is an example message that retrieves the home page in French:
    538282\begin{quote}\begin{footnotesize}\begin{verbatim}
    539 <message lang='fr'>
    540   <request type='action' action='p' subaction='home' output='html'/>
     283<message>
     284  <request lang='fr' type='cgi' action='p' subaction='home' output='html'/>
    541285</message>
    542286\end{verbatim}\end{footnotesize}\end{quote}
     
    544288This message represents a text query:
    545289\begin{quote}\begin{footnotesize}\begin{verbatim}
    546 <message lang='en'>
    547   <request type='action' page='q/text' output='html'>
     290<message>
     291  <request  lang='en' type='cgi' action='q' output='html'>
    548292  <paramList>
    549     <param name='k' value='0'/>
    550     <param name='s' value='1'/>
    551     <param name='md' value='10'/>
     293    <param name='s' value='TextQuery'/>
    552294    <param name='c' value='demo'/>
    553     <param name='q' value='the'/>
     295    <param name='rt' value='r'/>
     296    <!-- the rest are the service specific params -->
     297    <param name='ca' value='0'/> <!-- casefold -->
     298    <param name='st' value='1'/> <!-- stem -->
     299    <param name='m' value='10'/> <!-- maxdocs -->
     300    <param name='q' value='snail'/> <!-- query string -->
    554301  </paramList>
    555302</message>
    556303\end{verbatim}\end{footnotesize}\end{quote}
    557304
     305**** UP TO HERE **************
    558306\subsubsection{Module to module messages}
    559307
     
    571319The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient.
    572320\begin{quote}\begin{footnotesize}\begin{verbatim}
    573 <message lang='en'>
    574   <request type='describe' to=''/>
     321<message>
     322  <request lang='en' type='describe' to=''/>
    575323</message>
    576324\end{verbatim}\end{footnotesize}\end{quote}
     
    578326An example response from a MessageRouter might look like this:
    579327\begin{quote}\begin{footnotesize}\begin{verbatim}
    580 <message lang='en'>
    581   <response type='describe'>
     328<message>
     329  <response lang='en' type='describe'>
    582330    <serviceList>
    583331      <service name='CrossCollectionSearch' type='query' />
     
    625373</message>
    626374
    627 <message lang='en'>
    628   <response type='describe' from='demo' >
     375<message>
     376  <response lang='en' type='describe' from='demo' >
    629377    <collection name='demo'>
    630378      <serviceList>
     
    649397Parameters have the following format:
    650398\begin{quote}\begin{footnotesize}\begin{verbatim}
    651 <param name='xxx' type='integer|boolean|string|input' default='yyy'/>
    652 <param name='xxx' type='enum' default='aa'/>
     399<param name='xxx' type='integer|boolean|string' default='yyy'/>
     400<param name='xxx' type='enum_single|enum_multi' default='aa'/>
    653401  <option name='aa'/><option name='bb'/>...
    654402</param>
     403<param name='xxx' type='multi' occurs='4'>
     404 <param .../>
     405 <param .../>
     406</param>
    655407\end{verbatim}\end{footnotesize}\end{quote}
    656408If no default is specified, the parameter is assumed to be mandatory.
    657 Here are three examples of parameters:
     409Here are some examples of parameters:
    658410\begin{quote}\begin{footnotesize}\begin{verbatim}
    659411<param name='Case' type='boolean' default='0'/>
     
    666418  <option name='stx'/>
    667419<param>
     420
     421<!-- this one is for the text box and field list for the simple field query-->
     422<param name='simple' type='multi' occurs='4'>
     423  <param name='fqv' type='string'/>
     424  <param name='fqf' type='enum_single'>
     425    <option name='TI'/><option name='AU'/><option name='OR'/>
     426  </param>
     427</param>
     428
    668429\end{verbatim}\end{footnotesize}\end{quote}
    669430Here is a message, along with a sample response.
    670431\begin{quote}\begin{footnotesize}\begin{verbatim}
    671 <message lang='en'>
    672   <request type='describe' to='demo/TextQuery'/>
    673 </message>
    674 
    675 <message lang='en'>
    676   <response type='describe' from='demo/TextQuery' >
     432<message>
     433  <request lang='en'  type='describe' to='demo/TextQuery'/>
     434</message>
     435
     436<message>
     437  <response lang='en' type='describe' from='demo/TextQuery' >
    677438    <service name='TextQuery' type='query'>
    678439    <paramList>
     
    704465
    705466<message><request type='configure' to=''>
    706 <configure action='activate' type='servicesImpl'
     467<configure action='activate' type='serviceRack'
    707468           name='TranslationServices'/>
    708469</request></message>
    709470\end{verbatim}\end{footnotesize}\end{quote}
    710471
    711 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above.  Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the servicesImpl class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new servicesImpl object. As for collections, if one already exists, it is deactivated first.
     472The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above.  Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first.
    712473
    713474The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is:
     
    721482Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also.
    722483
    723 The main type of requests in the system are for services. There are different types of services: query, build\footnote{need new name?}, transform, enrich, extract, accrete. The two most common ones are build and query. Build is for collection formation, query is for the typical use of those collections---querying, browsing, retrieving documents. The other types of service generally enhance the functionality of the first two. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
    724 
    725 'Query' requests are the most used requests in the system. They are requests for data of some kind, for example, a list of the documents matching a certain criteria, the Title and Author metadata for some specified documents, the text for a specified document, and so on. Each request has a content, and some parameters that specify modifications to the way the query is carried out. So the basic form of a query request is as follows:
    726 
    727 \begin{quote}\begin{footnotesize}\begin{verbatim}
    728 <message lang='en'>
    729   <request type='query' to='demo/TextQuery'>
     484The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet.
     485
     486  Other possibilities include  transform, enrich, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form.
     487
     488The basic structure of a service request is as follows:
     489\begin{quote}\begin{footnotesize}\begin{verbatim}
     490<message>
     491  <request lang='en'  type='query' to='demo/TextQuery'>
    730492    <paramList/>
    731493    <content/>
     
    734496\end{verbatim}\end{footnotesize}\end{quote}
    735497
    736 The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request. The value of the parameter can be an attribute, or the content of the parameter.
    737 Attributes can be used for simple strings.
     498The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request.
    738499
    739500\begin{quote}\begin{footnotesize}\begin{verbatim}
     
    742503<param name='index' value='dtx'/>
    743504\end{verbatim}\end{footnotesize}\end{quote}
    744 or
    745 \begin{quote}\begin{footnotesize}\begin{verbatim}
    746 <param name='case'>1</param>
    747 <param name='maxDocs'>34</param>
    748 <param name='index'>dtx</param>
    749 \end{verbatim}\end{footnotesize}\end{quote}
    750 
    751 The content of the query is the actual query itself---for a text query, this is the query string. For an image or music query, it would be the image file or music clip. For document retrieval, the identifier of the document is the content.
    752 
    753 Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.
     505
     506Some requests have a content---for document retrieval, the content is the list of documents to retrieve. For metadata retrieval, teh content is the list of documents, and a list of metadata to retrieve for each document.
     507
     508Responses vary depending on the type of request.
     509Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.}
    754510
    755511The following shows some example query requests and their responses.
     
    757513Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order:
    758514\begin{quote}\begin{footnotesize}\begin{verbatim}
    759 <message lang='en'>
    760   <request to="mgppdemo/TextQuery" type="query">
     515<message>
     516  <request lang='en'  to="mgppdemo/TextQuery" type="query">
    761517    <paramList>
    762518      <param name="maxDocs" value="10"/>
     
    774530
    775531\begin{quote}\begin{footnotesize}\begin{verbatim}
    776 <message lang='en'>
    777   <response from="mgppdemo/TextQuery" type="query">
     532<message>
     533  <response lang='en' from="mgppdemo/TextQuery" type="query">
    778534    <content>
    779       <resourceList>
    780         <resource name="HASH010f073f22033181e206d3b7"/>
    781         <resource name="HASH010f073f22033181e206d3b7.2"/>
    782         <resource name="HASHac0a04dd14571c60d7fbfd"/>
    783       </resourceList>
     535      <documentList>
     536        <document name="HASH010f073f22033181e206d3b7"/>
     537        <document name="HASH010f073f22033181e206d3b7.2"/>
     538        <document name="HASHac0a04dd14571c60d7fbfd"/>
     539      </documentList>
    784540    </content>
    785541  </response>
     
    789545Give me the Title metadata for these documents:
    790546\begin{quote}\begin{footnotesize}\begin{verbatim}
    791 <message lang='en'>
    792   <request to="mgppdemo/MetadataRetrieve" type="query">
     547<message>
     548  <request lang='en'  to="mgppdemo/MetadataRetrieve" type="retrieve">
    793549    <content>
    794       <resourceList>
    795         <resource name="HASH010f073f22033181e206d3b7"/>
    796         <resource name="HASH010f073f22033181e206d3b7.2"/>
    797         <resource name="HASHac0a04dd14571c60d7fbfd"/>
    798       </resourceList>
     550      <documentList>
     551        <document name="HASH010f073f22033181e206d3b7"/>
     552        <document name="HASH010f073f22033181e206d3b7.2"/>
     553        <document name="HASHac0a04dd14571c60d7fbfd"/>
     554      </documentList>
    799555      <metadataList>
    800556        <metadata name="Title"/>
     
    806562
    807563\begin{quote}\begin{footnotesize}\begin{verbatim}
    808 <message lang='en'>
    809   <response from="mgppdemo/MetadataRetrieve" type="query">
     564<message>
     565  <response lang='en' from="mgppdemo/MetadataRetrieve" type="retrieve">
    810566    <content>
    811       <resourceList>
    812         <resource name="HASH010f073f22033181e206d3b7">
     567      <documentList>
     568        <document name="HASH010f073f22033181e206d3b7">
    813569          <metadataList>
    814570            <metadata name="Title">Farming snails 1:
     
    816572            </metadata>
    817573          </metadataList>
    818         </resource>
    819         <resource name="HASH010f073f22033181e206d3b7.2">
     574        </document>
     575        <document name="HASH010f073f22033181e206d3b7.2">
    820576          <metadataList>
    821577            <metadata name="Title">Learning about snails</metadata>
    822578          </metadataList>
    823         </resource>
    824         <resource name="HASHac0a04dd14571c60d7fbfd">
     579        </document>
     580        <document name="HASHac0a04dd14571c60d7fbfd">
    825581          <metadataList>
    826582            <metadata name="Title">Farming snails 2:
     
    828584            </metadata>
    829585          </metadataList>
    830         </resource>
    831       </resourceList>
     586        </document>
     587      </documentList>
    832588    </content>
    833589  </response>
     
    837593Give me the text for this document:
    838594\begin{quote}\begin{footnotesize}\begin{verbatim}
    839 <message lang='en'>
    840   <request  to="mgppdemo/ResourceRetrieve" type="query">
     595<message>
     596  <request lang='en'   to="mgppdemo/DocumentRetrieve" type="retrieve">
    841597    <content>
    842       <resourceList>
    843         <resource name="HASH010f073f22033181e206d3b7.2"/>
    844       </resourceList>
     598      <documentList>
     599        <document name="HASH010f073f22033181e206d3b7.2"/>
     600      </documentList>
    845601    </content>
    846602  </request>
     
    849605
    850606\begin{quote}\begin{footnotesize}\begin{verbatim}
    851 <message lang='en'>
    852   <response from="mgppdemo/ResourceRetrieve" type="query">
     607<message>
     608  <response lang='en' from="mgppdemo/DocumentRetrieve" type="retrieve">
    853609    <content>
    854       <resource name="HASH010f073f22033181e206d3b7.2">
     610      <document name="HASH010f073f22033181e206d3b7.2">
    855611        <content>
    856612&lt;/B&gt;&lt;P ALIGN=&quot;JUSTIFY&quot;&gt;&lt;/P&gt;
     
    863619&lt;/P&gt;....
    864620        </content>
    865       </resource>
     621      </document>
    866622    </content>
    867623  </response>
     
    876632
    877633\begin{quote}\begin{footnotesize}\begin{verbatim}
    878 <message lang='en'>
    879   <request type='build' to='build/NewCollection'>
     634<message>
     635  <request lang='en'  type='process' to='build/NewCollection'>
    880636    <paramList>
    881637      <param name='creator' value='[email protected]'/>
     
    886642</message>
    887643
    888 <message lang='en'>
    889   <request type='build' to='build/ImportCollection'>
     644<message>
     645  <request lang='en'  type='process' to='build/ImportCollection'>
    890646    <paramList>
    891647      <param name='collection' value='demo'/>
     
    907663<page>
    908664 <config/>
    909  <translate/>
     665 <display/>
    910666 <request/>
    911667 <response/>
     
    913669\end{verbatim}\end{footnotesize}\end{quote}
    914670
    915 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Translate contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
    916 
    917 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and translate information, and the xslt files.
     671There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters  can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.
     672
     673The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and display information, and the xslt files.
    918674
    919675\subsubsection{Page action}
     
    926682\subsubsection{Query action}
    927683
    928 Currently, only text query has been implemented.
    929 For each page, the service description is requested from the TextQuery service  or the current collection (via a describe request).  This is done every time the query page is
    930 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If there is no query
    931 string specified in the URL, only this information is needed---the request was for the blank query page.
    932 If there is a query string specified, i.e. the user has entered a query, a query request to the TextQuery service is sent. This has the query string as content, and all the parameters from the URL in the parameter list. A list of document identifiers
     684There are three query services which have been implemented: TextQuery, SimpleFieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.
     685For each page, the service description is requested from the  service  of the current collection (via a describe request).  This is done every time the query page is
     686displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has  all the parameters from the URL put into the parameter list. A list of document identifiers
    933687is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of
    934 documents, with a request for their {\em Title} metadata.  The result is
    935 transformed using {\em textquery.xsl\/}.
     688documents, with a request for their {\em Title} metadata.  The service description and query result are combined into a page of xml, which is
     689transformed using {\em basicquery.xsl\/} to produce the html page.
    936690
    937691\subsubsection{Applet action}
     
    1046800current library servlet as its value.
    1047801
    1048 \subsubsection{Resource action}
    1049 
    1050 ResourceAction sends a query to the ResourceRetrieve service of the collection requesting the text of the specified document.  At this stage no additional information is obtained, but in future stuff like Title and
     802\subsubsection{Document action}
     803
     804DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document.  At this stage no additional information is obtained, but in future stuff like Title and
    1051805table of contents would be needed to make the display nicer.
    1052806
     
    1057811can override these files by having their own copy of the appropriate
    1058812files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
    1059 interface, default interface. 
     813interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
    1060814
    1061815\subsection{Internationalization}
    1062816
    1063 Internationalization is a bit part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
    1064 
    1065 At the moment:\footnote{this may change soon, so I haven't 'nice'd this text yet}
    1066 
    1067 Language specific text strings are specified as xml files, named by
    1068 the language code, eg en.xml, fr.xml.
    1069 
    1070 They are located in interfaces/translate. This assumes one set of
    1071 language files per system set up. (or should they be site/interface
    1072 specific??)
    1073 
    1074 A Translate class is used to hold the xml for the languages. The
    1075 Receptionist has a Translate object. It sets the default language to
    1076 be 'en', the current language is whatever a message lang attribute
    1077 specifies.
    1078 
    1079 The translation object internally holds DOM trees for the languages it
    1080 has loaded. It has a mapping between language name and the tree. When
    1081 the default language is set, the appropriate xml file is read in and
    1082 parsed into a DOM tree.
    1083 
    1084 A call to getLanguageTree(lang) returns a DOM element of the form:
    1085 
    1086 \begin{quote}\begin{footnotesize}\begin{verbatim}
    1087 <translate>
    1088 <current><text>.. the actual text elems...</text></current>
    1089 <default><text>.. the actual text elems...</text></default>
    1090 <translate>
    1091 \end{verbatim}\end{footnotesize}\end{quote}
    1092 If the specified lang has not been loaded yet, it will be read into
    1093 memory. Only languages which have been asked for are loaded into
    1094 memory. But once loaded, they stay there. Will need to see how much
    1095 memory this requires once we use full language files.---may need to
    1096 limit the number of cached languages? or maybe only hold two in
    1097 memory, and read them in from file again when a new one is asked for.
    1098 
    1099 The xml files start with the {\em <text>\/} element. The elements are
    1100 organized hierarchically. An example is the following.
    1101 
    1102 \begin{quote}\begin{footnotesize}\begin{verbatim}
    1103 <text>
    1104 <common>
    1105 <nzdl>New Zealand Digital Library</nzdl>
    1106 <aboutpage>about page</aboutpage>
    1107 <search>Search</search>
    1108 <browse>Browse</browse>
    1109 <applet>Applet</applet>
    1110 <home>HOME</home>
    1111 <on>on</on>
    1112 <off>off</off>
    1113 </common>
    1114 <query>
    1115 <queryoptions>Query Options:</queryoptions>
    1116 <params><case><name>Case differences:</name>
    1117 <on>ignore case differences</on>
    1118 <off>upper/lower case must match</off></case>
    1119 <stem><name>Word endings:</name>
    1120 <on>ignore word endings</on>
    1121 <off>whole word must match</off></stem>
    1122 <sortBy><name>Sort results by:</name>
    1123 <rank>rank</rank><natural>none</natural></sortBy>
    1124 <maxDocs><name>Maximum number of documents to return:</name></maxDocs>
    1125 <matchMode><name>Match mode:</name>
    1126 <all>all</all><some>some</some></matchMode>
    1127 <queryLevel><name>Level:</name><Section>Section</Section>
    1128 <Document>Document</Document></queryLevel></params>
    1129 <beginsearch>Begin Search</beginsearch>
    1130 </query>
    1131 </text>
    1132 \end{verbatim}\end{footnotesize}\end{quote}
    1133 Most of the text strings will be specified by the main xml files, but
    1134 some will come from the services/collections. In this case, the lang
    1135 attribute of the message will indicate which language text to return.
    1136 
    1137 Text strings can added to the HTML output in two ways. In the XSLT, we
    1138 know which text strings are needed, eg 'home' for the home link. home
    1139 is in common/home, so we get the text by calling the text template
    1140 with common/home as a param:
    1141 
    1142 \begin{quote}\begin{footnotesize}\begin{verbatim}
    1143 <xsl:call-template name='text'>
    1144 <xsl:with-param name='key'>common/home</xsl:with-param>
    1145 </xsl:call-template>
    1146 \end{verbatim}\end{footnotesize}\end{quote}
    1147 If we want to specify text strings in the xml result (rather than the
    1148 XSLT---would we want to do this?), we can use
    1149 {\footnotesize \verb#<text key='common/home'/>#}.
    1150 {\footnotesize \verb#<xsl:apply-templates select='text'/>#} must then be used when
    1151 processing the parent node.
    1152 
    1153 The template is shown below. Basically, it looks for an appropriate
    1154 element in the current language tree, and if its not found, it looks
    1155 in the default language tree.
    1156 
    1157 \begin{quote}\begin{footnotesize}\begin{verbatim}
    1158 <xsl:template name='text' match='text'>
    1159 <xsl:param name='key'><xsl:value-of select='@key'/></xsl:param>
    1160 <!-- try the current language -->
    1161 <xsl:variable name='path1'>
    1162 ancestor::page/translate/current/text/<xsl:value-of select='$key'/>
    1163 </xsl:variable>
    1164 <xsl:variable name='string1'><xsl:value-of
    1165 select='java:org.apache.xalan.lib.Extensions.evaluate($path1)'/>
    1166 </xsl:variable>
    1167 <xsl:choose><xsl:when test='boolean(string($string1))'>
    1168 <xsl:value-of select='$string1'/></xsl:when>
    1169 <xsl:otherwise>
    1170 <!-- try the default language -->
    1171 <xsl:variable name='path2'>
    1172 ancestor::page/translate/default/text/<xsl:value-of select='$key'/>
    1173 </xsl:variable>
    1174 <xsl:value-of select=
    1175 'java:org.apache.xalan.lib.Extensions.evaluate($path2)'/>
    1176 </xsl:otherwise>
    1177 </xsl:choose>
    1178 </xsl:template>
    1179 \end{verbatim}\end{footnotesize}\end{quote}
    1180 
     817Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages.
     818
     819Language specific text strings are specified in resource bundle property files. These live in resources/java.
     820
     821There is a properties file per class, and one per interface. At the moment, we have
     822
     823GS2MGPPSearch.properties
     824GS2MGPPRetrieve.properties etc - the service classes
     825
     826interface_default.properties. - for the default interface
     827
     828To add other languages, create eg GS2MGPPSearch_fr.properties.
     829
     830The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml $<$display$>$ element - the xslt can get the ones it needs from there.
     831xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better?
     832
     833All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created.
    1181834
    1182835\subsection{Collection formation}
    1183836
    1184 
    1185 There is no facility to create collections in GSDL3 yet. There are three
    1186 working servicesImpl classes: MGPPGDBMServices, GSDL2ClassifierServices and PhindServices---these use
    1187 standard collections built with MGPP and gdbm from GSDL2. For
    1188 PhindService, you need to add 'classify phind' to the collect.cfg file
    1189 during building. For the GSDL2ClassifierServices you need to have any other classifiers specified.
    1190 
    1191 To use a collection in GSDL3, build using mgpp in the old greenstone
    1192 (see mgpp\_in\_greenstone.txt in the mgpp/docs directory of either
    1193 gsdl).
    1194 
    1195 Then copy the collection over into the appropriate collect directory,
    1196 and create index/buildConfig.xml (see \ref{subsec:config}). The basic info
    1197 that you need is shown below. Substitute the appropriate values for
    1198 your collection. Only put the phind service one in if you have a phind
    1199 classifier.
    1200 
    1201 \begin{quote}\begin{footnotesize}\begin{verbatim}
    1202 <buildConfiguration>
    1203   <metadataList>
    1204     <metadata name="iconCollection">mgppdemo.gif</metadata>
    1205     <metadata name="colName">mgpp demo</metadata>
    1206   </metadataList>
    1207   <servicesImplList>
    1208     <servicesImpl name="MGPPGDBMServices">
    1209       <defaultIndex name="tt"/>
    1210       <defaultLevel name='Section'/>
    1211     </servicesImpl>
    1212     <servicesImpl name="PhindServices"/>
    1213     <servicesImpl name="GSDL2ClassifierServices">
    1214       <classifierList>
    1215         <classifier name="CL1">
    1216           <metadataList>
    1217             <metadata name="Title">Subject</metadata>
    1218           </metadataList>
    1219         </classifier>
    1220         <classifier name="CL2" >
    1221           <metadataList>
    1222             <metadata name="Title">Title</metadata>
    1223           </metadataList>
    1224         </classifier>
    1225       </classifierList>
    1226     </servicesImpl>
    1227   </servicesImplList>
    1228 </buildConfiguration>
    1229 \end{verbatim}\end{footnotesize}\end{quote}
     837Greenstone 2 compatible building has been implemented in gsdl3. so far only mgpp collections will work.
     838
     839Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory.
     840
     841You need to carry out the following services:
     842NewCollection
     843- add docs to import directory
     844ImportCollection
     845BuildCollection
     846ActivateCollection
     847
     848If you want anything other than the default for the config file, you need to add it by hand - there is currently no ConfigureCollection service which would enable you to do this.
     849
     850Collection building can also be done on the command line:
     851
     852ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>
     853
     854eg
     855
     856ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol
     857
     858the options get passed to the underlying script, - there is no good help message yet.
     859
     860import and build use gs2 import.pl and buildcol.pl so you can specify any of their options if you like.
     861
     862Building stuff is in src/java/org/greenstone/gsdl3/build.
     863
     864CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.
    1230865
    1231866\section{Details}
     
    1257892  & Utility classes \\
    1258893gsdl3/src/java/org/greenstone/gsdl3/collection
    1259   & Collection class\\
     894  & ServiceCluster and Collection classes\\
    1260895gsdl3/src/java/org/greenstone/gsdl3/comms
    1261896  & Communicator classes, eg SOAP\\
     897gsdl3/src/java/org/greenstone/gsdl3/build
     898  & stuff for collection building \\
    1262899gsdl3/src/java/org/greenstone/gsdl3/action
    1263900  & Action classes used by the Receptionist---do the work of displaying the pages\\
     
    1276913gsdl3/lib/java
    1277914  & Java jar files\\
     915gsdl3/resources
     916 & any resources that may be needed\\
     917gsdl3/resources/java
     918 & properties files for java resource bundles - used to handle all the language specific text\\
     919gsdl3/bin
     920  & executable stuff lives here\\
     921gsdl3/bin/script
     922  & some perl building scripts\\
     923gsdl3/bin/linux
     924  & linux executables for eg mgpp\\
    1278925gsdl3/comms
    1279926  & Put some stuff here for want of a better place---things to do with servers and communication. eg soap stuff, and tomcat servlet container\\
     
    1305952gsdl3/interfaces/default/transforms
    1306953  & The XSLT files\\
    1307 gsdl3/interfaces/translate
    1308   & Language specific stuff---language xml files containing all the text strings go here\\
    1309954\hline
    1310955\end{tabular}}
     
    1317962\newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}}
    1318963
    1319 Cuurently, greenstone3 is only available through CVS. The installation procedure has not been automated. Eventually, all that will be needed (hopefully) will be a {\footnotesize \verb#configure, make, make install#} sequence. But for now, all the steps must be done by hand.
     964Cuurently, greenstone3 is only available through CVS. The installation procedure has  been automated.
    1320965
    1321966\subsubsection{Get the source}
     
    1340985
    1341986\noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}.
    1342 \\
    1343 \\
    1344 \noindent You also need to download the mgpp code - it comes in a separate CVS module.
    1345 
    1346 \noindent I once added a directory for mgpp in gsdl3/packages in cvs---now I can't get
    1347 rid of it, so you need to delete it before you start.
     987
     988\subsubsection{Compile and install greenstone}\label{subsec:compile}
     989
     990An install.sh script has been constructed (thanks, Stuart) to compile and install greenstone 3. What you nee to do is:
    1348991
    1349992\begin{footnotesize}\begin{tt}
    1350 \noindent cd \gsdlhome/packages\\
    1351 rm -r mgpp\\
    1352 cvs co mgpp\\
    1353 \end{tt}\end{footnotesize}
    1354 
    1355 \subsubsection{Compile and install greenstone}\label{subsec:compile}
    1356 
    1357 \noindent From here on, \gsdlhome\  is the absolute path to the top-level directory of the gsdl3 checkout.
    1358 For example, /research/kjdon/gsdl3.
    1359 \\
    1360 \\
    1361 \noindent First, set up your classpath:\\
    1362 \begin{footnotesize}\begin{tt}
    1363 cd \gsdlhome\\
     993cd gsdl3
     994source setup.bash
     995install.bash
    1364996source setup.bash
    1365997\end{tt}\end{footnotesize}
    1366998
    1367 \noindent Note: this step needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.
    1368 \\
    1369 \\
    1370 \noindent Compile mgpp:\\
     999If you want to do greenstone2 compatible building (currently the only type) you need to have greenstone 2 installed, 'source setup.bash' in the top level greenstone 2 directory, then re-'source setup.bash' for greenstone 3. This is to set GSDLHOME for tomcat.
     1000
     1001\noindent Note: 'source setup.bash' needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.
     1002
     1003If you want to use SOAP to talk to remote sites, you also need to do the following:
     1004
    13711005\begin{footnotesize}\begin{tt}
    1372 cd \gsdlhome/packages/mgpp\\
    1373 ./configure --prefix \gsdlhome\\
    1374 make\\
    1375 make install\\
     1006install-soap.bash 
    13761007\end{tt}\end{footnotesize}
    13771008
    1378 \noindent Note: you need to use \gsdlhome\  as the prefix for mgpp's configure at this stage---mgpp has been set up properly, but gsdl3 hasn't.
    1379 
    1380 \noindent Next you need to compile greenstone.
    1381 
    1382 \noindent A jar file is used from tomcat during compilation, so this must be unpacked first.
    1383 \begin{footnotesize}\begin{tt}
    1384 cd \gsdlhome/comms/tomcat/\\
    1385 tar xzvf jakarta-tomcat-4.0.1.tar.gz \\
    1386 \end{tt}\end{footnotesize}
    1387 \\
    1388 \\
    1389 \noindent Do a \verb#make#, then a \verb#make install# in each of the following directories:\\
    1390 \begin{footnotesize}\begin{tt}
    1391 \gsdlhome/src/java/org/greenstone/gdbm\\
    1392 \gsdlhome/src/java/org/greenstone/testing\\
    1393 \gsdlhome/src/java/org/greenstone/gsdl3\\
    1394 \gsdlhome/src/java/org/greenstone/applet/phind
    1395 \end{tt}\end{footnotesize}
    1396 
    1397 \subsubsection{Set up the sample sites}
     1009Thats it.
     1010
     1011You dont want to run install.bash twice - it adds stuff into files
     1012
     1013To update your installation, you can run update.bash - this remakes all the java stuff.
     1014
     1015
     1016\subsubsection{The sample sites}
    13981017
    13991018\noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name,  site-wide services if any, and a list of remote sites to connect to.
     
    14021021\noindent The collections which do not have data can be looked at but you cant do any queries on them.
    14031022
    1404 \noindent The data comes in tar files, which need to be unpacked:
    1405 
    1406 \begin{footnotesize}\begin{tt}
    1407 \noindent cd \gsdlhome/sites/localsite/collect/mgppdemo/index/\\
    1408 tar xzvf mgpp-indexfiles.tar.gz\\
    1409 cd ../../chinesedemo/index\\
    1410 tar xzvf chinese-index-files.tar.gz\\
    1411 \end{tt}\end{footnotesize}
    1412 
    1413 \subsubsection{Set up tomcat}
     1023
     1024\subsubsection{Tomcat}
    14141025
    14151026\noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.
     
    14281039\end{verbatim}\end{footnotesize}
    14291040
    1430 \noindent You need to replace {\footnotesize \verb#/research/kjdon/home/gsdl3#} with the correct path for \gsdlhome, in both library servlet entries.
    1431 \\
    1432 \\
    1433 \noindent Next, symbolic links to the sites, interfaces and lib directories need to be set up---this enables tomcat to 'see' files in these directories.
    1434 
    1435 \begin{footnotesize}\begin{tt}
    1436 \noindent cd \gsdlhome/web\\
    1437 ln -s ../interfaces\\
    1438 ln -s ../sites\\
    1439 ln -s ../lib
    1440 \end{tt}\end{footnotesize}
    1441 
    1442 \noindent The test servlet needs to be compiled: (you need to set up your {\footnotesize CLASSPATH} if you haven't already, see \ref{subsec:compile})\\
    1443 \begin{footnotesize}\begin{tt}
    1444 \noindent cd \gsdlhome/web/WEB-INF/classes\\
    1445 javac TestServlet.java
    1446 \end{tt}\end{footnotesize}
    1447 \\
    1448 \\
    1449 \noindent Next, one of the scripts that runs tomcat needs to be altered to use our {\footnotesize CLASSPATH}.
    1450 
    1451 \begin{footnotesize}\begin{tt}
    1452 \noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1
    1453 \end{tt}\end{footnotesize}
    1454 \\
    1455 \\
    1456 \noindent edit {\footnotesize \verb#bin/catalina.sh#}:
    1457 
    1458 \noindent on line 89 add {\footnotesize \$CLASSPATH} to the {\footnotesize CP="...."} line ie. {\footnotesize CP="\$CLASSPATH:..."}---this
    1459 sets up the classpath properly
    1460 \\
    1461 \\
    1462 \noindent Now you need to tell tomcat about the greenstone context:
    1463 \\
    1464 \\
    1465 \noindent edit {\footnotesize \verb#conf/server.xml#}:
    1466 
    1467 \noindent you need to add a context for gsdl servlets---there are other context elements in the xml---this one goes at the same level as those ones.\\
    1468 add the following (putting the correct path for \gsdlhome)
    1469 
    1470 \begin{footnotesize}\begin{tt}
    1471 \noindent <!-- GSDL3 Service -->\\
    1472 <Context path="/gsdl3" docBase="\gsdlhome/web" debug="1" reloadable="true"/>
    1473 \end{tt}\end{footnotesize}
     1041The file \gsdlhome/comms/tomcat/jakarta/conf/server.xml is the tomcat configuration file. setup.bash adds a context for gsdl servlets - this tells tomcat where to find the web.xml file, and what url (eg /gsdl3) to give it.
    14741042
    14751043\noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file
     
    15071075\\
    15081076\noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service.
    1509 \\
    1510 \\
    1511 \noindent Set up SOAP:
    1512 \\
    1513 \\
    1514 \begin{footnotesize}\begin{tt}
    1515 \noindent cd \gsdlhome/comms/soap\\
    1516 tar xzvf soap-bin-2.2.tar.gz
    1517 \end{tt}\end{footnotesize}
    1518 \\
    1519 \\
    1520 \noindent The context for the SOAP servlet needs to be added to the tomcat server.xml file in the same way that you added the context for gsdl3:
    1521 
    1522 \noindent edit \begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml}\end{footnotesize}
    1523 
    1524 \noindent add the following (put the proper path for \gsdlhome)
    1525 
    1526 \begin{footnotesize}\begin{tt}
    1527 \noindent <!-- SOAP Service -->\\
    1528 <Context path="/soap" docBase="\gsdlhome/comms/soap/soap-2\_2/webapps/soap"\\
    1529 debug="1" reloadable="true"/>
    1530 \end{tt}\end{footnotesize}
    1531 \\
    1532 \\
    1533 \noindent Next, the class SOAPServer must be altered---the constructor is not allowed any arguments, so it has a path hard coded in it. This is the address of the site that is to be served. In \begin{footnotesize}{\tt \gsdlhome/src/java/org/greenstone/gsdl3/SOAPServer.java}\end{footnotesize}, you need to change the {\footnotesize \verb#site_home#} variable to \begin{footnotesize}{\tt \gsdlhome/sites/localsite}\end{footnotesize} (using the absolute path).
    1534 \\
    1535 \\
    1536 \noindent The SOAPServer service now needs to be deployed. If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
     1077
     1078this is done by install-soap.bash.
     1079You can also deploy a service through the website.  If tomcat is not running, start it up (see \ref{subsec:runtomcat}).
    15371080
    15381081\noindent The SOAP servlet can be accessed at \begin{footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
Note: See TracChangeset for help on using the changeset viewer.