Changeset 10880


Ignore:
Timestamp:
2005-11-11T10:12:44+13:00 (18 years ago)
Author:
kjdon
Message:

ran a spell checker

Location:
trunk/gsdl3/docs/manual
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r10863 r10880  
    2121\author{Katherine Don, George Buchanan and Ian H. Witten \\[1ex]
    2222        Department of Computer Science \\
    23          University of Waikato \\ Hamilton, New Zealand \\
    24         \{kjdon, grbuchan, ihw\}@cs.waikato.ac.nz}
     23        University of Waikato \\ Hamilton, New Zealand \\ }
    2524
    2625\date{}
     
    3635reimplementation of the \gs\  digital library software.  The current
    3736version (\gsii) enjoys considerable success and is being widely used.
    38 \gsiii \  will capitalise on this success, and in addition it will
     37\gsiii \  will capitalize on this success, and in addition it will
    3938\begin{bulletedlist}
    4039\item improve flexibility, modularity, and extensibility
     
    4443   self-documentation
    4544\item make full use of existing XML-related standards and software
    46 \item provide improved internationalisation, particularly in terms of sort order,
     45\item provide improved internationalization, particularly in terms of sort order,
    4746   information browsing, etc.
    4847\item include new features that facilitate additional ``content management''
     
    5857A description of the general design and architecture of \gsiii\  is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the docs/manual directory).
    5958
    60 This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
     59This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customizations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
    6160\newpage
    6261\tableofcontents
     
    125124  & Imported source packages from other systems e.g. MG, MGPP \\
    126125greenstone3/extensions
    127   & Extensions to greenstone 3 core functionality, eg, Vishnu visualizer, Alerting service \\
     126  & Extensions to greenstone 3 core functionality, e.g., Vishnu visualizer, Alerting service \\
    128127greenstone3/lib
    129128  & Shared library files\\
     
    141140  & some Perl and/or shell building scripts\\
    142141greenstone3/packages
    143   & External packages that may be installed as part of greenstone, e.g. Tomcat, Mysql \\
     142  & External packages that may be installed as part of greenstone, e.g. Tomcat, MySQL \\
    144143greenstone3/docs
    145144  & Documentation\\
     
    150149  & The web.xml file lives here (servlet configuration information for Tomcat)\\
    151150greenstone3/web/WEB-INF/classes
    152   & Individual class files needed by the servlet go in here, also properties files for java resource bundles - used to handle all the language specific text. This direcotry is on the servlet classpath\\
     151  & Individual class files needed by the servlet go in here, also properties files for java resource bundles - used to handle all the language specific text. This directory is on the servlet classpath\\
    153152greenstone3/web/WEB-INF/lib
    154153  & jar files needed by the servlets go here \\
     
    187186One \gsiii\  installation can have many sites and interfaces, and these can be paired in different combinations.  One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance.  For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. e.g. HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
    188187
    189 Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialised will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
     188Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialized will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
    190189
    191190There are two  sites that come with the distribution: \gst{localsite}, and \gst{gateway}. \gst{localsite} has several demo  collections, while \gst{gateway} has none. \gst{gateway} specifies that a SOAP connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
     
    197196
    198197The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the configuration information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
    199 There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\  library servlets described in Section~\ref{sec:browser-access}, \gst{library}, \gst{classic} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. Site\_name and interface\_name are just two examples of initialisation parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
     198There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\  library servlets described in Section~\ref{sec:browser-access}, \gst{library}, \gst{classic} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. Site\_name and interface\_name are just two examples of initialization parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
    200199
    201200For more details about Tomcat see Appendix~\ref{app:tomcat}.
    202201
    203202\begin{table}
    204 \caption{\gs\  servlet initialisation parameters}
     203\caption{\gs\  servlet initialization parameters}
    205204\label{tab:serv-init}
    206205{\footnotesize
     
    223222
    224223Initial \gsiii\  system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
    225 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a system command to the library.  There are a series of system commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
     224The configuration files are read in when the system is initialized, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a system command to the library.  There are a series of system commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
    226225
    227226\subsubsection{Site configuration file}\label{sec:siteconfig}
     
    322321
    323322
    324 \subsection{Run-time re-initialisation}\label{sec:runtime-config}
     323\subsection{Run-time re-initialization}\label{sec:runtime-config}
    325324
    326325When Tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless Tomcat is restarted, or re-configuration commands issued.
     
    358357Browsing involves navigating pre-defined hierarchies of documents, following links of interest to find documents. The hierarchies may be constructed on different metadata fields, for example, alphabetical lists of Titles, or a hierarchy of Subject classifications. Clicking on a bookshelf icon takes you to a lower level in the hierarchy, while clicking on a book or page icon takes you to a document.
    359358
    360 In the standard interface that comes with \gsiii\ \footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
     359In the standard interface that comes with \gsiii\ \footnote{of course, this is all customizable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
    361360
    362361\begin{figure}[h]
     
    387386Building native \gsiii\  collections is done using the \gst{gs3-build.sh/bat} script, with the \gst{collectionConfig.xml} file controlling how the building is done.  There are a number of considerations in building a collection:  what documents appear in the collection, how they are indexed for searching, which classifications are used for browsing, etc.
    388387
    389 Firstly, the documents that comprise the collection should be placed in the import subdirectory.  At present, only documents in this directory will appear in the collection. Documents can be organised into sub folders inside the import directory.
     388Firstly, the documents that comprise the collection should be placed in the import subdirectory.  At present, only documents in this directory will appear in the collection. Documents can be organized into sub folders inside the import directory.
    390389[TODO: describe the kinds of documents that can be added, something about METS files?]
    391390
     
    449448The collectionConfig.xml file controls the all of these options for collection building, and the format is described in Section~\ref{sec:collconfig}.
    450449
    451 To build a collection, place the source documents and optional metadata.xml file(s) in the import directory, place the \gst{collectionConfig.xml} file in the etc directory, and execute \gst{gs3build.sh/bat sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have mysql running before you start building---running \gst{ant start} will start up the MySQL server as well as tomcat.
     450To build a collection, place the source documents and optional metadata.xml file(s) in the import directory, place the \gst{collectionConfig.xml} file in the etc directory, and execute \gst{gs3build.sh/bat sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have MySQL running before you start building---running \gst{ant start} will start up the MySQL server as well as tomcat.
    452451
    453452Once the build process is complete, the building directory should be renamed to index (after deleting or renaming the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
     
    457456The Greenstone Librarian Interface (GLI) can be used to create \gsii\ style collections for \gsiii. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 Digital Library menu in the Program Files section of the Start menu. On Linux, run \gst{./gli4gs3.sh} from the \gst{greenstone3/gli} directory.
    458457
    459 Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use native \gsiii\ config files and collection building}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
     458Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use native \gsiii\ configuration files and collection building}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
    460459
    461460The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates within a single site---you can edit, delete, create new collections within this site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences-$>$Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
    462461
    463 Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\  configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ config files. You can either rebuild the collection through the GLI (may take a while), or run the conversion script directly (see following section).
     462Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\  configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ configuration files. You can either rebuild the collection through the GLI (may take a while), or run the conversion script directly (see following section).
    464463 
    465464Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the Greenstone 2 User's Guide (\gst{GS2-User-en.pdf}. This can be found in  your \gsii\ installation, or in the greenstone3/docs/manual directory if you have installed \gsiii\ from a distribution.
     
    483482The script attempts to create \gsiii\  format statements from the old \gsii\  ones. The conversion may not always work properly, so if the collection looks a bit strange under \gsiii\ , you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
    484483
    485 Once again, to have the collection recognised by the library servlet, you can either restart Tomcat, or load it dynamically.
     484Once again, to have the collection recognized by the library servlet, you can either restart Tomcat, or load it dynamically.
    486485
    487486\subsection{Collection configuration files}\label{sec:collconfig}
     
    498497\subsubsection{collectionInit.xml}
    499498
    500 This optional file is only used for non-standard, customised collections. It specifies the class name of the non-standard collection class.  The only syntax so far is the class name:
     499This optional file is only used for non-standard, customized collections. It specifies the class name of the non-standard collection class.  The only syntax so far is the class name:
    501500
    502501\begin{gsc}\begin{verbatim}
     
    504503\end{verbatim}\end{gsc}
    505504
    506 Section~\ref{sec:new-coll-types} describes an example collection where this file is used. Depending on the type of collection that this is used for, one or both of the other config files may not be needed.
     505Section~\ref{sec:new-coll-types} describes an example collection where this file is used. Depending on the type of collection that this is used for, one or both of the other configuration files may not be needed.
    507506
    508507\subsubsection{collectionConfig.xml}
     
    578577The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These displayItem elements can be specified in different languages.
    579578 
    580 The \gst{<search>} element specifies what indexes should be built, and provides some display and formatting information for each one. Search has an attribute, \gst{type}, which specifies which indexer to be used for indexing. Currently, \gst{mg} and \gst{mgpp}[??] are available. If type is not specified, mg is used. Multiple search elements may be specified, if more than one indexer is to be used. (Note, this is not yet recognised by the run-time system.)
     579The \gst{<search>} element specifies what indexes should be built, and provides some display and formatting information for each one. Search has an attribute, \gst{type}, which specifies which indexer to be used for indexing. Currently, \gst{mg} and \gst{mgpp}[??] are available. If type is not specified, mg is used. Multiple search elements may be specified, if more than one indexer is to be used. (Note, this is not yet recognized by the run-time system.)
    581580
    582581Search indexes appear as individual \gst{<index>} elements within the \gst{<search>} element. Some choices for the index are made using attributes of the element itself, and some through child elements. 
     
    597596</index>
    598597\end{verbatim}\end{gsc}
    599 ...in this case the \gst{<field>} tag refers to the ``title'' metadata item, found in the Dublin Core namespace.  The mg search engine would be used on this index.
     598...in this case the \gst{<field>} tag refers to the ``title'' metadata item, found in the Dublin Core namespace.  The MG search engine would be used on this index.
    600599
    601600Alternatively, to index the full document texts by section:
     
    865864This will display the dls.Title metadata if available, otherwise it will use the dc.Title metadata if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed.
    866865
    867 The \gst{<gsf:switch>} element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organisation it came from.
     866The \gst{<gsf:switch>} element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organization it came from.
    868867
    869868\begin{gsc}
     
    958957A particular collection can override the properties for any service. For example, if a collection uses the GS2MGSearch service rack (look in the buildConfig.xml file for a list of service racks used), and the collection builder wants to change the text associated with this service, they can put a GS2MGSearch.properties file in the resources directory of the collection.
    959958This will be used in preference to one in the default resources directory.
    960 Note that while changes in the default properties files seem to require a tomcat restart to take effect, changes in the colleciton specific properties files take effect immediately.
    961 
    962 \subsection{Customising the interface}\label{sec:interface-customise}
    963 
    964 Format statements in the collection configuration files provide a way to change small parts of the collection display. For large scale customisations to a collection, or ones that apply to a site as a whole, a second mechanism is available. The interface is defined by a set of XSLT files that transform the page data into HTML. Any of these files can be overridden to provide specialised display, on a site or collection basis.
     959Note that while changes in the default properties files seem to require a tomcat restart to take effect, changes in the collection specific properties files take effect immediately.
     960
     961\subsection{Customizing the interface}\label{sec:interface-customise}
     962
     963Format statements in the collection configuration files provide a way to change small parts of the collection display. For large scale customizations to a collection, or ones that apply to a site as a whole, a second mechanism is available. The interface is defined by a set of XSLT files that transform the page data into HTML. Any of these files can be overridden to provide specialized display, on a site or collection basis.
    965964
    966965The first section looks at customizing the existing interface, while the second section looks at defining a whole new interface. The last section describes how to add a new language translation of an interface.
     
    970969Most of an interface is defined by XSLT files, which are stored in \gst{\$GSDL3HOME/\-web/\-interfaces/\-interface-name/\-transform}. These can be changed and the changes will take effect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following  order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same \gs\  installation as the interface.)
    971970
    972 Sites and collections can have a transform directory, which is where customised XSLT files should go. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different layout for the about page of a collection, you can put a new \gst{about.xsl} file into the collection's \gst{transform} directory, and this will be used instead. This is what we do for the Gutenberg sample collection.
    973 
    974 This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the java code and full paths added based on availability of the files, so that the correct one is used.
     971Sites and collections can have a transform directory, which is where customized XSLT files should go. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different layout for the about page of a collection, you can put a new \gst{about.xsl} file into the collection's \gst{transform} directory, and this will be used instead. This is what we do for the Gutenberg sample collection.
     972
     973This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the Java code and full paths added based on availability of the files, so that the correct one is used.
    975974
    976975Note that you cannot include a file with the same name as the including file. For example query.xsl cannot include query.xsl (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
     
    992991Keys will be looked up in the properties file closest to the specified language. For example, if language \gst{fr\_CA} was specified (French language, country Canada), and the default locale was \gst{en\_GB},  Java would look at properties files in the following order, until it found the key: \gst{XXX\_fr\_CA.properties}, \gst{XXX\_fr.properties},  \gst{XXX\_en\_GB.properties}, then \gst{XXX\_en.properties}, and finally the default \gst{XXX.properties}.
    993992
    994 These new files are available straight away---to use the new language, add e.g. \gst{l=fr} to the arguments in the URL. To get \gs\ to add it in to the list of languages on the preferences page, an entry needs to be added into the languages list in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}). Modification of this file requires a restart of the Tomcat server for the changes to be recognised.
     993These new files are available straight away---to use the new language, add e.g. \gst{l=fr} to the arguments in the URL. To get \gs\ to add it in to the list of languages on the preferences page, an entry needs to be added into the languages list in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}). Modification of this file requires a restart of the Tomcat server for the changes to be recognized.
    995994
    996995\newpage
     
    10721071Messages inside the system (``internal'' messages) all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned. Currently all requests are independent, so any requests can be combined into the same message, and they will be answered separately, with their responses being sent back in a single message.
    10731072
    1074 When a page request (external request) comes in to the Receptionist, it looks at the action attribute and passes the request to the appropriate Action module. The Action will fire one or more internal requests to the MessageRouter, based on the arguments. The data is gathered into a  response, which is returned to the Receptionist.  The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the \gs\ servlet  we transform using XSLT to generate html pages.
     1073When a page request (external request) comes in to the Receptionist, it looks at the action attribute and passes the request to the appropriate Action module. The Action will fire one or more internal requests to the MessageRouter, based on the arguments. The data is gathered into a  response, which is returned to the Receptionist.  The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the \gs\ servlet  we transform using XSLT to generate HTML pages.
    10751074
    10761075Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text
     
    12171216A service description also contains some display information---this includes the name of the service, and the text  for the submit button.
    12181217
    1219 Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} shows an example html search form that may be generated from this describe response.
     1218Here is a sample describe request to the FieldQuery service of collection mgppdemo, along with its response. The parameters in this example include their display information. Figure~\ref{fig:query-display} shows an example HTML search form that may be generated from this describe response.
    12201219
    12211220\begin{quote}\begin{gsc}\begin{verbatim}
     
    13001299\end{figure}
    13011300
    1302 A describe request to an applet type service returns the applet html element: this will be embedded into a web page to run the applet.
     1301A describe request to an applet type service returns the applet HTML element: this will be embedded into a web page to run the applet.
    13031302\begin{quote}\begin{gsc}\begin{verbatim}
    13041303<request type='describe' to='mgppdemo/PhindApplet'/>
     
    13291328\end{verbatim}\end{gsc}\end{quote}
    13301329
    1331 Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the Receptionist must fill in this parameter before displaying the html.
     1330Note that the library parameter has been left blank. This is because library refers to the current servlet that is running and the name is not necessarily known in advance. So either the applet action or the Receptionist must fill in this parameter before displaying the HTML.
    13321331
    13331332\subsection{'system'-type messages}\label{sec:system}
     
    16031602\end{verbatim}\end{gsc}\end{quote}
    16041603
    1605 One or more parameters specifying metadata may be included in a request. Also, ametadata  value of \gst{all} will retrieve all the metadata for each document.
     1604One or more parameters specifying metadata may be included in a request. Also, a metadata  value of \gst{all} will retrieve all the metadata for each document.
    16061605
    16071606Any browse-type service must also implement a metadata retrieval service to provide metadata for the nodes in the classification hierarchy. The name of it is the browse service name plus \gst{MetadataRetrieve}. For example, the ClassifierBrowse service described in the previous section should also have a ClassifierBrowseMetadataRetrieve service. The request and response format is exactly the same as for the DocumentMetadataRetrieve service, except that \gst{<documentNode>} elements are replaced by \gst{<classifierNode>} elements (and the corresponding list element is also changed).
     
    17851784
    17861785A 'page' is some XML or HTML (or other?) data returned in response to an
    1787 external 'page'-type request. These requests originate from outside \gs\ , for example from a servlet, or java application, and are received by the Receptionist. As described below in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\  URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to.
     1786external 'page'-type request. These requests originate from outside \gs\ , for example from a servlet, or Java application, and are received by the Receptionist. As described below in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\  URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to.
    17881787
    17891788Action modules decode the rest of the arguments to determine what requests need to be made to the system. One or more internal requests may be made to the MessageRouter. A request for format information from the Collection/Service may also be made. The resulting data is gathered together into a single XML response, \gst{<page>}, and returned to the Receptionist.
     
    18441843& & but no processing of the results is done \\
    18451844& & currently only used in process actions \\
    1846 o & output type & XML, html, WML \\
     1845o & output type & XML, HTML, WML \\
    18471846l & language & en, fr, zh ...\\
    18481847d & document id & HASHxxx \\
     
    18981897
    18991898\subsubsection{Collection specific formatting}\label{sec:collformat}
    1900 get format info, transform gsf->xsl. transfrom xml->html
    1901 
    1902 config params are passed in to the transformation
     1899get format info, transform gsf->xsl. transform xml->html
     1900
     1901configuration params are passed in to the transformation
    19031902\subsubsection{CGI arguments}
    19041903
     
    19061905\subsubsection{Page action}\label{sec:pageaction}
    19071906
    1908 PageAction is responsible for displaying kinds of information pages, such as the home page of the library, or the home page of a collection, or the help and preferenecs pages. These pages are not associated with specific services like the other page types. In general, the data comes from describe requests to various modules.
     1907PageAction is responsible for displaying kinds of information pages, such as the home page of the library, or the home page of a collection, or the help and preferences pages. These pages are not associated with specific services like the other page types. In general, the data comes from describe requests to various modules.
    19091908The different pages are requested using the subaction argument. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page.    For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster.  This returns a list of metadata
    19101909and a list of services.
     
    19521951\subsubsection{XML Document action}\label{sec:xmldocumentaction}
    19531952
    1954 XMLDOcumentAction is a little different to the standard DocumentAction. It operates in two modes, \gst{text} and \gst{toc}. In \gst{text} mode, it will retrieve the content of the current document node using a DocumentContentRetrieve request. In \gst{toc} mode, it retrieves the entire table of contents for the document using a DocumentStructureRetrieve request. Either mode may also retrieve metadata for the current section or each section in the table of contents.
     1953XMLDocumentAction is a little different to the standard DocumentAction. It operates in two modes, \gst{text} and \gst{toc}. In \gst{text} mode, it will retrieve the content of the current document node using a DocumentContentRetrieve request. In \gst{toc} mode, it retrieves the entire table of contents for the document using a DocumentStructureRetrieve request. Either mode may also retrieve metadata for the current section or each section in the table of contents.
    19551954
    19561955\subsubsection{GS2Browse action}\label{sec:browseaction}
     
    20132012GSXSLT & some manipulation functions for \gs\  XSLT\\
    20142013GlobalProperties & Holds the global properties (from global.properties) \\
    2015 MacroResolver & Used with replace elements in collection config files, replaces a macro or string with another string, metadata or text from a dictionary\\
     2014MacroResolver & Used with replace elements in collection configuration files, replaces a macro or string with another string, metadata or text from a dictionary\\
    20162015GS2MacroResolver & MacroResolver for GS2 collections, that uses the GDBM database\\
    20172016Misc & miscellaneous functions\\
     
    20402039\subsection{Creating new services}\label{sec:new-services}
    20412040
    2042 *inherit from ServiceRack - abstract base class. this handles the main process method, determines the service name and request type. if request type is describe, and to is empty, it returns a list of services (short\_service\_info) which is initialised in the configure method. a describe request to a particular service results in getServiceDescription being called, which must be supplied by the subclass.
     2041*inherit from ServiceRack - abstract base class. this handles the main process method, determines the service name and request type. if request type is describe, and to is empty, it returns a list of services (short\_service\_info) which is initialized in the configure method. a describe request to a particular service results in getServiceDescription being called, which must be supplied by the subclass.
    20432042other request types (process) get sent to processXXX methods, where XXX is the service name.
    20442043
     
    20622061
    20632062Java GUI Interface: There are couple of alternatives. Depending on what you want to display in the GUI, you could talk to either a Receptionist or a MessageRouter. The library classes can be set up and compiled into the GUI program.
    2064 Talking to a Receptionist will give you access to pages of XML. It is likely that the standard Receptionist class would be used - this doesn't transform the data to HTML. Queries such as ``give me the home page of a collection'' and ``do the following search'' can be issued. All teh data needed for the result view is returned. Queries are quite simple, but are limited to what kinds of Actions are available in the library.
    2065 Talking to a MessageRouter requires a bit more effort on the part of the GUI program, but results in greater flexibility. The kinds of queries that can be issued are individual units of action, such as ``describe yourself'', ``search'', ``retrieve the content for this document''. More than one request may need to be made for a particular feature of the GUI. However you can ask for any combination of data available in the system, you are not relying on Actions. What you will implemenet though, may be a lot like the Action code in terms of request sequences.
    2066 
    2067 Interfaces in other programming languages: Because the communication is all XML based, other interfaces can talk to the Java library if a communication protocol is set up. This could be done using SOAP for example. LIke for Java GUI interfaces, the program could talk to a Receptionist or to a MessageRouter.
    2068 e.g. java interface. where you can interface to. MR vs Receptionist. diff receptionists. egs, handheld - using servlet, transforming recpt, but new set of XSLT java program other program - talk to recpt but just get back XML data for pages. java gui - just talk to MR, do all processing itself.
     2063Talking to a Receptionist will give you access to pages of XML. It is likely that the standard Receptionist class would be used - this doesn't transform the data to HTML. Queries such as ``give me the home page of a collection'' and ``do the following search'' can be issued. All the data needed for the result view is returned. Queries are quite simple, but are limited to what kinds of Actions are available in the library.
     2064Talking to a MessageRouter requires a bit more effort on the part of the GUI program, but results in greater flexibility. The kinds of queries that can be issued are individual units of action, such as ``describe yourself'', ``search'', ``retrieve the content for this document''. More than one request may need to be made for a particular feature of the GUI. However you can ask for any combination of data available in the system, you are not relying on Actions. What you will implement though, may be a lot like the Action code in terms of request sequences.
     2065
     2066Interfaces in other programming languages: Because the communication is all XML based, other interfaces can talk to the Java library if a communication protocol is set up. This could be done using SOAP for example. Like for Java GUI interfaces, the program could talk to a Receptionist or to a MessageRouter.
     2067e.g. Java interface. where you can interface to. MR vs Receptionist. different receptionists. e.g., handheld - using servlet, transforming recpt, but new set of XSLT Java program other program - talk to recpt but just get back XML data for pages. Java gui - just talk to MR, do all processing itself.
    20692068
    20702069Remote interfaces: remote interfaces can be set up in the same way as above, using a communication protocol between the interface, and the library program.
     
    20772076\subsection{New types of collections}\label{sec:new-coll-types}
    20782077
    2079 There are two types of standard \gs\  collections: collections built with the \gsiii\  building system, and collections that are imported from \gsii\ . There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\  has an ability to use any type of collection you can come up with, assuming  some java code is provided.
    2080 
    2081 There are four levels of customisation that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\  to describe these different levels.
     2078There are two types of standard \gs\  collections: collections built with the \gsiii\  building system, and collections that are imported from \gsii\ . There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\  has an ability to use any type of collection you can come up with, assuming  some Java code is provided.
     2079
     2080There are four levels of customization that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\  to describe these different levels.
    20822081
    20832082Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the \gsii\  MGPP collections were the first to be served in \gsiii\ . When we came to do \gsii\  MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
     
    21452144
    21462145The classic interface was created to be used by this site (and is now a standard part of Greenstone).
    2147 In many cases, creating a new interface just requires the new images and XSLT  to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This classic interface required a bit more customisation.
     2146In many cases, creating a new interface just requires the new images and XSLT  to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This classic interface required a bit more customization.
    21482147
    21492148The standard \gsiii\  navigation bar lists all the services available for the collection. In \gsii\ , the navigation bar provides the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produces the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
    21502149The standard receptionist (DefaultReceptionist) gathers a little bit of extra information for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface needed a bit more information than this: for the ClassifierBrowse service, if there was one, the list of classifiers and their display elements must be obtained. So a new Receptionist (NZDLReceptionist) was written that inherited from DefaultReceptionist, and added this new info into the page.
    21512150
    2152 One of the servlet initialisation parameters is the receptionist class: this was added to the servlet definition in the web.xml file so that the LibraryServlet would load up the right receptionist class.
     2151One of the servlet initialization parameters is the receptionist class: this was added to the servlet definition in the web.xml file so that the LibraryServlet would load up the right receptionist class.
    21532152
    21542153
     
    22492248
    22502249
    2251 Grenstone sets up Tomcat to run on port 8080 by default. To change this, you can edit the tomcat.port property in build.properties. If you do this before installing Greenstone, then running 'ant install' will use the new port number. If you want to change it later on, shutdown tomcat, run 'ant reconfigure-server-settings', then when you restart tomcat it will use the new port.
     2250Greenstone sets up Tomcat to run on port 8080 by default. To change this, you can edit the tomcat.port property in build.properties. If you do this before installing Greenstone, then running 'ant install' will use the new port number. If you want to change it later on, shutdown tomcat, run 'ant reconfigure-server-settings', then when you restart tomcat it will use the new port.
    22522251
    22532252Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:
     
    22592258\item any classes or jar files used by the servlets
    22602259\end{bulletedlist}
    2261 \noindent Note: stdin and stdout for the servlets (on linux) both go to\\
     2260\noindent Note: stdin and stdout for the servlets (on Linux) both go to\\
    22622261\gst{\gsdlhome/packages/tomcat/logs/catalina.out}
    22632262
     
    22732272\end{gsc}\end{quote}
    22742273
    2275 By default, Tomcat allows directory listings. To disable this, change the 'listings' paramter to false in the default servlet definition, in Tomcat's web.xml file (\gst{\$GSDL3HOME/packages/tomcat/conf/web.xml}):
     2274By default, Tomcat allows directory listings. To disable this, change the 'listings' parameter to false in the default servlet definition, in Tomcat's web.xml file (\gst{\$GSDL3HOME/packages/tomcat/conf/web.xml}):
    22762275
    22772276We have set the greenstone context to be reloadable. This means that if a class or resource file in web/WEB-INF/lib or web/WEB-INF/classes changes, the servlet will be reloaded. This is useful for development, but should be turned off for production mode (set the reloadable attribute to false).
     
    23002299\subsection{Running Tomcat behind a proxy}
    23012300
    2302 Almost everything works fine when Tomcat is running behind a proxy. The only time this causes trouble is if the servlet itself needs to make external http connections. We do this in the infomine demo collection for example. One of the service classes sends http requests to the infomine database at riverside. Since this is going through the proxy, a username and password is needed. It is not sufficient to prompt the user for a password because they are unlikely to have a password for the particular proxy that Tomcat is using. What we have done at present is to put a proxy element in the siteConfig.xml file. Here you have to enter a suitable username and password for the proxy server. Unfortunately these are entered in plain text. And the file is viewable via the servlet. So we need a better solution.
     2301Almost everything works fine when Tomcat is running behind a proxy. The only time this causes trouble is if the servlet itself needs to make external HTTP connections. We do this in the infomine demo collection for example. One of the service classes sends HTTP requests to the infomine database at riverside. Since this is going through the proxy, a username and password is needed. It is not sufficient to prompt the user for a password because they are unlikely to have a password for the particular proxy that Tomcat is using. What we have done at present is to put a proxy element in the siteConfig.xml file. Here you have to enter a suitable username and password for the proxy server. Unfortunately these are entered in plain text. And the file is viewable via the servlet. So we need a better solution.
    23032302
    23042303\newpage
    23052304\section{SOAP}\label{app:soap}
    23062305
    2307 Grenstone uses the Apache Axis SOAP implementation for distributed communications. Axis runs as a servlet inside Tomcat, and SOAP web services can be deployed by this Axis servlet. The Greenstone installation process sets up Axis for Tomcat, and predeploys the localsite web service.
     2306Greenstone uses the Apache Axis SOAP implementation for distributed communications. Axis runs as a servlet inside Tomcat, and SOAP web services can be deployed by this Axis servlet. The Greenstone installation process sets up Axis for Tomcat, and predeploys the localsite web service.
    23082307
    23092308To deploy a SOAP service for other sites, run \gst{ant soap-deploy-site}
     
    23982397\end{verbatim}\end{gsc}
    23992398
    2400 These two examples show how to deal with Greenstone 2's external link macros. The first one is for a 'relative' external link. In this case, the links are like URL's but they actually refer to Greenstone internal documents. So the Greensotne 3 link is to the document, but with parameter s0.ext signifying that the d argument will need translating before retrieving the content.
    2401 The second example is a truly external link. This is translated into a html type page action, where the url is presented as a frame along with the collection header in a separate frame.
     2399These two examples show how to deal with Greenstone 2's external link macros. The first one is for a 'relative' external link. In this case, the links are like URL's but they actually refer to Greenstone internal documents. So the Greenstone 3 link is to the document, but with parameter s0.ext signifying that the d argument will need translating before retrieving the content.
     2400The second example is a truly external link. This is translated into a HTML type page action, where the URL is presented as a frame along with the collection header in a separate frame.
    24022401
    24032402Sometimes we need to add in macros to be resolved in a second step:
Note: See TracChangeset for help on using the changeset viewer.