Changeset 6335
- Timestamp:
- 2004-01-05T12:39:51+13:00 (20 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl3/docs/manual/manual.tex
r6312 r6335 15 15 16 16 % if you work on this manual, add your name here 17 \author{Katherine Don and Ian H. Witten \\[1ex]17 \author{Katherine Don, George Buchanan and Ian H. Witten \\[1ex] 18 18 Department of Computer Science \\ 19 19 University of Waikato \\ Hamilton, New Zealand \\ 20 \{kjdon, ihw\}@cs.waikato.ac.nz}20 \{kjdon, grbuchan, ihw\}@cs.waikato.ac.nz} 21 21 22 22 \date{} … … 32 32 reimplementation of the Greenstone digital library software. The current 33 33 version (Greenstone2) enjoys considerable success and is being widely used. 34 Greenstone3 will capitali ze on this success, and in addition it will34 Greenstone3 will capitalise on this success, and in addition it will 35 35 \begin{bulletedlist} 36 36 \item improve flexibility, modularity, and extensibility … … 40 40 self-documentation 41 41 \item make full use of existing XML-related standards and software 42 \item provide improved internationali zation, particularly in terms of sort order,42 \item provide improved internationalisation, particularly in terms of sort order, 43 43 information browsing, etc. 44 44 \item include new features that facilitate additional ``content management'' … … 54 54 A description of the general design and architecture of Greenstone3 is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory). 55 55 56 This documentation consists of several parts. Section~\ref{sec:install} covers greenstone installation, how to access the library, and some administration issues. Section~\ref{sec:user} looks at usi gn the sample collections, creating new collections, and how to make small customizations to the interface. The remaining sections are aimed towards the Greenstone developer. Section~\ref{sec:runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:buildtime} describes the collection building process. Section~\ref{sec:develop} describes how to add new features to Greenstone, such as how to add new services, new page types, new plugins for different document formats. Section~\ref{sec:distributed} describes how to make Greentsone run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install Greentsone from CVS, and a comparison of greenstone 2 and greenstone 3 format statements.56 This documentation consists of several parts. Section~\ref{sec:install} covers greenstone installation, how to access the library, and some administration issues. Section~\ref{sec:user} looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards the Greenstone developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to Greenstone, such as how to add new services, new page types, new plugins for different document formats. Section~\ref{sec:distributed} describes how to make Greenstone run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install Greenstone from CVS, and a comparison of greenstone 2 and greenstone 3 format statements. 57 57 58 58 \section{Greenstone installation and administration}\label{sec:install} 59 59 60 This section covers where to get Greenstone 3 from, how to install it and how to run it. The standard method of running Green tsone is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access Greenstone, tomcat must be started up, and then it can be accessed via a web browser.60 This section covers where to get Greenstone 3 from, how to install it and how to run it. The standard method of running Greenstone is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access Greenstone, tomcat must be started up, and then it can be accessed via a web browser. 61 61 62 62 Greenstone is also available through CVS (Concurrent Versioning System). This provides the absolute latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install Greenstone from CVS. … … 64 64 \subsection{Get and install Greenstone} 65 65 66 Greenstone is available from www.... There are cur ently two distributions: a self-installing tar for linux, and a Windows executable.66 Greenstone is available from www.... There are currently two distributions: a self-installing tar for Linux, and a Windows executable. 67 67 68 68 \subsubsection{Linux} … … 96 96 Table~\ref{tab:dirs} shows the file hierarchy for Greenstone3. 97 97 The first part shows the common stuff which can be shared between 98 Greenstone users---the s rc, libraries etc. Under linux, these will eventually be installed into appropriate system directories. The second part shows98 Greenstone users---the source, libraries etc. Under Linux, these will eventually be installed into appropriate system directories. The second part shows 99 99 stuff used by one person/group---their sites and interface setup (see Section~\ref{sec:sites-and-ints}). 100 100 etc. There can be several sites/interfaces per installation. … … 115 115 & c/ cpp source code---none yet \\ 116 116 gsdl3/packages 117 & Imported packages from other systems e g mg, mgpp\\117 & Imported packages from other systems e.g. MG, MGPP \\ 118 118 gsdl3/lib 119 119 & Shared library files\\ … … 123 123 & any resources that may be needed\\ 124 124 gsdl3/resources/java 125 & properties files for java resource bundles - used to handle all the language specific text This directory is on the class path, so any other Java resources can be placed here \\125 & properties files for java resource bundles - used to handle all the language specific text This directory is on the class path, so any other Java resources can be placed here \\ 126 126 gsdl3/resources/soap 127 127 & soap service description files \\ 128 128 gsdl3/resources/dtd 129 & Green tsone has trouble loading DTD files sometimes. They can go here\\129 & Greenstone has trouble loading DTD files sometimes. They can go here\\ 130 130 gsdl3/bin 131 131 & executable stuff lives here\\ 132 132 gsdl3/bin/script 133 & some perl building scripts\\133 & some Perl building scripts\\ 134 134 gsdl3/bin/linux 135 & linux executables for eg mgpp\\135 & Linux executables for e.g. MGPP\\ 136 136 gsdl3/bin/windows 137 & windows executables for e g mgpp\\137 & windows executables for e.g. MGPP\\ 138 138 gsdl3/comms 139 & Put some stuff here for want of a better place---things to do with servers and communication. e gsoap stuff, and tomcat servlet container\\139 & Put some stuff here for want of a better place---things to do with servers and communication. e.g. soap stuff, and tomcat servlet container\\ 140 140 gsdl3/docs 141 141 & Documentation :-)\\ … … 148 148 & Servlet classes go in here\\ 149 149 gsdl3/web/sites 150 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (e gsoap) to other sites\\150 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (e.g. soap) to other sites\\ 151 151 gsdl3/web/sites/localsite 152 152 & One site - the site configuration file lives here\\ … … 158 158 & Site specific transforms \\ 159 159 gsdl3/web/interfaces 160 & Contains directories for different interfaces - an interface is defined by its images and xsltfiles \\160 & Contains directories for different interfaces - an interface is defined by its images and XSLT files \\ 161 161 gsdl3/web/interfaces/default 162 162 & The default interface\\ … … 170 170 171 171 172 \subsection{Sites and interfaces}\label{s ites-and-ints}172 \subsection{Sites and interfaces}\label{sec:sites-and-ints} 173 173 174 174 local gs stuff (sites and interfaces) vs installed stuff (code)\\ 175 where they live, whats t ehdifference, what each contains.\\175 where they live, whats the difference, what each contains.\\ 176 176 177 177 There are two Greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to. … … 204 204 \subsection{Configuring a greenstone installation} 205 205 206 Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a config file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have serveral configuration files; these are discussed in Section~\ref{sec:collconfig}.207 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to t eh interface config file to take effect. However, changes to the site config file can be incorporated sending a cgi-type command to teh library. cgi command can be sent to the library are made to the interface config file, tomcat needs to be restartedThere are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.206 Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}. 207 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a CGI-type command to the library. CGI command can be sent to the library are made to the interface configuration file, tomcat needs to be restarted. There are a series of CGI-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}. 208 208 209 209 \subsubsection{Site configuration file}\label{sec:siteconfig} … … 261 261 \subsubsection{Interface configuration file}\label{sec:interfaceconfig} 262 262 263 The interface config file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (but other ones can be loaded dynamically). If the interface uses servlets, it specifies what short name each action should use for the action cgi parameter eg QueryAction should use a=q. If the interface uses xslt, it specifies what xsltfile should be used for each action and subaction.263 The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (but other ones can be loaded dynamically). If the interface uses servlets, it specifies what short name each action should use for the action CGI parameter e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and subaction. 264 264 265 265 \begin{figure} … … 280 280 </interfaceConfig> 281 281 \end{verbatim}\end{gsc} 282 \caption{A sample interface config file}282 \caption{A sample interface configuration file} 283 283 \label{fig:ifaceconfig} 284 284 \end{figure} 285 285 286 This makes it easy for developers to implement and use different actions and/or xsltfiles without recompilation. The server must be restarted, however.286 This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however. 287 287 288 288 \subsection{Run-time re-initialisation}\label{sec:runtime-config} … … 290 290 should this section go in here, cos its kind of adminy, or go into the user stuff, cos you need to do it after building a collection??? 291 291 292 When tomcat is started up, the site and interface config files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless tomcat is restarted, or re-configuration commands issued.293 294 There are several cgi-type commands that can be issued to tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configfile has changed, tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, tomcat must be restarted then too.295 296 Currently, the runtime config commands can only be accessed by typing in cgi-arguments into the URL, there is no nice web form yet to do this.297 298 The cgiarguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in Greenstone, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the arguments in a bit more detail.292 When tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless tomcat is restarted, or re-configuration commands issued. 293 294 There are several CGI-type commands that can be issued to tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, tomcat must be restarted then too. 295 296 Currently, the runtime configuration commands can only be accessed by typing in CGI-arguments into the URL, there is no nice web form yet to do this. 297 298 The CGI arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in Greenstone, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the arguments in a bit more detail. 299 299 300 300 \begin{table} … … 305 305 \gst{a=s\&sa=c\&sc=XXX} & reconfigures the XXX collection or cluster. \gst{ss} can also be used here, valid values are \gst{metadataList} and \gst{serviceList}. \\ 306 306 \gst{a=s\&sa=a} & (re)activate a specific module. Modules are specified using two arguments, \gst{st} (system module type) and \gst{sn} (system module name). Valid types are \gst{collection}, \gst{cluster} \gst{site}.\\ 307 \gst{a=s\&sa=d} & deactivate a module. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{collection}, \gst{cluster}, \gst{site}, \gst{service}. Modules are removed from the current configuration, but will reappear if Tomcat is restar ed.\\307 \gst{a=s\&sa=d} & deactivate a module. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{collection}, \gst{cluster}, \gst{site}, \gst{service}. Modules are removed from the current configuration, but will reappear if Tomcat is restarted.\\ 308 308 \gst{a=s\&sa=d\&sc=XXX} & deactivate a module belonging to the XXX collection or cluster. \gst{st} and \gst{sn} can be used here too. Valid types are \gst{service}. \\\end{tabular} 309 309 \end{table} … … 315 315 \subsection{Using a collection}\label{sec:usecolls} 316 316 317 A collection typically consists of a set of documents, which could be text, html, word, pdf, images, bibliographic records etc, along with some access methods, or services. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.317 A collection typically consists of a set of documents, which could be text, html, word, PDF, images, bibliographic records etc, along with some access methods, or services. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers. 318 318 Searching involves entering words or phrases and getting back lists of documents that contain those words. The search terms may be restricted to particular fields of the document. Browsing ... 319 319 320 In the standard interface that comes with Greenstone3\footnote{of course, this is all customi zable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.320 In the standard interface that comes with Greenstone3\footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}. 321 321 322 322 \begin{figure}[h] … … 340 340 There are two ways to get a new collection into Greenstone 3. The first is to build it using the greenstone 3 building process. The second way is to import a greenstone 2 collection. 341 341 342 Collections live in the collect directory of a site. As described in Section~\ref{sec: rungs}, there can be several sites per greenstone installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to.343 344 The following two sections describe how to create a collection f orm scratch, or how to import a greenstone 2 collection. Once a collection has been built, the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, there is a CGI command to reload a collection which can also load a new one. Use the cgiarguments \gst{a=s\&sa=a\&st=collection\&sn=collname}---this tells the library program to reload the collname collection.342 Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per greenstone installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to. 343 344 The following two sections describe how to create a collection from scratch, and how to import a greenstone 2 collection. Once a collection has been built, the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, there is a CGI command to reload a collection which can also load a new one. Use the CGI arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}---this tells the library program to reload the collname collection. 345 345 346 346 … … 350 350 351 351 how to build a collection, but none of the mechanisms of building. 352 talk a bit about config files? maybe just the parts that you use?? your changes should go into the next sections about configfiles, but they need to go here too.352 talk a bit about configuration files? maybe just the parts that you use?? your changes should go into the next sections about configuration files, but they need to go here too. 353 353 354 354 \subsubsection{Importing a greenstone 2 collection} … … 375 375 automatically. It also includes configuration information for any ServiceRacks needed by the collection. 376 376 377 The collection configuration file is where the collection designer (e g a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far. (Since collection building at this stage is still done using Greenstone2 perl scripts and the old \gst{collect.cfg} file, we have only defined the format for the parts of \gst{collectionConfig.xml} that are used by the runtime-system.)377 The collection configuration file is where the collection designer (e.g. a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far. (Since collection building at this stage is still done using Greenstone2 Perl scripts and the old \gst{collect.cfg} file, we have only defined the format for the parts of \gst{collectionConfig.xml} that are used by the runtime-system.) 378 378 379 379 Display elements for a collection or metadata for a document can be entered in any language---use lang='en' attributes to metadata elements to specify which language they are in. 380 380 381 config files need to be encoded in utf-8.381 configuration files need to be encoded in utf-8. 382 382 383 383 \begin{figure} … … 444 444 collection. The serviceRack names are Java classes that are loaded 445 445 dynamically at runtime. Any information inside the serviceRack element is 446 specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve, GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. The collectionConfig.xml file is also passed otthe ServiceRack objects at configure time---the \gst{format} and \gst{displayItem} information is used directly from the \gst{collectionConfig.xml} file rather than added into \gst{buildConfig.xml} during building. This enables changes in \gst{collectionConfig.xml} to take effect in the collection without rebuilding being necessary.446 specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This configuration file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve, GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. The collectionConfig.xml file is also passed to the ServiceRack objects at configure time---the \gst{format} and \gst{displayItem} information is used directly from the \gst{collectionConfig.xml} file rather than added into \gst{buildConfig.xml} during building. This enables changes in \gst{collectionConfig.xml} to take effect in the collection without rebuilding being necessary. 447 447 448 448 … … 503 503 In standard greenstone, the library is served to a web browser by a servlet, and the html is generated using XSLT. XSLT templates are used to format all the parts of the pages. Some commonly overwritten templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display. 504 504 505 Real XSL templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.505 Real XSLT templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document. 506 506 507 507 \begin{gsc}\begin{verbatim} … … 535 535 \gst{<gsf:link type='document'>...</gsf:link>} & Same as above\\ 536 536 \gst{<gsf:link type='classifier'>...</gsf:link>} & A link to a classification node (use in classifierNode templates)\\ 537 \gst{<gsf:link type='source'>...</gsf:link>} & The HTML link to the original file---set for documents that have been converted from e gWord, PDF, PS \\537 \gst{<gsf:link type='source'>...</gsf:link>} & The HTML link to the original file---set for documents that have been converted from e.g. Word, PDF, PS \\ 538 538 \gst{<gsf:icon/>} & An appropriate icon\\ 539 539 \gst{<gsf:icon type='document'/>} & same as above\\ 540 540 \gst{<gsf:icon type='classifier'/>} & bookshelf icon for classification nodes\\ 541 \gst{<gsf:icon type='source'/>} & An appropriate icon for the original file e gWord, PDF icon\\541 \gst{<gsf:icon type='source'/>} & An appropriate icon for the original file e.g. Word, PDF icon\\ 542 542 \gst{<gsf:metadata name='Title'/>} & The value of a metadata element for the current document or section, in this case, Title\\ 543 543 \gst{<gsf:metadata name='Title' select='select-type' [separator='y' multiple='true']/>} & A more extended selection of metadata values. The select field can be one of those shown in Table~\ref{tab:gsf-select-types}. There are two optional attributes: separator gives a String that will be used to separate the fields, default is ``, ``, and if multiple is set to true, looks for multiple values at each section.\\ … … 548 548 <gsf:metadata name='metaC'/> 549 549 </gsf:choose-metadata>} 550 & A choice of metadata. Will select the first existing one. the metadata elements can have the select, separator and multiple attrib tues like normal.\\550 & A choice of metadata. Will select the first existing one. the metadata elements can have the select, separator and multiple attributes like normal.\\ 551 551 \gst{<gsf:switch preprocess='preprocess-type'> 552 <gsf:metadata name='Title'/><gsf:when test='test-type' test-value='xxx'>.....</gsf:when><gsf:when test='test-type' test-value='xxx'>...</gsf:when><gsf:otherwise>...</gsf:otherwise></gsf:switch>} & switch on the value of a particular metadata - the metadata is specified in gsf:metadata, has the same att s as normal.\\552 <gsf:metadata name='Title'/><gsf:when test='test-type' test-value='xxx'>.....</gsf:when><gsf:when test='test-type' test-value='xxx'>...</gsf:when><gsf:otherwise>...</gsf:otherwise></gsf:switch>} & switch on the value of a particular metadata - the metadata is specified in gsf:metadata, has the same attributes as normal.\\ 553 553 \end{tabular} 554 554 \end{table} … … 558 558 Sometimes you may want to display metadata values for sections other than the current one. For example, in the mgppdemo collection, in a search list we display the Title of all the enclosing sections, followed by the Title of the current section, all separated by semi-colons. The display ends up looking something like: 559 559 Farming snails 2; Starting out; Selecting your snails 560 where Selecting your snails is the Title of the section in the results list, and Farming snails 2 and Starting out are the Titles of t eh enclosing sections. The select attribute is used to display metadata for sections other than the current one. Table~\ref{tab:gsf-select-types} shows the options available for this attribtue. The separator attribute is used here also, to specify the separating text.560 where Selecting your snails is the Title of the section in the results list, and Farming snails 2 and Starting out are the Titles of the enclosing sections. The select attribute is used to display metadata for sections other than the current one. Table~\ref{tab:gsf-select-types} shows the options available for this attribute. The separator attribute is used here also, to specify the separating text. 561 561 562 562 To get the previous metadata, the format statement would have the following in it: … … 588 588 This will display the dls.Title metadata if available, otherwise it will use the dc.Title metadata if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed. 589 589 590 The gsf:switch element allows different formatting depending on the value of a specified metadata element. For example, the foll woignswitch statement could be used to display a different icon for each document in a list depending on which organisation it came from.590 The gsf:switch element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organisation it came from. 591 591 592 592 \begin{gsc} … … 605 605 606 606 607 If none of the gsf elements meets your needs for formatting, xsltcan be entered directly into the format element, giving the collection designer full flexibility over how the collection appears.608 609 The collection specific templates are added into the configuration file \gst{collectionConfig.xml}. Any templates found in the xsltfiles can be overwritten.610 The important part to ad ing templates into the configfile is determining where to put them. Formatting templates cannot go just anywhere---there are standard places for them. Figure~\ref{fig:format-places} shows the positions that templates can occur.607 If none of the gsf elements meets your needs for formatting, XSLT can be entered directly into the format element, giving the collection designer full flexibility over how the collection appears. 608 609 The collection specific templates are added into the configuration file \gst{collectionConfig.xml}. Any templates found in the XSLT files can be overwritten. 610 The important part to adding templates into the configuration file is determining where to put them. Formatting templates cannot go just anywhere---there are standard places for them. Figure~\ref{fig:format-places} shows the positions that templates can occur. 611 611 612 612 \begin{figure} … … 645 645 </collectionConfig> 646 646 \end{verbatim}\end{gsc} 647 \caption{Places for format stat ments}647 \caption{Places for format statements} 648 648 \label{fig:format-places} 649 649 \end{figure} … … 660 660 661 661 There are also formatting instructions that are not templates but are options. 662 These are described in Table~\ref{tab:format_options}. They are entered into the config file like \gst{<gsf:option name='coverImages' value='false'/>}662 These are described in Table~\ref{tab:format_options}. They are entered into the configuration file like \gst{<gsf:option name='coverImages' value='false'/>} 663 663 664 664 \begin{table} … … 676 676 \end{table} 677 677 678 Note, format templates are added into the xslt files before transforming, while the options are added into the page source, and used in tests in the xslt.679 680 For local collections\footnote{and eventually remote collections} whole xslt files can be overridden. A collection can have a transform directory. Any xsltfiles in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different about page for the collection, you can put a new about.xsl into the collections transform directory, and this will be used instead. This is what we do for the Gutenberg sample collection.678 Note, format templates are added into the XSLT files before transforming, while the options are added into the page source, and used in tests in the XSLT. 679 680 For local collections\footnote{and eventually remote collections} whole XSLT files can be overridden. A collection can have a transform directory. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different about page for the collection, you can put a new about.xsl into the collections transform directory, and this will be used instead. This is what we do for the Gutenberg sample collection. 681 681 682 682 683 683 \subsection{Customising the interface} 684 684 685 The interface can be customi zed in several ways.685 The interface can be customised in several ways. 686 686 adding a new interface, adding a new language, \\ 687 687 changing the look and feel for an interface vs a site vs a collection\\ … … 693 693 The interface language can be changed by going to the preferences page, and choosing a language from the list. The list lists (:-)) all languages in which the interface has been defined so far. 694 694 695 It is easy to add a new interface language to greenstone. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where name is the interface name). Each service class has one with the same name as the class (e gGS2Search.properties). To add another language all of the base .properties files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties.695 It is easy to add a new interface language to greenstone. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where name is the interface name). Each service class has one with the same name as the class (e.g. GS2Search.properties). To add another language all of the base .properties files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties. 696 696 697 697 Keys will be looked up in the properties file closest to the specified language. For example, if language fr\_CA was specified (french language, country Canada), and the default locale was en\_GB, java would look at properties files in the following order, until it found the key: XXX\_fr\_CA.properties, XXX\_fr.properties, XXX\_en\_GB.properties, then XXX\_en.properties, and finally the default XXX.properties. … … 710 710 A new interface may be needed if different instantiations of the library require different interfaces, or different developers want their own look and feel. Creating a new interface will allow modifications to be made while leaving the original one intact. 711 711 712 A new interface needs a directory in \$GSDL3HOME/web/interfaces, the name of this directory becomes the interface name. Inside, it needs images and transform directories, and an interfaceConfig.xml file. Any xslt may be overridden for a new interface by putting the replacement in the new transform directory. If the appropriate xslt file is not there, the one from the default interface will be used - this enables just overriding a few xsltfiles as needed.713 714 To use a new interface, the tomcat web.xml must be edited: either change the interface that a current version of the servlet is using, or add another servlet instantiation to the file (see Section~\ref{sec: tomcat}). The Tomcat server must be restarted for this to take effect.712 A new interface needs a directory in \$GSDL3HOME/web/interfaces, the name of this directory becomes the interface name. Inside, it needs images and transform directories, and an interfaceConfig.xml file. Any XSLT may be overridden for a new interface by putting the replacement in the new transform directory. If the appropriate XSLT file is not there, the one from the default interface will be used - this enables just overriding a few XSLT files as needed. 713 714 To use a new interface, the tomcat web.xml must be edited: either change the interface that a current version of the servlet is using, or add another servlet instantiation to the file (see Section~\ref{sec:sites-and-ints} or Appendix~\ref{app:tomcat}). The Tomcat server must be restarted for this to take effect. 715 715 716 716 … … 739 739 {\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site. 740 740 741 {\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e g all the building services may be part of a cluster. What is part of a cluster is specified by the site configfile. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection.741 {\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, e.g. all the building services may be part of a cluster. What is part of a cluster is specified by the site configuration file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection. 742 742 Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different. 743 743 744 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, e gsearching, retrieving documents, building collections etc.744 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, e.g. searching, retrieving documents, building collections etc. 745 745 746 746 {\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer. 747 747 748 {\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to t eh appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT for example. There is a hierarchy of different REceptionist types, which is described in Section~\ref{sec:recepts}.749 750 {\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the ' cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in more detail in Section~\ref{sec:pagegen}.748 {\em Receptionist}: this is the point of contact for the 'front end'. Its core functionality involves routing requests to the Actions, but it may do more than that. For example, a Receptionist may: modify the request in some way before sending it to the appropriate Action; add some data to the page responses that is common to all pages; transform the response into another form using XSLT for example. There is a hierarchy of different Receptionist types, which is described in Section~\ref{sec:recepts}. 749 750 {\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'CGI' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is returned to the Receptionist, which may transform it to HTML. The various actions are described in more detail in Section~\ref{sec:pagegen}. 751 751 752 752 … … 760 760 761 761 The \gst{init()} method creates a new Receptionist and a new 762 MessageRouter. Default classes (DefaultReceptionist, MessageRouter) are used unless subclasses have been specified in the servlet initiation parameters (see Section~\ref{sec: tomcat}). The appropriate system variables are set for each object (interface762 MessageRouter. Default classes (DefaultReceptionist, MessageRouter) are used unless subclasses have been specified in the servlet initiation parameters (see Section~\ref{sec:sites-and-ints}). The appropriate system variables are set for each object (interface 763 763 name, site name, etc.) and then \gst{configure()} is called on both. The MessageRouter handle 764 764 is passed to the Receptionist. The servlet then communicates only with … … 766 766 767 767 The Receptionist reads in the \gst{interfaceConfig.xml} file, and loads up all the different Action classes. Other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too. 768 If the Receptionist is a TransformingReceptionist, a mapping between shortnames and xsltfile names is also created.768 If the Receptionist is a TransformingReceptionist, a mapping between shortnames and XSLT file names is also created. 769 769 770 770 The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. It creates a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.). 771 Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.771 Each ServiceRack specified in the configuration file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module. 772 772 ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList. 773 773 For each site specified, the MessageRouter creates an appropriate type of Communicator object. Then it tries to get the site description. If the server for the remote site is up and running, this should be successful. The site will be added to the mapping with its site name as a key. The site's collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure-sites commands must be sent (see next section). … … 777 777 The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml} 778 778 files, determines the metadata, and loads ServiceRack classes based on the 779 names specified in \gst{buildConfig.xml\/}. The \gst{<serviceRack>} XML element is passed to the object to be used in configuration. The \gst{collectionConfig.xml} contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file.780 Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList xml.779 names specified in \gst{buildConfig.xml\/}. The \gst{<serviceRack>} XML element is passed to the object to be used in configuration. The \gst{collectionConfig.xml} contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection configuration file. 780 Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList XML. 781 781 782 782 … … 784 784 \subsection{Message passing} 785 785 786 Action in Greenstone 3 is originated by a request coming in from the outside. In the standard web-based greenstone, this comes from a servlet into the receptionist. This external type request is a request for a page of data, and contains a representation of the cgi style args. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request. Messages inside the system all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned.787 788 When a page request comes in to t eh REceptionist, it looks at the action attribtue to determine which action to send it to. The response is returned from the action.The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). THe data may be transformed in some way --- for the servlet greenstone we transform using xsltto generate html pages which get returned to the servlet.789 790 Actions send internal style messages to the M EssageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text786 Action in Greenstone 3 is originated by a request coming in from the outside. In the standard web-based greenstone, this comes from a servlet into the receptionist. This external type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request. Messages inside the system all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned. 787 788 When a page request comes in to the Receptionist, it looks at the action attribute to determine which action to send it to. The response is returned from the action.The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the servlet greenstone we transform using XSLT to generate html pages which get returned to the servlet. 789 790 Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text 791 791 There are different request types: describe, process, system... 792 792 … … 798 798 799 799 request: 800 These are the special 'external'-style messages. Requests originate from outside Greenstone, for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the ' cgi' arguments in a Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.800 These are the special 'external'-style messages. Requests originate from outside Greenstone, for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters. 801 801 802 802 Here are some examples of requests\footnote{In a servlet context, these correspond to the URLs \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}: … … 845 845 & & but no processing of the results is done \\ 846 846 & & currently only used in process actions \\ 847 o & output type & xml, html, wml\\847 o & output type & XML, html, WML \\ 848 848 l & language & en, fr, zh ...\\ 849 849 d & document id & HASHxxx \\ … … 949 949 950 950 This collection provides many typical services. Notice how this response lists the services available, while the collection configuration file for this collection (Figure~\ref{fig:collconfig}) described serviceRacks. Once the service racks have been configured, they become transparent in the system, and only services are referred to. 951 There are three document retrieval services, for structural information, metadata, and content. The Classifier services retrieve classification structure and metadata. These five services were all provided by the GS2MGPPRetrieve ServiceRack. T He three query services were provided by GS2MGPPSearch serviceRack, ansprovide different kinds of query interface. The last service, PhindApplet, is provided by the PhindPhraseBrowse serviceRack and is an applet service.951 There are three document retrieval services, for structural information, metadata, and content. The Classifier services retrieve classification structure and metadata. These five services were all provided by the GS2MGPPRetrieve ServiceRack. The three query services were provided by GS2MGPPSearch serviceRack, and provide different kinds of query interface. The last service, PhindApplet, is provided by the PhindPhraseBrowse serviceRack and is an applet service. 952 952 953 953 A \gst{describe} request sent to a service returns a list of parameters that … … 1117 1117 \subsubsection{'system'-type messages}\label{sec:system} 1118 1118 1119 ``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently they are initiated by particular cgiparameters (see Section~\ref{sec:runtime-config}).1119 ``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. Currently they are initiated by particular CGI parameters (see Section~\ref{sec:runtime-config}). 1120 1120 1121 1121 The basic format of a system request is as follows: … … 1167 1167 1168 1168 The actual format statements are described in Section~\ref{sec:formatstmt}. They are templates written directly in XSLT, or in GSF, which stands for Greenstone Format, and is a simple XML representation of the more complicated XSLT templates. 1169 GSF style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format xml is transformed to xslt using xsltwith the config\_format.xsl stylesheet.1169 GSF style format statements need to be converted to proper XSLT. This is currently done by the Receptionist (but may be moved to an ActionHelper): the format XML is transformed to XSLT using XSLT with the config\_format.xsl stylesheet. 1170 1170 1171 1171 \subsection{'status'-type messages}\label{sec:status} … … 1185 1185 COMPLETED & 11 & the process has finished \\ 1186 1186 HALTED & 12 & the process has stopped \\ 1187 INFO & 20 & just an info message that doesn t imply anything \\1187 INFO & 20 & just an info message that doesn't imply anything \\ 1188 1188 \end{tabular} 1189 1189 \end{table} 1190 1190 1191 The following shows an example status request, along with two responses, the first a ' okbut continuing' response, and the second a 'successfully completed' response. The content of the status elements in the two responses is the output from the process since the last status update was sent back.1191 The following shows an example status request, along with two responses, the first a 'OK but continuing' response, and the second a 'successfully completed' response. The content of the status elements in the two responses is the output from the process since the last status update was sent back. 1192 1192 1193 1193 \begin{quote}\begin{gsc}\begin{verbatim} … … 1223 1223 \end{verbatim}\end{gsc}\end{quote} 1224 1224 1225 \subsubsection{process mes ages}1226 1227 Process requests and responses provide the major functionality of t ehsystem---these are the ones that do the actual work. The format depends on the service they are for, so I'll describe these by service.1225 \subsubsection{process messages} 1226 1227 Process requests and responses provide the major functionality of the system---these are the ones that do the actual work. The format depends on the service they are for, so I'll describe these by service. 1228 1228 1229 1229 Query type services TextQuery, FieldQuery, AdvancedFieldQuery (GS2MGSearch, GS2MGPPSearch), TextQuery (LuceneSearch) … … 1252 1252 Some requests have other content---for document retrieval, this would be a list of document identifiers to retrieve. For metadata retrieval, the content is the list of documents to retrieve metadata for. 1253 1253 1254 Responses vary depending on the type of request. The following sections look at hte process type requests and responses for each type of service.1254 Responses vary depending on the type of request. The following sections look at the process type requests and responses for each type of service. 1255 1255 1256 1256 \subsubsection{'query'-type services} … … 1302 1302 \end{verbatim}\end{gsc}\end{quote} 1303 1303 1304 The list of document identifiers includes some information about document type and node type. Currently, document types include \gst{simple}, \gst{paged} and \gst{hierarchy}. \gst{simple} is for single section documents, i.e. ones with no sub-structure. \gst{paged} is documents that have a single list of sections, while \gst{hierarchy} type documents have a hierarchy of nested sections. For \gst{paged} and \gst{hierarchy} type documents, the node type identifies wh ather a section is the root of the document, an internal section, or a leaf.1305 1306 The term list identifies, for each term in t ehquery, what its frequency in the collection is, how many documents contained that term, and a list of its equivalent terms (if stemming or casefolding was used).1304 The list of document identifiers includes some information about document type and node type. Currently, document types include \gst{simple}, \gst{paged} and \gst{hierarchy}. \gst{simple} is for single section documents, i.e. ones with no sub-structure. \gst{paged} is documents that have a single list of sections, while \gst{hierarchy} type documents have a hierarchy of nested sections. For \gst{paged} and \gst{hierarchy} type documents, the node type identifies whether a section is the root of the document, an internal section, or a leaf. 1305 1306 The term list identifies, for each term in the query, what its frequency in the collection is, how many documents contained that term, and a list of its equivalent terms (if stemming or casefolding was used). 1307 1307 1308 1308 \subsubsection{'browse'-type services} … … 1345 1345 \subsubsection{'retrieve'-type services} 1346 1346 1347 Retrieval services are special in that requests are not explici lty initiated by a user from a form on a web page, but are called from actions in response to other things. This means that their names are hard-coded into the Actions. DocumentContentRetrieve, DocumentStructureRetrieve and DocumentMetadataRetrieve are the standard names for retrieval services for content, structure, and metadata of documents. Requests to each of these include a list of document identifiers. Because these generally refer to parts of documents, the elements are called \gst{<documentNode>}. For the content, that is all that is required. For the metadata retrieval service, the request also needs parameters specifying what metadata is required. For structure retrieval services, requests need parameters specifying what structure or structural info is required.1347 Retrieval services are special in that requests are not explicitly initiated by a user from a form on a web page, but are called from actions in response to other things. This means that their names are hard-coded into the Actions. DocumentContentRetrieve, DocumentStructureRetrieve and DocumentMetadataRetrieve are the standard names for retrieval services for content, structure, and metadata of documents. Requests to each of these include a list of document identifiers. Because these generally refer to parts of documents, the elements are called \gst{<documentNode>}. For the content, that is all that is required. For the metadata retrieval service, the request also needs parameters specifying what metadata is required. For structure retrieval services, requests need parameters specifying what structure or structural info is required. 1348 1348 1349 1349 Some example requests and responses follow. … … 1448 1448 \end{verbatim}\end{gsc}\end{quote} 1449 1449 1450 Structure is returned inside a \gst{<nodeStructure>} element, while structural info is returned in a \gst{<nodeStructureInfo>} element. Possible values for str cuture parameters are as for browse services: \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}.1450 Structure is returned inside a \gst{<nodeStructure>} element, while structural info is returned in a \gst{<nodeStructureInfo>} element. Possible values for structure parameters are as for browse services: \gst{ancestors}, \gst{parent}, \gst{siblings}, \gst{children}, \gst{descendents}. Possible values for info parameters are \gst{numSiblings}, \gst{siblingPosition}, \gst{numChildren}. 1451 1451 1452 1452 \subsubsection{'process'-type services}\label{sec:process} … … 1477 1477 \end{verbatim}\end{gsc}\end{quote} 1478 1478 1479 The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of t eh status element is (currenlty) just the output from the process so far. Status messages, which are described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not.1479 The \gst{code} attribute in the response specifies whether the command has been successfully stated, whether its still going, etc (see Table~\ref{tab:status codes} for a list of currently used codes). The pid attribute specifies a process id number that can be used when querying the status of this process. The content of the status element is (currently) just the output from the process so far. Status messages, which are described in Section~\ref{sec:status}, are used to find out how the process is going, and whether it has finished or not. 1480 1480 1481 1481 \subsubsection{'applet'-type services} 1482 1482 1483 Applet-type services are those that process the data for an applet. A request consists only of a list of parameters, and the response contains an \gst{<appletData>} element that contains the XML data to be returned to t ehe applet. The format of this is entirely specific to the applet---there is no set format to the applet data.1483 Applet-type services are those that process the data for an applet. A request consists only of a list of parameters, and the response contains an \gst{<appletData>} element that contains the XML data to be returned to the applet. The format of this is entirely specific to the applet---there is no set format to the applet data. 1484 1484 1485 1485 Here is an example request and response, used by the Phind applet: … … 1568 1568 * talk general first: get data, get format info, transform gsf->xsl. transfrom xml->html 1569 1569 1570 * state saving. the xslt files assume that args are saved somehow. This needs to be implemented outside Greenstone proper - we do this in the servlet, using something or other.1571 1572 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system.1570 * state saving. the XSLT files assume that arguments are saved somehow. This needs to be implemented outside Greenstone proper - we do this in the servlet, using something or other. 1571 1572 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the CGI-arguments to determine what requests need to be made to the system. 1573 1573 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module. 1574 1574 … … 1583 1583 \end{verbatim}\end{gsc}\end{quote} 1584 1584 1585 * show config and describe whats its used for1586 1587 There are two main elements in the page: pageRequest, pageResponse. The pageRequest is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The pageResponse contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (e glibrary)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.1585 * show configuration and describe whats its used for 1586 1587 There are two main elements in the page: pageRequest, pageResponse. The pageRequest is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The pageResponse contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (e.g. library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization. 1588 1588 1589 1589 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. 1590 1590 1591 1591 1592 Once the xmlpage has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are1592 Once the XML page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are 1593 1593 located in interfaces/default/transforms. Collections, sites and other interfaces 1594 1594 can override these files by having their own copy of the appropriate … … 1601 1601 The receptionist is the controlling module for the page generation part of greenstone. It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages. 1602 1602 1603 There are different ways of providing an interface to greenstone, from web based cgistyle (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.1604 1605 Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they don t do anything.1606 1607 TransformingReceptionist: This extends Receptionist, and overwrites postProcessPage to transform the page using xslt. An xslt is listed for each action in the receptionists config file, and this is used to transform the page. First, some display information, and config information is added to the page. Then it is transformed using the specified xsltfor the action, and returned.1608 1609 WebReceptionist: The WebReceptionist extends TransformingR Eceptionist. It doesn't do much else except some argument conversion. To keep the url's short, parameters from the services are given shortnames, and these are used in the web pages.1610 1611 DefaultReceptionist: This extends WebReceptionist, and is the default one for greenstone 3 servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. T He receptionist sends a describe request to teh collection to get this, and appends it to teh page before transformation using xslt.1612 1613 NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from t ehClassifierBrowse service.1603 There are different ways of providing an interface to greenstone, from web based CGI style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist. 1604 1605 Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they don't do anything. 1606 1607 TransformingReceptionist: This extends Receptionist, and overwrites postProcessPage to transform the page using XSLT. An XSLT is listed for each action in the receptionists configuration file, and this is used to transform the page. First, some display information, and configuration information is added to the page. Then it is transformed using the specified XSLT for the action, and returned. 1608 1609 WebReceptionist: The WebReceptionist extends TransformingReceptionist. It doesn't do much else except some argument conversion. To keep the URLs short, parameters from the services are given shortnames, and these are used in the web pages. 1610 1611 DefaultReceptionist: This extends WebReceptionist, and is the default one for greenstone 3 servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. The receptionist sends a describe request to the collection to get this, and appends it to the page before transformation using XSLT. 1612 1613 NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from the ClassifierBrowse service. 1614 1614 1615 1615 By default, the LibraryServlet uses DefaultReceptionist. However, there is an init-param called receptionist which can be set to make the servlet use a different one. 1616 1616 1617 \subsubsection{ cgi args}1618 1619 T He args used by the page come from several sources. Receptionist uses a couple, actions use some and services. the receptionist and actions are treated as a whole so must not have conflicting args. GSParams class specifies all teh general basic args, and whether they should be saved or not. servlet has an init parameter params\_class, that specifies which params class to use - if subclass it. actions or receptionist may specify some new ones1620 1621 services may be created by different people, may be on a different site. cant g arantee no conflict with action params, or even with other services.1622 so service params are namespaced when they are put on the page. interface (recept and action) params wil have no namespace) the default namespace is s1 (service1) - any params that are for the service will be prefixed by this. eg the case paramfor a search will be put in the page as s1.case.1623 T He actions must now look for all the s1 params to send to tehservice.1624 1625 if there are two or more services combined on a page with a single submit button, they will use s1, s2, s3 etc as needed. the s param (service) will end up with a list eg s=TextQuery,MusicQuery, and the order of these determines the mapping order of teh namespaces, ie s1 will be TExtQuery, s2 MusicQuery.1626 1627 also talk abo tu saving args - save ones that GSParams says to save, and any service ones should always save.1617 \subsubsection{CGI arguments} 1618 1619 The arguments used by the page come from several sources. Receptionist uses a couple, actions use some and services. the receptionist and actions are treated as a whole so must not have conflicting arguments. GSParams class specifies all the general basic arguments, and whether they should be saved or not. servlet has an init parameter params\_class, that specifies which params class to use - if subclass it. actions or receptionist may specify some new ones 1620 1621 services may be created by different people, may be on a different site. cant guarantee no conflict with action params, or even with other services. 1622 so service params are namespaced when they are put on the page. interface (recept and action) params will have no namespace) the default namespace is s1 (service1) - any parameters that are for the service will be prefixed by this. e.g. the case parameter for a search will be put in the page as s1.case. 1623 The actions must now look for all the s1 parameters to send to the service. 1624 1625 if there are two or more services combined on a page with a single submit button, they will use s1, s2, s3 etc as needed. the s parameter (service) will end up with a list e.g. s=TextQuery,MusicQuery, and the order of these determines the mapping order of the namespaces, ie s1 will be TextQuery, s2 MusicQuery. 1626 1627 also talk about saving arguments - save ones that GSParams says to save, and any service ones should always save. 1628 1628 1629 1629 \subsubsection{Page action} … … 1637 1637 \subsubsection{Query action} 1638 1638 1639 The basic urlis \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}.1639 The basic URL is \gst{a=q\&s=TextQuery\&c=demo\&rt=d/r}. 1640 1640 There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action. 1641 1641 For each page, the service description is requested from the service of the current collection (via a describe request). This is currently done every time the query page is 1642 1642 displayed, but should be cached. The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers 1643 1643 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of 1644 documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the xslt that wil be used to transform the page (Formatter object??). The service description and query result are combined into a page of xml, which is1644 documents, with a request for some of their metadata. Which metadata to retrieve is determined by looking through the XSLT that will be used to transform the page (Formatter object??). The service description and query result are combined into a page of XML, which is 1645 1645 transformed using \gst{basicquery.xsl} to produce the html page. 1646 1646 … … 1690 1690 \subsubsection{System action}\label{sec:system-action} 1691 1691 1692 SystemAction allows for manual reconfiguration of various components at run-time. There is no interactive web-page displaying the options, it merely turns a set of cgi arguments into an xmlsystem request. The response from a system request is a message which is displayed to the user.1692 SystemAction allows for manual reconfiguration of various components at run-time. There is no interactive web-page displaying the options, it merely turns a set of CGI arguments into an XML system request. The response from a system request is a message which is displayed to the user. 1693 1693 1694 1694 \begin{table} 1695 \caption{Configure cgiarguments}1695 \caption{Configure CGI arguments} 1696 1696 \label{tab:system-cgi} 1697 1697 \begin{tabular}{ll} … … 1723 1723 \bf Utility class & \bf Description\\ 1724 1724 ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\ 1725 Dictionary & wrapper around a Resource Bundle, providing strings with parameter\\1726 GSCGI & class to map between short name cgi args and long name request parameters \\1727 GSFile & class to create all Greenstone file paths e g used to locate configuration files, xsltfiles and collection data. \\1728 GSHTML & provides convenience methods for dealing with HTML, e gmaking strings HTML safe\\1725 Dictionary & wrapper around a Resource Bundle, providing strings with parameter\\ 1726 GSCGI & class to map between short name CGI arguments and long name request parameters \\ 1727 GSFile & class to create all Greenstone file paths e.g. used to locate configuration files, XSLT files and collection data. \\ 1728 GSHTML & provides convenience methods for dealing with HTML, e.g. making strings HTML safe\\ 1729 1729 GSPath & used to create, examine and modify message address paths\\ 1730 1730 GSStatus & some static codes for status messages\\ … … 1745 1745 how building actually works\\ 1746 1746 the building structure/architecture\\ 1747 modules api\\1748 1749 \section{Developing Greenstone 3: Adding new features} 1747 modules API\\ 1748 1749 \section{Developing Greenstone 3: Adding new features}\label{sec:new-features} 1750 1750 1751 1751 \subsection{Creating new services} 1752 1752 1753 *inherit from ServiceRack - abstract base class. this handles the main process method, determines hte service name and request type. if request type is describe, and to is empty, it returns a list of services (short\_service\_info) which is initialised in the configure method. a describe request to a particular service results in getServiceDescription being called, which must be supplied by the subclass.1753 *inherit from ServiceRack - abstract base class. this handles the main process method, determines the service name and request type. if request type is describe, and to is empty, it returns a list of services (short\_service\_info) which is initialised in the configure method. a describe request to a particular service results in getServiceDescription being called, which must be supplied by the subclass. 1754 1754 other request types (process) get sent to processXXX methods, where XXX is the service name. 1755 1755 … … 1764 1764 1765 1765 \subsection{new interfaces} 1766 e g java interface. where you can interface to. MR vs Receptionist. diff receptionists. egs, handheld - usign servlet, transforming recpt, but new set of xslts java program other prpgram - talk to recpt but just get back XML data for pages. java gui - just talk to MR, do all processing itself.1766 e.g. java interface. where you can interface to. MR vs Receptionist. diff receptionists. egs, handheld - using servlet, transforming recpt, but new set of XSLT java program other program - talk to recpt but just get back XML data for pages. java gui - just talk to MR, do all processing itself. 1767 1767 1768 1768 \subsection{Adding new classifiers} … … 1777 1777 1778 1778 We have created a second interface that can be seen at \gst{http://www.greenstone.org/greenstone3/nzdl}. There are some small differences between this and the standard greenstone interface. 1779 We created a new interface---called nzdl, put into the web/interfaces directory. It has a set of images and transform files like the standard interface. And most of the xsltfiles have been overridden.1780 1781 * Along the navigation bar, it has search and classifiers. The standard interface has each service along there. We needed to modify the nav bar xsltcode, but also we added a new receptionist.1782 interface fou dnat www...1779 We created a new interface---called nzdl, put into the web/interfaces directory. It has a set of images and transform files like the standard interface. And most of the XSLT files have been overridden. 1780 1781 * Along the navigation bar, it has search and classifiers. The standard interface has each service along there. We needed to modify the navigation bar XSLT code, but also we added a new receptionist. 1782 interface found at www... 1783 1783 what did we have to do to get this interface? 1784 1784 classifiers displayed instead of services, query services all have same button, hard coded query page. … … 1792 1792 \centering 1793 1793 \includegraphics[width=4in]{remote} %5.8 1794 \caption{A distributed digital library configuration runnin gover several servers}1794 \caption{A distributed digital library configuration running over several servers} 1795 1795 \label{fig:remote} 1796 1796 \end{figure} … … 1801 1801 1802 1802 We have used Apache SOAP for Java. This is run as a servlet in Tomcat. 1803 If you have obtained Greenstone through cvs, you wil need to install soap separatelly, describe in Appendix~\ref{app:soap-cvs}. Debugging soap is described in Appendix~\ref{app:soap-debug}.1803 If you have obtained Greenstone through CVS, you will need to install soap separately, describe in Appendix~\ref{app:soap-cvs}. Debugging soap is described in Appendix~\ref{app:soap-debug}. 1804 1804 1805 1805 1806 1806 \appendix 1807 1807 1808 \section{Tomcat} 1808 \section{Tomcat}\label{app:tomcat} 1809 1809 1810 1810 Tomcat is a servlet container. It is used to serve a Greenstone site using a servlet. … … 1840 1840 We have set up tomcat to disallow directory listings for everything in the docBase directory. To turn this back on, you need to edit Tomcat's default web.xml file (\$GSDL3HOME/comms/jakarta/tomcat/conf/web.xml): 1841 1841 1842 In the default servlet definition, change the 'listings' param to true.1843 1844 Tomcat uses a Manager to handle HTTP session information. This may be stored between restarts if possible. To use a persist ant session handling manager, uncomment the \gst{<Manager>} element in \gst{\$GSDL3HOME/comms/jakarta/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory: \gst{\$GSDL3HOME/comms/jakarta/tomcat/work/Standalone/localhost/gsdl3/SESSIONS.ser}. Delete this file to clear the cached session info.1842 In the default servlet definition, change the 'listings' parameter to true. 1843 1844 Tomcat uses a Manager to handle HTTP session information. This may be stored between restarts if possible. To use a persistent session handling manager, uncomment the \gst{<Manager>} element in \gst{\$GSDL3HOME/comms/jakarta/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory: \gst{\$GSDL3HOME/comms/jakarta/tomcat/work/Standalone/localhost/gsdl3/SESSIONS.ser}. Delete this file to clear the cached session info. 1845 1845 1846 1846 \subsection{Proxying tomcat with apache} … … 1857 1857 \end{gsc}\end{quote} 1858 1858 1859 In our example, THe greenstone 3 servlet can be accessed at \gst{http://www.greenstone.org/greenstone3/library}, instead of at \gst{http://puka.cs.waikato.ac.nz:8080/gsdl3/library}, which is not publically accessible.1859 In our example, the greenstone 3 servlet can be accessed at \gst{http://www.greenstone.org/greenstone3/library}, instead of at \gst{http://puka.cs.waikato.ac.nz:8080/gsdl3/library}, which is not publically accessible. 1860 1860 1861 1861 \subsection{Running tomcat behind a proxy} … … 1872 1872 \end{gsc}\end{quote} 1873 1873 1874 This unpacks the soap distribution, adds a SOAP context to Tomcat's server.xml config file, and creates the file \gst{src/java/org/greenstone/gsdl3/SOAPServer.java} from \gst{src/java/org/greenstone/gsdl3/SOAPServer.java.in} (it has a place where gsdl3home needs to be added).1874 This unpacks the soap distribution, adds a SOAP context to Tomcat's server.xml configuration file, and creates the file \gst{src/java/org/greenstone/gsdl3/SOAPServer.java} from \gst{src/java/org/greenstone/gsdl3/SOAPServer.java.in} (it has a place where gsdl3home needs to be added). 1875 1875 It also tries to deploy the SOAP service, but this often doesn't work. You may need to run from a shell the following command: 1876 1876 … … 1925 1925 *** need to make sure building stuff is in here *** 1926 1926 1927 Greenstone 3 is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes. Whats in CVS is quite different to what comes in a release. T He code needs to be compiled, and some files need editing...1927 Greenstone 3 is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes. Whats in CVS is quite different to what comes in a release. The code needs to be compiled, and some files need editing... 1928 1928 1929 1929 To check out the greenstone code, use: … … 1936 1936 If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some older versions of CVS have trouble accessing this repository due to the port number being present. We are using version 1.11.1p1. 1937 1937 1938 The software needs to be compiled and installed. The installation procedure has been semi-automated. The following sections describe installation under linux and windows.1938 The software needs to be compiled and installed. The installation procedure has been semi-automated. The following sections describe installation under Linux and windows. 1939 1939 1940 1940 \subsection{Linux install} … … 1960 1960 1961 1961 You shouldn't run install.bash twice. 1962 To update your installation, you can run update.bash - this updates your code from CVS, and re makes all the java stuff.1962 To update your installation, you can run update.bash - this updates your code from CVS, and re-makes all the java stuff. 1963 1963 1964 1964 \subsection{Windows install}
Note:
See TracChangeset
for help on using the changeset viewer.