Changeset 13893


Ignore:
Timestamp:
2007-02-12T11:57:40+13:00 (17 years ago)
Author:
kjdon
Message:

updated first two sections

Location:
trunk/gsdl3/docs/manual
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r13284 r13893  
    88 
    99\newcommand{\gst}[1]{{\footnotesize \tt #1}}
    10 \newcommand{\gsdlhome}{\$GSDL3HOME}
    1110
    1211\newcommand{\gsii}{Greenstone 2}
     
    5756A description of the general design and architecture of \gsiii\  is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the docs/manual directory).
    5857
    59 This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customizations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
     58This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customizations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
    6059\newpage
    6160\tableofcontents
     
    6564This section covers where to get \gsiii\  from, how to install it and how to run it. The standard method of running \gsiii\  is as a Java servlet. We provide the Tomcat servlet container to run the servlet. Standard web servers may  be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access \gsiii, Tomcat must be started up, and then it can be accessed via a web browser.
    6665
    67 Ant (Java's XML based build tool) is used for compilation, installation and running Greenstone. The build.xml file is the configuration file for the Greenstone project, and build.properties contains parameters that can be altered by the user.
    68 
    69 \subsection{Get and install \gs\ }
     66Ant (Java's XML based build tool) is used for compilation, installation and running Greenstone. The \gst{build.xml} file is the configuration file for the Greenstone project, and \gst{build.properties} contains parameters that can be altered by the user.
     67
     68\subsection{Get and install \gs\ }\label{sec:getandinstall}
    7069
    7170\gsiii\  is available for download from Sourceforge:\\
    72  \gst{https://sourceforge.net/projects/greenstone3}. There are Windows, Linux and Mac OS X releases. They consist of a ZIP/TAR file which should be unpacked. Please check and edit (if necessary) the installation properties in build.properties, then run 'ant install' in the greenstone3 directory. Please read the file README.txt for more detailed (and up to date) instructions.
    73 
    74 Greenstone 3 can be started by running 'ant start', and will be available at \gst{http://localhost:8080/greenstone3}\\
     71 \gst{https://sourceforge.net/projects/greenstone3}. There are Windows, Linux, and source releases. The binary releases are self-installing executables: download and run the file to install. A series of prompts will guide you through the installation process. The source release is a gzip'd tar file. Unzip and untar this, check build.properties, then run \gst{'ant install'} to configure and compile the code.
     72
     73The Greenstone 3 library can be launched by running the server program. This is accessible from the Start menu on Windows, or by running the \gst{gs3-server.sh/bat} script in the top level \gst{greenstone3} directory. This program will start up the Tomcat web server and launch a browser.
     74
     75Alternatively, you can start it up using Ant: run \gst{'ant start'}, which starts up Tomcat, then in a browser go to \gst{http://localhost:8080/greenstone3}\\
    7576(or \gst{http://your-computer-name:your-chosen-port/greenstone3}). \\
    76 This gets you to a welcome page containing  links to four servlets: the \gst{test} servlet (this allows you to check that Tomcat is running properly); the standard \gst{library} servlet which serves \gst{localsite} site with the \gst{default} interface; the \gst{classic} servlet which serves \gst{localsite} using the \gst{classic} or \gsii-style interface; the \gst{gateway} servlet, which serves \gst{gateway} site with the \gst{default} interface. The \gst{gateway} site uses a SOAP connection to communicate with \gst{localsite}, and demonstrates the library working in a distributed fashion.
     77This gets you to a welcome page containing links to four servlets: the \gst{test} servlet (this allows you to check that Tomcat is running properly); the standard \gst{library} servlet which serves \gst{localsite} site with the \gst{default} interface; the \gst{classic} servlet which serves \gst{localsite} using the \gst{classic} or \gsii-style interface; and the \gst{gateway} servlet, which serves \gst{gateway} site with the \gst{default} interface. The \gst{gateway} site uses a SOAP connection to communicate with \gst{localsite}, and demonstrates the library working in a distributed fashion. The SOAP connection is not enabled by default - to enable it, run \gst{'ant deploy-localsite'}.
    7778
    7879\gsiii\  is also available through CVS (Concurrent Versioning System). This provides the latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install \gsiii\  from CVS.
     
    8081\subsection{How the library works}
    8182
    82 The standard library program is a Java servlet. We use the Tomcat servlet container to present the servlets over the web. Tomcat takes CGI-style URLs and passes the arguments to the servlet, which processes these and returns a page of HTML. As far as an end-user is concerned, a servlet is a Java version of a CGI program. The interaction is similar: access is via a web browser,  using arguments in a URL.
     83The standard library program is a Java servlet. We use the Tomcat servlet container to present the servlets over the web. Tomcat takes CGI-style URLs and passes the arguments to the servlet, which processes these and returns a page of HTML. As far as an end-user is concerned, a servlet is a Java version of a CGI program. The interaction is similar: access is via a web browser, using arguments in a URL.
    8384
    8485Other types of interfaces can be used, such as Java GUI programs. See Section~\ref{sec:new-interfaces} for details about how to make these.
     
    8687\subsubsection{Restarting the library}
    8788
    88 The library program (actually Tomcat and MYSQL) can be restarted by running \gst{ant restart} in the greenstone3 directory.
    89 
    90 Tomcat must be restarted any time you make changes in the following for those changes to take effect:\\
     89You can restart Tomcat by clicking 'Restart Server' on the little server program. You should restart the server any time you make changes in the following for those changes to take effect:\\
    9190\begin{bulletedlist}
    9291\begin{gsc}
    93 \item \gsdlhome/web/WEB-INF/web.xml
    94 \item \gsdlhome/packages/tomcat/conf/server.xml
     92\item \$GSDL3HOME/WEB-INF/web.xml
     93\item \$GSDL3SRCHOME/packages/tomcat/conf/server.xml
    9594\end{gsc}
    9695\item any classes or jar files used by the servlets
    9796\end{bulletedlist}
    98 \noindent Note: stdout and stderr for the servlets (on Linux and Mac OS X) both go to\\
    99 \gst{\gsdlhome/packages/tomcat/logs/catalina.out}
    10097
    10198
    10299\subsection{Directory structure}
    103100
    104 Table~\ref{tab:dirs} shows the file hierarchy for \gsiii\ .
     101Table~\ref{tab:dirs} shows the file hierarchy for \gsiii.
    105102The first part  shows the common stuff which can be shared between
    106 \gs\  users---the source, libraries etc. The second part shows the file hierarchy for the greenstone3/web directory, which comprises the greenstone3 context for Tomcat, and is accessible via Tomcat. The main directories are for sites and interfaces: there can be several sites and interfaces per installation, and they are described in the following section.
    107 
     103\gs\  users---the source, libraries etc. The second part shows the file hierarchy for the web directory, which comprises the greenstone3 context for Tomcat, and is accessible via Tomcat. The main directories are for sites and interfaces: there can be several sites and interfaces per installation, and they are described in the following section.
     104
     105Two environment variables used by \gsiii\ are often mentioned in this manual: \gst{\$GSDL3SRCHOME} and \gst{\$GSDL3HOME}. \gst{\$GSDL3SRCHOME} refers to the top-level \gst{greenstone3} directory, while \gst{\$GSDL3HOME} refers to the \gst{web} directory. The web directory contains everything needed to serve the \gsiii\ library using Tomcat, and doesn't necessarily need to live with the rest of the \gsiii\ source.
    108106
    109107\begin{table}
     
    116114\hline
    117115greenstone3
    118   & The main installation directory---gsdl3home can be changed to something more standard\\
     116  & The main installation directory---\$GSDL3SRCHOME is set to this directory \\
    119117greenstone3/src
    120118  & Source code lives here \\
     
    122120  & main greenstone 3 java source code \\
    123121greenstone3/src/packages
    124   & Imported source packages from other systems e.g. MG, MGPP \\
    125 greenstone3/extensions
    126   & Extensions to greenstone 3 core functionality, e.g., Vishnu visualizer, Alerting service \\
     122  & Imported source packages from other systems e.g. indexing packages may go here \\
    127123greenstone3/lib
    128124  & Shared library files\\
     
    138134  & executable stuff lives here\\
    139135greenstone3/bin/script
    140   & some Perl and/or shell building scripts\\
     136  & some Perl and/or shell scripts\\
    141137greenstone3/packages
    142   & External packages that may be installed as part of greenstone, e.g. Tomcat, MySQL \\
     138  & External packages that may be installed as part of greenstone, e.g. Tomcat \\
    143139greenstone3/docs
    144140  & Documentation\\
     141greenstone3/gli
     142  & \gs\ Librarian Interface code \\
     143greenstone3/gs2build
     144  & collection building code\\
    145145\hline
    146146greenstone3/web
    147   & This is where the web site is defined. Any static HTML files can go here. This directory is the Tomcat root directory.\\
     147  & This is where the web site is defined. Any static HTML files can go here. This directory is the root directory used by Tomcat when serving \gsiii. \$GSDL3HOME is set to this directory. \\
    148148greenstone3/web/WEB-INF
    149149  & The web.xml file lives here (servlet configuration information for Tomcat)\\
     
    168168greenstone3/web/interfaces/default/images
    169169  & The images for the default interface\\
     170greenstone3/web/interfaces/default/js
     171  & The javascript libraries for the default interface\\
     172greenstone3/web/interfaces/default/style
     173  & The CSS stylesheets for the default interface\\
    170174greenstone3/web/interfaces/default/transforms
    171175  & The XSLT files for the default interface\\
     
    179183\subsection{Sites and interfaces}\label{sec:sites-and-ints}
    180184
    181 [local gs stuff (sites and interfaces) vs installed stuff (code)\\
    182 where they live, whats the difference, what each contains.]\\
    183185Sites and interfaces contain the content and presentation information, respectively, for the digital library.
    184186A site is comprised of a set of collections and possibly some site-wide services. An interface (in this web-based servlet context) is a set of images along with a set of XSLT files used for translating xml output from the library into an appropriate form---HTML in general.
     
    186188One \gsiii\  installation can have many sites and interfaces, and these can be paired in different combinations.  One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance.  For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. e.g. HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
    187189
    188 Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialized will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
     190Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialized will be loaded up. Public collections will appear on the library home page, while private collections will be hidden. These can still be accessed by typing in cgi arguments. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
    189191
    190192There are two  sites that come with the distribution: \gst{localsite}, and \gst{gateway}. \gst{localsite} has several demo  collections, while \gst{gateway} has none. \gst{gateway} specifies that a SOAP connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
     
    195197\subsection{Configuring Tomcat}\label{sec:tomcat-config}
    196198
    197 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the configuration information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
    198 There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\  library servlets described in Section~\ref{sec:browser-access}, \gst{library}, \gst{classic} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. Site\_name and interface\_name are just two examples of initialization parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
     199The file \gst{\$GSDL3HOME/WEB-INF/web.xml} contains the configuration information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
     200There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\  library servlets described in Section~\ref{sec:getandinstall}, \gst{library}, \gst{classic} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. \gst{site\_name} and \gst{interface\_name} are just two examples of initialization parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
    199201
    200202For more details about Tomcat see Appendix~\ref{app:tomcat}.
     
    204206\label{tab:serv-init}
    205207{\footnotesize
    206 \begin{tabular}{llp{5cm}}
     208\begin{tabular}{lp{3.5cm}p{6cm}}
    207209\hline
    208210\bf name & \bf sample value & \bf description \\
     
    210212library\_name & library & the web name of the servlet \\
    211213interface\_name & default & the name of the interface to use\\
    212 site\_name & localsite & the name of the site to use (use either site\_name or the three remote\_site parameters)\\
     214site\_name & localsite & the name of the local site to use (use either site\_name or the three remote\_site parameters)\\
    213215remote\_site\_name & org.greenstone.site1 & the name of a remote site (can be anything??) \\
    214216remote\_site\_type & soap & the type of server running on the site \\
    215 remote\_site\_address & http://www.greenstone.org/greenstone3/services/localsite & The address of the server \\
     217remote\_site\_address & http://www.greenstone.org/ greenstone3/services/ localsite & The address of the server \\
    216218default\_lang & en & the default language for the interface\\
    217219receptionist\_class & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\
     
    224226\subsection{Configuring a \gs\ library}\label{sec:config}
    225227
    226 Initial \gsiii\  system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
     228Initial \gsiii\  system configuration is determined by a set of XML configuration files. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies parameters for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
    227229The configuration files are read in when the system is initialized, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a system command to the library.  There are a series of system commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
    228230
    229231\subsubsection{Site configuration file}\label{sec:siteconfig}
    230232
    231 The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of
     233The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any \gst{ServiceClusters} that the site provides (for example, collection building), any \gst{ServiceRacks} that do not belong to a cluster or collection, and a list of
    232234known external sites to connect to.  Collections are not specified in the site
    233235configuration file, but are determined by the contents of the site's
    234 collections directory.
     236collect directory.
    235237
    236238The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible through Tomcat, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat greenstone3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}.
     
    279281\end{figure}
    280282
     283Another element that can appear in a site configuration file is \gst{replaceList}. This must have an \gst{id} attribute, and may contain one or more \gst{replace} elements. Replace elements are discussed in Section \ref{sec:collconfig}. The list found in a \gst{siteConfig.xml} file can be applied to any collection by adding a \gst{replaceListRef} element (with the appropriate \gst{id} attribute) to its \gst{collectionConfig.xml} file.
     284
    281285\subsubsection{Interface configuration file}\label{sec:interfaceconfig}
    282286
    283 The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library URLs for the a (action) parameter) e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
    284 
    285 It also lists all the languages that the interface text files have been translated into. These have a \gst{name} attribute, which is the ISO code for the language, and a \gst{displayElement} which gives the language name in that language (note that this file should be encoded in UTF-8). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a \gsiii\  library are shown in Section~\ref{sec:interface-customise}.
    286 
     287The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library URLs for the a (action) parameter) e.g. QueryAction should use \gst{a=q}. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
     288
     289It also lists all the languages that the interface text files have been translated into. These have a \gst{name} attribute, which is the ISO code for the language, and a \gst{displayElement} which gives the language name in that language (note that this file should be encoded in UTF-8). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a \gsiii\  library are shown in Section~\ref{sec:interface-language}.
     290
     291An \gst{optionList} element can be used to disable or enable some optional functionality for the interface. Currently there are three options that can be enabled:
     292
     293\begin{tabular}{lp{7cm}}
     294highlightQueryTerms & Whether search term highlighting is available or not\\
     295berryBaskets & Whether berry basket functionality is available or not\\
     296displayAnnotationService & Whether any annotation services (specified in the site config file) should be displayed with a document or not. \\
     297\end{tabular}
     298
     299An interface may be based on an existing one, for example, the classic interface is based on the default interface. This means that it will use any images or templates from the base one unless overridden in the current one. The \gst{baseInterface} attribute of the \gst{<interfaceConfig>} element is used to specify the base interface.
     300 
    287301\begin{figure}
    288302\begin{gsc}\begin{verbatim}
     
    294308      <subaction name='help' xslt='help.xsl'/>
    295309      <subaction name='pref' xslt='pref.xsl'/>
     310      <subaction name='nav' xslt='nav.xsl'/><!-- used for the
     311            collection header frame -->
     312      <subaction name="html" xslt="html.xsl"/> <!-- used to put an
     313            external page into a frame with a collection header-->
    296314    </action>
    297315    <action name='q' class='QueryAction' xslt='basicquery.xsl'/>
     
    305323    <action name='pr' class='ProcessAction' xslt='process.xsl'/>
    306324    <action name='s' class='SystemAction' xslt='system.xsl'/>
     325    <action name='g' class='GeneralAction'>
     326      <subaction name="berry" xslt='berry.xsl'/>
     327    </action>
    307328  </actionList>
    308329  <languageList>
     
    317338    </language>
    318339  </languageList>
     340  <optionList>
     341    <option name="highlightQueryTerms" value="true"/>
     342    <option name="berryBaskets" value="true"/>
     343  </optionList>
    319344</interfaceConfig>
    320345\end{verbatim}\end{gsc}
     
    332357Currently, the runtime configuration commands can only be accessed by typing arguments into the URL; there is no nice web form yet to do this.
    333358
    334 The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in \gs, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
     359The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
    335360
    336361\begin{table}
     
    338363\label{tab:run-time config}
    339364{\footnotesize
    340 \begin{tabular}{lp{8cm}}
     365\begin{tabular}{lp{9cm}}
    341366\hline
    342367\gst{a=s\&sa=c} & reconfigures the whole site. Reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument \gst{ss} (system subset). The valid values are \gst{collectionList}, \gst{siteList}, \gst{serviceList}, \gst{clusterList}. \\
     
    360385Browsing involves navigating pre-defined hierarchies of documents, following links of interest to find documents. The hierarchies may be constructed on different metadata fields, for example, alphabetical lists of Titles, or a hierarchy of Subject classifications. Clicking on a bookshelf icon takes you to a lower level in the hierarchy, while clicking on a book or page icon takes you to a document.
    361386
    362 In the standard interface that comes with \gsiii\ \footnote{of course, this is all customizable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
     387In the standard interface that comes with \gsiii\ \footnote{of course, this is all customizable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner for a collection looks something like that shown in Figure~\ref{fig:page-banner}.
    363388
    364389\begin{figure}[h]
     
    377402\subsection{Building a collection}\label{sec:buildcol}
    378403
    379 There are three ways to get a new collection into \gsiii. The first is to build it using the \gsiii\ command line building process. The second way is to use the Greenstone Librarian Interface to build a new collection. This creates a collection in a \gsiii\ context, but uses the \gsii\ Perl collection building process. The third way is to import a pre-built \gsii\  collection.
    380 
    381 Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\  installation. The collect directory is at \gst{\$GSDL3HOME/web/sites/site-name/collect}, where site-name is the name of the site you want your new collection to belong to.
    382 
    383 The following three sections describe how to create a collection from scratch, using command line and GLI building, and how to import a \gsii\  collection. Once a collection has been built (and is located in the collect directory), the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{and eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, an activate collection command can be issued to the servlet, using the arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}, where \gst{collname} should be replaced with the collection name---this tells the library program to (re)load the \gst{collname} collection.
    384 
    385 
    386 \subsubsection{Creating a collection from scratch}
    387 
    388 To create the director
    389 Building native \gsiii\  collections is done using the \gst{gs3-build.sh/bat} script, with the \gst{collectionConfig.xml} file controlling how the building is done.  There are a number of considerations in building a collection:  what documents appear in the collection, how they are indexed for searching, which classifications are used for browsing, etc.
    390 
    391 Firstly, the documents that comprise the collection should be placed in the import subdirectory.  At present, only documents in this directory will appear in the collection. Documents can be organized into sub folders inside the import directory.
    392 [TODO: describe the kinds of documents that can be added, something about METS files?]
    393 
    394 Metadata for documents can be added using metadata.xml files.  These files have already been used in \gsii, and the format is the same in \gsiii.  A metadata.xml file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the metadata.xml, or in one of its child directories, will be selected.  The filename tag encloses the regular expression as text, e.g.:
     404There are three ways to get a new collection into \gsiii. The most common way is to use the Greenstone Librarian Interface to create a collection. If you have existing collections in a \gsii\ installation, these can be imported into \gsiii. Thirdly, you can use the Perl command line building scripts directly.
     405
     406Collections live in the \gst{collect} directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\  installation. The collect directory is at \gst{\$GSDL3HOME/sites/site-name/collect}, where site-name is the name of the site you want your new collection to belong to.
     407
     408The following three sections briefly  describe how to create a collection using GLI, how to import a collection from \gsii, and how to use command line building.  Once a collection has been built (and is located in the collect directory), the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{and eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, an activate collection command can be issued to the servlet, using the arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}, where \gst{collname} should be replaced with the collection name---this tells the library program to (re)load the \gst{collname} collection.
     409
     410\subsubsection{Using the Librarian Interface}
     411
     412The Greenstone Librarian Interface (GLI) can be used to create collections. The procedure is the same as for \gsii, but it works in a \gsiii\  context. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 Digital Library menu in the Program Files section of the Start menu. On Linux, run \gst{ant gli} from the \gst{greenstone3} directory, or run \gst{./gli4gs3.sh} from the \gst{\$GSDL3SRCHOME/gli} directory.
     413
     414Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use \gsiii\ XML  configuration files.}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
     415
     416The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates within a single site---you can edit, delete, and create new collections within this site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences-$>$Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
     417
     418Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\  configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ configuration files. Clicking the Preview Collection button will re-run the configuration file conversion script. If you change anything on the Format panel, you will need to click Preview Collection. Just reloading the collection via a browser will not be enough.
     419 
     420Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the Greenstone 2 User's Guide (\gst{GS2-User-en.pdf}). This can be found in  your \gsii\ installation, or in the \gst{\$GSDL3SRCHOME/docs/manual} directory if you have installed \gsiii\ from a distribution.
     421
     422
     423\subsubsection{Importing from \gsii}
     424
     425Pre-built \gsii\ collections can also be used in \gsiii. The collection folder should be copied to the collect directory of the site it is to appear in (or a symbolic link may be used if possible).
     426The \gsiii\  run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new \gst{collectionConfig.xml} and \gst{buildConfig.xml} from the old \gst{collect.cfg} and \gst{build.cfg} files. It does not change the collection in any way, so it can still be used by \gsii\  software.
     427
     428The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, make sure you have run \gst{source setup.bash} (or \gst{setup} in Windows) in the \gst{\$GSDL3SRCHOME/gs2build} directory (as well as running the standard \gst{gs3-setup} command). Then you need to specify the path to the collect directory and the collection name as parameters to the conversion script. For example,
     429
     430\begin{gsc}
     431\begin{verbatim}
     432convert_coll_from_gs2.pl -collectdir
     433   $GSDL3HOME/sites/localsite/collect gs2mgdemo
     434\end{verbatim}
     435\end{gsc}
     436%$
     437The script attempts to create \gsiii\  format statements from the old \gsii\  ones. The conversion may not always work properly, so if the collection looks a bit strange under \gsiii, you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
     438
     439Once again, to have the collection recognized by the library servlet, you can either restart Tomcat, or load it dynamically.
     440
     441\subsubsection{Using command line building}
     442
     443This is the same procedure as for \gsii\ command line building, with the addition of a final step to create the \gsiii\ configuration files. The basic steps are (for a new collection called testcol):
     444
     445Linux:
     446
     447\begin{gsc}
     448\begin{verbatim}
     449cd greenstone3
     450source gs3-setup.sh
     451cd gs2build
     452source setup.bash
     453cd ../
     454mkcol.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
     455put source documents and metadata into
     456            $GSDL3HOME/sites/localsite/collect/testcol/import
     457edit $GSDL3HOME/sites/localsite/collect/testcol/etc/collect.cfg as
     458            appropriate
     459import.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
     460buildcol.pl -collectdir $GSDL3HOME/sites/localsite/collect testcol
     461rename the $GSDL3HOME/sites/localsite/collect/testcol/building
     462            directory to index
     463convert_coll_from_gs2.pl -collectdir $GSDL3HOME/sites/localsite/collect
     464            testcol
     465%$
     466\end{verbatim}
     467\end{gsc}
     468
     469Windows:
     470\begin{gsc}
     471\begin{verbatim}
     472cd greenstone3
     473gs3-setup
     474cd gs2build
     475setup
     476cd ..
     477perl -S mkcol.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
     478put source documents and metadata into
     479            %GSDL3HOME%\sites\localsite\collect\testcol\import
     480edit %GSDL3HOME%\sites\localsite\collect\testcol\etc\collect.cfg as
     481            appropriate
     482perl -S import.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
     483perl -S buildcol.pl -collectdir %GSDL3HOME%\sites\localsite\collect testcol
     484rename the %GSDL3HOME%\sites\localsite\collect\testcol\building directory
     485            to index
     486perl -S convert_coll_from_gs2.pl -collectdir
     487            %GSDL3HOME%\sites\localsite\collect testcol
     488\end{verbatim}
     489\end{gsc}
     490
     491Once the build process is complete, Tomcat should be prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
     492
     493Metadata for documents can be added using \gst{metadata.xml} files.  A \gst{metadata.xml} file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the \gst{metadata.xml} file, or in one of its child directories, will be selected.  The filename tag encloses the regular expression as text, e.g.:
    395494
    396495\begin{gsc}\begin{verbatim}
     
    398497\end{verbatim}\end{gsc}
    399498
    400 This would match any file containing the text 'example' in its name.  The second part of the \gst{<FileSet>} item is a \gst{<Description>} item.  The \gst{<Description>} tag has no attributes, but encloses one or more \gst{<Metadata>} tags.  Each \gst{<Metadata>} tag contains one metadata item, i.e. a label to describe the metadata and a corresponding value.  The \gst{<Metadata>} tag has one compulsory attribute: ``name''.  This attribute gives the metadata label to add to the document.  Each \gst{<Metadata>} tag also has an optional attribute: ``mode''.  If this attribute is set to ``accumulate'' then the value is added to the document, and any existing values for that metadata item are retained.  If the attribute is set to ``set'' or is omitted, then any existing value of the metadata item will be deleted.
     499This would match any file containing the text 'example' in its name.  The second part of the \gst{<FileSet>} item is a \gst{<Description>} item.  The \gst{<Description>} tag has no attributes, but encloses one or more \gst{<Metadata>} tags.  Each \gst{<Metadata>} tag contains one metadata item, i.e. a label to describe the metadata and a corresponding value.  The \gst{<Metadata>} tag has one compulsory attribute: \gst{'name'}.  This attribute gives the metadata label to add to the document.  Each \gst{<Metadata>} tag also has an optional attribute: \gst{'mode'}.  If this attribute is set to \gst{'accumulate'} then the value is added to the document, and any existing values for that metadata item are retained.  If the attribute is set to \gst{'set'} or is omitted, then any existing value of the metadata item will be deleted.
    401500
    402501\begin{figure}
     
    443542
    444543Figure~\ref{fig:metadatafile} shows an example metadata.xml file.
    445 Here, only one file pattern is found in each file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the \gst{mode=accumulate} attribute.  This means that when the title is assigned to a document, its existing \gst{Title} information will be lost.
    446 
    447 The basic means of finding documents in \gs\  is search. Options for building the search indexes include which indexer to use, what granularity to use for the indexes (e.g. whether to index documents as a whole, or sections of documents), what content the index should have (the whole text of the document or one or many metadata fields).  Section-level indexes allow a reader to recall part of a document (for instance, a chapter) rather than the entire document.  However, \gsiii\  must be able to identify the internal structure of the document to achieve this.  The degree to which structure can be found varies from file format to file format.
    448 
    449 An alternative means of finding documents is through browsing. Greenstone can create pre-defined browsing hierarchies based on document metadata. Each browsing structure is called a classifier. Options for building classifiers include what type of classifier to use (linear list or multi-level hierarchy), what metadata to build the classifier on, e.g. Title, Author etc.
    450 
    451 The collectionConfig.xml file controls the all of these options for collection building, and the format is described in Section~\ref{sec:collconfig}.
    452 
    453 To build a collection, place the source documents and optional metadata.xml file(s) in the import directory, place the \gst{collectionConfig.xml} file in the etc directory, and execute \gst{gs3build.sh/bat sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have MySQL running before you start building---running \gst{ant start} will start up the MySQL server as well as tomcat.
    454 
    455 Once the build process is complete, the building directory should be renamed to index (after deleting or renaming the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
    456 
    457 \subsubsection{Using the Librarian Interface}
    458 
    459 The Greenstone Librarian Interface (GLI) can be used to create \gsii\ style collections for \gsiii. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 Digital Library menu in the Program Files section of the Start menu. On Linux, run \gst{./gli4gs3.sh} from the \gst{greenstone3/gli} directory.
    460 
    461 Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use native \gsiii\ configuration files and collection building}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
    462 
    463 The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates within a single site---you can edit, delete, create new collections within this site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences-$>$Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
    464 
    465 Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\  configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ configuration files. You can either rebuild the collection through the GLI (may take a while), or run the conversion script directly (see following section).
    466  
    467 Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the Greenstone 2 User's Guide (\gst{GS2-User-en.pdf}. This can be found in  your \gsii\ installation, or in the greenstone3/docs/manual directory if you have installed \gsiii\ from a distribution.
    468 
    469 
    470 \subsubsection{Importing a \gsii\  collection}
    471 
    472 
    473 Pre-built \gsii\ collections can also be used in \gsiii\footnote{For information about the \gsii\  software, and how to build collections using it, visit \gst{www.greenstone.org}}. The collection folder should be copied to the collect directory of the site it is to appear in (or a symbolic link may be used if possible).
    474 The \gsiii\  run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new collectionConfig.xml and buildConfig.xml from the old collect.cfg and build.cfg files. It does not change the collection in any way, so it can still be used by \gsii\  software.
    475 
    476 The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, make sure you have run \gst{source setup.bash} (or \gst{setup} in Windows) in your top-level gsdl directory of the \gsii\  installation (as well as running the standard \gst{gs3-setup} command). Then you need to specify the path to the collect directory and the collection name as parameters to the conversion script. For example,
    477 
    478 \begin{gsc}
    479 \begin{verbatim}
    480 convert_coll_from_gs2.pl -collectdir
    481    $GSDL3HOME/web/sites/localsite/collect demo
    482 \end{verbatim}
    483 \end{gsc}
    484 %$
    485 The script attempts to create \gsiii\  format statements from the old \gsii\  ones. The conversion may not always work properly, so if the collection looks a bit strange under \gsiii\ , you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
    486 
    487 Once again, to have the collection recognized by the library servlet, you can either restart Tomcat, or load it dynamically.
     544Here, only one file pattern is found in each file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the \gst{mode=accumulate} attribute.  This means that when this title is assigned to a document, any existing \gst{Title} information will be lost.
     545
    488546
    489547\subsection{Collection configuration files}\label{sec:collconfig}
    490548
    491 Each collection has two, or possibly three, configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, and optionally \gst{collectionInit.xml}, that give metadata, display and other information for the
    492 collection.\footnote{For collections imported from \gsii, \gst{collectionConfig.xml} and \gst{buildConfig.xml}are generated from \gst{collect.cfg} and \gst{build.cfg}.}  The first includes user-defined presentation metadata for the collection,
    493 such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives
    494 instructions on how the collection is to be built.  The second is produced by
    495 the build-time process and includes any metadata that can be determined
    496 automatically. It also includes configuration information for any ServiceRacks needed by the collection.
     549Each collection has two, or possibly three, \gsiii\ configuration files, \\
     550\gst{collectionConfig.xml}, \gst{buildConfig.xml}, and optionally \gst{collectionInit.xml}, that give metadata, display and other information for the
     551collection. Currently, \gst{collectionConfig.xml} and \gst{buildConfig.xml} are generated from \gst{collect.cfg} and \gst{build.cfg}. At some stage, the collection building process and the Librarian Interface will be modified to use these files directly.
     552\gst{collect.cfg} and/or \gst{collectionConfig.xml} includes user-defined presentation metadata for the collection, such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives instructions on how the collection is to be built. \gst{build.cfg} and/or \gst{buildConfig.xml} are produced by the build-time process and include any metadata that can be determined automatically. It also includes configuration information for any ServiceRacks needed by the collection.
    497553
    498554All the configuration files should be encoded using UTF-8.
     555
     556The format of \gst{collect.cfg} and \gst{build.cfg} are not discussed here. Please see the \gsii\ manuals for more information regarding these files.
    499557
    500558\subsubsection{collectionInit.xml}
     
    510568\subsubsection{collectionConfig.xml}
    511569
    512 The collection configuration file is where the collection designer (e.g. a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far.
    513 
    514 Display elements for a collection or metadata for a document can be entered in any language---use lang='en' attributes to metadata elements to specify which language they are in.
     570The collection configuration file is where the collection designer (e.g. a librarian) decides what form the collection should take. So far this file only includes the presentation aspects needed by the run-time system. Instructions for collection building have yet to be defined. Presentation aspects include collection metadata such as title and description, display text for indexes, and format statements for search results, classifiers etc. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} shows the parts of it that have been defined so far.
     571
     572Display elements for a collection can be entered in any language---use \gst{lang='en'} attributes to specify which language they are in.
    515573
    516574\begin{figure}
     
    520578  <metadataList>
    521579    <metadata name="creator">[email protected]</metadata>
     580    <metadata name="public">true</metadata>
    522581  </metadataList>
    523582  <displayItemList>
     
    528587    <displayItem name='smallicon' lang='en'>gs3mgdemo_sm.gif</displayItem>
    529588  </displayItemList>
    530   <recognise>
    531       <docType name="HTML"/><docType name="Text"/>
    532       <docType name="Metadata"/><docType name="JPEG"/>
    533   </recognise>         
    534   <search type="mg" name="mgsearch">
    535     <index name="sectext">
    536       <field>text</field>
    537       <level>section</level>
     589  <search>
     590    <index name="ste">
    538591      <displayItem name='name' lang="en">chapters</displayItem>
    539592      <displayItem name='name' lang="fr">chapitres</displayItem>
     
    548601  </search>
    549602  <browse>
    550     <classifier name="CLTit" type="AZList" horizontalAtTop='true'>
    551       <field>Title</field>
    552       <sort>Title</sort>
     603    <classifier name="CL1" horizontalAtTop='true'>
    553604      <displayItem name='name' lang='en'>Titles</displayItem>
    554605    </classifier>
    555606    [... more classifiers ...]
    556     <classifier name="CLKeyword" type="Hierarchy">
    557       <field>Keyword</field>
    558       <sort>Title</sort>
     607    <classifier name="CL4">
    559608      <displayItem name='name' lang='en'>HowTo</displayItem>
    560       <file URL="keyword.xml"/>
    561609      <format>
    562610        <gsf:template match="documentNode">
     
    569617    <format>
    570618      <gsf:option name="coverImages" value="false"/>
    571       <!--<gsf:option name="documentTOC" value="false"/>-->
     619      <gsf:option name="documentTOC" value="false"/>
    572620    </format>
    573621  </display>
    574622</collectionConfig>
    575623\end{verbatim}\end{gsc}
    576 \caption{Sample collectionConfig.xml file (gs3mgdemo collection)}
     624\caption{Sample collectionConfig.xml file}
    577625\label{fig:collconfig}
    578626\end{figure}
    579627
    580 The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These displayItem elements can be specified in different languages.
     628The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These \gst{displayItem} elements can be specified in different languages.
    581629 
    582 The \gst{<search>} element specifies what indexes should be built, and provides some display and formatting information for each one. Search has an attribute, \gst{type}, which specifies which indexer to be used for indexing. Currently, \gst{mg} and \gst{mgpp}[??] are available. If type is not specified, mg is used. Multiple search elements may be specified, if more than one indexer is to be used. (Note, this is not yet recognized by the run-time system.)
    583 
    584 Search indexes appear as individual \gst{<index>} elements within the \gst{<search>} element. Some choices for the index are made using attributes of the element itself, and some through child elements. 
    585 
    586 Each index must have a unique name, which is used to identify it within \gsiii\   The name is given as an attribute of the \gst{<index>} element. 
    587 
    588 The other choices are described using child elements of \gst{<index>}.  The \gst{<level>} tag indicates the index level and the \gst{<field>} tag the text to be used.  The \gst{<level>} tag can contain one of document, section or paragraph, while the \gst{<field>} tag can contain ``text'' or the name of a metadata field.  If the \gst{<level>} tag is omitted, the default setting is to index by document, and if the \gst{<field>} tag is omitted, the default setting is to index the document text.
    589 
    590 Example index specifications include:
    591 
    592 [NOTE: I think we shouldn't have default level and field and that it must be specified--kjdon]
    593 
    594 To index only the title of each separate document in the collection:
    595 \begin{gsc}\begin{verbatim}
    596 <index name="dtt">
    597   <level>document</level>
    598   <field>dc:title</field>
    599 </index>
    600 \end{verbatim}\end{gsc}
    601 ...in this case the \gst{<field>} tag refers to the ``title'' metadata item, found in the Dublin Core namespace.  The MG search engine would be used on this index.
    602 
    603 Alternatively, to index the full document texts by section:
    604 \begin{gsc}\begin{verbatim}
    605 <index name="stx">
    606   <level>section</level>
    607 </index>
    608 \end{verbatim}\end{gsc}
    609 ...or...
    610 \begin{gsc}\begin{verbatim}
    611 <index name="stx">
    612   <level>section</level>
    613   <field>text</field>
    614 </index>
    615 \end{verbatim}\end{gsc}
    616 ...in the first example, the \gst{<field>} tag is not explicitly defined, and would default to 'text', whereas it is explicitly set to 'text' in the second example. As they are of the same name, they should not appear in the same \gst{collectionConfig.xml} file.
    617 
    618 Moving onto \gst{<classifier>} items, the format is broadly similar to \gst{<index>} items, but with a couple of different choices.  Firstly, each classifier should have ``name'' and ``type'' attributes.  In the case of \gst{<classifier>} items the ``type'' attribute identifies the type of classifier it is.  At present, this should either be ``Hierarchy'' or ``AZList''. 
    619 
    620 The remaining choices for the classifier should follow as child elements of the \gst{<classifier>} element.  The \gst{<file>} element should contain the name of the file that describes the classifier as its ``URL'' attribute.  The format of this file varies from classifier type to classifier type.  The \gst{<field>} element identifies the name of the field to index.  More than one \gst{<field>} element may appear if two or more metadata fields are to be used with the classifier.  Finally, the \gst{<sort>} item identifies another metadata field which the items within one classifier node are to be ordered.  Unlike the \gst{<index>} element, the \gst{<classifier>} element does not have default, assumed values for its children.
    621 
    622 Figure~\ref{fig:hierarchyfile} shows the format of the file for a Hierarchy classifier. [TODO add a  description]
    623 \begin{figure}
    624 \begin{gsc}\begin{verbatim}
    625 <Hierarchy>
    626   <Classification>
    627     <Name>ACCU</Name>
    628     <Path>1</Path>
    629     <Description>ACCU</Description>
    630   </Classification>
    631   <Classification>
    632     <Name>Agenda 21</Name>
    633     <Path>2</Path>
    634     <Description>Agenda 21</Description>
    635   </Classification>
    636   <Classification>
    637     <Name>FAO</Name>
    638     <Path>3</Path>
    639     <Description>FAO</Description>
    640     <Children>
    641       <Classification>
    642         <Name>FAO Better Farming series</Name>
    643         <Path>3.1</Path>
    644         <Description>FAO Better Farming Series</Description>
    645       </Classification>
    646     </Children>
    647   </Classification>
    648 </Hierarchy>
    649 \end{verbatim}\end{gsc}
    650 \caption{Sample Hierarchy classifier file}
    651 \label{fig:hierarchyfile}
    652 \end{figure}
    653 
    654 Inside the \gst{<search>} and \gst{<browse>} elements, \gst{<displayItem>} elements are used to provide titles for the indexes or classifiers, while \gst{<format>} elements provide formatting instructions, typically for a document or classifier node in a list of results. Placing the \gst{<format>} instructions at the top level in the search or browse element will apply the format to all the indexes or classifiers, while placing it inside an individual index or classifier element will restrict that formatting instruction to that item.
    655 
    656 The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading}, \gst{DocumentContent}. Other formatting options may also be specified here, such as whether to display a table of contents and/or cover image for the documents.
     630The \gst{<search>} element provides some display and formatting information for the search indexes, while the \gst{<browse>} element concerns classifiers, and   the \gst{<display>} element looks at document display.
     631
     632Inside the \gst{<search>} and \gst{<browse>} elements, \gst{<displayItem>} elements are used to provide titles for the indexes or classifiers, while \gst{<format>} elements provide formatting instructions, typically for a document or classifier node in a list of results. Placing the \gst{<format>} instructions at the top level in the \gst{search} or \gst{browse} element will apply the format to all the indexes or classifiers, while placing it inside an individual \gst{index} or \gst{classifier} element will restrict that formatting instruction to that item.
     633
     634The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading} and \gst{DocumentContent}. Other formatting options may also be specified here, such as whether to display a table of contents and/or cover image for the documents.
    657635
    658636Format elements are described in Section~\ref{sec:formatstmt}.
     
    669647\end{verbatim}\end{gsc}
    670648
    671 Scope determines on what text the replacements are carried out: text, metadata, or both (all). An empty scope attribute is equivalent to scope=all. Each replace type can be used with all scope values. Replacing uses Java's 'String.replaceAll' functionality, so macro and replacement text are actually regular expressions. The first example is a straight textual replacement. The second example uses dictionary lookups. xxx will be replaced with the (language-dependent) value for key zzz in resource bundle yyy. The third example uses metadata: xxx will be replaced by the value of the yyy metadata for that document.
     649Scope determines on what text the replacements are carried out: \gst{text}, \gst{metadata}, and \gst{all} (both text and metadata). An empty scope attribute is equivalent to scope=all. Each replace type can be used with all scope values. Replacing uses Java's 'String.replaceAll' functionality, so macro and replacement text are actually regular expressions. The first example is a straight textual replacement. The second example uses dictionary lookups. xxx will be replaced with the (language-dependent) value for key zzz in resource bundle yyy. The third example uses metadata: xxx will be replaced by the value of the yyy metadata for that document.
    672650
    673651Appendix~\ref{app:gs2replace} gives some examples that have been used for Greenstone 2 collections.
     
    679657It contains metadata and other information about the collection that can
    680658be determined automatically, such as the number of
    681 documents it contains.  It also includes a list of ServiceRack classes that are
     659documents in the collection.  It also includes a list of \gst{ServiceRack} classes that are
    682660required to provide the services that have been built into the
    683661collection.  The serviceRack names are Java classes that are loaded
     
    695673  <serviceRackList>
    696674    <serviceRack name="GS2Browse">
     675      <indexStem name="gs2mgppdemo"/>
    697676      <classifierList>
    698677        <classifier name="CL1" content="Title"/>
     
    703682    </serviceRack>
    704683    <serviceRack name="GS2MGPPRetrieve">
     684      <indexStem name="gs2mgppdemo"/>
    705685      <defaultLevel name="Sec" />
    706686    </serviceRack>
    707687    <serviceRack name="GS2MGPPSearch">
     688      <indexStem name="gs2mgppdemo"/>
    708689      <defaultLevel name="Sec" />
    709690      <levelList>
     
    722703        <searchType name="plain" />
    723704      </searchTypeList>
     705      <indexOptionList>
     706        <indexOption name="stemIndexes" value="3"/>
     707    <indexOption name="maxnumeric" value="4"/>
     708      </indexOptionList>
    724709      <defaultIndex name="idx" />
    725710      <indexList>
     
    728713    </serviceRack>
    729714  </serviceRackList>
     715</buildConfig>
    730716\end{verbatim}\end{gsc}
    731717\caption{Sample buildConfig.xml file (gs2mgppdemo collection)}
     
    737723Part of collection design involves deciding how the collection should look. \gsiii\  has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
    738724
    739 In standard \gsiii\ , the library is served to a web browser by a servlet, and the HTML is generated using XSLT. XSLT templates are used to format all the parts of the pages. These templates can be overridden by including them in the \gst{collectionConfig.xml} file. Some commonly overridden templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
     725In standard \gsiii, the library is served to a web browser by a servlet, and the HTML is generated using XSLT. XSLT templates are used to format all the parts of the pages. These templates can be overridden by including them in the \gst{collectionConfig.xml} file. Some commonly overridden templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
    740726
    741727Real XSLT templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.
     
    745731     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    746732  <xsl:param name="collName"/>
    747     <td><a href="{\$library_name}?a=d&amp;c={\$collName}&amp;
     733    <td><a href="{$library_name}?a=d&amp;c={$collName}&amp;
    748734           d={@nodeID}&amp;dt={@docType}"><xsl:value-of
    749735           select="metadataList/metadata[@name='Keyword']"/></a>
     
    760746\end{bulletedlist}
    761747 
    762 Since XSLT is written in XML, we can use XSLT to transform XML into XSLT. \gsiii\  provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. The user specifies a \gst{<gsf:template>} for what they want to format---these typically match \gst{documentNode} or \gst{classifierNode} (for node in a classification hierarchy).
     748We can use XSLT to transform XML into XSLT. \gsiii\  provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. The user specifies a \gst{<gsf:template>} for what they want to format---these typically match \gst{documentNode} or \gst{classifierNode} (for a node in a classification hierarchy).
    763749 
    764 The template at the start of this section can be represented as:
     750The template above can be represented as:
    765751 
    766752\begin{gsc}\begin{verbatim}
     
    770756\end{verbatim}\end{gsc}
    771757
    772 Table~\ref{tab:gsf-format} shows the set of 'gsf' (Greenstone Format) elements. If you have come from a \gsii\  background, Appendix~\ref{app:gs2format} shows \gsii\  format elements and their equivalents in \gsiii\ .
     758Table~\ref{tab:gsf-format} shows the set of \gst{'gsf'} (Greenstone Format) elements. If you have come from a \gsii\  background, Appendix~\ref{app:gs2format} shows \gsii\  format elements and their equivalents in \gsiii\ .
    773759 
    774760\begin{table}
     
    854840\end{table}
    855841
    856 The \gst{<gsf:choose-metadata} element selects the first available metadata value from the list of options.
     842The \gst{<gsf:choose-metadata>} element selects the first available metadata value from the list of options.
    857843\begin{gsc}
    858844\begin{verbatim}
     
    865851\end{gsc}
    866852
    867 This will display the dls.Title metadata if available, otherwise it will use the dc.Title metadata if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed.
     853This will display dls.Title if available, otherwise it will use dc.Title if available, otherwise it will use the Title metadata. If there are no values for any of these metadata elements, then nothing will be displayed.
    868854
    869855The \gst{<gsf:switch>} element allows different formatting depending on the value of a specified metadata element. For example, the following switch statement could be used to display a different icon for each document in a list depending on which organization it came from.
     
    946932\hline
    947933coverImages & true, false & whether or not to display cover images for documents \\
    948 TOC & true, false & whether or not to display the table of contents for the document\\
     934documentTOC & true, false & whether or not to display the table of contents for the document\\
    949935\hline
    950936\end{tabular}}
     
    952938
    953939Note, format templates are added into the XSLT files before transforming, while the options are added into the page source, and used in tests in the XSLT.
     940
    954941\subsubsection{Changing the service text strings}
    955942
    956943Each collection has a set of services which are the access points for the information in the collection. Each service has a set of text strings which are used to display it. These include name, description, the text on the submit button, and names and descriptions of all the parameters to the service.
    957944
    958 These text strings are found in .properties files, in greenstone3/resources/java. The names of the files are based on class names. Subclasses can defined their own properties, or can use their parent class ones. For example, AbstractSearch defines strings for the TextQuery service, in AbstractSearch.properties. GS2MGSearch just uses these default ones, so doesn't need its own property file.
    959 
    960 A particular collection can override the properties for any service. For example, if a collection uses the GS2MGSearch service rack (look in the buildConfig.xml file for a list of service racks used), and the collection builder wants to change the text associated with this service, they can put a GS2MGSearch.properties file in the resources directory of the collection.
    961 This will be used in preference to one in the default resources directory.
    962 Note that while changes in the default properties files seem to require a tomcat restart to take effect, changes in the collection specific properties files take effect immediately.
     945These text strings are found in \gst{.properties} files, in \gst{\$GSDL3HOME/WEB-INF/classes}. The names of the files are based on class names. Subclasses can define their own properties, or can use their parent class ones. For example, \gst{AbstractSearch} defines strings for the \gst{TextQuery} service, in \gst{AbstractSearch.properties}. \gst{GS2MGSearch} just uses these default ones, so doesn't need its own properties file.
     946
     947A particular collection can override the properties for any service. For example, if a collection uses the \gst{GS2MGSearch} service rack (look in the \gst{buildConfig.xml} file for a list of service racks used), and the collection builder wants to change the text associated with this service, they can put a \gst{GS2MGSearch.properties} file in the resources directory of the collection. After a reconfigure of the collection, this will be used in preference to the one in the default resources directory.
    963948
    964949\subsection{Customizing the interface}\label{sec:interface-customise}
     
    970955\subsubsection{Modifying an existing interface}
    971956
    972 Most of an interface is defined by XSLT files, which are stored in \gst{\$GSDL3HOME/\-web/\-interfaces/\-interface-name/\-transform}. These can be changed and the changes will take effect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following  order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same \gs\  installation as the interface.)
     957Most of an interface is defined by XSLT files, which are stored in \gst{\$GSDL3HOME/\-interfaces/\-interface-name/\-transform}. These can be changed and the changes will take effect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following  order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same \gs\  installation as the interface.)
    973958
    974959Sites and collections can have a transform directory, which is where customized XSLT files should go. Any XSLT files in here will be used in preference to the interface files when using this collection. For example, if you want to have a completely different layout for the about page of a collection, you can put a new \gst{about.xsl} file into the collection's \gst{transform} directory, and this will be used instead. This is what we do for the Gutenberg sample collection.
    975960
    976 This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the Java code and full paths added based on availability of the files, so that the correct one is used.
    977 
    978 Note that you cannot include a file with the same name as the including file. For example query.xsl cannot include query.xsl (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
    979 
    980 You can add the argument o=xml to any URL and you wil be returned the XML before transformation by a stylesheet. This shows you the XML page source. It can be useful when you are trying to write some new XSLT statements.
     961This also applies to files that are included from other XSLT files. For example the \gst{query.xsl} for the query pages includes a file called \gst{querytools.xsl}. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site \gst{transform} directory will work. Either the new \gst{query.xsl} will include the default \gst{querytools.xsl}, or the default \gst{query.xsl} will include the new \gst{querytools.xsl}. The \gst{xsl:include} directives are preprocessed by the Java code and full paths added based on availability of the files, so that the correct one is used.
     962
     963Note that you cannot include a file with the same name as the including file. For example \gst{query.xsl} cannot include \gst{query.xsl} (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
     964
     965You can add the argument \gst{o=xml} to any URL and you wil be returned the XML before transformation by a stylesheet. This shows you the XML page source. It can be useful when you are trying to write some new XSLT statements.
    981966
    982967\subsubsection{Defining a new interface}
     
    984969A new interface may be needed if different instantiations of the library require different interfaces, or different developers want their own look and feel. Creating a new interface will allow modifications to be made while leaving the original one intact.
    985970
    986 A new interface needs a directory in \gst{\$GSDL3HOME/web/interfaces}, the name of this directory becomes the interface name. Inside, it needs images and transform directories,  and an interfaceConfig.xml file. Any XSLT may be overridden for a new interface by putting the replacement in the new transform directory. If the appropriate XSLT file is not there, the  one from the default interface will be used - this enables just overriding a few XSLT files as needed.
    987 
    988 To use a new interface, the Tomcat web.xml must be edited: either change the interface that a current servlet instance is using, or add another servlet instantiation to the file (see Section~\ref{sec:sites-and-ints} or Appendix~\ref{app:tomcat}). The Tomcat server must be restarted for this to take effect.
    989 
    990 \subsubsection{Changing the interface language}
     971A new interface needs a directory in \gst{\$GSDL3HOME/interfaces}, the name of this directory becomes the interface name. Inside, it needs \gst{images} and \gst{transform} directories,  and an \gst{interfaceConfig.xml} file. The \gst{interfaceConfig.xml} file may specify a base interface, in which case the new interface only needs to define XSLT for the parts that are different. Otherwise, it will need a full set of XSLT files.
     972
     973To use a new interface, the \gst{\$GSDL3HOME/WEB-INF/web.xml} file must be edited: either change the interface that a current servlet instance is using, or add another servlet instantiation to the file (see Section~\ref{sec:sites-and-ints} or Appendix~\ref{app:tomcat}). The Tomcat server must be restarted for this to take effect.
     974
     975\subsubsection{Changing the interface language}\label{sec:interface-language}
    991976
    992977The interface language can be changed by going to the preferences page, and choosing a language from the list, which includes all languages into which the interface has been translated.
    993978
    994 It is easy to add a new interface language to \gs\ .  Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in \gst{resources/java}. Each interface has one named \gst{interface\_name.properties} (where `name' is the interface name). Each service class has one with the same name as the class (e.g. \gst{GS2Search.properties}). To add another language all of the base .properties  files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of \gst{interface\_default.properties} would be named \gst{interface\_default\_fr.properties}.
     979It is easy to add a new interface language to \gs\ .  Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in \gst{\$GSDL3HOME/WEB-INF/classes}. Each interface has one named \gst{interface\_name.properties} (where \gst{'name'} is the interface name, for example, \gst{interface\_default.properties}, or \gst{interface\_classic.properties}). Each service class has one with the same name as the class (e.g. \gst{GS2Search.properties}). To add another language all of the base \gst{.properties}  files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of \gst{interface\_default.properties} would be named \gst{interface\_default\_fr.properties}.
    995980
    996981Keys will be looked up in the properties file closest to the specified language. For example, if language \gst{fr\_CA} was specified (French language, country Canada), and the default locale was \gst{en\_GB},  Java would look at properties files in the following order, until it found the key: \gst{XXX\_fr\_CA.properties}, \gst{XXX\_fr.properties},  \gst{XXX\_en\_GB.properties}, then \gst{XXX\_en.properties}, and finally the default \gst{XXX.properties}.
     
    1001986\section{Developing \gsiii : Run-time system}\label{sec:develop-runtime}
    1002987
    1003 [TODO: rewrite this!!]
     988[TODO: rewrite this section\\
    1004989runtime object structure diagram. describe the modules.\\
    1005990class hierarchy,\\
     
    1010995\\
    1011996page generation\\
    1012 accessing the javadoc\\
    1013 
     997]
    1014998\subsection{Overview of modules??}
    1015999
     
    18811865files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current
    18821866interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.}
    1883 ***TODO*** describe a bit more?? currently only can get this locally
     1867[TODO: describe a bit more?? currently only can get this locally]
    18841868
    18851869\subsubsection{Receptionists}\label{sec:recepts}
     
    20151999NZDLParams & a subclass of GSParams which holds default service parameters too, necessary for the classic style interface.\\
    20162000GSPath & used to create, examine and modify message address paths\\
    2017 GSSQL & contains static strings for all the SQL table/field names\\
    20182001GSStatus & some static codes for status messages\\
    20192002GSXML & lots of methods for extracting information out of \gs\  XML, and creating some common types of elements. Also has static Strings for element and attribute names used by \gs\ .\\
     
    20252008MyNodeList & A simple implementation of an XML NodeList\\
    20262009OID & class to handle \gs\  (2) OIDs\\
    2027 GS3OID & subclass of OID to handle \gsiii\ OIDs\\
    20282010Processing & Runs an external process and prints the output from the process \\
    20292011SQLQuery & contains a connection to a SQL database, along with some methods for accessing the data, such as converting MG numbers to and from Greenstone OIDs.\\
     
    20352017\end{table}
    20362018
    2037 \newpage
    2038 \section{Collection building architecture}\label{sec:develop-build}
    2039 **** GEORGE ****
    2040 how building actually works\\
    2041 the building structure/architecture\\
    2042 modules API\\
    20432019
    20442020\newpage
    20452021\section{Developing \gsiii\ : Adding new features}\label{sec:new-features}
     2022
     2023[TODO: finish this section ]
    20462024
    20472025\subsection{Creating new services}\label{sec:new-services}
     
    20772055Remote interfaces: remote interfaces can be set up in the same way as above, using a communication protocol between the interface, and the library program.
    20782056
    2079 \subsection{Adding new classifiers}\label{sec:new-classifiers}
    2080 *** GEORGE ***
    2081 \subsection{Adding new plugins}\label{sec:new-plugins}
    2082 *** GEORGE ***
    20832057
    20842058\subsection{New types of collections}\label{sec:new-coll-types}
    20852059
    2086 There are two types of standard \gs\  collections: collections built with the \gsiii\  building system, and collections that are imported from \gsii\ . There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\  has an ability to use any type of collection you can come up with, assuming some Java code is provided.
     2060The standard type of collection is built with the \gsii\ Perl collection building system. There are many options to this, but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\  has an ability to use any type of collection you can come up with, assuming some Java code is provided.
    20872061
    20882062There are four levels of customization that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\  to describe these different levels.
    20892063
    2090 Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the \gsii\  MGPP collections were the first to be served in \gsiii\ . When we came to do \gsii\ MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
     2064Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, MGPP collections were the first to be served in \gsiii\ . When we came to do MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
    20912065
    20922066The XML Sample Texts (gberg) collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
    20932067
    2094 To tell \gs\  to load up a different type of collection class, we use another configuration file: etc/collectionInit.xml. This specifies the name of the collection class to use.
     2068To tell \gs\  to load up a different type of collection class, we use another configuration file: \gst{etc/collectionInit.xml}. This specifies the name of the collection class to use.
    20952069Currently, this is all that is specified in that file, but you may want to add parameters for the class etc.
    20962070
     
    21432117Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not \gs\  format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with \gs\  format elements.
    21442118
    2145 The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to make all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder greenstone3/resources/dtd.
     2119The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to make all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder \gst{\$GSDL3SRCHOME/resources/dtd}.
    21462120
    21472121\subsection{The Classic Interface}
    21482122
    2149 The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\  instead of \gsii\ . It uses a new site (nzdl) with the classic interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and classic interface.
     2123The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\  instead of \gsii\ . It uses a new site (nzdl) with a new interface (nzdl) which is based on the classic interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and nzdl interface.
    21502124
    21512125The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because it is running on Linux, we were able to link to all the collections in the old \gs\  installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
     
    21742148\end{figure}
    21752149
    2176 We have used Apache Axis SOAP implementation. This is run as a servlet in Tomcat. Axis is setup during installation of Greenstone. For more details about SOAP in Greenstone, see Appendix~\ref{app:soap}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
     2150We have used Apache Axis SOAP implementation. This is run as a servlet in Tomcat. Axis is set up during installation of Greenstone. For more details about SOAP in Greenstone, see Appendix~\ref{app:soap}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
    21772151
    21782152\subsection{Serving a site using soap}
    21792153
    2180 A webs service for localsite comes predeployed, but if you want to setup a service for another site, run \gst{ant soap-deploy-site}. This will prompt you for the sitename (its directory name), and a siteuri - a unique identifier for the web service. Tomcat needs to be running for this to work.
    2181 
    2182 The ant target deploys the service for the site specified. A resource file (\gst{<sitename>.wsdd}) is created which is used to specify the service. It can be found in \gst{greenstone3/resources/soap}, and is generated from \gst{site.wsdd.template}.
     2154A web service for localsite comes with \gs. However, it is not deployed by default. To deploy it, run run \gst{ant deploy-localsite}. If you want to set up web services for other sites, run \gst{ant soap-deploy-site}. This will prompt you for the sitename (its directory name), and a siteuri - a unique identifier for the web service. Tomcat needs to be running for this to work, and you need to have installed the \gs source code.
     2155
     2156The ant target deploys the service for the site specified. A resource file (\gst{<sitename>.wsdd}) is created which is used to specify the service. It can be found in \gst{\$GSDL3HOME/resources/soap}, and is generated from \gst{site.wsdd.template}.
    21832157
    21842158The address of the new SOAP service will be tomcatserver-address/greenstone3/services/sitename, for example, www.greenstone.org/greenstone3/services/localsite.
     
    21882162There are two ways to use a remote site. First, if you have a local site running, then the site can also connect to other remote sites. In the siteConfig.xml file, you need to add a site element into the siteList element.
    21892163
    2190 For example, to get siteA to talk to siteB, you need to deploy a SOAP server on siteB, then add a \gst{<site>} element to the \gst{<siteList>} of siteA's \gst{siteConfig.xml} file (in \gst{greenstone3/web/sites/siteA/siteConfig.xml}).
     2164For example, to get siteA to talk to siteB, you need to deploy a SOAP server on siteB, then add a \gst{<site>} element to the \gst{<siteList>} of siteA's \gst{siteConfig.xml} file (in \gst{\$GSDL3HOME/sites/siteA/siteConfig.xml}).
    21912165
    21922166In the \gst{<siteList>} element, add the following (substituting the chosen site uri for siteBuri):
     
    22042178Several sites can be connected to in this manner.
    22052179
    2206 The second option is if you have a receptionist set up on a machine where you have no site, and you only want to connect to a single remote site. Instead of using site\_name in the servlet initialisation parameters (in greenstone3/web/WEB-INF/web.xml), you can specify remote\_site\_name, remote\_site\_type and remote\_site\_address. A communicator object will be set up instead of a MessageRouter and the receptionist will talk to the communicator.
     2180The second option is if you have a receptionist set up on a machine where you have no site, and you only want to connect to a single remote site. Instead of using site\_name in the servlet initialisation parameters (in \$GSDL3HOME/WEB-INF/web.xml), you can specify remote\_site\_name, remote\_site\_type and remote\_site\_address. A communicator object will be set up instead of a MessageRouter and the receptionist will talk to the communicator.
    22072181
    22082182\appendix
     
    22112185\section{Using \gsiii\  from CVS}\label{app:cvs}
    22122186
    2213 [TODO: need to make sure building stuff is in here]
    2214 
    22152187\gsiii\  is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes.
    22162188
     
    22282200Greenstone is built and installed using Ant (Apache's Java based build tool,
    22292201http://ant.apache.org). You will need a Java Development
    2230 Environment (1.4 or higher), and Ant installed to use Greenstone. You can download Ant from \\\gst{http://ant.apache.org/bindownload.cgi}.
    2231 
    2232 In the greenstone3 directory, you can run 'ant' which will give you a help message.
    2233 Running 'ant -projecthelp' gives a list of the targets that you can run - these
     2202Environment (1.4 or higher), and Ant installed to use Greenstone. You can download Ant from \\\gst{http://ant.apache.org/bindownload.cgi}. Make sure that the environment variables JAVA\_HOME and ANT\_HOME are set.
     2203
     2204In the \gst{greenstone3} directory, you can run \gst{'ant'} which will give you a help message.
     2205Running \gst{'ant -projecthelp'} gives a list of the targets that you can run --- these
    22342206do various things like compile the source code, startup the server etc.
    22352207
    2236 The README.txt file has up-to-date instructions for installing from CVS. Briefly, for a first time install, run 'ant prepare install'.
    2237 
    2238 The file build.properties contains various parameters that can be set by the user. Please check these settings before running the installation process. The install process will ask you if you accept the properties before starting.
     2208The \gst{README.txt} file has up-to-date instructions for installing from CVS. Briefly, for a first time install, run \gst{'ant prepare install'}.
     2209
     2210The file \gst{build.properties} contains various parameters that can be set by the user. Please check these settings before running the installation process. The install process will ask you if you accept the properties before starting.
    22392211For a  non-interactive version of the install, run
    2240 ant -Dproperties.accepted=yes install
     2212\gst{'ant -Dproperties.accepted=yes install'}
    22412213
    22422214To log the output in build.log, run
    2243 ant -Dproperties.accepted=yes -logfile build.log install
    2244 
    2245 Under Linux, Java and C/C++ compilation is carried out. For windows, since Visual Studio is not a standard component, only Java compilation is carried out. Pre-compiled binaries are provided for the C/C++ components (source packages and Greenstone 2 style building). If you have Visual Studio installed (version 6), you can run the compile-windows-c++ targets to compile the code locally. (Don't forget to setup the Visual Studio environment first, by running, e.g. C:/Program Files/Microsoft Visual Studio/VC98/Bin/VCVARS32.BAT or equivalent.)
    2246 
    2247 
    2248 Note: \gst{gs3-setup} sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} and needs to be done in a shell before doing collection building etc.
    2249 
    2250 To startup or shutdown the library (includes the Tomcat server and MYSQL server), the commands are (run from the greenstone3 directory):
    2251 
    2252 \begin{quote}\begin{gsc}
    2253 ant start \\
    2254 ant stop
    2255 \end{gsc}\end{quote}
    2256 
    2257 If you want to restart only Tomcat, run \gst{ant restart-tomcat}.
     2215\gst{'ant -Dproperties.accepted=yes -logfile build.log install'}
     2216
     2217Compilation includes Java and C/C++. On Windows, you will need to have Visual Studio or equivalent installed. Please check the \gst{compile.windows.c++.setup} property in build.properties --- make sure it is set to the setup script of Visual Studio.
     2218
     2219Note: \gst{gs3-setup} sets the environment variables \gst{GSDL3HOME, GSDL3SRCHOME, CLASSPATH, PATH, JAVA\_HOME} and needs to be done in a shell before doing collection building etc.
     2220
     2221To run the library, use the \gst{gs3-server.sh/bat} shell scripts.
    22582222
    22592223\newpage
     
    22622226Tomcat is a servlet container, and Greenstone 3 runs as a servlet inside it.
    22632227
    2264 The file \gst{\gsdlhome/packages/tomcat/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for \gsiii\  servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/greenstone3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/greenstone3/index.html}. The demo collection's images can be accessed through \\
    2265 \gst{localhost:8080/greenstone3/sites/localsite/collect/demo/images/}.
    2266 
    2267 
    2268 Greenstone sets up Tomcat to run on port 8080 by default. To change this, you can edit the tomcat.port property in build.properties. If you do this before installing Greenstone, then running 'ant install' will use the new port number. If you want to change it later on, shutdown tomcat, run 'ant reconfigure-server-settings', then when you restart tomcat it will use the new port.
     2228The file \gst{\$GSDL3SRCHOME/packages/tomcat/conf/server.xml} is the Tomcat configuration file. A context for \gsiii\  is given by the file\\ \gst{\$GSDL3SRCHOME/packages/tomcat/conf/Catalina/localhost/greenstone3.xml}. This tells Tomcat where to find the web.xml file, and what URL (\gst{/greenstone3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\$GSDL3HOME} can be accessed through the URL \gst{localhost:8080/greenstone3/index.html}. The gs2mgdemo collection's images can be accessed through \\
     2229\gst{localhost:8080/greenstone3/sites/localsite/collect/gs2mgdemo/images/}.
     2230
     2231
     2232Greenstone sets up Tomcat to run on port 8080 by default. To change this, you can edit the tomcat.port property in build.properties. If you do this before installing Greenstone, then running 'ant install' will use the new port number. If you want to change it later on, shutdown tomcat, run 'ant configure', then when you restart tomcat it will use the new port.
    22692233
    22702234Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:
    22712235\begin{bulletedlist}
    22722236\begin{gsc}
    2273 \item \gsdlhome/web/WEB-INF/web.xml
    2274 \item \gsdlhome/packages/tomcat/conf/server.xml
     2237\item \$GSDL3HOME/WEB-INF/web.xml
     2238\item \$GSDL3SRCHOME/packages/tomcat/conf/server.xml
    22752239\end{gsc}
    22762240\item any classes or jar files used by the servlets
    22772241\end{bulletedlist}
    2278 \noindent Note: stdin and stdout for the servlets (on Linux) both go to\\
    2279 \gst{\gsdlhome/packages/tomcat/logs/catalina.out}
    22802242
    22812243On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}), or by restarting Tomcat.
    22822244
    2283 We have set up Tomcat to follow symlinks. To disable this feature, remove the \gst{<Resources>} element from the greenstone3 context in \\\gst{\$GSDL3HOME/packages/tomcat/conf/server.xml}:
    2284 
    2285 \begin{quote}\begin{gsc}
    2286 <Context path="/greenstone3" docBase="\$GSDL3HOME/web" debug="1" \\
    2287 reloadable="true">\\
    2288    <Resources allowLinking='true'/>\\
    2289 </Context>\\
    2290 \end{gsc}\end{quote}
    2291 
    2292 By default, Tomcat allows directory listings. To disable this, change the 'listings' parameter to false in the default servlet definition, in Tomcat's web.xml file (\gst{\$GSDL3HOME/packages/tomcat/conf/web.xml}):
    2293 
    2294 We have set the greenstone context to be reloadable. This means that if a class or resource file in web/WEB-INF/lib or web/WEB-INF/classes changes, the servlet will be reloaded. This is useful for development, but should be turned off for production mode (set the reloadable attribute to false).
     2245We have disabled following symlinks for the greenstone servlet. To enable it, edit \gst{\$GSDL3SRCHOME/packages/tomcat/conf/Catalina/localhost/greenstone3.xml} and set 'allowLinking' to true.
     2246
     2247By default, Tomcat allows directory listings. To disable this, change the 'listings' parameter to false in the default servlet definition, in Tomcat's web.xml file (\gst{\$GSDL3SRCHOME/packages/tomcat/conf/web.xml}):
     2248
     2249We have set the greenstone context to be reloadable. This means that if a class or resource file in web/WEB-INF/lib or web/WEB-INF/classes changes, the servlet will be reloaded. This is useful for development, but should be turned off for production mode (set the 'reloadable' attribute to false).
    22952250
    22962251Tomcat uses a Manager to handle HTTP session information. This may be stored between restarts if possible. To use a persistent session handling manager, uncomment the \gst{<Manager>} element in \\
    2297 \gst{\$GSDL3HOME/packages/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory:\\
    2298 \gst{\$GSDL3HOME/packages/tomcat/work/Catalina/localhost/greenstone3/SESSIONS.ser}. Delete this file to clear the cached session info. Note that Tomcat needs to be shutdown to delete this file.
     2252\gst{\$GSDL3SRCHOME/packages/tomcat/conf/server.xml}. For the default manager, session information is stored in the work directory:\\
     2253\gst{\$GSDL3SRCHOME/packages/tomcat/work/Catalina/localhost/greenstone3/SESSIONS.ser}. Delete this file to clear the cached session info. Note that Tomcat needs to be shutdown to delete this file.
    22992254
    23002255\subsection{Proxying Tomcat with apache}
     
    23222277\section{SOAP}\label{app:soap}
    23232278
    2324 Greenstone uses the Apache Axis SOAP implementation for distributed communications. Axis runs as a servlet inside Tomcat, and SOAP web services can be deployed by this Axis servlet. The Greenstone installation process sets up Axis for Tomcat, and predeploys the localsite web service.
     2279Greenstone uses the Apache Axis SOAP implementation for distributed communications. Axis runs as a servlet inside Tomcat, and SOAP web services can be deployed by this Axis servlet. The Greenstone installation process sets up Axis for Tomcat, but does not deploy any services.
     2280
     2281To deploy the SOAP service for localsite, run \gst{ant deploy-localsite}.
    23252282
    23262283To deploy a SOAP service for other sites, run \gst{ant soap-deploy-site}
    23272284
    2328 This will prompt you for the sitename (the site's directory name), and a unique URI for the site. It creates a new SOAPServer class for the site \\(\gst{\$GSDL3HOME/src/java/org/greenstone/gsdl3/SOAPServer<sitename>.java}), creates a resource file for deployment (\gst{\$GSDL3HOME/resources/soap/<sitename>.wsdd}), and then tries to deploy the service.
     2285This will prompt you for the sitename (the site's directory name), and a unique URI for the site. It creates a new SOAPServer class for the site \\(\gst{\$GSDL3SRCHOME/src/java/org/greenstone/gsdl3/SOAPServer<sitename>.java}), creates a resource file for deployment (\gst{\$GSDL3SRCHOME/resources/soap/<sitename>.wsdd}), and then tries to deploy the service.
    23292286
    23302287Information about deployed services is maintained between Tomcat sessions---you only need to deploy something once. To undeploy a site, use \gst{ant undeploy-soap-site}.
     
    23372294To run it, type:
    23382295
    2339 \begin{quote}\gst{java -cp \$GSDL3HOME/web/WEB-INF/lib/axis.jar \\
     2296\begin{quote}\gst{java -cp \$GSDL3HOME/WEB-INF/lib/axis.jar \\
    23402297org.apache.axis.utils.tcpmon}
    23412298\end{quote}
     
    23432300The listen port is the port that you want the monitor to be listening on. It should 'act as' a Listener, with target hostname 127.0.0.1 (localhost), and target port the port that Tomcat is running on (8080). You need to modify the address used to talk to the SOAP service. For example, if you want to monitor traffic between the gateway site and the localsite SOAP server, you will need to edit gateway's siteConfig.xml file and change the port number (in the site element) to whatever you have chosen as the listen port.
    23442301
    2345 For example, in the Admin panel of TCPMonitor the Target Hostname might be 127.0.0.1, and the Target Port # 8080. Set the Listen Port # to be a different port, such as 8070 and click Add. This produces a new tab panel where you can see the messages arriving at port 8070 before being forwarded to port 8080. You then need to set your test request from your SOAP application to arrive at port 8070 and you will see copies of the messages in the new tab panel.
     2302For example, in the Admin panel of TCPMonitor the Target Hostname might be 127.0.0.1, and the Target Port \# 8080. Set the Listen Port \# to be a different port, such as 8070 and click Add. This produces a new tab panel where you can see the messages arriving at port 8070 before being forwarded to port 8080. You then need to set your test request from your SOAP application to arrive at port 8070 and you will see copies of the messages in the new tab panel.
    23462303
    23472304
Note: See TracChangeset for help on using the changeset viewer.