Changeset 6908


Ignore:
Timestamp:
2004-03-04T10:29:44+13:00 (20 years ago)
Author:
kjdon
Message:

more changes

Location:
trunk/gsdl3/docs/manual
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r6904 r6908  
    1010\newcommand{\gsdlhome}{\$GSDL3HOME}
    1111
     12\newcommand{\gsii}{Greenstone 2}
     13\newcommand{\gsiii}{Greenstone 3}
     14\newcommand{\gs}{Greenstone}
     15
    1216\begin{document}
    1317
    14 \title{Greenstone 3: A modular digital library.}
     18\title{\gsiii\ : A modular digital library.}
    1519
    1620% if you work on this manual, add your name here
     
    3034\noindent
    3135Greenstone Digital Library Version 3 is a complete redesign and
    32 reimplementation of the Greenstone digital library software.  The current
    33 version (Greenstone2) enjoys considerable success and is being widely used.
    34 Greenstone3 will capitalise on this success, and in addition it will
     36reimplementation of the \gs\ digital library software.  The current
     37version (\gsii) enjoys considerable success and is being widely used.
     38\gsiii \ will capitalise on this success, and in addition it will
    3539\begin{bulletedlist}
    3640\item improve flexibility, modularity, and extensibility
    37 \item lower the bar for ``getting into'' the Greenstone code with a view to
     41\item lower the bar for ``getting into'' the \gs\ code with a view to
    3842   understanding and extending it
    3943\item use XML where possible internally to improve the amount of
     
    4953   easier inclusion of existing Java code (such as for text mining).
    5054\end{bulletedlist}
    51 Parts of Greenstone will remain in other languages (e.g. MG, MGPP); JNI (Java
     55Parts of \gs\ will remain in other languages (e.g. MG, MGPP); JNI (Java
    5256Native Interface) will be used to communicate with these.
    5357
    54 A description of the general design and architecture of Greenstone3 is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory).
    55 
    56 This documentation consists of several parts. Section~\ref{sec:install} covers greenstone installation, how to access the library, and some administration issues. Section~\ref{sec:user} looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards  the Greenstone developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to Greenstone, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make Greenstone run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install Greenstone from CVS, and a comparison of Greenstone2 and Greenstone3 format statements.
     58A description of the general design and architecture of \gsiii\ is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory).
     59
     60This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs\ , such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\ format statements.
    5761\newpage
    58 \section{Greenstone installation and administration}\label{sec:install}
    59 
    60 This section covers where to get Greenstone 3 from, how to install it and how to run it. The standard method of running Greenstone is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may  be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access Greenstone, Tomcat must be started up, and then it can be accessed via a web browser.
    61 
    62 
    63 \subsection{Get and install Greenstone}
    64 
    65 Greenstone is available from \gst{http://www.greenstone.org/greensone3}. There are currently two distributions: a self-installing tar for Linux, and a Windows executable.
    66 
    67 Greenstone is also available through CVS (Concurrent Versioning System). This provides the absolute latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install Greenstone from CVS.
     62\section{\gs\ installation and administration}\label{sec:install}
     63
     64This section covers where to get \gsiii\  from, how to install it and how to run it. The standard method of running \gsiii\  is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may  be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access \gsiii\ , Tomcat must be started up, and then it can be accessed via a web browser.
     65
     66
     67\subsection{Get and install \gs\ }
     68
     69\gsiii\  is available from \gst{http://www.greenstone.org/greenstone3}. There are currently two distributions: a self-installing tar for Linux, and a Windows executable.
     70
     71\gsiii\  is also available through CVS (Concurrent Versioning System). This provides the latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install \gsiii\ from CVS.
    6872
    6973\subsubsection{Linux}
    7074
    71 Download the latest version of the self-installing tar file, gsdl3-x.xx-unix.sh, and run it in a shell (./gsdl3-x.xx-unix.sh). Greenstone will be installed into a directory called gsdl3 inside the current directory. The install script will prompt you for  the name of your computer and what port to run Tomcat on (the defaults being localhost and 8080).  Once Greenstone has been installed, you can start the library  by running ./gsdl3/gs3-launch.sh, and opening up a browser pointing to localhost:8080/gsdl3 (or different computer name and port).
     75Download the latest version of the self-installing tar file, \gst{gsdl3-x.xx-unix.sh}, and run it in a shell (\gst{./gsdl3-x.xx-unix.sh}). \gsiii\  will be installed into a directory called \gst{gsdl3} inside the current directory. The install script will prompt you for  the name of your computer and what port to run Tomcat on (the defaults being \gst{localhost} and \gst{8080}).  Once \gsiii\  has been installed, you can start the library  by running \gst{./gsdl3/gs3-launch.sh}, and opening up a browser pointing to \gst{http://localhost:8080/gsdl3} (substituting your chosen name and port if necessary).
    7276
    7377\subsubsection{Windows}
    7478
    75 Download the latest Windows executable, gsdl3-x.xx-win32.exe, and double click it to start the installation. You will be prompted for your computer name and port number to run Tomcat on (defaults are localhost and 8080). Once Greenstone is installed, you can access the library by selecting Greenstone 3 Digital Library in the Start menu.
     79Download the latest Windows executable, \gst{gsdl3-x.xx-win32.exe}, and double click it to start the installation. You will be prompted for your computer name and the port number to run Tomcat on (defaults are \gst{localhost} and \gst{8080}). Once \gsiii\  is installed, you can access the library by selecting \gst{Greenstone Digital Library 3} in the Start menu.
    7680
    7781\subsubsection{Accessing the library in a browser}
    7882
    79 Once you have started up the library (see the previous sections for OS dependent instructions), you can access it in a browser at http://localhost:8080/gsdl3 (or http://your-computer-name:your-chosen-port/gsdl3). This gets you to a welcome page, with three links: one to run a test servlet (this allows you to check that Tomcat is running properly), one to run the standard library servlet using the site \gst{localsite}, and one to run a library servlet using the site \gst{soapsite}. This site uses a SOAP connection to communicate with localsite, and demonstrates the library working in a distributed fashion. See Section~\ref{sec:distributed} for details about how to run Greenstone distributedly.
     83Once you have started up the library (see the previous sections for OS dependent instructions), you can access it in a browser at \gst{http://localhost:8080/gsdl3} (or \gst{http://your-computer-name:your-chosen-port/gsdl3}). This gets you to a welcome page, with three links: one to run a test servlet (this allows you to check that Tomcat is running properly), one to run the standard library servlet using the site \gst{localsite}, and one to run a library servlet using the site \gst{soapsite}. This site uses a SOAP connection to communicate with localsite, and demonstrates the library working in a distributed fashion. The SOAP connection is not enabled by default: see Section~\ref{sec:distributed} for details about how to run \gsiii\ distributedly.
    8084
    8185\subsection{How the library works}
    8286
    83 The standard library program is a Java servlet.
     87The standard library program is a Java servlet. We use the Tomcat servlet container to present the servlets over the web. Tomcat takes CGI-style URLs and passes the arguments to the servlet, which processes these and returns a page of HTML. As far as an end-user is concerned, a servlet is a Java version of a CGI program. The interaction is similar: access is via a web browser,  using arguments in a URL.
    8488
    8589Other types of interfaces can be used, such as Java GUI programs. See Section~\ref{sec:new-interfaces} for details about how to make these.
     
    8791\subsubsection{Restarting the library}
    8892
    89 The library program (actually Tomcat) can be restarted by ... (** put a mechanism in each install program **).
    90 
    91 
    92 Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\
     93The library program (actually Tomcat) can be restarted in Windows by closing the window, and restarting it from the Start menu. In linux, you nned to go to the gsdl3 directory, and run \gst{gsdl3/gs3-launch.sh -shutdown}, then \gst{gsdl3/gs3-launch.sh}.
     94
     95
     96Tomcat must be restarted any time you make changes in the following for those changes to take effect:\\
    9397\begin{bulletedlist}
    9498\begin{gsc}
     
    104108\subsection{Directory structure}
    105109
    106 Table~\ref{tab:dirs} shows the file hierarchy for Greenstone3.
     110Table~\ref{tab:dirs} shows the file hierarchy for \gsiii\ .
    107111The first part  shows the common stuff which can be shared between
    108 Greenstone users---the source, libraries etc. Under Linux, these can be installed into appropriate system directories. The second part shows
     112\gs\ users---the source, libraries etc. Under Linux, these can be installed into appropriate system directories. The second part shows
    109113stuff used by one person/group---their sites and interface setup (see Section~\ref{sec:sites-and-ints}).
    110 etc. There can be several sites/interfaces per installation.
     114etc. There can be several sites/interfaces per installation. All the files inside the gsdl3/web directory comprise the gsdl3 context for Tomcat, and are accessible via Tomcat.
    111115
    112116\begin{table}
    113 \caption{The Greenstone directory structure}
     117\caption{The \gs\ directory structure}
    114118\label{tab:dirs}
    115119{\footnotesize
     
    139143 & soap service description files \\
    140144gsdl3/resources/dtd
    141  & Greenstone has trouble loading DTD files sometimes. They can go here\\
     145 & \gsiii\ has trouble loading DTD files sometimes. They can go here\\
    142146gsdl3/bin
    143147  & executable stuff lives here\\
     
    184188\subsection{Sites and interfaces}\label{sec:sites-and-ints}
    185189
    186 local gs stuff (sites and interfaces) vs installed stuff (code)\\
    187 where they live, whats the difference, what each contains.\\
     190[local gs stuff (sites and interfaces) vs installed stuff (code)\\
     191where they live, whats the difference, what each contains.]\\
    188192
    189193A site is comprised of a set of collections and possibly some site-wide services. An interface (in this web-based servlet context) is a set of images along with a set of xslt files used for translating xml output from the library into an appropriate form---html in general.
    190194
    191 One greenstone installation can have many sites and interfaces. One instantiation of a servlet uses one site and one interface. Sites and interfaces can be matched up in different ways. For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. eg HTML vs WML, or perhaps providing a completely different look and feel for different audiences. A standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
    192 
    193 Collections live in the collect directory of a site. Any collections that are found in this directory when the servlet is initialised will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
    194 
    195 There are two Greenstone sites that come with the distribution: localsite, and soapsite. localsite has several demo  collections, while soapsite has none. soapsite specifies that a soap connection should be made to localsite. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
     195One \gsiii\  installation can have many sites and interfaces, and these can be paired in different combinations.  One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance.  For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. eg HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
     196
     197Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialised will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
     198
     199There are two  sites that come with the distribution: \gst{localsite}, and \gst{soapsite}. \gst{localsite} has several demo  collections, while \gst{soapsite} has none. \gst{soapsite} specifies that a soap connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
    196200
    197201Each site and interface has a configuration file which specifies parameters for the site or interface---these are described in Section~\ref{sec:config}.
    198202
    199203The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
    200 There are three servlets specified in web.xml (these correspond to the three links in the welcome page for greenstone): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are Greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. Both of these servlets use the standard interface (called {\em default}).
     204There are three servlets specified in web.xml (these correspond to the three links in the welcome page for \gsiii\ ): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are \gs\ library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. Both of these servlets use the standard interface (called {\em default}).
    201205
    202206\begin{table}
    203 \caption{Greenstone servlet initialisation parameters}
     207\caption{\gs\ servlet initialisation parameters}
    204208\label{tab:serv-init}
    205209{\footnotesize
     
    210214gsdl3\_home & /research/kjdon/gsdl3 & the base directory of the gsdl3 installation \\
    211215site\_name & localsite & the name of the site to use \\
    212 interface\_name & default & the name or the interface to use\\
     216interface\_name & default & the name of the interface to use\\
    213217library\_name & library & the web name of the servlet \\
    214218default\_lang & en & the default language for the interface\\
    215219receptionist\_class & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\
    216220messagerouter\_class & NewMessageRouter & (optional) specifies an alternative MessageRouter to use\\
     221params\_class & NZDLParams & (optional) specifies an alternative GSParams class to use \\
    217222\hline
    218223\end{tabular}}
     
    222227
    223228
    224 \subsection{Configuring a greenstone installation}\label{sec:config}
    225 
    226 Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
    227 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a CGI-type command to the library.  There are a series of commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
     229\subsection{Configuring a \gs\ installation}\label{sec:config}
     230
     231Initial \gsiii\ system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
     232The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a CGI-type command to the library.  There are a series of commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
    228233
    229234\subsubsection{Site configuration file}\label{sec:siteconfig}
     
    238243Figure~\ref{fig:siteconfig} shows two example site configuration files. The first example is for a rudimentary site with no site-wide services,
    239244which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster.  It also connects to the first site using SOAP.
    240 These two sites are running on the same machine. For site \gst{gsdl1} to talk to site \gst{localsite}, a SOAP server must be run for \gst{localsite}. The address of the SOAP server, in this case, is \gst{http://localhost:8080/soap/servlet/rpcrouter}.
     245These two sites happen to be running on the same machine, which is why they can use \gst{localhost} in the address. For site \gst{gsdl1} to talk to site \gst{localsite}, a SOAP server must be run for \gst{localsite}. The address of the SOAP server, in this case, is \gst{http://localhost:8080/soap/servlet/rpcrouter}.
    241246
    242247
     
    281286\subsubsection{Interface configuration file}\label{sec:interfaceconfig}
    282287
    283 The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). It specifies what short name each action maps to (this is used in library urls for the a (action) parameter) e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
    284 
    285 It also lists all the languages that the interface text files have been translated into. These have a name attribute, which is the ISO code for the language, and a displayElement which gives the language name in that language (note the non-English characters have been specified in UTF-8 codes). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a Greenstone library are shown in Section~\ref{sec:interface-customise}.
     288The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library urls for the a (action) parameter) e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
     289
     290It also lists all the languages that the interface text files have been translated into. These have a \gst{name} attribute, which is the ISO code for the language, and a \gst{displayElement} which gives the language name in that language (note that this file should be encoded in UTF-8). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a \gsiii\ library are shown in Section~\ref{sec:interface-customise}.
    286291
    287292\begin{figure}
     
    326331\subsection{Run-time re-initialisation}\label{sec:runtime-config}
    327332
    328 should this section go in here, cos its kind of adminy, or go into the user stuff, cos you need to do it after building a collection???
     333[**should this section go in here, cos its kind of adminy, or go into the user stuff, cos you need to do it after building a collection???**]
    329334
    330335When Tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless Tomcat is restarted, or re-configuration commands issued.
    331336
    332 There are several CGI-type commands that can be issued to Tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, Tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, Tomcat must be restarted then too.
    333 
    334 Currently, the runtime configuration commands can only be accessed by typing in CGI-arguments into the URL, there is no nice web form yet to do this.
    335 
    336 The CGI arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in Greenstone, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
     337There are several commands that can be issued to Tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, Tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, Tomcat must be restarted then too.
     338
     339Currently, the runtime configuration commands can only be accessed by typing arguments into the URL; there is no nice web form yet to do this.
     340
     341The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in \gs\ , so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
    337342
    338343\begin{table}
     
    351356\end{table}
    352357\newpage
    353 \section{Using Greenstone 3}\label{sec:user}
    354 
    355 Once you have greenstone 3 installed, you can access the sample collections. The installation comes with some example collections, and Section~\ref{sec:usecolls} describes these collections and how to use them. Section~\ref{sec:buildcol} describes how to build your own collections.
    356 
    357 \subsection{Using a collection}\label{sec:usecolls}
     358\section{Using \gsiii\ }\label{sec:user}
     359
     360Once \gsiii\ is installed, the sample collections can be accessed. The installation comes with several example collections, and Section~\ref{sec:usecolls} describes these collections and how to use them. Section~\ref{sec:buildcol} describes how to build new collections.
     361
     362\subsection{Using a collection}\label{sec:usecolls}
     363[TODO: expand this section]
    358364
    359365A collection typically consists of a set of documents, which could be text, html, word, PDF, images, bibliographic records etc, along with some access methods, or ``services''. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.
     
    362368Browsing involves navigating pre-defined hierarchies of documents, following links of interest to find documents. The hierarchies may be constructed on different metadata fields, for example, alphabetical lists of Titles, or a hierarchy of Subject classifications. Clicking on a bookshelf icon takes you to a lower level in the hierarchy, while clicking on a book or page icon takes you to a document.
    363369
    364 In the standard interface that comes with Greenstone3\footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
     370In the standard interface that comes with \gsiii\ \footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
    365371
    366372\begin{figure}[h]
     
    371377\end{figure}
    372378
    373 The image at the top left is a link to the collection's home page. The top right has buttons to link to the library home page, help pages and preference pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Clicking on a name accesses that service. Search type services generally provide a form to fill in, with parameters including what field or granularity to index, and the query itself. Clicking the
    374 The results of a search
     379The image at the top left is a link to the collection's home page. The top right has buttons to link to the library home page, help pages and preference pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Clicking on a name accesses that service.
     380
     381Search type services generally provide a form to fill in, with parameters including what field or granularity to index, and the query itself. Clicking the search button carries out the search, and a list of matching documents will be displayed. Clicking on the icons in the results list takes you to the document itself.
     382
    375383Once you are looking at a document, clicking the open book icon at the top of the document, underneath the navigation bar, will take you back to the service page that you accessed the document from.
    376384
    377 describe the colls that the sample installation comes with\\
     385[TODO: describe the colls that the sample installation comes with\\
    378386brief description of what a collection is.\\
    379387how to get around the collection, services etc. \\
    380388querying vs browsing \\
    381 use the demo colls that come with greenstone - one gs2 coll, one gs3 coll, tei coll??\\
     389use the demo colls that come with \gsiii\  - one gs2 coll, one gs3 coll, tei coll??\\]
    382390
    383391\subsection{Building a collection}\label{sec:buildcol}
    384392
    385 There are two ways to get a new collection into Greenstone 3. The first is to build it using the greenstone 3 building process. The second way is to import a greenstone 2 collection.
    386 
    387 Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per greenstone installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to.
    388 
    389 The following two sections describe how to create a collection from scratch, and how to import a greenstone 2 collection. Once a collection has been built, the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, there is a CGI command to reload a collection which can also load a new one. Use the CGI arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}---this tells the library program to reload the collname collection.
     393There are two ways to get a new collection into \gsiii\ . The first is to build it using the \gsiii\  building process. The second way is to import a \gsii\ collection.
     394
     395Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\ installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to.
     396
     397The following two sections describe how to create a collection from scratch, and how to import a \gsii\ collection. Once a collection has been built, the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, there is a CGI command to reload a collection which can also load a new one. Use the CGI arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}---this tells the library program to reload the collname collection.
    390398
    391399
    392400\subsubsection{Creating a collection from scratch}
    393401
    394 Building Greenstone 3 collections is done using the \gst{gs3-build.sh} script, with the \gst{collectionConfig.xml} file controlling how the building is done.  There are a number of considerations in building a collection: including what documents appear in the collection, how they are indexed for searching, which classifications are used for browsing, etc.
     402Building \gsiii\ collections is done using the \gst{gs3-build.sh} script, with the \gst{collectionConfig.xml} file controlling how the building is done.  There are a number of considerations in building a collection: including what documents appear in the collection, how they are indexed for searching, which classifications are used for browsing, etc.
    395403
    396404Firstly, the documents that comprise the collection should be placed in the import subdirectory.  At present, only documents in this directory will appear in the collection.
    397405[TODO: describe the kinds of documents that can be added, something about METS files?]
    398406
    399 Metadata for documents can be added using metadata.xml files.  These files have already been used in Greenstone 2, and the format is the same in Greenstone 3.  A metadata.xml file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the metadata.xml, or in one of its child directories, file will be selected.  The filename tag encloses the regular expression as text, eg:
     407Metadata for documents can be added using metadata.xml files.  These files have already been used in \gsii\ , and the format is the same in \gsiii\ .  A metadata.xml file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the metadata.xml, or in one of its child directories, file will be selected.  The filename tag encloses the regular expression as text, eg:
    400408
    401409\begin{gsc}\begin{verbatim}
     
    425433\end{verbatim}\end{gsc}
    426434
    427 Here, only one file pattern is found in the file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the accumulate metadata.  This means that when the title is assigned to a document, its existing \gst{Title} information will be lost.
    428 
    429 The basic means of finding documents in Greenstone is search. Options for building the search indexes include which indexer to use, what granularity to use for the indexes (e.g. whether to index documents as a whole, or sections of documents), what content the index should have (the whole text of the document or one or many metadata fields).
    430 
    431 Indexes can alter which search engine to use for that index, the level at which the index should be built (e.g. document, section or paragraph) and the text over which it should be built (e.g. the document text, titles alone, author names, etc.).  Section-level indexes allow a reader to recall part of a document (for instance, a chapter) rather than the entire document.  However, Greenstone 3 must be able to identify the internal structure of the document to achieve this.  The degree to which structure can be found varies from file format to file format.
     435Here, only one file pattern is found in the file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the mode=accumulate attribute.  This means that when the title is assigned to a document, its existing \gst{Title} information will be lost.
     436
     437The basic means of finding documents in \gs\  is search. Options for building the search indexes include which indexer to use, what granularity to use for the indexes (e.g. whether to index documents as a whole, or sections of documents), what content the index should have (the whole text of the document or one or many metadata fields).  Section-level indexes allow a reader to recall part of a document (for instance, a chapter) rather than the entire document.  However, \gsiii\  must be able to identify the internal structure of the document to achieve this.  The degree to which structure can be found varies from file format to file format.
    432438
    433439The collectionConfig.xml file controls the all of these options for collection building, and the format is described in Section~\ref{sec:collconfig}.
    434440
    435 Wherever possible, the Greenstone 3 will import and use options from a Greenstone 2 \gst{collect.cfg} file.  However, it is strongly recommended that a proper \gst{collectionConfig.xml} file is used wherever possible.
     441If a collectionConfig.xml file is not found, the \gsiii\  build process will import and use options, wherever possible, from a \gsii\  \gst{collect.cfg} file.  However, it is strongly recommended that a proper \gst{collectionConfig.xml} file is used wherever possible. [NOTE: I think we should require a proper config file for gs3 building--kjdon]
    436442
    437443To build a collection, execute \gst{gs3build.sh sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have mysql running before you start building---running \gst{gs3-launch.sh} will start up the mysql server as well as tomcat.
    438444
    439 Once the build process is complete, the building directory should be renamed to index (after deleting the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
     445Once the build process is complete, the building directory should be renamed to index (after deleting or renaming the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
    440446
    441447[TODO: need to describe namespaces somewhere? ]
    442448
    443 \subsubsection{Importing a greenstone 2 collection}
    444 
    445 Greenstone 3 can also serve Greenstone 2 collections. If you have a Greenstone 2 collection\footnote{For information about the Greenstone 2 software, and how to build collections using it, visit \gst{www.greenstone.org}}, you can copy it into the collect directory of the site you are using. Or make a link to it from the collect directory if your OS supports that.
    446 The Greenstone 3 run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new collectionConfig.xml and buildConfig.xml from the old collect.cfg and build.cfg files. It does not change the collection in any way, so it can still be used by Greenstone 2 software.
    447 
    448 The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, make sure you have sourced setup.bash (or run setup in Windows) in your top-level gsdl directory of the greenstone 2 installation. Then you need to specify the path to the collect directory, and the collection name as parameters to the conversion script. For example,
     449\subsubsection{Importing a \gsii\ collection}
     450
     451\gsiii\  can also serve \gsii\  collections. If you have a \gsii\  collection\footnote{For information about the \gsii\ software, and how to build collections using it, visit \gst{www.greenstone.org}}, you can copy it into the collect directory of the site you are using. Or make a link to it from the collect directory if your OS supports that.
     452The \gsiii\  run time system requires different configuration files for a collection, so you need to run a conversion script. All this does is create the new collectionConfig.xml and buildConfig.xml from the old collect.cfg and build.cfg files. It does not change the collection in any way, so it can still be used by \gsii\ software.
     453
     454The conversion script is \gst{convert\_coll\_from\_gs2.pl}. To run it, make sure you have sourced setup.bash (or run setup in Windows) in your top-level gsdl directory of the \gsii\ installation. Then you need to specify the path to the collect directory, and the collection name as parameters to the conversion script. For example,
    449455
    450456\gst{convert\_coll\_from\_gs2.pl -collectdir \$GSDL3HOME/web/\-sites/\-localsite/\-collect demo}
    451457
    452 The script attempts to create gs3 format statements from the old greenstone 2 ones. The conversion may not always work properly, so if the collection looks a bit strange under Greenstone 3, you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
     458The script attempts to create gs3 format statements from the old \gsii\  ones. The conversion may not always work properly, so if the collection looks a bit strange under \gsiii\ , you should check the format statements. Format statements are described in Section~\ref{sec:formatstmt}.
    453459
    454460Once again, to have the collection recognised by the library servlet, you can either restart Tomcat, or load it dynamically.
     
    458464Each collection has two, or possibly three, configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, and optionally \gst{collectionInit.xml} that give metadata, display and other information for the
    459465collection.\footnote{\gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in
    460 Greenstone2.}  The first includes user-defined presentation metadata for the collection,
     466\gsii.}  The first includes user-defined presentation metadata for the collection,
    461467such as its name and the {\em About this collection} text; gives formatting information for the collection display; and also gives
    462468instructions on how the collection is to be built.  The second is produced by
     
    464470automatically. It also includes configuration information for any ServiceRacks needed by the collection.
    465471
     472All the configuration files should be encoded using UTF-8.
     473
    466474\subsubsection{collectionInit.xml}
    467475
    468 This optional file specifies a new collection class if the standrad one is not to be used. The only syntax so far is the class name:
     476This optional file is only used for non-standard, customised collections. It specifies the class name of the non-standard collection class. The only syntax so far is the class name:
    469477
    470478\begin{gsc}\begin{verbatim}
     
    479487
    480488Display elements for a collection or metadata for a document can be entered in any language---use lang='en' attributes to metadata elements to specify which language they are in.
    481 
    482 configuration files need to be encoded in utf-8.
    483489
    484490\begin{figure}
     
    490496  </metadataList>
    491497  <displayItemList>
    492     <displayItem name="smallicon" lang="en">mgppdemosm.gif</displayItem>
    493     <displayItem name="description" lang="fr">C'est une collection pour
    494       demonstration du logiciel Greenstone. Elle contient une petite
    495       partie du projet de bibliotheques humanitaires et de developpement
    496         (11 livres).</displayItem>
     498    <displayItem name="name" lang="en">Greenstone 3 demo</displayItem>
     499    <displayItem name="icon" lang="en">gs3demo.gif</displayItem>
     500    <displayItem name="smallicon" lang="en">gs3demosm.gif</displayItem>
     501    <displayItem name="description" lang="fr">Il s'agit d'une collection
     502      de démonstration pour le logiciel Greenstone. Elle contient
     503      seulement un petit échantillon des Bibliothèques humanitaires
     504      pour le Développement (11 documents).</displayItem>
    497505    <displayItem name="description" lang="en">This is a demonstration
    498506      collection for the Greenstone digital library software. It contains
    499       a small subset (11 books) of the Humanity Development Library. It is
    500       built with mgpp.</displayItem>
    501     <displayItem name="name" lang="en">greenstone mgpp demo</displayItem>
    502     <displayItem name="icon" lang="en">mgppdemo.gif</displayItem>
     507      a small subset (11 books) of the Humanity Development Library. It
     508      is built with mg using Greenstone 3 native building.</displayItem>
    503509  </displayItemList>
    504   <search type='mgpp'>
    505     <index name="idx"/>
     510  <search type='mg'>
     511    <index name="i1">
     512      <field>text</field>
     513      <level>document</level>
     514      <displayItem name='name' lang="en">entire documents</displayItem>
     515      <displayItem name='name' lang="fr">documents entiers</displayItem>
     516      <displayItem name='name' lang="es">documentos enteros</displayItem>
     517    </index>
     518    <index name="i2">
     519      <field>text</field>
     520      <level>section</level>
     521      <displayItem name='name' lang="en">chapters</displayItem>
     522      <displayItem name='name' lang="fr">chapitres</displayItem>
     523      <displayItem name='name' lang="es">capítulos</displayItem>
     524    </index>
    506525    <format>
    507526      <gsf:template match="documentNode">
    508527        <td valign='top'><gsf:link><gsf:icon/></gsf:link></td>
    509         <td><gsf:metadata name='Title' select='ancestors'
    510           separator=': '/>: <gsf:link><gsf:metadata name='Title' />
    511           </gsf:link></td>
     528        <td><gsf:metadata name='Title' /></td>
    512529      </gsf:template>
    513530    </format>
     
    528545</collectionConfig>
    529546\end{verbatim}\end{gsc}
    530 \caption{Sample collectionConfig.xml file (mgppdemo collection)}
     547[TODO: add in building instructions for the classifiers]
     548\caption{Sample collectionConfig.xml file (gs3demo collection)}
    531549\label{fig:collconfig}
    532550\end{figure}
    533 [TODO: add in building istructions for the config file]
    534551
    535552The \gst{<metadataList>} element specifies some collection metadata, such as creator. The \gst{<displayItemList>} specifies some language dependent information that is used for collection display, such as collection name and short description. These displayItem elements can be specified in different languages. If languages other than English are used, the configuration file should be encoded in UTF-8.
     
    539556Search indexes appear as individual \gst{<index>} elements within the \gst{<search>} element. Some choices for the index are made using attributes of the element itself, and some through child elements. 
    540557
    541 Each index must have a unique name, which is used to identify it within Greenstone  The name is given as an attribute of the \gst{<index>} element. 
     558Each index must have a unique name, which is used to identify it within \gsiii\   The name is given as an attribute of the \gst{<index>} element. 
    542559
    543560The other choices are described using child elements of \gst{<index>}.  The \gst{<level>} tag indicates the index level and the \gst{<field>} tag the text to be used.  The \gst{<level>} tag can contain one of document, section or paragraph, while the \gst{<field>} tag can contain ``text'' or the name of a metadata field.  If the \gst{<level>} tag is omitted, the default setting is to index by document, and if the \gst{<field>} tag is omitted, the default setting is to index the document text.
    544561
    545562Example index specifications include:
     563
     564[NOTE: I think we shouldn't have default level and field and that it must be specified--kjdon]
    546565
    547566To index only the title of each separate document in the collection:
     
    550569  <level>document</level>
    551570  <field>dc:title</field>
    552   <displayItem name='name' lang="en">entire documents</displayItem>
    553   <displayItem name='name' lang="fr">documents entiers</displayItem>
    554   <displayItem name='name' lang="es">documentos enteros</displayItem>
    555571</index>
    556572\end{verbatim}\end{gsc}
     
    559575Alternatively, to index the full document texts by section:
    560576\begin{gsc}\begin{verbatim}
    561 <index name="stx" type=''mgpp''>
     577<index name="stx">
    562578  <level>section</level>
    563   <displayItem name='name' lang="en">entire documents</displayItem>
    564   <displayItem name='name' lang="fr">documents entiers</displayItem>
    565   <displayItem name='name' lang="es">documentos enteros</displayItem>     
    566579</index>
    567580\end{verbatim}\end{gsc}
    568581...or...
    569582\begin{gsc}\begin{verbatim}
    570 <index name="stx" type=''mg''>
     583<index name="stx">
    571584  <level>section</level>
    572585  <field>text</field>
    573   <displayItem name='name' lang="en">entire documents</displayItem>
    574   <displayItem name='name' lang="fr">documents entiers</displayItem>
    575   <displayItem name='name' lang="es">documentos enteros</displayItem>
    576586</index>
    577587\end{verbatim}\end{gsc}
    578 ...in the first example, the \gst{<field>} tag is not explicitly defined, and would default to 'text', whereas it is explicitly set to 'text' in the second example.  Note the different indexer selected for these two indexes.  As they are of the same name, they should not appear in the same \gst{collectionConfig.xml} file.
    579 
    580 The \gst{<search>} and \gst{<browse>} elements give some formatting information about the indexes and classifiers. \gst{<displayItem>} elements are used to provide titles for the indexes or classifiers, while \gst{<format>} elements provide formatting instructions, typically for a document or classifier node in a list of results. 
    581 
    582 of the \gst{collectionConfig.xml} file, and classifications as individual \gst{<classifier>} elements within the \gst{<browse>} element.  In each case, some choices are made using attributes of the element itself, and some through child elements. 
    583 Moving onto \gst{<classifier>} items, the format is broadly similar to \gst{<index>} items, but with a couple of different choices.  Firstly, each classifier should have a ``name'' and ``type'' attribute as with \gst{<index>} tags.  In the case of \gst{<classifier>} items the ``type'' attribute identifies the type of classifier it is.  At present, this should either be ``Hierarchy'' or ``AZList''. 
     588...in the first example, the \gst{<field>} tag is not explicitly defined, and would default to 'text', whereas it is explicitly set to 'text' in the second example. As they are of the same name, they should not appear in the same \gst{collectionConfig.xml} file.
     589
     590Moving onto \gst{<classifier>} items, the format is broadly similar to \gst{<index>} items, but with a couple of different choices.  Firstly, each classifier should have ``name'' and ``type'' attributes.  In the case of \gst{<classifier>} items the ``type'' attribute identifies the type of classifier it is.  At present, this should either be ``Hierarchy'' or ``AZList''. 
    584591
    585592The remaining choices for the classifier should follow as child elements of the \gst{<classifier>} element.  The \gst{<file>} element should contain the name of the file that describes the classifier as its ``URL'' attribute.  The format of this file will be described later - it will vary from classifier type to classifier type.  The \gst{<field>} element identifies the name of the field to index.  More than one \gst{<field>} element may appear if two or more metadata fields are to be used with the classifier.  Finally, the \gst{<sort>} item identifies another metadata field which the items within one classifier node are to be ordered.  Unlike the \gst{<index>} element, the \gst{<classifier>} element does not have default, assumed values for its children.
     
    654661\subsection{Formatting the collection}\label{sec:formatstmt}
    655662
    656 format statements. and displayItem stuff. advanced collection design.\\
    657 
    658 Part of collection design involves deciding how the collection should look. Greenstone has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
    659 
    660 In standard greenstone, the library is served to a web browser by a servlet, and the html is generated using XSLT. XSLT templates are used to format all the parts of the pages. Some commonly overwritten templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
     663Part of collection design involves deciding how the collection should look. \gsiii\  has a default 'look' for a collection, so this is optional. However, the default may not suit the purposes of some collections, so many parts to the look of a collection can be determined by the collection designer.
     664
     665In standard \gsiii\ , the library is served to a web browser by a servlet, and the html is generated using XSLT. XSLT templates are used to format all the parts of the pages. Some commonly overwritten templates are those for formatting lists: search results list, classifier browsing hierarchies, and for parts of the document display.
    661666
    662667Real XSLT templates for formatting search results or classifier lists are quite complicated, and not at all easy for a new user to write. For example, the following is a sample template for formatting a classifier list, to show Keyword metadata as a link to the document.
     
    681686\end{bulletedlist}
    682687 
    683 Since XSLT is written in XML, we can use XSLT to transform XML into XSLT. Greenstone provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. Table~\ref{tab:gsf-format} shows the set of 'gsf' (greenstone format) elements. If you have come from a Greenstone 2 background, Appendix~\ref{app:format} shows Greenstone 2 format elements and their equivalents in Greenstone 3.
     688Since XSLT is written in XML, we can use XSLT to transform XML into XSLT. \gsiii\  provides a simplified set of formatting commands, written in XML, which will be transformed into proper XSLT. Table~\ref{tab:gsf-format} shows the set of 'gsf' (Greenstone Format) elements. If you have come from a \gsii\  background, Appendix~\ref{app:format} shows \gsii\  format elements and their equivalents in \gsiii\ .
    684689 
    685690\begin{table}
     
    692697\hline
    693698\gst{<gsf:text/>} & The document's text\\
     699\hline
    694700\gst{<gsf:link>...</gsf:link>} & The HTML link to the document itself \\
    695701\gst{<gsf:link type='document'>...
     
    699705\gst{<gsf:link type='source'>...
    700706</gsf:link>} & The HTML link to the original file---set for documents that have been converted from e.g. Word, PDF, PS \\
     707\hline
    701708\gst{<gsf:icon/>}  & An appropriate icon\\
    702709\gst{<gsf:icon type='document'/>} & same as above\\
    703710\gst{<gsf:icon type='classifier'/>} & bookshelf icon for classification nodes\\
    704711\gst{<gsf:icon type='source'/>} & An appropriate icon for the original file e.g. Word, PDF icon\\
     712\hline
    705713\gst{<gsf:metadata name='Title'/>} & The value of a metadata element for the current document or section, in this case, Title\\
    706714\gst{<gsf:metadata name='Title' select='select-type' [separator='y' multiple='true']/>} & A more extended selection of metadata values. The select field can be one of those shown in Table~\ref{tab:gsf-select-types}. There are two optional attributes: separator gives a String that will be used to separate the fields, default is ``, ``, and if multiple is set to true, looks for multiple values at each section.\\
    707 
     715\hline
    708716\gst{<gsf:choose-metadata>
    709717  <gsf:metadata name='metaA'/>
     
    712720</gsf:choose-metadata>}
    713721 & A choice of metadata. Will select the first existing one. the metadata elements can have the select, separator and multiple attributes like normal.\\
     722\hline
    714723\gst{<gsf:switch preprocess=
    715724'preprocess-type'>
     
    882891The interface language can be changed by going to the preferences page, and choosing a language from the list. The list lists (:-)) all languages in which the interface has been defined  so far.
    883892
    884 It is easy to add a new interface language to greenstone.  Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where `name' is the interface name). Each service class has one with the same name as the class (e.g. GS2Search.properties). To add another language all of the base .properties  files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties.
     893It is easy to add a new interface language to \gs\ .  Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in resources/java. Each interface has one named interface\_name.properties (where `name' is the interface name). Each service class has one with the same name as the class (e.g. GS2Search.properties). To add another language all of the base .properties  files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of interface\_default.properties would be named interface\_default\_fr.properties.
    885894
    886895Keys will be looked up in the properties file closest to the specified language. For example, if language fr\_CA was specified (french language, country Canada), and the default locale was en\_GB,  java would look at properties files in the following order, until it found the key: XXX\_fr\_CA.properties, XXX\_fr.properties,  XXX\_en\_GB.properties, then XXX\_en.properties, and finally the default XXX.properties.
    887896
    888 You can tell Greenstone about a new language by adding it in to the languageList in the interfaceConfig.xml file. This will add it in to the list of languages on the preferences page. Modification of this file requires a restart of the Tomcat server for the changes to be recognised.
     897You can tell \gs\ about a new language by adding it in to the languageList in the interfaceConfig.xml file. This will add it in to the list of languages on the preferences page. Modification of this file requires a restart of the Tomcat server for the changes to be recognised.
    889898
    890899
    891900\subsubsection{Modifying an existing interface}
    892901
    893 Most of an interface is defined by XSLT files, which are stored in \$GSDL3HOME/\-web/\-interfaces/\-interface-name/\-transform. These can be changed and the changes will take affect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following  order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same greenstone installation as the interface.) This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the java code and full paths added based on availability of the files, so that the correct one is used.
     902Most of an interface is defined by XSLT files, which are stored in \$GSDL3HOME/\-web/\-interfaces/\-interface-name/\-transform. These can be changed and the changes will take affect straight away. If changes only apply to certain collections or sites, not everything that uses the interface, you can override some of the files by putting new ones in a different place. XSLT files are looked for in the following  order: collection, site, interface, default interface. (This currently only apples to sites, and therefore collections, that reside in the same \gs\ installation as the interface.) This also applies to files that are included from other XSLT files. For example the query.xsl for the query pages includes a file called querytools.xsl. To have a particular site show a different query interface either of these files may need to be modified. Creating a new version of either of these and putting it in the site transform directory will work. Either the new query.xsl will include the default querytools, or the default query.xsl will include the new querytools.xsl. The xsl:include directives are preprocessed by the java code and full paths added based on availability of the files, so that the correct one is used.
    894903
    895904Note that you cannot include a file with the same name as the including file. For example query.xsl cannot include query.xsl (it is tempting to want to do this if you just want to change one template for a particular file, and then include the default. but you cant).
     
    904913
    905914\newpage
    906 \section{Developing Greenstone 3: Run-time system}\label{sec:develop-runtime}
    907 
     915\section{Developing \gsiii\ : Run-time system}\label{sec:develop-runtime}
     916
     917[TODO: rewrite this!!]
    908918runtime object structure diagram. describe the modules.\\
    909919class hierarchy,\\
     
    918928\subsection{Overview of modules??}
    919929
    920 A Greenstone3 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system.
     930A \gsiii\ 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system.
    921931
    922932\begin{figure}[t]
     
    945955
    946956We use the Tomcat web server, which operates either stand-alone in a test mode
    947 or in conjunction with the Apache web server.  The Greenstone LibraryServlet
     957or in conjunction with the Apache web server.  The \gs\ LibraryServlet
    948958class is loaded by Tomcat  and the servlet's \gst{init()} method is called.  Each time a
    949959\gst{get/put/post} (etc.) is used, a new thread is started and
     
    975985\subsection{Message passing}
    976986
    977 Action in Greenstone 3 is originated by a request coming in from the outside. In the standard web-based greenstone, this comes from a servlet into the receptionist. This external type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request. Messages inside the system all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned.
    978 
    979 When a page request comes in to the Receptionist, it looks at the action attribute to determine which action to send it to. The response is returned from the action.The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the servlet greenstone we transform using XSLT to generate html pages which get returned to the servlet.
     987Action in \gsiii\  is originated by a request coming in from the outside. In the standard web-based \gs\ , this comes from a servlet into the receptionist. This external type request is a request for a page of data, and contains a representation of the CGI style arguments. A page of XML is returned, which can be in HTML format or other depending on the output parameter to the request. Messages inside the system all follow the same basic format: message elements contain multiple request elements, or multiple response elements. Messaging is all synchronous. The same number of responses as requests will be returned.
     988
     989When a page request comes in to the Receptionist, it looks at the action attribute to determine which action to send it to. The response is returned from the action.The page that the receptionist returns contains the original request, the response from the action and other info as needed (depends on the type of Receptionist). The data may be transformed in some way --- for the servlet \gs\ we transform using XSLT to generate html pages which get returned to the servlet.
    980990
    981991Actions send internal style messages to the MessageRouter. Some can be answered by it, others are passed on to collections, and maybe on to services. Internal requests are for simple actions, such as search, retrieve metadata, retrieve document text
     
    989999
    9901000request:
    991 These are the special 'external'-style messages. Requests originate from outside Greenstone, for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a Greenstone URL.  The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
     1001These are the special 'external'-style messages. Requests originate from outside \gs\ , for example from a servlet, or java application. They are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a list of arguments specifying what type of page is required. If the external context is a servlet, the arguments represent the 'CGI' arguments in a \gs\ URL.  The two main arguments are \gst{a} (action) and \gst{sa} (subaction). All other arguments are encoded as parameters.
    9921002
    9931003Here are some examples of  requests\footnote{In a servlet context, these correspond to the URLs \gst{a=p\&sa=about\&c=demo\&l=fr}, and \gst{a=q\&l=en\&s=TextQuery\&c=demo\&rt=r\&ca=0\&st=1\&m=10\&q=snail}.}:
     
    10431053\hline
    10441054\end{tabular}}
    1045 \caption{Generic arguments that can appear in a Greenstone URL}
     1055\caption{Generic arguments that can appear in a \gs\ URL}
    10461056\label{tab:args}
    10471057\end{table}
     
    13591369
    13601370\begin{table}
    1361 \caption{Status codes currently used in Greenstone 3}
     1371\caption{Status codes currently used in \gsiii\ }
    13621372\label{tab:status codes}
    13631373{\footnotesize
     
    17571767* talk general first: get data, get format info, transform gsf->xsl. transfrom xml->html
    17581768
    1759 * state saving. the XSLT files assume that arguments are saved somehow. This needs to be implemented outside Greenstone proper - we do this in the servlet, using something or other.
    1760 
    1761 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the CGI-arguments to determine what requests need to be made to the system.
     1769* state saving. the XSLT files assume that arguments are saved somehow. This needs to be implemented outside \gs\ proper - we do this in the servlet, using something or other.
     1770
     1771URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:page-requests}, the requests are XML representations of \gs\ URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the CGI-arguments to determine what requests need to be made to the system.
    17621772System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module.
    17631773
     
    17881798\subsubsection{Receptionists}\label{sec:recepts}
    17891799
    1790 The receptionist is the controlling module for the page generation part of greenstone. It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages.
    1791 
    1792 There are different ways of providing an interface to greenstone, from web based CGI style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.
     1800The receptionist is the controlling module for the page generation part of \gs\ . It has the job of loading up all the actions, and it knows about the message router it and the actions are supposed to talk to. It routes messages received to the appropriate action (page-type messages) or directly to the message router (all other types). Receptionists also do other things, for example, adding to the page received back from the action any information that is common to all pages.
     1801
     1802There are different ways of providing an interface to \gs\ , from web based CGI style (using servlets) to Java GUI applications. These different interfaces require slightly different responses from a receptionist, so we provide several standard types of receptionist.
    17931803
    17941804Receptionist: This is the most basic receptionist. The page it returns consists of the original request, and the response from the action it was sent to. Methods preProcessRequest, and postProcessPage are called on the request and page, respectively, but in this basic receptionist, they don't do anything.
     
    17981808WebReceptionist: The WebReceptionist extends TransformingReceptionist. It doesn't do much else except some argument conversion. To keep the URLs short, parameters from the services are given shortnames, and these are used in the web pages.
    17991809
    1800 DefaultReceptionist: This extends WebReceptionist, and is the default one for greenstone 3 servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. The receptionist sends a describe request to the collection to get this, and appends it to the page before transformation using XSLT.
     1810DefaultReceptionist: This extends WebReceptionist, and is the default one for \gsiii\ servlets. Due to the page design, some extra information is needed for each page: some metadata about the current collection. The receptionist sends a describe request to the collection to get this, and appends it to the page before transformation using XSLT.
    18011811
    18021812NZDLReceptionist: (do we want to talk about this?) This is an example of a custom receptionist. For a look-alike nzdl.org system, even more information is needed for each page, namely the list of classifiers available from the ClassifierBrowse service.
     
    18571867indicates a request to the service itself. The extra arguments (not a, sa, sn, c)  are simply copied into the
    18581868request as parameters. The response is in a form suitable for the applet, placed inside
    1859 \gst{<appletData>} in a standard Greenstone message.  AppletAction returns the
     1869\gst{<appletData>} in a standard \gs\ message.  AppletAction returns the
    18601870contents of appletData to the browser, i.e. to the applet itself.
    18611871
     
    19061916
    19071917\subsubsection{Some class info - where should this go??}
    1908 \begin{table}
     1918\begin{table}[h]
    19091919\caption{The utility classes in org.greenstone.gsdl3.util}
    19101920\label{tab:utils}
     
    19171927Dictionary & wrapper around a Resource Bundle, providing strings with parameter\\
    19181928GSCGI & class to map between short name CGI arguments and long name request parameters \\
    1919 GSFile & class to create all Greenstone file paths e.g. used to locate configuration files, XSLT files and collection data. \\
     1929GSFile & class to create all \gs\ file paths e.g. used to locate configuration files, XSLT files and collection data. \\
    19201930GSHTML & provides convenience methods for dealing with HTML, e.g. making strings HTML safe\\
    19211931GSPath & used to create, examine and modify message address paths\\
    19221932GSStatus & some static codes for status messages\\
    1923 GSXML & lots of methods for extracting information out of Greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by Greenstone.\\
    1924 GSXSLT & some manipulation functions for Greenstone XSLT\\
     1933GSXML & lots of methods for extracting information out of \gs\  XML, and creating some common types of elements. Also has static Strings for element and attribute names used by \gs\ .\\
     1934GSXSLT & some manipulation functions for \gs\ XSLT\\
    19251935Misc & miscellaneous functions\\
    1926 OID & class to handle Greenstone (2) OIDs\\
     1936OID & class to handle \gs\ (2) OIDs\\
    19271937XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\
    19281938XMLTransformer & methods to transform XML using XSLT \\
     
    19401950
    19411951\newpage
    1942 \section{Developing Greenstone 3: Adding new features}\label{sec:new-features}
     1952\section{Developing \gsiii\ : Adding new features}\label{sec:new-features}
    19431953
    19441954\subsection{Creating new services}\label{sec:new-services}
     
    19691979\subsection{New types of collections}\label{sec:new-coll-types}
    19701980
    1971 There are two types of standard Greenstone collections: collections built with the Greenstone 3 building system, and collections that are imported from Greenstone 2. There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. Greenstone 3 has an ability to use any type of collection you can come up with, assuming  some java code is provided.
    1972 
    1973 
    1974 There are four levels of customisation that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with Greenstone to describe these different levels.
    1975 
    1976 Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the Greenstone 2 MGPP collections were the first to be served in Greenstone 3. When we came to do Greenstone 2 MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
     1981There are two types of standard \gs\  collections: collections built with the \gsiii\  building system, and collections that are imported from \gsii\ . There are many options to collection building but it is conceivable that these options don't meet the needs of all collection builders. \gsiii\ has an ability to use any type of collection you can come up with, assuming  some java code is provided.
     1982
     1983
     1984There are four levels of customisation that may be needed with new collections: service, collection, interface XSLT, and action levels. We will use the example collections that come with \gs\ to describe these different levels.
     1985
     1986Firstly, new service classes need to be written to provide the functionality to search/browse/whatever the collection. If the services have similar interfaces and functionality to the standard services, this may be all that is needed. For example, the \gsii\  MGPP collections were the first to be served in \gsiii\ . When we came to do \gsii\ MG collections, all we had to do was write some new service classes that interacted with MG instead of MGPP. Because these collections used the same type of services, this was all we had to do. The format of the configuration files was similar, they just specified MG serviceRack classes rather than MGPP ones.
    19771987
    19781988The nzmaps collection used the same level of customisation, just implementing new services and fitting all the extra display elements into the standard query/display framework using javascript.
     
    19801990The gberg collection, however, was done quite differently to the standard collections. New services were provided to search the database (built with Lucene) and to provide the documents and parts of documents (using XSLT to transform the raw XML files). The collectionConfig file had some extra information in it: a list of the documents in the collection along with their Titles. Because the standard collection class has no notion of document lists, a new class was created (org.greenstone.gsdl3.collection.XMLCollection). This class is basically the same as a standard collection class except that it looks for and stores in memory the documentList from the collectionConfig file.
    19811991
    1982 To tell Greenstone to load up a different type of collection class, we use another configuration file: etc/collectionInit.xml. This  specifies the name of the collection class to use.
     1992To tell \gs\ to load up a different type of collection class, we use another configuration file: etc/collectionInit.xml. This  specifies the name of the collection class to use.
    19831993Currently, this is all that is specified in that file, but you may want to add parameters for the class etc.
    19841994
    19851995\gst{<collectionInit class="XMLCollection"/>}
    19861996
    1987 The display for the collection is also quite different. The home page for the collection  displays the list of documents. To achieve this, the describe response from the collection had to include the list, and a new XSLT was written for the collection that displayed this. Collection XSLT should be put in the transform directory of the collection\footnote{These are currently only used when running greenstone in a non-distributed fashion, but it will be added in properly at some stage}.
    1988 
    1989 Document display is  significantly different to standard greenstone. There are two modes of display: table of contents mode, and content mode. Clicking on a document link from the collection home page takes the user to the table of contents for the collection. Clicking on one of the sections in the table of contents takes them to a display of that section. To facilitate this, not only do we need new XSLT files , we also needed a new action. XMLDocumentAction was created, that used two subactions, toc and text, for the different modes of display.
     1997The display for the collection is also quite different. The home page for the collection  displays the list of documents. To achieve this, the describe response from the collection had to include the list, and a new XSLT was written for the collection that displayed this. Collection XSLT should be put in the transform directory of the collection\footnote{These are currently only used when running \gs\ in a non-distributed fashion, but it will be added in properly at some stage}.
     1998
     1999Document display is  significantly different to standard \gs\ . There are two modes of display: table of contents mode, and content mode. Clicking on a document link from the collection home page takes the user to the table of contents for the collection. Clicking on one of the sections in the table of contents takes them to a display of that section. To facilitate this, not only do we need new XSLT files , we also needed a new action. XMLDocumentAction was created, that used two subactions, toc and text, for the different modes of display.
    19902000
    19912001The Receptionist was told about this new action by the addition of the following to the interfaceConfig.xml file:
     
    20292039\end{verbatim}\end{gsc}
    20302040
    2031 Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not greenstone format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with greenstone format elements.
    2032 
    2033 The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to may all the links absolute links to files in the collection folder, the other option is to put them in Greenstone's DTD folder gsdl3/resources/dtd.
     2041Instead of displaying an icon and the Title, it displays the Title of the section and the title of the document. Both of these are linked to the document: the section title to the content of that section, the document title to the table of contents for the document. Because these require non-standard arguments to the library, these parts of the template are written in XSLT not \gs\  format language. As is shown here it is perfectly feasible to write a format statement that includes XSLT mixed in with \gs\ format elements.
     2042
     2043The document display uses CSS to format the output---these are kept in the collection and specified in the collections XSLT files. The documents also specify DTD files. Due to the way we read in the XML files, Tomcat sometimes has trouble locating the DTDs. One option is to may all the links absolute links to files in the collection folder, the other option is to put them in \gs\ 's DTD folder gsdl3/resources/dtd.
    20342044
    20352045\subsection{The NZDL mirror site}
    20362046
    2037 The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using Greenstone 3 instead of Greenstone 2. It uses a new site and a new interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and interface.
    2038 
    2039 The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because its running on Linux, we were able to link to all the collections in the old greenstone installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
     2047The library seen at \gst{http://www.greenstone.org/greenstone3/nzdl} is like a mirror to \gst{http://www.nzdl.org}---it aims to present the same collections, in the same way but using \gsiii\  instead of \gsii\ . It uses a new site and a new interface. The web.xml file had a new servlet entry in it to specify the combination of nzdl site and interface.
     2048
     2049The site was created by making a directory called nzdl in the sites folder. A siteConfig file was created. Because its running on Linux, we were able to link to all the collections in the old \gs\ installation. The convert\_coll\_from\_gs2.pl script was run over all the collections to produce the new XML configuration files.
    20402050
    20412051A new interface, also called nzdl, was created in the interfaces directory.
    20422052In many cases, creating a new interface just requires the new images and XSLT  to be added to the new directory(see Sections~\ref{sec:sites-and-ints} and \ref{sec:interface-customise}). This setup also required a bit more customisation.
    20432053
    2044 The standard Greenstone navigation bar lists all the services available for the collection. In Greenstone 2, the navigation bar provided the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produced the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
     2054The standard \gs\  navigation bar lists all the services available for the collection. In \gsii\ , the navigation bar provided the search option, and the different classifiers. This is not service specific, but hard coded to the search and classifiers. The XSLT that produced the navigation bar needed to be altered to produce this. But also, a new Receptionist was needed.
    20452055The standard receptionist (DefaultReceptionist) gathers a little bit of extra info for each page of XML before transforming it: this is the list of services for the collection and their display information, allowing the services to be listed along the navigation bar. This is information that is needed by every page (except for the library home page) and therefore is obtained by the receptionist instead of by each action. The nzdl interface needed a bit more information than this: for the ClassifierBrowse service, if there was one, the list of classifiers and their display elements must be obtained. So a new Receptionist was written that inherited from DefaultReceptionist, and added this new info into the page.
    20462056
     
    20492059
    20502060\newpage
    2051 \section{Distributed Greenstone}\label{sec:distributed}
    2052 
    2053 Greenstone is designed to run in a distributed fashion. One greenstone installation can talk to several sites on different computers. This requires some sort of communication protocol. Any protocol can be used, however we have only implemented a simple SOAP protocol.
     2061\section{Distributed \gs\ }\label{sec:distributed}
     2062
     2063\gs\  is designed to run in a distributed fashion. One \gs\ installation can talk to several sites on different computers. This requires some sort of communication protocol. Any protocol can be used, however we have only implemented a simple SOAP protocol.
    20542064
    20552065more explanation..
     
    20632073
    20642074We have used Apache SOAP for Java. This is run as a servlet in Tomcat.
    2065 If you have obtained Greenstone through CVS, you will need to install soap separately, describe in Appendix~\ref{app:soap-cvs}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
     2075If you have obtained \gs\ through CVS, you will need to install soap separately, describe in Appendix~\ref{app:soap-cvs}. Debugging soap is described in Appendix~\ref{app:soap-debug}.
    20662076
    20672077\subsection{Serving a site using soap}
     
    20712081
    20722082\newpage
    2073 \section{Using Greenstone 3 from CVS}\label{app:cvs}
     2083\section{Using \gsiii\ from CVS}\label{app:cvs}
    20742084
    20752085*** need to make sure building stuff is in here ***
    20762086
    2077 Greenstone 3 is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes.
     2087\gsiii\ is also available via CVS. You can download the latest version of the code. This is not guaranteed to be stable, in fact it is likely to be unstable. The advantage of using CVS is that you can update the code and get the latest fixes.
    20782088
    20792089Note that you will need the Java 2 SDK, version 1.4.0 or higher.
    20802090
    2081 To check out the greenstone code, use:
     2091To check out the \gs\ code, use:
    20822092
    20832093\begin{quote}\begin{gsc}\begin{verbatim}
     
    20922102\subsection{Linux install}
    20932103
    2094 An install.sh script is provided to compile and install Greenstone3. What you need to do is:
     2104An install.sh script is provided to compile and install \gsiii\ . What you need to do is:
    20952105
    20962106\begin{quote}\begin{gsc}
     
    21342144Run gs3-finalise.bat\\
    21352145
    2136 To run Greenstone, run gs3-launch.bat. This will start the Tomcat server in a new DOS window (stop it by closing the window), and open a broser window showing the Greenstone 3 homepage.
     2146To run \gs\ , run gs3-launch.bat. This will start the Tomcat server in a new DOS window (stop it by closing the window), and open a broser window showing the \gsiii\  homepage.
     2147
     2148\subsection{Creating a distribution}
     2149
     2150The installation scripts have been set up in such a way that it is easy to create different distribution types (for linux). To create a standard binary distribution, carry out the following steps:
     2151
     2152\begin{gsc}\begin{verbatim}
     2153cvs co gsdl3
     2154cd gsdl3
     2155source gs3-setup.sh
     2156./gs3-prepare.sh
     2157./gs3-configure.sh
     2158./gs3-compile.sh
     2159./gs3-for-distribution.sh
     2160
     2161mv Header ../
     2162cd ../
     2163tar czvf gsdl3.tgz gsdl3/
     2164cat Header gsdl3.tgz > gsdl3-x.xx-unix.sh
     2165\end{verbatim}\end{gsc}
     2166
     2167Note that gs3-for-distribution.sh removes some files that are not needed for the distribution, including all the CVS directories. Once you have run this, you will no longer be able to update your gsdl3 code via cvs.
     2168
     2169To create a source distribution, you can do:
     2170\begin{gsc}\begin{verbatim}
     2171cvs co gsdl3
     2172cd gsdl3
     2173source gs3-setup.sh
     2174./gs3-prepare.sh
     2175<delete unnecessary files>
     2176cd ../
     2177tar czvf gsdl3-x.xx-src.tgz gsdl3/
     2178\end{verbatim}\end{gsc}
     2179
     2180Some of the gs3-for-distribution script will need to be run (at the stage of delete unnecessary files), and there needs to be instructions on what to do when someone downloads the source distro.
     2181 
     2182I think it would be:
     2183\begin{gsc}\begin{verbatim}
     2184tar xzvf gsdl3-x.xx-src.tgz
     2185cd gsdl3
     2186source gs3-setup.sh
     2187./gs3-configure.sh
     2188./gs3-compile.sh
     2189./gs3-finalise.sh
     2190\end{verbatim}\end{gsc}
     2191
    21372192
    21382193\newpage
    21392194\section{Tomcat}\label{app:tomcat}
    21402195
    2141 Tomcat is a servlet container. It is used to serve a Greenstone site using a servlet.
    2142 
    2143 The file \gst{\gsdlhome/comms/jakarta/tomcat/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \\
     2196Tomcat is a servlet container. It is used to serve a \gs\ site using a servlet.
     2197
     2198The file \gst{\gsdlhome/comms/jakarta/tomcat/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for \gsiii\ servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \\
    21442199\gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}.
    21452200
     
    21882243\end{gsc}\end{quote}
    21892244
    2190 In our example, the greenstone 3 servlet can be accessed at \gst{http://www.greenstone.org/greenstone3/library}, instead of at \gst{http://puka.cs.waikato.ac.nz:8080/gsdl3/library}, which is not publically accessible.
     2245In our example, the \gsiii\ servlet can be accessed at \gst{http://www.greenstone.org/greenstone3/library}, instead of at \gst{http://puka.cs.waikato.ac.nz:8080/gsdl3/library}, which is not publically accessible.
    21912246
    21922247\subsection{Running Tomcat behind a proxy}
     
    21992254\subsection{Setting up SOAP from CVS}\label{app:soap-cvs}
    22002255
    2201 If you have obtained greenstone through CVS, you will need to install the SOAP stuff by running:
     2256If you have obtained \gs\ through CVS, you will need to install the SOAP stuff by running:
    22022257
    22032258\begin{quote}\begin{gsc}
     
    22432298\end{quote}
    22442299
    2245 8070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify Greenstone to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
     23008070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify \gs\ to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
    22462301\begin{quote}\begin{gsc}\begin{verbatim}
    22472302<site name="org.greenstone.localsite"
     
    22552310
    22562311\newpage
    2257 \section{Format statements: Greenstone 2 vs Greenstone 3}\label{app:format}
    2258 The following table shows the Greenstone 2 format elements, and their equivalents in Greenstone 3
    2259 \begin{table}
    2260 \caption{Greenstone 3 equivalents of Greenstone 2 format statements}
     2312\section{Format statements: \gsii\  vs \gsiii\ }\label{app:format}
     2313The following table shows the \gsii\  format elements, and their equivalents in \gsiii\
     2314\begin{table}[h]
     2315\caption{\gsiii\  equivalents of \gsii\ format statements}
    22612316{\footnotesize
    22622317\begin{tabular}{ll}
    22632318\hline
    2264 \bf Greenstone 2        & \bf Greenstone 3 \\
     2319\bf \gsii\         & \bf \gsiii\ \\
    22652320\hline
    22662321\gst{[Text]} & \gst{<gsf:text/>} \\
Note: See TracChangeset for help on using the changeset viewer.