Changeset 7861


Ignore:
Timestamp:
2004-08-04T17:23:56+12:00 (20 years ago)
Author:
kjdon
Message:

more changes

Location:
trunk/gsdl3/docs/manual
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/docs/manual/manual.tex

    r7826 r7861  
    5858A description of the general design and architecture of \gsiii\  is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory).
    5959
    60 This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs\ , such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
     60This documentation consists of several parts. Section~\ref{sec:install} is for administrators, and covers \gsiii\  installation, how to access the library, and some administration issues. Section~\ref{sec:user} is for users of the software, and looks at using the sample collections, creating new collections, and how to make small customisations to the interface. The remaining sections are aimed towards  the \gs\  developer. Section~\ref{sec:develop-runtime} describes the run-time system, including the structure of the software, and the message format, while Section~\ref{sec:develop-build} describes the collection building process. Section~\ref{sec:new-features} describes how to add new features to \gs, such as how to add new services, new page types, new plugins for different document formats.  Section~\ref{sec:distributed} describes how to make \gs\  run in a distributed fashion, using SOAP as an example communications protocol. Finally, there are several appendices, including how to install \gs\  from CVS, some notes on Tomcat and SOAP, and a comparison of \gsii\  and \gsiii\  format statements.
    6161\newpage
    6262\section{\gs\  installation and administration}\label{sec:install}
    6363
    64 This section covers where to get \gsiii\  from, how to install it and how to run it. The standard method of running \gsiii\  is as a Java servlet. We provide the Tomcat servlet container to serve the servlet :-). Standard web servers may  be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access \gsiii\ , Tomcat must be started up, and then it can be accessed via a web browser.
     64This section covers where to get \gsiii\  from, how to install it and how to run it. The standard method of running \gsiii\  is as a Java servlet. We provide the Tomcat servlet container to run the servlet. Standard web servers may  be able to be configured to provide servlet support, and thereby remove the need to use Tomcat. Please see your web server documentation for this. This documentation assumes that you are using Tomcat. To access \gsiii, Tomcat must be started up, and then it can be accessed via a web browser.
    6565
    6666
    6767\subsection{Get and install \gs\ }
    6868
    69 \gsiii\  is available from \gst{http://www.greenstone.org/greenstone3}. There are currently two distributions: a self-installing tar for Linux, and a Windows executable.
     69\gsiii\  is available from \gst{http://www.greenstone.org/greenstone3}. There are currently two releases: one of Linux, one for Windows. They were built using InstallShieldX, a new multi-platform installer software. This uses Java and is quite slow.
    7070
    7171\gsiii\  is also available through CVS (Concurrent Versioning System). This provides the latest development version, and is not guaranteed to be stable. Appendix~\ref{app:cvs} describes how to download and install \gsiii\  from CVS.
     
    7373\subsubsection{Linux}
    7474
    75 Download the latest version of the self-installing tar file, \gst{gsdl3-x.xx-unix.sh}, and run it in a shell (\gst{./gsdl3-x.xx-unix.sh}). \gsiii\  will be installed into a directory called \gst{gsdl3} inside the current directory. The install script will prompt you for  the name of your computer and what port to run Tomcat on (the defaults being \gst{localhost} and \gst{8080}).  Once \gsiii\  has been installed, you can start the library  by running \gst{./gsdl3/gs3-launch.sh}, and opening up a browser pointing to \gst{http://localhost:8080/gsdl3} (substituting your chosen name and port if necessary).
     75Download the latest version of the installer, \gst{gsdl3-x.xx-linux}, and run it in a shell (\gst{./gsdl3-x.xx-linux}). The installation process will prompt you for the installation directory, the name of your computer and what port to run Tomcat on (the defaults being \gst{localhost} and \gst{8080}).  Once \gsiii\  has been installed, you can start the library  by running \gst{.gs3-launch.sh} from the gsdl3 directory, and opening up a browser pointing to \gst{http://localhost:8080/gsdl3} (substituting your chosen name and port if necessary).
    7676
    7777\subsubsection{Windows}
    7878
    79 Download the latest Windows executable, \gst{gsdl3-x.xx-win32.exe}, and double click it to start the installation. You will be prompted for your computer name and the port number to run Tomcat on (defaults are \gst{localhost} and \gst{8080}). Once \gsiii\  is installed, you can access the library by selecting \gst{Greenstone Digital Library 3} in the Start menu.
    80 
    81 \subsubsection{Accessing the library in a browser}
    82 
    83 Once you have started up the library (see the previous sections for OS dependent instructions), you can access it in a browser at \gst{http://localhost:8080/gsdl3} (or \gst{http://your-computer-name:your-chosen-port/gsdl3}). This gets you to a welcome page, with three links: one to run a test servlet (this allows you to check that Tomcat is running properly), one to run the standard library servlet using the site \gst{localsite}, and one to run a library servlet using the site \gst{soapsite}. This site uses a SOAP connection to communicate with localsite, and demonstrates the library working in a distributed fashion. The SOAP connection is not enabled by default: see Section~\ref{sec:distributed} for details about how to run \gsiii\  distributedly.
     79Download the latest Windows installer, \gst{gsdl3-x.xx-win32.exe}, and double click it to start the installation. You will be prompted for the installation directory, installation type, your computer name and the port number to run Tomcat on (defaults are \gst{localhost} and \gst{8080}). Once \gsiii\  is installed, you can access the library by selecting \gst{Greenstone Digital Library 3} in the Start menu.
     80
     81\subsubsection{Accessing the library in a browser}\label{sec:browser-access}
     82
     83Once you have started up the library (see the previous sections for OS dependent instructions), you can access it in a browser at \gst{http://localhost:8080/gsdl3} (or \gst{http://your-computer-name:your-chosen-port/gsdl3}). This gets you to a welcome page containing  links to four servlets: the \gst{test} servlet (this allows you to check that Tomcat is running properly); the standard \gst{library} servlet which serves \gst{localsite} site with the \gst{default} interface; the \gst{classic} servlet which serves \gst{localsite} using the \gst{classic} or \gsii-style interface; the \gst{gateway} servlet, which serves \gst{gateway} site with the \gst{default} interface. The \gst{gateway} site uses a SOAP connection to communicate with \gst{localsite}, and demonstrates the library working in a distributed fashion.
    8484
    8585\subsection{How the library works}
     
    9191\subsubsection{Restarting the library}
    9292
    93 The library program (actually Tomcat) can be restarted in Windows by closing the window, and restarting it from the Start menu. In linux, you nned to go to the gsdl3 directory, and run \gst{gsdl3/gs3-launch.sh -shutdown}, then \gst{gsdl3/gs3-launch.sh}.
     93The library program (actually Tomcat) can be restarted in Windows by closing the window, and restarting it from the Start menu. In Linux, you need to go to the gsdl3 directory, and run \gst{./gs3-launch.sh -shutdown}, then \gst{./gs3-launch.sh}.
    9494
    9595
     
    110110Table~\ref{tab:dirs} shows the file hierarchy for \gsiii\ .
    111111The first part  shows the common stuff which can be shared between
    112 \gs\  users---the source, libraries etc. Under Linux, these can be installed into appropriate system directories. The second part shows
    113 stuff used by one person/group---their sites and interface setup (see Section~\ref{sec:sites-and-ints}).
    114 etc. There can be several sites/interfaces per installation. All the files inside the gsdl3/web directory comprise the gsdl3 context for Tomcat, and are accessible via Tomcat.
     112\gs\  users---the source, libraries etc. The second part shows the file hierarchy for the gsdl3/web directory, which comprises the gsdl3 context for Tomcat, and is accessible via Tomcat. The main directories are for sites and interfaces: there can be several sites and interfaces per installation, and they are described in the following section.
     113
    115114
    116115\begin{table}
     
    128127gsdl3/src/java/
    129128  & java source code \\
    130 gsdl3/src/cpp/
    131   & c/ cpp source code---none yet \\
    132129gsdl3/packages
    133130  & Imported packages from other systems e.g. MG, MGPP \\
     
    143140 & soap service description files \\
    144141gsdl3/resources/dtd
    145  & \gsiii\  has trouble loading DTD files sometimes. They can go here\\
     142 & \gsiii\  has trouble locating DTD files sometimes. They can go here\\
    146143gsdl3/bin
    147144  & executable stuff lives here\\
    148145gsdl3/bin/script
    149   & some Perl building scripts\\
    150 gsdl3/bin/linux
    151   & Linux executables for e.g. MGPP\\
    152 gsdl3/bin/windows
    153   & windows executables for e.g. MGPP\\
     146  & some Perl and/or shell building scripts\\
    154147gsdl3/comms
    155   & Put some stuff here for want of a better place---things to do with servers and communication. e.g. soap stuff, and Tomcat servlet container\\
     148  & Communication packages: Tomcat and SOAP\\
    156149gsdl3/docs
    157   & Documentation :-)\\
     150  & Documentation\\
    158151\hline
    159152gsdl3/web
    160   & This is where the web site is defined. Any static html files can go here. This directory is the Tomcat root directory.\\
     153  & This is where the web site is defined. Any static HTML files can go here. This directory is the Tomcat root directory.\\
    161154gsdl3/web/WEB-INF
    162155  & The web.xml file lives here (servlet configuration information for Tomcat)\\
     
    166159  & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (e.g. soap) to other sites\\
    167160gsdl3/web/sites/localsite
    168   & One site - the site configuration file lives here\\
     161  & An example site - the site configuration file lives here\\
    169162gsdl3/web/sites/localsite/collect
    170163  & The collections directory \\
     
    190183[local gs stuff (sites and interfaces) vs installed stuff (code)\\
    191184where they live, whats the difference, what each contains.]\\
    192 
    193 A site is comprised of a set of collections and possibly some site-wide services. An interface (in this web-based servlet context) is a set of images along with a set of xslt files used for translating xml output from the library into an appropriate form---html in general.
    194 
    195 One \gsiii\  installation can have many sites and interfaces, and these can be paired in different combinations.  One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance.  For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. eg HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
     185Sites and interfaces contain the content and presentation information, respectively, for the digital library.
     186A site is comprised of a set of collections and possibly some site-wide services. An interface (in this web-based servlet context) is a set of images along with a set of XSLT files used for translating xml output from the library into an appropriate form---HTML in general.
     187
     188One \gsiii\  installation can have many sites and interfaces, and these can be paired in different combinations.  One instantiation of a servlet uses one site and one interface, so every specified pairing results in a new servlet instance.  For example, a single site might be served with two different interfaces. This provides different modes of access to the same content. e.g. HTML vs WML, or perhaps providing a completely different look and feel for different audiences. Alternatively, a standard interface may be used with many different sites---providing a consistent mode of access to a lot of different content.
    196189
    197190Collections live in the \gst{collect} directory of a site. Any collections that are found in this directory when the servlet is initialised will be loaded up and presented to the user. Collections require valid configuration files, but apart from this, nothing needs to be done to the site to use new collections. Collections added while Tomcat is running will not be noticed automatically. Either the server needs to be restarted, or a configuration request may be sent to the library, triggering a (re)load of the collection (this is described in Section~\ref{sec:runtime-config}).
    198191
    199 There are two  sites that come with the distribution: \gst{localsite}, and \gst{soapsite}. \gst{localsite} has several demo  collections, while \gst{soapsite} has none. \gst{soapsite} specifies that a soap connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
     192There are two  sites that come with the distribution: \gst{localsite}, and \gst{gateway}. \gst{localsite} has several demo  collections, while \gst{gateway} has none. \gst{gateway} specifies that a SOAP connection should be made to \gst{localsite}. Getting this to work involves setting up a soap server for localsite: see Section~\ref{sec:distributed} for details.
     193There are also two interfaces provided in the distribution: \gst{default} and \gst{classic}. The default interface is a generic \gsiii\ interface, while the \gst{classic} interface aims to look like the old \gsii\ interface.
    200194
    201195Each site and interface has a configuration file which specifies parameters for the site or interface---these are described in Section~\ref{sec:config}.
    202196
    203 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
    204 There are three servlets specified in web.xml (these correspond to the three links in the welcome page for \gsiii\ ): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are \gs\  library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. Both of these servlets use the standard interface (called {\em default}).
     197\subsection{Configuring Tomcat}\label{sec:tomcat-config}
     198
     199The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the configuration information for Tomcat. It tells Tomcat what servlets to load, what initial parameters to pass them, and what web names map to the servlets.
     200There are four servlets specified in web.xml (these correspond to the four servlet links in the welcome page for \gsiii): one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other three are the \gs\  library servlets described in Section~\ref{sec:browser-access}, \gst{library}, \gst{classic} and \gst{gateway}. Each servlet must specify which site and which interface to use. Having multiple servlets provides a way of serving different sites, or the same site with a different style of presentation. Site\_name and interface\_name are just two examples of initialisation parameters used by the library servlets. The full list is shown in Table~\ref{tab:serv-init}.
     201
     202For more details about Tomcat see Appendix~\ref{app:tomcat}.
    205203
    206204\begin{table}
     
    224222\end{table}
    225223
    226 The initialisation parameters used by the library servlets are shown in Table~\ref{tab:serv-init}. This is where you define what site and interface each servlet uses. Any number of servlets can be specified here. See Appendix~\ref{app:tomcat} for more details about Tomcat.
    227 
    228 
    229 \subsection{Configuring a \gs\  installation}\label{sec:config}
     224\subsection{Configuring a \gs\ library}\label{sec:config}
    230225
    231226Initial \gsiii\  system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for the site, \gst{siteConfig.xml}. Each interface has a configuration file, \gst{interfaceConfig.xml}, that specifies Actions for the interface. Collections also have several configuration files; these are discussed in Section~\ref{sec:collconfig}.
    232 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a CGI-type command to the library.  There are a series of commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
     227The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will not take immediate effect. Tomcat needs to be restarted for changes to the interface configuration file to take effect. However, changes to the site configuration file can be incorporated sending a system command to the library.  There are a series of system commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}.
    233228
    234229\subsubsection{Site configuration file}\label{sec:siteconfig}
     
    236231The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of
    237232known external sites to connect to.  Collections are not specified in the site
    238 configuration file, instead they are determined by the contents of the site's
     233configuration file, but are determined by the contents of the site's
    239234collections directory.
    240235
     
    275270  <siteList>
    276271    <site name="org.greenstone.localsite"
    277       address="http://localhost:8090/soap/servlet/rpcrouter"
     272      address="http://localhost:8080/soap/servlet/rpcrouter"
    278273      type="soap"/>
    279274  </siteList>
     
    286281\subsubsection{Interface configuration file}\label{sec:interfaceconfig}
    287282
    288 The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library urls for the a (action) parameter) e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
     283The interface configuration file \gst{interfaceConfig.xml} lists all the actions that the interface knows about at the start (other ones can be loaded dynamically). Actions create the web pages for the library: there is generally one Action per type of page. For example, a query action produces the pages for searching, while a document action displays the documents. The configuration file specifies what short name each action maps to (this is used in library URLs for the a (action) parameter) e.g. QueryAction should use a=q. If the interface uses XSLT, it specifies what XSLT file should be used for each action and possibly each subaction. This makes it easy for developers to implement and use different actions and/or XSLT files without recompilation. The server must be restarted, however.
    289284
    290285It also lists all the languages that the interface text files have been translated into. These have a \gst{name} attribute, which is the ISO code for the language, and a \gst{displayElement} which gives the language name in that language (note that this file should be encoded in UTF-8). This language list is used on the Preferences page to allow the user to change the interface language. Details on how to add a new language to a \gsiii\  library are shown in Section~\ref{sec:interface-customise}.
     
    331326\subsection{Run-time re-initialisation}\label{sec:runtime-config}
    332327
    333 [**should this section go in here, cos its kind of adminy, or go into the user stuff, cos you need to do it after building a collection???**]
    334 
    335328When Tomcat is started up, the site and interface configuration files are read in, and actions/services/collections loaded as necessary. The configuration is then static unless Tomcat is restarted, or re-configuration commands issued.
    336329
    337 There are several commands that can be issued to Tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, Tomcat must be restarted for those changes to take effect. Similarly, if the java classes are modified, Tomcat must be restarted then too.
     330There are several commands that can be issued to Tomcat to avoid having to restart the server. These can reload the entire site, or just individual collections. Unfortunately at present there are no commands to reconfigure the interface, so if the interface configuration file has changed, Tomcat must be restarted for those changes to take effect. Similarly, if the Java classes are modified, Tomcat must be restarted then too.
    338331
    339332Currently, the runtime configuration commands can only be accessed by typing arguments into the URL; there is no nice web form yet to do this.
    340333
    341 The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in \gs\ , so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
     334The arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate\footnote{There is no security for these commands yet in \gs, so the deactivate/delete command is disabled}. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (\gst{a} is action, \gst{sa} is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{sc=xxx}, where \gst{xxx} is the name of the collection or cluster. Table~\ref{tab:run-time config} describes the commands and arguments in a bit more detail.
    342335
    343336\begin{table}
     
    361354
    362355\subsection{Using a collection}\label{sec:usecolls}
    363 [TODO: expand this section]
    364 
    365 A collection typically consists of a set of documents, which could be text, html, word, PDF, images, bibliographic records etc, along with some access methods, or ``services''. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.
     356
     357A collection typically consists of a set of documents, which could be text, HTML, word, PDF, images, bibliographic records etc, along with some access methods, or ``services''. Typical access methods include searching or browsing for document identifiers, and retrieval of content or metadata for those identifiers.
    366358Searching involves entering words or phrases and getting back lists of documents that contain those words. The search terms may be restricted to particular fields of the document.
    367359
    368360Browsing involves navigating pre-defined hierarchies of documents, following links of interest to find documents. The hierarchies may be constructed on different metadata fields, for example, alphabetical lists of Titles, or a hierarchy of Subject classifications. Clicking on a bookshelf icon takes you to a lower level in the hierarchy, while clicking on a book or page icon takes you to a document.
    369361
    370 In the standard interface that comes with \gsiii\ \footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
     362In the standard interface that comes with \gsiii\ \footnote{of course, this is all customisable}, collections in a digital library are presented in the following manner. The 'home' page of the library shows a list of all the public collections in that library. Clicking on a collection link takes you to the home page for the collection, which we call the collection's 'about' page. The standard page banner looks something like that shown in Figure~\ref{fig:page-banner}.
    371363
    372364\begin{figure}[h]
     
    377369\end{figure}
    378370
    379 The image at the top left is a link to the collection's home page. The top right has buttons to link to the library home page, help pages and preference pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Clicking on a name accesses that service.
     371The image at the top left is a link to the collection's home page. The top right has buttons to link to the library home page, help and preferences pages. All the available services are arrayed along a navigation bar, along the bottom of the banner. Clicking on a name accesses that service.
    380372
    381373Search type services generally provide a form to fill in, with parameters including what field or granularity to search, and the query itself. Clicking the search button carries out the search, and a list of matching documents will be displayed. Clicking on the icons in the result list takes you to the document itself.
     
    383375Once you are looking at a document, clicking the open book icon at the top of the document, underneath the navigation bar, will take you back to the service page that you accessed the document from.
    384376
    385 [TODO: describe the colls that the sample installation comes with\\
    386 brief description of what a collection is.\\
    387 how to get around the collection, services etc. \\
    388 querying vs browsing \\
    389 use the demo colls that come with \gsiii\  - one gs2 coll, one gs3 coll, tei coll??\\]
    390 
    391377\subsection{Building a collection}\label{sec:buildcol}
    392378
    393 There are three ways to get a new collection into \gsiii\ . The first is to build it using the \gsiii\ command line building process. The second way is to use the Greenstone Librarian Interface to build a new collection. This creates a collection in a \gsiii\ context, but uses the \gsii\ perl build process. The third way is to import a pre-built \gsii\  collection.
    394 
    395 Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\  installation. The collect directory is at \$GSDL3HOME/web/sites/site-name/collect, where site-name is the name of the site you want your new collection to belong to.
    396 
    397 The following three sections describe how to create a collection from scratch, using command line and GLI building, and how to import a \gsii\  collection. Once a collection has been built (and is located in the collect directory), the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, an activate collection command can be issued to the servlet, using the arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}, where \gst{collname} should be replaced with the collection name---this tells the library program to (re)load the \gst{collname} collection.
     379There are three ways to get a new collection into \gsiii. The first is to build it using the \gsiii\ command line building process. The second way is to use the Greenstone Librarian Interface to build a new collection. This creates a collection in a \gsiii\ context, but uses the \gsii\ Perl collection building process. The third way is to import a pre-built \gsii\  collection.
     380
     381Collections live in the collect directory of a site. As described in Section~\ref{sec:sites-and-ints}, there can be several sites per \gsiii\  installation. The collect directory is at \gst{\$GSDL3HOME/web/sites/site-name/collect}, where site-name is the name of the site you want your new collection to belong to.
     382
     383The following three sections describe how to create a collection from scratch, using command line and GLI building, and how to import a \gsii\  collection. Once a collection has been built (and is located in the collect directory), the library server needs to be notified that there is a new collection. This can be accomplished in two ways\footnote{and eventually there will also probably be automatic polling for new collections}. If you are the library administrator, you can restart Tomcat. The library servlet will then be created afresh, and will discover the new collection when it scans the collect directory for the collection list. Alternatively, an activate collection command can be issued to the servlet, using the arguments \gst{a=s\&sa=a\&st=collection\&sn=collname}, where \gst{collname} should be replaced with the collection name---this tells the library program to (re)load the \gst{collname} collection.
    398384
    399385
     
    405391[TODO: describe the kinds of documents that can be added, something about METS files?]
    406392
    407 Metadata for documents can be added using metadata.xml files.  These files have already been used in \gsii\ , and the format is the same in \gsiii\ .  A metadata.xml file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the metadata.xml, or in one of its child directories, will be selected.  The filename tag encloses the regular expression as text, eg:
     393Metadata for documents can be added using metadata.xml files.  These files have already been used in \gsii, and the format is the same in \gsiii.  A metadata.xml file has a root element of \gst{<DirectoryMetadata>}.  This encloses a series of \gst{<FileSet>} items.  Neither of these tags has any attributes.  Each \gst{<FileSet>} item includes two parts: firstly, one or more \gst{<FileName>} tags, each of which encloses a regular expression to identify the files which are to be assigned the metadata.  Only files in the same directory as the metadata.xml, or in one of its child directories, will be selected.  The filename tag encloses the regular expression as text, e.g.:
    408394
    409395\begin{gsc}\begin{verbatim}
     
    411397\end{verbatim}\end{gsc}
    412398
    413 This would match any file containing the text 'example' in its name.  The second part of the \gst{<FileSet>} item is a \gst{<Description>} item.  The \gst{<Description>} tag has no attributes, but encloses one or more \gst{<Metadata>} tags.  Each \gst{<Metadata>} tag contains one metadata item, i.e. a label to describe the metadata and a corresponding value.  The \gst{<Metadata>} tag has one compulsory attribute: ``name''.  This attribute gives the metadata label to add to the document.  Each \gst{<Metadata>} tag also has an optional attribute: ``mode''.  If this attribute is set to ``accumulate'' then the value is added to the document, and any existing values for that metadata item are retained.  If the attribute is set to ``set'' or is omitted, then the existing value of the metadata item will be deleted.
     399This would match any file containing the text 'example' in its name.  The second part of the \gst{<FileSet>} item is a \gst{<Description>} item.  The \gst{<Description>} tag has no attributes, but encloses one or more \gst{<Metadata>} tags.  Each \gst{<Metadata>} tag contains one metadata item, i.e. a label to describe the metadata and a corresponding value.  The \gst{<Metadata>} tag has one compulsory attribute: ``name''.  This attribute gives the metadata label to add to the document.  Each \gst{<Metadata>} tag also has an optional attribute: ``mode''.  If this attribute is set to ``accumulate'' then the value is added to the document, and any existing values for that metadata item are retained.  If the attribute is set to ``set'' or is omitted, then any existing value of the metadata item will be deleted.
    414400
    415401\begin{figure}
     
    436422    <FileName>b22bue</FileName>
    437423    <Description>
    438       <Metadata name="Title">Butterfly Farming in Papua New Guinea (b22bue)</Metadata>
     424      <Metadata name="Title">Butterfly Farming in Papua New Guinea
     425        (b22bue)</Metadata>
    439426      <Metadata mode="accumulate" name="Language">English</Metadata>
    440       <Metadata mode="accumulate" name="Subject">Other animals (micro-livestock, little known animals, silkworms, reptiles, frogs, snails, game, etc.)</Metadata>
     427      <Metadata mode="accumulate" name="Subject">Other animals (micro-
     428        livestock, little known animals, silkworms, reptiles, frogs,
     429        snails, game, etc.)</Metadata>
    441430      <Metadata mode="accumulate" name="Organization">BOSTID</Metadata>
    442431      <Metadata mode="accumulate" name="AZList">T.1</Metadata>
    443       <Metadata mode="accumulate" name="Keyword">start a butterfly farm</Metadata>
     432      <Metadata mode="accumulate" name="Keyword">start a butterfly farm
     433        </Metadata>
    444434    </Description>
    445435  </FileSet>
     
    451441
    452442Figure~\ref{fig:metadatafile} shows an example metadata.xml file.
    453 Here, only one file pattern is found in each file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the mode=accumulate attribute.  This means that when the title is assigned to a document, its existing \gst{Title} information will be lost.
     443Here, only one file pattern is found in each file set.  However, the \gst{Description} tag contains a number of separate metadata items.  Note that the \gst{Title} metadata does not have the \gst{mode=accumulate} attribute.  This means that when the title is assigned to a document, its existing \gst{Title} information will be lost.
    454444
    455445The basic means of finding documents in \gs\  is search. Options for building the search indexes include which indexer to use, what granularity to use for the indexes (e.g. whether to index documents as a whole, or sections of documents), what content the index should have (the whole text of the document or one or many metadata fields).  Section-level indexes allow a reader to recall part of a document (for instance, a chapter) rather than the entire document.  However, \gsiii\  must be able to identify the internal structure of the document to achieve this.  The degree to which structure can be found varies from file format to file format.
    456446
    457 An alternative means of finding documents is through browsing. Greenstone can create pre-defined browsing hierarchies based on document metadata. Each browsing structure is called a classifier. Options for building classifiers include what type of classifier to use (linear list or multi-level hierarchy), what metadata to build the classifier on, eg Title, Author etc.
     447An alternative means of finding documents is through browsing. Greenstone can create pre-defined browsing hierarchies based on document metadata. Each browsing structure is called a classifier. Options for building classifiers include what type of classifier to use (linear list or multi-level hierarchy), what metadata to build the classifier on, e.g. Title, Author etc.
    458448
    459449The collectionConfig.xml file controls the all of these options for collection building, and the format is described in Section~\ref{sec:collconfig}.
    460450
    461 To build a collection, place the source documents and optional metadata.xml file(s) in the import directory, place the \gst{collectionConfig.xml} file in the etc directory, and execute \gst{gs3build.sh sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have mysql running before you start building---running \gst{gs3-launch.sh} will start up the mysql server as well as tomcat.
     451To build a collection, place the source documents and optional metadata.xml file(s) in the import directory, place the \gst{collectionConfig.xml} file in the etc directory, and execute \gst{gs3build.sh/bat sitename collectionname}.  The process will run, placing the new indexes in the \gst{building} subdirectory of the collection's directory. You must have mysql running before you start building---running \gst{gs3-launch.sh/bat} will start up the MySQL server as well as tomcat.
    462452
    463453Once the build process is complete, the building directory should be renamed to index (after deleting or renaming the existing index directory, if any), and Tomcat prompted to reload the collection---either by restarting the server, or by sending an activate collection command to the library servlet.
    464454
    465 Summary:
    466 
    467 [TODO: need to describe namespaces somewhere? ]
    468 
    469455\subsubsection{Using the Librarian Interface}
    470456
    471 [TODO: check that this is true with the new installer]
    472 
    473 The Greenstone Librarian Interface (GLI) can be used to create \gsii\ style collections for \gsiii. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 .. menu in the Program Files section of the Start menu. On linux, run ./gli4gs3.sh from the gsdl3/gli directory.
     457The Greenstone Librarian Interface (GLI) can be used to create \gsii\ style collections for \gsiii. It can be started under Windows by selecting Greenstone Librarian Interface from the Greenstone 3 Digital Library menu in the Program Files section of the Start menu. On Linux, run \gst{./gli4gs3.sh} from the \gst{gsdl3/gli} directory.
    474458
    475459Currently, the GLI works almost exactly the same as for \gsii\footnote{Eventually the GLI will be modified to use native \gsiii\ config files and collection building}. Collection configuration is done in a \gsii\ manner. The main difference is that \gsiii\ has different sites and interfaces and servlets, whereas \gsii\ has a single collect directory, and a single runtime cgi program.
    476460
    477 The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates using one site---you can edit, delete, create new collections within a single site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences->Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
    478 
    479 Collection building using the GLI will use the \gsii\ perl scripts and plugins. At the conclusing of the \gsii\ build process, a conversion script will be run to create the \gsiii configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ config files. You can either rebuild the collection through the GLI (may take a while), or run the conversion script directly (see following section).
     461The GLI for \gsiii\ has a couple of new configuration parameters: site and servlet. It operates within a single site---you can edit, delete, create new collections within this site. A servlet is also specified for that site---this is used when previewing a collection. While you are working in one site, you cannot edit collections from another site. However, you can base a collection on one from another site. To change the working site and/or servlet, go to Preferences-$>$Connection in the File menu. By default, the GLI will use site \gst{localsite}, and servlet \gst{library}.
     462
     463Collection building using the GLI will use the \gsii\ Perl scripts and plugins. At the conclusion of the \gsii\ build process, a conversion script will be run to create the \gsiii\ configuration files. This means that format statements are no longer 'live'---changing these will require changes to the \gsiii\ config files. You can either rebuild the collection through the GLI (may take a while), or run the conversion script directly (see following section).
    480464 
    481 Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the Greenstone 2 User's Guide. This can be found in your \gsii\ installation, or in... if you have installed the ... installer.
     465Detailed instructions about using the GLI can be found in Sections 3.1 and 3.2 of the Greenstone 2 User's Guide (\gst{GS2-User-en.pdf}. This can be found in  your \gsii\ installation, or in the gsdl3/docs/manual directory if you have installed \gsiii\ from a distribution.
    482466
    483467
     
    682666The \gst{<display>} element contains optional formatting information for the display of documents. Templates that can be specified here include \gst{documentHeading}, \gst{DocumentContent}, and other information that could be specified (in a yet to be decided format) are things such as  whether or not to display the cover image, table of contents etc.
    683667
    684 Format elements are desribed in Section~\ref{sec:formatstmt}.
     668Format elements are described in Section~\ref{sec:formatstmt}.
    685669
    686670\subsection{buildConfig.xml}\label{sec:buildconfig}
    687671
    688 The file \gst{buildConfig.xml} is produced by the collection building process. Gererally it is not necessary to look at this file, but it can be useful in determining what went wrong if the collection doesn't appear quite the way it was planned.
     672The file \gst{buildConfig.xml} is produced by the collection building process. Generally it is not necessary to look at this file, but it can be useful in determining what went wrong if the collection doesn't appear quite the way it was planned.
    689673
    690674It contains  metadata and other information about the collection that can
     
    993977It is easy to add a new interface language to \gs\ .  Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. These text strings are contained in Java resource bundle properties files. These are plain text files consisting of key-value pairs, located in \gst{resources/java}. Each interface has one named \gst{interface\_name.properties} (where `name' is the interface name). Each service class has one with the same name as the class (e.g. \gst{GS2Search.properties}). To add another language all of the base .properties  files must be translated. The translated files keep the same names, but with a language extension added. For example, a French version of \gst{interface\_default.properties} would be named \gst{interface\_default\_fr.properties}.
    994978
    995 Keys will be looked up in the properties file closest to the specified language. For example, if language \gst{fr\_CA} was specified (french language, country Canada), and the default locale was \gst{en\_GB},  java would look at properties files in the following order, until it found the key: \gst{XXX\_fr\_CA.properties}, \gst{XXX\_fr.properties},  \gst{XXX\_en\_GB.properties}, then \gst{XXX\_en.properties}, and finally the default \gst{XXX.properties}.
    996 
    997 These new files are available straight away---to use the new language, add e.g. \gst{l=fr} to the arguments in the URL. To get \gs\ to add it in to the list of languages on the preferences page, an entry needs to be added into the languagss list in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}). Modification of this file requires a restart of the Tomcat server for the changes to be recognised.
     979Keys will be looked up in the properties file closest to the specified language. For example, if language \gst{fr\_CA} was specified (French language, country Canada), and the default locale was \gst{en\_GB},  Java would look at properties files in the following order, until it found the key: \gst{XXX\_fr\_CA.properties}, \gst{XXX\_fr.properties},  \gst{XXX\_en\_GB.properties}, then \gst{XXX\_en.properties}, and finally the default \gst{XXX.properties}.
     980
     981These new files are available straight away---to use the new language, add e.g. \gst{l=fr} to the arguments in the URL. To get \gs\ to add it in to the list of languages on the preferences page, an entry needs to be added into the languages list in the \gst{interfaceConfig.xml} file (see Section~\ref{sec:interfaceconfig}). Modification of this file requires a restart of the Tomcat server for the changes to be recognised.
    998982
    999983\newpage
     
    1013997\subsection{Overview of modules??}
    1014998
    1015 A \gsiii\  'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system. The top left part is concerned with displaying the data, while the bottom right part is the collection data serving part. The two sides communicate through the MessaegRouter. There is a one-to-one correspondance between modules and Java classes, with the exception of services: for coding and/or run-time efficiency reasons, several Service modules may be grouped together into one ServiceRack class.
     999A \gsiii\  'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc.  Figure~\ref{fig:local} shows how they fit together in a stand-alone system. The top left part is concerned with displaying the data, while the bottom right part is the collection data serving part. The two sides communicate through the MessageRouter. There is a one-to-one correspondence between modules and Java classes, with the exception of services: for coding and/or run-time efficiency reasons, several Service modules may be grouped together into one ServiceRack class.
    10161000
    10171001\begin{figure}[t]
     
    21702154\gst{./gs3-soap-deploy-site.sh <sitename> <siteuri>}
    21712155
    2172 Sitename is the name of the site's directory, eg localsite. The siteuri is the identifier that will be used for the SOAP resource, eg org.greenstone.localsite. It should be a unique name amongst all the SOAP services that you want to connect to.
     2156Sitename is the name of the site's directory, e.g. localsite. The siteuri is the identifier that will be used for the SOAP resource, e.g. org.greenstone.localsite. It should be a unique name amongst all the SOAP services that you want to connect to.
    21732157
    21742158The script  deploys the service for the site specified. A resource file (\gst{sitename.xml}) is created which is used to specify the service. It can be found in \gst{gsdl3/resources/soap}, and is generated from \gst{site.xml.in}.
     
    23552339Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the left hand ``List'' button.
    23562340
    2357 Information about deployed services is maintained between Tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart Tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.
     2341Information about deployed services is maintained between Tomcat sessions---you only need to deploy it once. To get the gateway servlet talking to the SOAP server, you need to shutdown and restart Tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the gateway servlet.
    23582342
    23592343\subsection{Debugging SOAP}\label{app:soap-debug}
     
    23652349\end{quote}
    23662350
    2367 8070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify \gs\  to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}).
     23518070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify \gs\  to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the gateway site configuration file (\gst{\gsdlhome/web/sites/gateway/siteConfig.xml}).
    23682352\begin{quote}\begin{gsc}\begin{verbatim}
    23692353<site name="org.greenstone.localsite"
Note: See TracChangeset for help on using the changeset viewer.