Changeset 4162 for trunk/gsdl3/docs/manual
- Timestamp:
- 2003-04-15T15:36:16+12:00 (21 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl3/docs/manual/manual.tex
r3712 r4162 3 3 \hyphenation{Message-Router Text-Query} 4 4 5 \newenvironment{gsc}% Greenstone text bits 6 {\begin{footnotesize}\begin{tt}}% 7 {\end{tt}\end{footnotesize}} 8 9 \newcommand{\gst}[1]{{\footnotesize \tt #1} } 5 10 \begin{document} 6 11 … … 47 52 Native Interface) will be used to communicate with these. 48 53 49 50 \section{Architecture} 51 52 This section is covered by the paper: An agent based architecture for dynamic digital library construction and configuration. Either cut and paste it in here, or link to the text?? or have two separate docs. dont want to have to maintain two separate versions of the same thing. 53 54 \section{Greenstone Implementation} 55 \label{sec:impl} 56 57 \subsection{Configuring Greenstone} 58 \label{subsec:config} 59 60 Greenstone3 involves several different kinds of configuration files, all 61 expressed in XML. Each site has a configuration file that binds parameters for 62 the site, {\em siteConfig.xml}. Each collection has two configuration files, {\em collectionConfig.xml} and {\em buildConfig.xml\/}, that give metadata for the 63 collection.\footnote{These replace {\em collect.cfg} and {\em build.cfg} in 54 A description of the general design and architecture of Greenstone3 is covered by the document ``The design of Greenstone3: An agent based dynamic digital library'' (design-2002.ps, in the gsdl3/docs/manual directory). 55 56 \section{System modules}\label{sec:modules} 57 58 A Greenstone3 'library' system consists of many components... Figure~\ref{fig:local} shows they fit together in a stand-alone system. 59 60 \begin{figure}[t] 61 \centering 62 \includegraphics[width=4in]{local} %5.8 63 \caption{A simple stand-alone site.} 64 \label{fig:local} 65 \end{figure} 66 67 68 {\em MessageRouter}: this is the central module for a site. It controls the site, loading up all the collections, clusters, communicators needed. All messages pass through the MessageRouter. Communication between remote sites is always done between MessageRouters, one for each site. 69 70 {\em Collection and ServiceCluster}: these are very similar. They both provide some metadata about the collection/cluster, and a list of services. The services are provided by ServiceRack objects that the collection/cluster loads up. A Collection is a specific type of ServiceCluster. A ServiceCluster groups services that are related conceptually, eg all the building services may be part of a cluster. What is part of a cluster is specified by the site config file. A Collection's services are grouped by the fact that they all operate on some common data---the documents in the collection. 71 Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different. 72 73 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. 74 75 {\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer. 76 77 {\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages. 78 79 {\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in more detail in Section~\ref{sec:pagegen}. 80 81 82 \section{Configuration}\label{sec:config} 83 84 Initial Greenstone3 system configuration is determined by a set of configuration files, all expressed in XML. Each site has a configuration file that binds parameters for 85 the site, \gst{siteConfig.xml}. Each collection has two configuration files, \gst{collectionConfig.xml} and \gst{buildConfig.xml}, that give metadata and other information for the 86 collection.\footnote{\gst{siteConfig.xml} is new for Greenstone3, while \gst{collectionConfig.xml} and \gst{buildConfig.xml} replace \gst{collect.cfg} and \gst{build.cfg} in 64 87 Greenstone2.} The first includes user-defined metadata for the collection, 65 88 such as its name and the {\em About this collection} text; and also gives 66 89 instructions on how the collection is to be built. The second is produced by 67 90 the build-time process and includes any metadata that can be determined 68 automatically. \footnote{Currently only the buildConfig.xml file is used - collections are built using gs2 style building and therefore use the old collect.cfg.}69 70 \subsubsection{Site configuration file} 71 72 The file {\em siteConfig.xml} specifies the URI for the site ({\em 73 localSiteName\/}), any services or service clusters provided by the site that are not connected 74 with a particular collection (for example, translation services, or collection building), and a list of91 automatically. It also includes configuration information for any serviceRacks needed by the collection. 92 93 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will have no effect. There are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}. 94 95 \subsection{Site configuration file}\label{sec:siteconfig} 96 97 The file \gst{siteConfig.xml} specifies the URI for the site (\gst{localSiteName}), the HTTP address for site resources (\gst{httpAddress}), any ServiceClusters that the site provides (for example, collection building), any ServiceRacks that do not belong to a cluster or collection, and a list of 75 98 known external sites to connect to. Collections are not specified in the site 76 99 configuration file, instead they are determined by the contents of the site's 77 100 collections directory. 78 101 79 Here is a configuration file for a rudimentary site with no site-wide services, 80 which does not connect to any external sites.\footnote{should the code be tolerant of missing elements? or do we require empty elements?} 81 \begin{quote}\begin{footnotesize}\begin{verbatim} 82 <config> 102 The HTTP address is used for retrieving resources from a site outside the XML protocol. Because a site is HTTP accessible, any files (e.g. images) belonging to that site or to its collections can be specified in the HTML of a page by a URL. This avoids having to retrieve these files from a remote site via the XML protocol\footnote{Currently, sites live inside the Tomcat gsdl3 root context, and therefore all their content is accessible over HTTP via the Tomcat address. We need to see if parts can be restricted. Also, if we use a different protocol, then resources from remote sites may need to come through the XML. Also, if we are running locally without using Tomcat, we may want to get them via file:// rather than http://.}. 103 104 The first example in Figure~\ref{fig:siteconfig} shows a site configuration file for a rudimentary site with no site-wide services, 105 which does not connect to any external sites. The second example is for a site with one site-wide service cluster - a collection building cluster. It also connects to the first site using SOAP. 106 These two sites are running on the same machine. For site gsdl1 to talk to site localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is \gst{http://localhost:8090/soap/servlet/rpcrouter}. 107 108 109 \begin{figure} 110 \begin{gsc}\begin{verbatim} 111 <siteConfig> 83 112 <localSiteName value="org.greenstone.localsite"/> 113 <httpAddress value="http://localhost:8090/gsdl3/sites/localsite"/> 84 114 <serviceClusterList/> 85 115 <serviceRackList/> 86 116 <siteList/> 87 </ config>88 \end{verbatim}\end{ footnotesize}\end{quote}89 The following configuration file is for a site with one site-wide service cluster - a collection building cluster. It also connects to the previous site using SOAP. 90 \begin{ quote}\begin{footnotesize}\begin{verbatim}91 < config>117 </siteConfig> 118 \end{verbatim}\end{gsc} 119 120 \begin{gsc}\begin{verbatim} 121 <siteConfig> 92 122 <localSiteName value="org.greenstone.gsdl1"/> 93 <serviceRackList/> 94 <servicesImpl name="TranslationServices"/> 95 </servicesImplList> 123 <httpAddress value="http://localhost:8090/gsdl3/sites/gsdl1"/> 96 124 <serviceClusterList> 97 125 <serviceCluster name="build"> … … 108 136 <siteList> 109 137 <site name="org.greenstone.localsite" 110 address="http://localhost:80 80/soap/servlet/rpcrouter"138 address="http://localhost:8090/soap/servlet/rpcrouter" 111 139 type="soap"/> 112 140 </siteList> 113 </config> 114 \end{verbatim}\end{footnotesize}\end{quote} 115 116 These two sites are running on the same machine. For site1 to talk to localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is "http://localhost:8080/soap/servlet/rpcrouter" 117 118 \subsubsection{Building configuration file} 119 120 The file {\em buildConfig.xml} contains all metadata and other information about the collection that can 141 </siteConfig> 142 \end{verbatim}\end{gsc} 143 \caption{Two sample site config files} 144 \label{fig:siteconfig} 145 \end{figure} 146 147 148 149 \subsection{Collection configuration file}\label{sec:collconfig} 150 151 The collection configuration file is where the collection designer (eg a librarian) decides what form the collection should take. This includes the collection metadata such as title and description, and also includes what indexes and browsing structures should be built. The format of \gst{collectionConfig.xml} is still under consideration. However, Figure~\ref{fig:collconfig} 152 here is an example as it is at present. 153 154 \begin{figure} 155 \begin{gsc}\begin{verbatim} 156 <collectionConfig xmlns:gsf="http://www.greenstone.org/ 157 configformat"> 158 <metadataList> 159 <metadata name="colName" lang="en">greenstone mgpp demo 160 </metadata> 161 <metadata name="colDescription" lang="en">This is a 162 demonstration collection for the Greenstone digital 163 library software. It contains a small subset (11 books) 164 of the Humanity Development Library.</metadata> 165 <metadata name="colDescription" lang="fr">C'est une 166 collection pour demonstration du logiciel Greenstone. 167 Elle contient une petite partie du projet de bibliotheques 168 humanitaires et de developpement (11 livres).</metadata> 169 <metadata name="colIcon">mgppdemo.gif</metadata> 170 </metadataList> 171 <search type='mgpp'> 172 <index name="tt" content="text,metadata" 173 level="Document,Section"> 174 <displayName lang="en">books</displayName> 175 </index> 176 <format> 177 <gsf:template match="documentNode"> 178 <td><gsf:link><gsf:metadata name="Title"/>(<gsf:metadata 179 name="Source"/>)</gsf:link></td> 180 </gsf:template> 181 </format> 182 </search> 183 <browse> 184 <classifier name="CL1" type="Hierarchy" content="Subject" 185 level="Document"> 186 <option name="hfile" value="sub.txt"/> 187 <option name="sort" value="Title"/> 188 </classifier> 189 <classifier name="CL2" type="AZList" content="Title" 190 level="Document"> 191 <displayName lang='en'>all titles</displayName> 192 <format> 193 <gsf:template match="classifierNode"> 194 <td><gsf:link type="classifier"><gsf:metadata name="Title"/> 195 </gsf:link></td> 196 </gsf:template> 197 </format> 198 </classifier> 199 <classifier name="CL3" type="List" content="Keyword" 200 level="Document"> 201 <format> 202 <gsf:template match="documentNode"><td><gsf:link> 203 <gsf:metadata name="Keyword"/></gsf:link></td></gsf:template> 204 </format> 205 </classifier> 206 <classifier type="Phind" content="text" level="Section"/> 207 </browse> 208 </collectionConfig> 209 \end{verbatim}\end{gsc} 210 \caption{Sample collectionConfig.xml file} 211 \label{fig:collconfig} 212 \end{figure} 213 214 The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8. 215 The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list suold look like. 216 The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in teh hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail. 217 218 There is also a need for a descripiton of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}. 219 220 \subsection{Building configuration file}\label{sec:buildconfig} 221 222 The file \gst{buildConfig.xml} contains the metadata and other information about the collection that can 121 223 be determined automatically when building the collection, such as the number of 122 224 documents it contains. It also includes a list of serviceRack classes that are … … 124 226 collection. The serviceRack names are Java classes that are loaded 125 227 dynamically at runtime. Any information inside the serviceRack element is 126 specific to that service---there is no set format. Here is an example: 127 128 \begin{quote}\begin{footnotesize}\begin{verbatim} 129 130 <buildConfig> 228 specific to that service---there is no set format. Figure~\ref{fig:buildconfig} shows an example. This config file specifies that the collection should load up 3 ServiceRacks: GS2MGPPRetrieve, GS2MGPPSearch, and PhindPhraseBrowse. The contents of each \gst{<serviceRack>} element are passed to the appropriate ServiceRack objects for configuration. 229 230 231 \begin{figure} 232 \begin{gsc}\begin{verbatim} 233 <buildConfig xmlns:gsf="www.greenstone.org/format" > 131 234 <metadataList> 132 235 <metadata name="numDocs">11</metadata> 133 <metadata name="colIcon">mgppdemo.gif</metadata> 134 <metadata name="colName">Greenstone demo collection</metadata> 135 <metadata name="colDescription">This is a demonstration 136 collection for the Greenstone digital library software. It 137 contains a small subset of the Humanitarian and Development 138 Libraries.</metadata> 236 <metadata name="documentMetadata"><element name="Title"/> 237 <element name="Subject"/><element name="Organization"/> 238 <element name="URL"/></metadata> 139 239 </metadataList> 140 240 <serviceRackList> 141 241 <serviceRack name="GS2MGPPRetrieve"> 142 242 <defaultLevel name="Section"/> 143 <!-- something list this should be used to advertise 144 what metadata the collection has available to be retrieved - 145 however, it is not used yet --> 146 <metadataList> 147 <element name="Title"/><element name="Subject"/> 148 <element name="Organization"/><element name="URL"/> 149 </metadataList> 243 <levelList> 244 <level name="Document"/> 245 <level name="Section"/> 246 </levelList> 247 <classifierList> 248 <classifier name="CL1" content="Subject" 249 documentInterleave="true" orientation='vertical'/> 250 <classifier name="CL2" content="Title" 251 documentInterleave="false" orientation='horizontal'/> 252 <classifier name="CL4" content="Organisation" 253 documentInterleave="true" orientation='vertical'/> 254 <classifier name="CL5" content="Keyword" 255 documentInterleave="true" orientation='vertical'/> 256 </classifierList> 150 257 </serviceRack> 151 258 <serviceRack name="GS2MGPPSearch"> … … 161 268 </indexList> 162 269 <fieldList> 163 <field name="TX"/><field name="SU"/><field name="TI"/> 270 <field shortname="TX" name="TextOnly"/> 271 <field shortname="SU" name="Subject"/> 272 <field shortname="TI" name="Title"/> 164 273 </fieldList> 165 274 </serviceRack> 166 275 <serviceRack name="PhindPhraseBrowse"/> 167 <serviceRack name="GS2Browse">168 <classifierList>169 <classifier name="CL1"><metadataList>170 <metadata name="Title">Subject</metadata>171 </metadataList></classifier>172 <classifier name="CL2" ><metadataList>173 <metadata name="Title">Title</metadata>174 </metadataList></classifier>175 <classifier name="CL4"><metadataList>176 <metadata name="Title">Organization</metadata>177 </metadataList></classifier>178 <classifier name="CL5" ><metadataList>179 <metadata name="Title">Keyword</metadata>180 </metadataList></classifier>181 </classifierList>182 </serviceRack>183 276 </serviceRackList> 184 </buildConfig> 185 \end{verbatim}\end{footnotesize}\end{quote} 186 Note: because {\em collectionConfig.xml} is not used yet, the {\em colIcon}, {\em colDescription} 187 and {\em colName} metadata elements have been specified here. 188 189 \subsubsection{Collection configuration file} 190 191 The format of {\em collectionConfig.xml} has not yet been defined. 192 193 \subsubsection{Starting up} 277 </buildConfig> 278 \end{verbatim}\end{gsc} 279 \caption{Sample buildConfig.xml file} 280 \label{fig:buildconfig} 281 \end{figure} 282 283 284 \subsection{Start up configuration}\label{sec:startup-config} 194 285 195 286 We use the Tomcat web server, which operates either stand-alone in a test mode 196 287 or in conjunction with the Apache web server. The Greenstone LibraryServlet 197 class is loaded by Tomcat and the servlet's {\eminit()} method is called. Each time a198 {\em get\/}/{\em put\/}/{\empost} (etc.) is used, a new thread is started and199 {\em doGet()\/}/{\em doPut()\/}/{\emdoPost()} (etc.) is called.200 201 The {\em init()} method creates a new Receptionist and a new instance of the288 class is loaded by Tomcat and the servlet's \gst{init()} method is called. Each time a 289 \gst{get/put/post} (etc.) is used, a new thread is started and 290 \gst{doGet()/doPut()/doPost()} (etc.) is called. 291 292 The \gst{init()} method creates a new Receptionist and a new 202 293 MessageRouter. The appropriate system variables are set in each (interface 203 name, site name, etc.) and then {\emconfigure()} is called. A MessageRouter294 name, site name, etc.) and then \gst{configure()} is called. A MessageRouter 204 295 reference is given to the Receptionist. The servlet then communicates only with 205 296 the Receptionist, not with the MessageRouter. 206 297 207 298 The Receptionist loads up all the different Action classes. A 208 static list is used initially, and other Actions may be loaded on the fly as needed. 209 210 The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This 211 lists the ServiceRack classes that need to be loaded, and lists any sites that need 212 to be connected to. It looks inside the {\em collect} directory which contains 213 all the site's collections and loads up a Collection object for each valid 214 collection found. 215 216 The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml} 299 static list is used initially, and other Actions may be loaded on the fly as needed. Actions are added to a map, with shortnames for keys. Eg the QueryAction is added with key 'q'. The Actions are passed the MessageRouter reference too. 300 301 The MessageRouter reads in its site configuration file \gst{siteConfig.xml}. This 302 lists the ServiceRack and ServiceCluster classes that need to be loaded and any sites that need 303 to be connected to. 304 It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.). 305 Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module. 306 ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList. 307 For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If teh server for teh remote site is up and running, this should be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static lists. 308 309 The MessageRouter also looks inside the site's \gst{collect} directory loads up a Collection object for each valid collection found. 310 311 The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml} 217 312 files, determines the metadata, and loads ServiceRack classes based on the 218 names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServiceRack>#} XML element is passed to the object to be used in configuration. 219 220 \section{System messages} 313 names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file. 314 Collection objects are added to teh module map with their name as a key, and also a collection element is added into teh collectionList xml. 315 316 \subsection{Run-time (re)configuration}\label{sec:runtime-config} 317 318 The startup configuration reads in teh various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided. 319 320 Currently there are commands to reconfigure the entire site---i.e. the MessageRouter repeats the whole of its startup initialisation. 321 322 ***TODO*** 323 whats available, whats not. show URLS, refer to system messages in next section 324 325 \section{System messages}\label{sec:messages} 326 327 for each type of message, show the basic elements, then some example messages. 328 Lists must only have the same elements in them. 221 329 222 330 Once the system is up and running (the configuration 223 process described in Section~\ref{subsec:config} has been carried out), it is passing messages back and forth. All modules communicate via message passing. 224 First, we examine the basic message 225 formats, then how the system creates and responds to the messages. 331 process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing. 332 333 First, we look at how messages originate, and how they flow in the system. Then, we examine the basic message 334 format, and look at the different types of messages. 335 336 \subsection{Message flow} 337 338 \subsection{Basic format} 226 339 227 340 All messages are enclosed in 228 \begin{quote}\begin{ footnotesize}\begin{verbatim}229 <message> 230 \end{verbatim}\end{ footnotesize}\end{quote}231 Messages contain either {\em <request>\/} or {\em <response>\/} elements--- a single message may contain multiple requests. Each {\em <request>\/} (and {\em <response>\/}?) has a language attribute, of the form ``lang='xx'''.341 \begin{quote}\begin{gsc}\begin{verbatim} 342 <message> 343 \end{verbatim}\end{gsc}\end{quote} 344 Messages contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='xx'}. 232 345 The language attribute is used by the XSLT to determine the language currently 233 346 being used by the user interface. Virtually all messages contain text strings, … … 239 352 This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system. 240 353 241 \subs ubsection{Servlet to Receptionist messages}\label{subsec:url-type}354 \subsection{cgi-type messages}\label{sec:cgi} 242 355 243 356 Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a 244 Greenstone URL. The two main arguments are {\em a} (action) and {\emsa}245 (subaction).\footnote{The {\em sa} replaces Greenstone's old {\emp} arg for357 Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa} 358 (subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for 246 359 the page action, and is new for other actions. For example, a text query could 247 be encoded as {\ema=q \& sa=text\/}.} All other arguments are treated as360 be encoded as \gst{a=q \& sa=text\/}.} All other arguments are treated as 248 361 parameters. 249 362 250 363 Here is the XML representation of the arguments: 251 364 252 \begin{quote}\begin{ footnotesize}\begin{verbatim}365 \begin{quote}\begin{gsc}\begin{verbatim} 253 366 <request type='cgi' action='a-arg-value' subaction='sa-arg-value' 254 367 lang='en' output='html'> 255 368 <paramList> 256 <param name='xx' value=' 'yyy'/>369 <param name='xx' value='yyy'/> 257 370 <param name=... 258 371 </paramList> 259 372 </request> 260 \end{verbatim}\end{ footnotesize}\end{quote}373 \end{verbatim}\end{gsc}\end{quote} 261 374 The receptionist routes the message to the appropriate action. The output 262 375 field is used to indicate what type of output to return. The actions do not … … 278 391 \hline 279 392 a & action & a (applet), q (query), b (browse), p (page), pr (process) \\ 393 & & s (system)\\ 280 394 sa & subaction & home, about (page action)\\ 281 395 c & collection or & demo, build \\ … … 285 399 ro & request only & 0 or 1 - if set to one, the request is carried out \\ 286 400 & & but no processing of the results is done \\ 401 & & currently only used in process actions \\ 287 402 o & output type & xml, html, wml \\ 288 l & language & en, fr, zh \\403 l & language & en, fr, zh ...\\ 289 404 d & document id & HASHxxx \\ 290 405 r & resource id & ???\\ 291 id & process handle & an integer identifying a particular process request \\406 pid & process handle & an integer identifying a particular process request \\ 292 407 \hline 293 408 \end{tabular}} 409 \caption{Generic arguments that can appear in a Greenstone URL} 294 410 \label{tab:args} 295 \caption{Generic rguments that can appear in a Greenstone URL}296 411 \end{table} 297 412 298 413 Here is an example message that retrieves the home page in French: 299 \begin{quote}\begin{ footnotesize}\begin{verbatim}414 \begin{quote}\begin{gsc}\begin{verbatim} 300 415 <message> 301 416 <request lang='fr' type='cgi' action='p' subaction='home' 302 417 output='html'/> 303 418 </message> 304 \end{verbatim}\end{ footnotesize}\end{quote}419 \end{verbatim}\end{gsc}\end{quote} 305 420 306 421 This message represents a text query: 307 \begin{quote}\begin{ footnotesize}\begin{verbatim}422 \begin{quote}\begin{gsc}\begin{verbatim} 308 423 <message> 309 424 <request lang='en' type='cgi' action='q' output='html'> … … 319 434 </paramList> 320 435 </message> 321 \end{verbatim}\end{ footnotesize}\end{quote}436 \end{verbatim}\end{gsc}\end{quote} 322 437 323 438 \subsubsection{Module to module messages} … … 326 441 information from one module to another, for example from an Action to the 327 442 MessageRouter module, and from that module to a service module. Requests have 328 a {\em to} attribute and responses have {\em from\/}. These are addresses used329 by routing modules. For example {\emto='site1/site2/demo/TextQuery'} routes a330 message to a MessageRouter ( {\em site1\/}), from there to another MessageRouter331 ( {\em site2\/}), from there to a collection ({\em demo\/}), and from there to a332 particular service ( {\em TextQuery\/}).443 a \gst{to} attribute and responses have \gst{from}. These are addresses used 444 by routing modules. For example \gst{to='site1/site2/demo/TextQuery'} routes a 445 message to a MessageRouter (\gst{site1}), from there to another MessageRouter 446 (\gst{site2}), from there to a collection (\gst{demo}), and from there to a 447 particular service (\gst{TextQuery}). 333 448 334 449 Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system. 335 450 451 \subsection{'describe'-type messages}\label{sec:describe} 336 452 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient. 337 \begin{quote}\begin{ footnotesize}\begin{verbatim}453 \begin{quote}\begin{gsc}\begin{verbatim} 338 454 <message> 339 455 <request lang='en' type='describe' to=''/> 340 456 </message> 341 \end{verbatim}\end{ footnotesize}\end{quote}342 If the {\emto} field is empty, the request is answered by the first module that it is passed to.457 \end{verbatim}\end{gsc}\end{quote} 458 If the \gst{to} field is empty, the request is answered by the first module that it is passed to. 343 459 An example response from a MessageRouter might look like this: 344 \begin{quote}\begin{ footnotesize}\begin{verbatim}460 \begin{quote}\begin{gsc}\begin{verbatim} 345 461 <message> 346 462 <response lang='en' type='describe'> … … 362 478 </response> 363 479 </message> 364 \end{verbatim}\end{ footnotesize}\end{quote}480 \end{verbatim}\end{gsc}\end{quote} 365 481 This MessageRouter has one site-wide service, a cross-collection searching service. It 366 communicates with one site, {\em org.greenstone.gsdl1\/}. It is aware of four367 collections. One of these, {\em myfiles\/}, belongs to it; the other three are482 communicates with one site, \gst{org.greenstone.gsdl1}. It is aware of four 483 collections. One of these, \gst{myfiles}, belongs to it; the other three are 368 484 available through the external site. One of those collections is actually from 369 485 a further external site. … … 371 487 It is possible to ask just for a specific part of the information provided by a 372 488 describe request, rather than the whole message. For example, these two 373 messages get the {\em collectionList} and the {\emsiteList} respectively:374 \begin{quote}\begin{ footnotesize}\begin{verbatim}489 messages get the \gst{collectionList} and the \gst{siteList} respectively: 490 \begin{quote}\begin{gsc}\begin{verbatim} 375 491 <message lang='en'> 376 492 <request type='describe' to='' info='collectionList'/> … … 380 496 <request type='describe' to='' info='siteList'/> 381 497 </message> 382 \end{verbatim}\end{ footnotesize}\end{quote}498 \end{verbatim}\end{gsc}\end{quote} 383 499 When a collection is asked to describe itself, what is returned is all of the 384 500 collection specific metadata and a list of services. For example, here is such 385 501 a message, along with a sample response. 386 502 387 \begin{quote}\begin{ footnotesize}\begin{verbatim}503 \begin{quote}\begin{gsc}\begin{verbatim} 388 504 <message lang='en'> 389 505 <request type='describe' to='demo'/> … … 408 524 </response> 409 525 </message> 410 \end{verbatim}\end{ footnotesize}\end{quote}411 A {\emdescribe} request sent to a service returns a list of parameters that526 \end{verbatim}\end{gsc}\end{quote} 527 A \gst{describe} request sent to a service returns a list of parameters that 412 528 the service accepts, and describes the content type for the request and 413 529 response. 414 530 415 531 Parameters have the following format: 416 \begin{quote}\begin{ footnotesize}\begin{verbatim}532 \begin{quote}\begin{gsc}\begin{verbatim} 417 533 <param name='xxx' type='integer|boolean|string' default='yyy'/> 418 534 <param name='xxx' type='enum_single|enum_multi' default='aa'/> … … 423 539 <param .../> 424 540 </param> 425 \end{verbatim}\end{ footnotesize}\end{quote}541 \end{verbatim}\end{gsc}\end{quote} 426 542 If no default is specified, the parameter is assumed to be mandatory. 427 543 Here are some examples of parameters: 428 \begin{quote}\begin{ footnotesize}\begin{verbatim}544 \begin{quote}\begin{gsc}\begin{verbatim} 429 545 <param name='Case' type='boolean' default='0'/> 430 546 … … 446 562 </param> 447 563 448 \end{verbatim}\end{ footnotesize}\end{quote}564 \end{verbatim}\end{gsc}\end{quote} 449 565 Here is a message, along with a sample response. 450 \begin{quote}\begin{ footnotesize}\begin{verbatim}566 \begin{quote}\begin{gsc}\begin{verbatim} 451 567 <message> 452 568 <request lang='en' type='describe' to='demo/TextQuery'/> … … 466 582 </response> 467 583 </message> 468 \end{verbatim}\end{ footnotesize}\end{quote}584 \end{verbatim}\end{gsc}\end{quote} 469 585 470 586 So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services. 471 587 472 ``Configure'' requests are used to tell the MessageRouter to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. 473 474 So far, we have {\em activate} and {\em deactivate} configure requests. 588 \subsection{'system'-type messages} 589 ``System'' requests are used to tell the MessageRouter or a Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. 590 591 So far, we have \gst{activate} and \gst{deactivate} configure requests. 475 592 Some examples are as follows. 476 \begin{quote}\begin{ footnotesize}\begin{verbatim}593 \begin{quote}\begin{gsc}\begin{verbatim} 477 594 <message><request type='configure' to=''> 478 595 <configure action='deactivate' type='collection' name='demo'/> … … 487 604 name='TranslationServices'/> 488 605 </request></message> 489 \end{verbatim}\end{ footnotesize}\end{quote}606 \end{verbatim}\end{gsc}\end{quote} 490 607 491 608 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first. 492 609 493 610 The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is: 494 \begin{quote}\begin{ footnotesize}\begin{verbatim}611 \begin{quote}\begin{gsc}\begin{verbatim} 495 612 <message><response from='' type='configure'> 496 613 <status>demo collection activated</status> 497 614 </response></message> 498 \end{verbatim}\end{ footnotesize}\end{quote}615 \end{verbatim}\end{gsc}\end{quote} 499 616 \footnote{this format not properly defined yet} 500 617 501 618 Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also. 502 619 620 \subsection{'process'-type messages} 621 622 divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse... 623 show basic structure, then more detailed format for each subtype 624 503 625 The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet. 504 626 … … 506 628 507 629 The basic structure of a service request is as follows: 508 \begin{quote}\begin{ footnotesize}\begin{verbatim}630 \begin{quote}\begin{gsc}\begin{verbatim} 509 631 <message> 510 632 <request lang='en' type='query' to='demo/TextQuery'> 511 633 <paramList/> 512 <content/>634 other elements... 513 635 </request> 514 636 </message> 515 \end{verbatim}\end{ footnotesize}\end{quote}637 \end{verbatim}\end{gsc}\end{quote} 516 638 517 639 The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request. 518 640 519 \begin{quote}\begin{ footnotesize}\begin{verbatim}641 \begin{quote}\begin{gsc}\begin{verbatim} 520 642 <param name='case' value='1'/> 521 643 <param name='maxDocs' value='34'/> 522 644 <param name='index' value='dtx'/> 523 \end{verbatim}\end{ footnotesize}\end{quote}524 525 Some requests have a content---for document retrieval, the content is the list of documents to retrieve. For metadata retrieval, tehcontent is the list of documents, and a list of metadata to retrieve for each document.645 \end{verbatim}\end{gsc}\end{quote} 646 647 Some requests have other content---for document retrieval, this would be a list of documents to retrieve. For metadata retrieval, the content is the list of documents, and a list of metadata to retrieve for each document. 526 648 527 649 Responses vary depending on the type of request. 650 651 \subsubsection{'query'-type services} 528 652 Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.} 529 653 … … 531 655 532 656 Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order: 533 \begin{quote}\begin{ footnotesize}\begin{verbatim}534 <message> 535 <request lang='en' to="mgppdemo/TextQuery" type=" query">657 \begin{quote}\begin{gsc}\begin{verbatim} 658 <message> 659 <request lang='en' to="mgppdemo/TextQuery" type="process"> 536 660 <paramList> 537 661 <param name="maxDocs" value="10"/> … … 542 666 <param name="index" value="t0"/> 543 667 <param name="case" value="0"/> 668 <param name="query" value="snail"/> 544 669 </paramList> 545 <content>snail</content>546 670 </request> 547 671 </message> 548 \end{verbatim}\end{ footnotesize}\end{quote}549 550 \begin{quote}\begin{ footnotesize}\begin{verbatim}672 \end{verbatim}\end{gsc}\end{quote} 673 674 \begin{quote}\begin{gsc}\begin{verbatim} 551 675 <message> 552 676 <response lang='en' from="mgppdemo/TextQuery" type="query"> 553 <content> 554 <documentList> 555 <document name="HASH010f073f22033181e206d3b7"/> 556 <document name="HASH010f073f22033181e206d3b7.2"/> 557 <document name="HASHac0a04dd14571c60d7fbfd"/> 558 </documentList> 559 </content> 677 <documentList> 678 <document name="HASH010f073f22033181e206d3b7"/> 679 <document name="HASH010f073f22033181e206d3b7.2"/> 680 <document name="HASHac0a04dd14571c60d7fbfd"/> 681 </documentList> 560 682 </response> 561 683 </message> 562 \end{verbatim}\end{footnotesize}\end{quote} 563 684 \end{verbatim}\end{gsc}\end{quote} 685 686 \subsubsection{'retrieve'-type services} 564 687 Give me the Title metadata for these documents: 565 \begin{quote}\begin{ footnotesize}\begin{verbatim}688 \begin{quote}\begin{gsc}\begin{verbatim} 566 689 <message> 567 690 <request lang='en' to="mgppdemo/MetadataRetrieve" 568 691 type="retrieve"> 569 <content>570 692 <documentList> 571 693 <document name="HASH010f073f22033181e206d3b7"/> … … 579 701 </request> 580 702 </message> 581 \end{verbatim}\end{ footnotesize}\end{quote}582 583 \begin{quote}\begin{ footnotesize}\begin{verbatim}703 \end{verbatim}\end{gsc}\end{quote} 704 705 \begin{quote}\begin{gsc}\begin{verbatim} 584 706 <message> 585 707 <response lang='en' from="mgppdemo/MetadataRetrieve" … … 611 733 </response> 612 734 </message> 613 \end{verbatim}\end{ footnotesize}\end{quote}735 \end{verbatim}\end{gsc}\end{quote} 614 736 615 737 Give me the text for this document: 616 \begin{quote}\begin{ footnotesize}\begin{verbatim}738 \begin{quote}\begin{gsc}\begin{verbatim} 617 739 <message> 618 740 <request lang='en' to="mgppdemo/DocumentRetrieve" … … 625 747 </request> 626 748 </message> 627 \end{verbatim}\end{ footnotesize}\end{quote}628 629 \begin{quote}\begin{ footnotesize}\begin{verbatim}749 \end{verbatim}\end{gsc}\end{quote} 750 751 \begin{quote}\begin{gsc}\begin{verbatim} 630 752 <message> 631 753 <response lang='en' from="mgppdemo/DocumentRetrieve" … … 647 769 </response> 648 770 </message> 649 \end{verbatim}\end{footnotesize}\end{quote} 650 771 \end{verbatim}\end{gsc}\end{quote} 772 773 \subsubsection{'browse'-type services} 774 775 \subsubsection{'process'-type services} 651 776 Build requests are not a request for data---they are a request for some action to be carried out, for example, create or import or build or activate a collection. The response is a status or an error message. The import and build commands may take a long time to complete, so a message is sent back after a successful start of the command. The status may be polled by the requester to see how the process is going. 652 777 … … 655 780 Some example requests (note that the build services are grouped into a service cluster called 'build', hence the addresses all begin with 'build/'): 656 781 657 \begin{quote}\begin{ footnotesize}\begin{verbatim}782 \begin{quote}\begin{gsc}\begin{verbatim} 658 783 <message> 659 784 <request lang='en' type='process' to='build/NewCollection'> … … 673 798 </request> 674 799 </message> 675 \end{verbatim}\end{footnotesize}\end{quote} 676 677 678 \subsection{Generating the pages} 679 680 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{subsec:url-type}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system. 800 \end{verbatim}\end{gsc}\end{quote} 801 802 \subsubsection{'enrich]-type services} 803 804 \subsection{'status'-type messages} 805 806 807 \subsection{'format'-type messages} 808 809 \subsection{'applet'-type services} 810 811 \section{Page generation}\label{sec:pagegen} 812 813 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system. 681 814 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module. 682 815 … … 684 817 685 818 The basic page format is: 686 \begin{quote}\begin{ footnotesize}\begin{verbatim}819 \begin{quote}\begin{gsc}\begin{verbatim} 687 820 <page> 688 <config/> 689 <display/> 690 <request/> 691 <response/> 821 <pageExtra> 822 <config/> 823 <display/> 824 </pageExtra> 825 <pageRequest/> 826 <pageResponse/> 692 827 </page> 693 \end{verbatim}\end{ footnotesize}\end{quote}828 \end{verbatim}\end{gsc}\end{quote} 694 829 695 830 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization. 696 831 697 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and display information, and the xslt files. 698 699 \subsubsection{Page action} 832 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. 833 834 835 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are 836 located in interfaces/default/transforms. Collections, sites and other interfaces 837 can override these files by having their own copy of the appropriate 838 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current 839 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.} 840 ***TODO*** describe a bit more?? 841 842 \subsection{Internationalization} 843 844 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 845 846 Language specific text strings are specified in resource bundle property files. These live in resources/java. 847 848 There is a properties file per class, and one per interface. At the moment, we have 849 850 GS2MGPPSearch.properties 851 GS2MGPPRetrieve.properties etc - the service classes 852 853 interface\_default.properties. - for the default interface 854 855 To add other languages, create eg GS2MGPPSearch\_fr.properties. 856 857 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there. 858 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? 859 860 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created. 861 862 \subsection{Page action} 700 863 701 864 Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is 702 transformed using {\em home.xsl\/}. For the 'about' page, a {\em703 describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata 704 and a list of services, and the result is transformed using {\em about.xsl\/}. 705 706 \subs ubsection{Query action}707 708 There are three query services which have been implemented: TextQuery, SimpleFieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action.865 transformed using \gst{home.xsl}. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata 866 and a list of services, and the result is transformed using \gst{about.xsl}. 867 868 869 \subsection{Query action} 870 871 There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action. 709 872 For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is 710 873 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers 711 874 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of 712 documents, with a request for their {\emTitle} metadata. The service description and query result are combined into a page of xml, which is713 transformed using {\em basicquery.xsl\/} to produce the html page.714 715 \subs ubsection{Applet action}716 717 There are two types of request to the applet action: {\ema=a \& sa=d\/} and718 {\em a=a \& sa=r\/}. The value {\emsa=d\/} means ``display the applet.'' A719 {\em describe} request is sent to the service, which returns the {\footnotesize \verb#<applet>#} HTML element. The transformation file {\emapplet.xsl} embeds this875 documents, with a request for their \gst{Title} metadata. The service description and query result are combined into a page of xml, which is 876 transformed using \gst{basicquery.xsl} to produce the html page. 877 878 \subsection{Applet action} 879 880 There are two types of request to the applet action: \gst{a=a \& sa=d\/} and 881 \gst{a=a \& sa=r\/}. The value \gst{sa=d\/} means ``display the applet.'' A 882 \gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this 720 883 into the page, and the servlet returns the HTML. 721 884 722 The value {\emsa=r} signals a request from the applet. The result is returned885 The value \gst{sa=r} signals a request from the applet. The result is returned 723 886 directly to the applet code, in XML. The other parameters are sent to the 724 887 service untransformed, and the result is passed directly back to the applet. … … 728 891 Here are two examples of requests generated by the Applet action, along with their corresponding responses. 729 892 730 The first request corresponds to the URL arguments {\ema=a \&893 The first request corresponds to the URL arguments \gst{a=a \& 731 894 sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind 732 895 applet for the mgppdemo collection''. 733 896 734 \begin{quote}\begin{ footnotesize}\begin{verbatim}897 \begin{quote}\begin{gsc}\begin{verbatim} 735 898 <message> 736 899 <request type='describe' to='mgppdemo/PhindApplet'/> … … 761 924 </response> 762 925 </message> 763 \end{verbatim}\end{ footnotesize}\end{quote}764 765 The second request corresponds to the arguments {\ema=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this926 \end{verbatim}\end{gsc}\end{quote} 927 928 The second request corresponds to the arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this 766 929 indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the 767 930 request as parameters. The response is in a form suitable for the applet, placed inside 768 {\footnotesize \verb#<appletData>#} in a standard Greenstone message. AppletAction returns the931 \gst{<appletData>} in a standard Greenstone message. AppletAction returns the 769 932 contents of appletData to the browser, i.e. to the applet itself. 770 933 771 \begin{quote}\begin{ footnotesize}\begin{verbatim}934 \begin{quote}\begin{gsc}\begin{verbatim} 772 935 <message> 773 936 <request type='query' to='mgppdemo/PhindApplet'> … … 812 975 </response> 813 976 </message> 814 \end{verbatim}\end{ footnotesize}\end{quote}815 816 Note that the applet HTML may need to know the name of the {\emlibrary}977 \end{verbatim}\end{gsc}\end{quote} 978 979 Note that the applet HTML may need to know the name of the \gst{library} 817 980 program. However, that name is chosen by the person who installed the software 818 981 and will not necessarily be ``library''. To get around this, the applet can 819 982 put a parameter called ``library'' into the applet data with a null value: 820 \begin{quote}\begin{ footnotesize}\begin{verbatim}983 \begin{quote}\begin{gsc}\begin{verbatim} 821 984 <PARAM NAME='library' VALUE=''/>\/} 822 \end{verbatim}\end{ footnotesize}\end{quote}985 \end{verbatim}\end{gsc}\end{quote} 823 986 When the Applet action encounters this parameter it inserts the name of the 824 987 current library servlet as its value. 825 988 826 \subs ubsection{Document action}989 \subsection{Document action} 827 990 828 991 DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and 829 992 table of contents would be needed to make the display nicer. 830 993 831 \subsubsection{Formatting the page using XSLT}\label{subsec:xslt} 832 833 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are 834 located in interfaces/default/transforms. Collections, sites and other interfaces 835 can override these files by having their own copy of the appropriate 836 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current 837 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.} 838 839 \subsection{Internationalization} 840 841 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 842 843 Language specific text strings are specified in resource bundle property files. These live in resources/java. 844 845 There is a properties file per class, and one per interface. At the moment, we have 846 847 GS2MGPPSearch.properties 848 GS2MGPPRetrieve.properties etc - the service classes 849 850 interface\_default.properties. - for the default interface 851 852 To add other languages, create eg GS2MGPPSearch\_fr.properties. 853 854 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml $<$display$>$ element - the xslt can get the ones it needs from there. 855 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? 856 857 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created. 858 859 \subsection{Collection formation} 860 861 Greenstone 2 compatible building has been implemented in gsdl3. so far only mgpp collections will work. 994 995 996 \section{Collection formation} 997 998 999 Greenstone 2 compatible building has been implemented in gsdl3. 862 1000 863 1001 Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory. … … 874 1012 Collection building can also be done on the command line: 875 1013 876 ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name> 1014 \gst{ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name>} 877 1015 878 1016 eg 879 1017 880 ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol 1018 \gst{ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol} 881 1019 882 1020 the options get passed to the underlying script, - there is no good help message yet. … … 888 1026 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events. 889 1027 890 \section{Details} 1028 \subsection{Collection design}\label{sec:colldesign} 1029 1030 \section{Installation details} 891 1031 892 1032 This section describes the directory structure of the Greenstone source, and provides an installation guide to installing Greenstone from CVS. … … 900 1040 901 1041 \begin{table} 1042 \caption{The Greenstone directory structure} 1043 \label{tab:dirs} 902 1044 \center{\footnotesize 903 1045 \begin{tabular}{l p{7cm}} … … 929 1071 gsdl3/src/java/org/greenstone/testing 930 1072 & Junit scaffolding for unit testing.\\ 1073 gsdl3/src/java/org/greenstone/applet 1074 & where the code for applets goes \\ 1075 gsdl3/src/java/org/greenstone/applet/phind 1076 & the phind applet (phrase browsing) \\ 931 1077 gsdl3/src/cpp/ 932 1078 & Place for any cpp source code---none yet \\ … … 940 1086 & any resources that may be needed\\ 941 1087 gsdl3/resources/java 942 & properties files for java resource bundles - used to handle all the language specific text\\ 1088 & properties files for java resource bundles - used to handle all the language specific text This directory is on the classpath, so any other Java resources can be placed here \\ 1089 gsdl3/resources/soap 1090 & soap service description files \\ 943 1091 gsdl3/bin 944 1092 & executable stuff lives here\\ … … 951 1099 gsdl3/docs 952 1100 & Documentation :-)\\ 1101 \hline 953 1102 gsdl3/web 954 & Th e place to put any web stuff that the servlet needs. html files go here\\1103 & This is where the web site is defined. Any static html files can go here. This directory is the Tomcat root directory.\\ 955 1104 gsdl3/web/WEB-INF 956 & The web.xml file lives here ( configuration information for tomcat)\\1105 & The web.xml file lives here (servlet configuration information for tomcat)\\ 957 1106 gsdl3/web/WEB-INF/classes 958 1107 & Servlet classes go in here\\ 959 \hline 960 gsdl3/sites 1108 gsdl3/web/sites 961 1109 & Contains directories for different sites---a site is a set of collections and services served by a single MessageRouter (MR). The MR may have connections (eg soap) to other sites\\ 962 gsdl3/ sites/localsite963 & One site \\964 gsdl3/ sites/localsite/collect1110 gsdl3/web/sites/localsite 1111 & One site - the site configuration file lives here\\ 1112 gsdl3/web/sites/localsite/collect 965 1113 & The collections directory \\ 966 gsdl3/ sites/localsite/images1114 gsdl3/web/sites/localsite/images 967 1115 & Site specific images \\ 968 gsdl3/ sites/localsite/transforms1116 gsdl3/web/sites/localsite/transforms 969 1117 & Site specific transforms \\ 970 gsdl3/ interfaces971 & Contains all interface specific stuff (eg images and XSLT transforms\\972 gsdl3/ interfaces/default1118 gsdl3/web/interfaces 1119 & Contains directories for different interfaces - an interface is defined by its images and xslt files \\ 1120 gsdl3/web/interfaces/default 973 1121 & The default interface\\ 974 gsdl3/ interfaces/default/images975 & The images \\976 gsdl3/ interfaces/default/transforms977 & The XSLT files \\1122 gsdl3/web/interfaces/default/images 1123 & The images for the default interface\\ 1124 gsdl3/web/interfaces/default/transforms 1125 & The XSLT files for the default interface\\ 978 1126 \hline 979 1127 \end{tabular}} 980 \label{tab:dirs}981 \caption{The Greenstone directory structure}982 1128 \end{table} 983 1129 984 1130 \subsection{Installation guide} 985 1131 986 \newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}} 987 988 Cuurently, greenstone3 is only available through CVS. The installation procedure has been automated. 1132 \newcommand{\gsdlhome}{\$GSDL3HOME} 1133 \newcommand{\gshome}{\$GSDLHOME} 1134 1135 Cuurently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}. 989 1136 990 1137 \subsubsection{Get the source} 991 1138 992 \noindent If you have a greenstone\_cvs account, you can use the following: 993 994 \begin{footnotesize}\begin{tt} 995 \noindent export CVSROOT=:ext:{\em your-username}@cvs.scms.waikato.ac.nz:\\ 996 \indent /usr/local/global-cvs/gsdl-src\\ 997 export CVS\_RSH=ssh\\ 998 cvs co gsdl3\\ 999 \end{tt}\end{footnotesize} 1000 1001 \noindent Otherwise, you can get it through anonymous access: 1002 1003 \begin{footnotesize}\begin{tt} 1004 \noindent export CVSROOT=:pserver:cvs\[email protected]:2402\\ 1005 \indent /usr/local/global-cvs/gsdl-src\\ 1006 export CVS\_RSH=ssh\\ 1007 cvs co gsdl3\\ 1008 \end{tt}\end{footnotesize} 1009 1010 \noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}. 1139 If you have a greenstone\_cvs account, you can use the following: 1140 1141 \begin{quote}\begin{gsc}\begin{verbatim} 1142 export CVS_RSH=ssh 1143 cvs -d :ext:@cvs.scms.waikato.ac.nz:/usr/local/global-cvs/ 1144 gsdl-src co gsdl3 1145 \end{verbatim}\end{gsc}\end{quote} 1146 1147 Otherwise, you can get it through anonymous access: 1148 1149 \begin{quote}\begin{gsc}\begin{verbatim} 1150 cvs -d :pserver:cvs\[email protected]:2402/usr/local/ 1151 global-cvs/gsdl-src co gsdl3 1152 \end{verbatim}\end{gsc}\end{quote} 1153 1154 If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some versions of CVS have trouble accessing this repository. We are using version 1.11.1p1. 1011 1155 1012 1156 \subsubsection{Compile and install greenstone}\label{subsec:compile} 1013 1157 1014 An install.sh script has been constructed (thanks, Stuart) to compile and install greenstone 3. What you neeto do is:1015 1016 \begin{ footnotesize}\begin{tt}1017 cd gsdl3 1018 source setup.bash 1019 install.bash 1020 source setup.bash 1021 \end{ tt}\end{footnotesize}1022 1023 If you want to do greenstone2 compatible building (currently the only type) you need to have greenstone 2 installed, 'source setup.bash' in the top level greenstone 2 directory, then re-'source setup.bash' for greenstone 3. This is to set GSDLHOMEfor tomcat.1024 1025 \noindent Note: 'source setup.bash' needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc.1158 An install.sh script has been constructed to compile and install Greenstone3. What you need to do is: 1159 1160 \begin{quote}\begin{gsc} 1161 cd gsdl3\\ 1162 source setup.bash\\ 1163 install.bash\\ 1164 source setup.bash\\ 1165 \end{gsc}\end{quote} 1166 1167 If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for tomcat. 1168 1169 \noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc. 1026 1170 1027 1171 If you want to use SOAP to talk to remote sites, you also need to do the following: 1028 1172 1029 \begin{ footnotesize}\begin{tt}1173 \begin{quote}\begin{gsc} 1030 1174 install-soap.bash 1031 \end{tt}\end{footnotesize} 1032 1033 Thats it. 1034 1035 You dont want to run install.bash twice - it adds stuff into files 1036 1037 To update your installation, you can run update.bash - this remakes all the java stuff. 1175 \end{gsc}\end{quote} 1176 1177 There is one java command that sometimes doesn't work under bash, so you may need to cut and paste it into the terminal to get it to work. See the output from the bash-script for details. 1178 1179 To shutdown or startup tomcat, the commands are: 1180 \begin{quote}\begin{gsc} 1181 \gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\ 1182 \gsdlhome/comms/tomcat/jakarta/bin/startup.sh\\ 1183 \end{gsc}\end{quote} 1184 1185 You dont want to run install.bash twice - it adds stuff into files. 1186 To update your installation, you can run update.bash - this updates your code form cvs, and remakes all the java stuff. 1038 1187 1039 1188 1040 1189 \subsubsection{The sample sites} 1041 1190 1042 \noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to. 1043 localsite does not connect to any other sites. site1 specifies a SOAP connection to localsite. 1044 1045 \noindent The collections which do not have data can be looked at but you cant do any queries on them. 1046 1191 \noindent There are two greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to. 1192 localsite does not connect to any other sites. soapsite specifies a SOAP connection to localsite. 1047 1193 1048 1194 \subsubsection{Tomcat} 1049 1195 1050 1196 \noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet. 1051 \\ 1052 \\ 1053 \noindent The file \begin{footnotesize}{\tt \gsdlhome/web/WEB-INF/web.xml}\end{footnotesize} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets. 1054 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, ``library'', which serves localsite, and ``library1'' which serves site1. 1055 \\ 1056 \\ 1057 \noindent One initialisation parameter for the library servlets is {\footnotesize \verb#gsdl3home#}. 1058 \begin{footnotesize}\begin{verbatim} 1059 <init-param> 1060 <param-name>gsdl3home</param-name> 1061 <param-value>/research/kjdon/home/gsdl3</param-value> 1062 </init-param> 1063 \end{verbatim}\end{footnotesize} 1064 1065 The file \gsdlhome/comms/tomcat/jakarta/conf/server.xml is the tomcat configuration file. setup.bash adds a context for gsdl servlets - this tells tomcat where to find the web.xml file, and what url (eg /gsdl3) to give it. 1066 1067 \noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file 1197 1198 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets. 1199 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. 1200 1201 The initialisation parameters used by the library servlets are as follows: 1202 1203 \begin{tabular}{lll} 1204 \bf name & \bf sample value & \bf description \\ 1205 \hline 1206 gsdl3home & /research/kjdon/gsdl3 & the base directory of the gsdl3 installation \\ 1207 sitename & localsite & the site to use \\ 1208 interfacename & default & the interface to use\\ 1209 libraryname & library & the name of the library program \\ 1210 defaultlang & en & the default language for the interface\\ 1211 receptionist & NZDLReceptionist & (optional) specifies an alternative Receptionist to use\\ 1212 messagerouter & NewMessageRouter & (optional) specifies an alternative MessageRouter to use\\ 1213 \hline 1214 \end{tabular} 1215 1216 It is possible to run several servlets at once, with different combinations of sites and/or interfaces. 1217 1218 The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the tomcat configuration file. The installation process adds a context for greenstone3 servlets (\gst{\gsdlhome/web})---this tells tomcat where to find the web.xml file, and what url (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~. 1219 1220 1221 Tomcat runs by default on port 8080---this can be changed in server.xml. The siteConfig files also need changing if Tomcat's port is changed: \gst{<httpAddress>} for the site, and \gst{<address>} for a remote site both use this. 1222 1068 1223 1069 1224 \subsubsection{Serving your site using tomcat}\label{subsec:runtomcat} … … 1071 1226 \noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then, 1072 1227 1073 \begin{ footnotesize}\begin{tt}1074 \noindent cd \gsdlhome/comms/tomcat/jakarta -tomcat-4.0.1/bin\\1228 \begin{gsc}\begin{tt} 1229 \noindent cd \gsdlhome/comms/tomcat/jakarta/bin\\ 1075 1230 ./startup.sh 1076 \end{tt}\end{ footnotesize}1231 \end{tt}\end{gsc} 1077 1232 1078 1233 \noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat) 1079 1234 \\ 1080 1235 \\ 1081 \noindent The tomcat server can be accessed on the web at {\footnotesize \verb#http://localhost:8080#}---this gets you to a welcome page.1082 The greenstone stuff is at {\footnotesize \verb#http://localhost:8080/gsdl3#}---this displays {\footnotesize\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.1236 \noindent The tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page. 1237 The greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page. 1083 1238 1084 1239 \noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\ 1085 1240 \begin{bulletedlist} 1086 \begin{ footnotesize}\begin{tt}1241 \begin{gsc} 1087 1242 \item \gsdlhome/web/WEB-INF/web.xml 1088 1243 \item \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml 1089 \end{ tt}\end{footnotesize}1244 \end{gsc} 1090 1245 \item any classes or jar files used by the servlets 1091 1246 \end{bulletedlist} 1092 1247 \noindent Note: stdin and stdout for the servlets both go to\\ 1093 \begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/logs/catalina.out}\end{footnotesize} 1248 \gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out} 1249 1250 On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting tomcat. 1094 1251 1095 1252 \subsubsection{Using SOAP to talk to a remote site} 1096 1253 1097 \noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. s ite1 specifies a SOAP connection to localsite. If you run site1 without connecting to localsite, you can only see the local collections, eg the dummy collection myfiles. However, if you connect to localsite, you can see all of {\em its} collections as well.1254 \noindent The previous installation stuff is fine if you only want to talk to local sites. However, if you want to connect using SOAP to a remote site, some more stuff needs to be done. soapsite specifies a SOAP connection to localsite. If you run soapsite without connecting to localsite, you don't get any collections. However, if you connect to localsite, you can see all of {\em its} collections. 1098 1255 \\ 1099 1256 \\ 1100 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service. 1101 1102 this is done by install-soap.bash. 1257 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service. 1258 This is done by install-soap.bash. 1103 1259 You can also deploy a service through the website. If tomcat is not running, start it up (see \ref{subsec:runtomcat}). 1104 1260 1105 \noindent The SOAP servlet can be accessed at \begin{ footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.1261 \noindent The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services. 1106 1262 1107 1263 \noindent To deploy the SOAPServer for localsite: … … 1124 1280 \subsubsection{Debugging SOAP} 1125 1281 1126 \noindent If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, there is a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port. 1127 1128 \noindent To run it: 1129 1130 \noindent {\footnotesize \verb#java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080#} 1131 1132 \noindent tomcat uses port 8080 - you need to modify greenstone to talk to port 8070 instead of 8080. - this is specified in the {\footnotesize \verb#site#} element of the site configuration file. 1133 \\ 1134 \\ 1135 \noindent eg, in \begin{footnotesize}{\tt \gsdlhome/sites/site1/siteConfig.xml}\end{footnotesize}: 1136 \begin{footnotesize}\begin{verbatim} 1282 If you need to debug the SOAP stuff for some reason, or just want to look at the SOAP messages that are being passed back and forth, use a program called TcpTunnelGui. This intercepts messages coming in to one port, displays them, and passes them to another port. 1283 To run it, type: 1284 1285 \begin{quote}\gst{java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080} 1286 \end{quote} 1287 1288 8070 is the port that TcpTunnelGui listens on, and 8080 is the port that it sends the messages onto---the port that Tomcat is using. You need to modify Greenstone to talk to port 8070 when it wants to talk to Tomcat, so that the messages go through TcpTunnelGui. This is specified in the \gst{<site>} element of the soapsite site configuration file (\gst{\gsdlhome/web/sites/soapsite/siteConfig.xml}). 1289 \begin{quote}\begin{gsc}\begin{verbatim} 1137 1290 <site name="org.greenstone.localsite" 1138 1291 address="http://localhost:8080/soap/servlet/rpcrouter" 1139 1292 type="soap"/> 1140 \end{verbatim}\end{footnotesize} 1141 1142 \noindent You can replace the 8080 with 8070 if you want to run TcpTunnelGui. 1143 1144 \noindent Note that \begin{footnotesize}{\tt http://localhost:8080/soap/servlet/rpcrouter}\end{footnotesize} is the 1293 \end{verbatim}\end{gsc}\end{quote} 1294 1295 Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the 1145 1296 address for talking to the tomcat SOAP servlet services. 1297 1298 \section{Developer's notes} 1299 1300 Here are some random notes for developers who want to modify the source code. 1301 \subsection{Greenstone utility classes} 1302 1303 These are found in \gst{gsdl3/src/java/org/greenstone/gsdl3/util} and provide a variety of useful functions. Table~\ref{tab:utils} gives a brief description of the various classes. 1304 1305 \begin{table} 1306 \caption{The utility classes in org.greenstone.gsdl3.util} 1307 \label{tab:utils} 1308 \center{\footnotesize 1309 \begin{tabular}{lp{3.75in}} 1310 \hline 1311 \bf Utility class & \bf Description\\ 1312 ConfigVars & holds the servlet startup variables, including library name, site name, interface name, default language\\ 1313 Dictionary & wrapper around a ResourceBundle, providing strings with parameter\\ 1314 GSCGI & class to map between short name cgi args and long name request parameters \\ 1315 GSFile & class to create all greenstone file paths eg used to locate configuration files, xslt files and collection data. \\ 1316 GSHTML & provides convenience methods for dealing with HTML, eg making strings HTML safe\\ 1317 GSPath & used to create, examine and modify message address paths\\ 1318 GSStatus & some static codes for status messages\\ 1319 GSXML & lots of methods for extracting information out of greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by greenstone.\\ 1320 GSXSLT & some manipulation functions for greenstone XSLT\\ 1321 Misc & miscellaneous functions\\ 1322 OID & class to handle greenstone (2) OIDs\\ 1323 XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\ 1324 XMLTransformer & methods to transform XML using XSLT \\ 1325 XSLTUtil & contains static methods to be called from within XSLT \\ 1326 \hline 1327 \end{tabular} 1328 } 1329 \end{table} 1330 1331 \subsection{Creating new services} 1332 1333 a browse type service must also implement servicenameMetadataRetrieve service. 1334 \subsection{Working with XML} 1335 1336 We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but dont append them to something already in the document tree, they will be separate - but they still know who their owner document is. 1337 1338 To create new Documents, and convert Strings or Files to Documents, use XMLConverter. 1339 eg: 1340 \begin{quote}\begin{gsc} 1341 XMLConverter converter = new XMLConverter();\\ 1342 Document doc = converter.newDOM();\\ 1343 1344 File stylesheet = new File(``query.xsl'');\\ 1345 Document style = converter.getDOM(stylesheet);\\ 1346 1347 String message = ``<message><request type='cgi'/></message>'';\\ 1348 Document m = converter.getDOM(message);\\ 1349 \end{gsc}\end{quote} 1350 1351 To output a document as a String, use \gst{converter.getString(doc);} 1352 1353 To add nodes and stuff to an empty document - create them, then append to the tree: 1354 \begin{quote}\begin{gsc} 1355 Document doc = converter.newDOM();\\ 1356 Element e = doc.createElement(``message'');\\ 1357 doc.appendChild(e);\\ 1358 \end{gsc}\end{quote} 1359 1360 Note that you can only append one node to a document---this will become the toplevel node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node. 1361 1362 Nodes can only be created by a Document. Document has creation methods for all types of Nodes, for example \gst{createElement(element\_name)}, \gst{createAttribute(attr\_name)}, \gst{createTextNode(text\_data)} etc. 1363 1364 DOM006 Hierarchy request error: happens if you have more than one root node in your document 1365 1366 \subsection{Greenstone XML} 1367 1368 Greenstone format namespace: (at the moment) 1369 xmlns:gsf="http://www.greenstone.org/configformat" 1370 1371 1372 no DTDs or Schema defined yet. Until there are, try and keep to teh following rules: 1373 1374 \begin{bulletedlist} 1375 1376 \item always return expected elements even if empty, eg \gst{<paramList/>}. 1377 1378 \item If you get the whole documetn it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documetns, they are \gst{<documentNode>}s. 1379 1380 \item insiode a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it. 1381 1382 \end{bulletedlist} 1383 \subsection{Working with XSLT} 1384 1385 \begin{bulletedlist} 1386 \item {\em adding html to an xml doc:} 1387 1388 eg I have a text node with html inside it inside a resource element 1389 to add that to a new XML doc, I use 1390 \gst{<xsl:value-of select='resource'>} 1391 1392 if the output mode is xml or html, this will escape any special characters 1393 ie $<$ and $>$ etc 1394 1395 use 1396 \gst{<xsl:value-of disable-output-escaping="yes" select='resource'>} 1397 instead. 1398 1399 \item {\em including an xml doc into a stylesheet:} 1400 1401 \gst{<xsl:variable name='import' select='document(``newdoc.xml'')'/>} 1402 1403 then can use the info: 1404 1405 \gst{<xsl:value-of select='\$import/element'/>} 1406 1407 \item {\em selecting an ancestor:} 1408 1409 the ancestor axis contains the parent of the context node, and its 1410 parent and so on. to pick one node among these: 1411 ancestor::elem-name. I dont know how this works if there are two 1412 nodes with the same name in the axis. 1413 1414 \item {\em basic XSLT elements:} 1415 \begin{quote}\begin{footnotesize}\begin{verbatim} 1416 <xsl:template match='xxx' name='yyy'/> 1417 1418 <xsl:apply-templates select='xxx'/> 1419 <xsl:call-templates name='yyy'/> 1420 1421 <xsl:variable name='doc' select='document("layout.xml")'/> 1422 1423 <xsl:value-of select='$doc/chapter1'/> $ 1424 \end{verbatim}\end{footnotesize}\end{quote} 1425 1426 \item {\em using namespaces:} 1427 If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names dont match. This includes http:// on the front. 1428 1429 \item I dont think \gst{<xsl:with-param name='xxx' select='true'/>} is 1430 the same as \gst{<xsl:with-param name='xxx'>true</xsl:with-param>}. 1431 Use the second one. 1432 1433 \item to select a node from a list based on an attribute value: for example 1434 \begin{quote}\begin{footnotesize}\begin{verbatim} 1435 <xsl:variable name='name'>CL1</xsl:variable> 1436 1437 <xsl:value-of select="classifier[@name=\$name]/@content"/> 1438 \end{verbatim}\end{footnotesize}\end{quote} 1439 1440 1441 \end{bulletedlist} 1442 \subsubsection{What can I do to speed up XSL transformations?} 1443 1444 This information taken from the Xalan FAQS page. 1445 1446 \begin{bulletedlist} 1447 1448 \item Use a Templates object (with a different Transformers for each 1449 transformation) to perform multiple transformations with the same set 1450 of stylesheet instructions. 1451 1452 \item Set up your stylesheets to function efficiently. 1453 1454 \item Don't use "//" (descendant axes) patterns near the root of a 1455 large document. 1456 1457 \item Use xsl:key elements and the key() function as an efficient way 1458 to retrieve node sets. 1459 1460 \item Where possible, use pattern matching rather than xsl:if or 1461 xsl:when statements. 1462 1463 \item xsl:for-each is fast because it does not require pattern matching. 1464 1465 \item Keep in mind that xsl:sort prevents incremental processing. 1466 1467 \item When you create variables,\\ 1468 \gst{<xsl:variable name="fooElem" select="foo"/>} is usually faster 1469 than \\ 1470 \gst{<xsl:variable name="fooElem"><xsl:value-of-select="foo"/></xsl:variable>}. 1471 1472 \item Be careful using the last() function. 1473 1474 \item The use of index predicates within match patterns can be expensive. 1475 1476 \item Decoding and encoding is expensive. 1477 1478 \item For the ultimate in server-side scalability, perform transform 1479 operations on the client. 1480 1481 \end{bulletedlist} 1482 1483 \subsection{Java gdbm} 1484 1485 To talk to gdbm, a jni wrapper called java-gdbm is used. It was 1486 obtained from:\\ \gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/} 1487 1488 It uses packing objects to convert to and from an array of bytes (in 1489 gdbm file) from and to java objects. In my GDBMWrapper class I use 1490 StringPacking - uses UTF-8 encoding. but some stuff came out funny. so 1491 I had to changes the from\_bytes method in StringPacking.java to use 1492 new String(raw, "UTF-8") instead of new String(raw). this seems to 1493 work. 1494 1495 Note---if we use this gdbm stuff to create the file too, may need to 1496 alter the to-bytes method. 1497 1498 The makefile in j-gdbm is crap---it tries to get stuff from its 1499 original CVS tree. I have created a new Makefile---in my-j-gdbm 1500 directory. this stuff needs to go into cvs probably. 1501 1502 1503 1504 \subsection{Resources} 1505 1506 This is a list of some useful resources that we have come across during development of gsdl3. 1507 1508 Contents for 'The Java Native Interface Programmer's Guide and 1509 Specification' on-line\\ 1510 \gst{http://java.sun.com/docs/books/jni/html/jniTOC.html} 1511 1512 Java Native Interface Specification\\ 1513 \gst{http://java.sun.com/j2se/1.4/docs/guide/jni/spec/jniTOC.doc.html} 1514 1515 JNI Documentation Contents\\ 1516 \gst{http://java.sun.com/j2se/1.4/docs/guide/jni/index.html} 1517 1518 another JNI page\\ 1519 \gst{http://mindprod.com/jni.html} 1520 1521 Java 1.4 api index\\ 1522 \gst{http://java.sun.com/j2se/1.4/docs/api/index.html} 1523 1524 Java tutorial index\\ 1525 \gst{http://java.sun.com/docs/books/tutorial/index.html} 1526 1527 Safari books online - has java, XML, XSLT, etc books\\ 1528 \gst{http://proquest.safaribooksonline.com/mainhom.asp?home} 1529 1530 Java 1.4 i18n FAQ\\ 1531 \gst{http://www.sun.com/developers/gadc/faq/java/java1.4.html} 1532 1533 Java and XSLT page\\ 1534 \gst{http://www.javaolympus.com/java/Java\%20and\%20XSLT.html} 1535 1536 Xalan-Java overview\\ 1537 \gst{http://xml.apache.org/xalan-j/overview.html} 1538 1539 Tomcat documentation index\\ 1540 \gst{http://jakarta.apache.org/tomcat/tomcat-4.0-doc/index.html} 1541 1542 Servlet and JSP tutorial\\ 1543 \gst{http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/} 1544 1545 Core Servlets and JavaServer Pages, book by Marty Hall. download the 1546 pdf from here (try before you buy link)\\ 1547 \gst{http://www.coreservlets.com/} 1548 1549 J-gdbm page\\ 1550 \gst{http://aurora.rg.iupui.edu/~schadow/dbm-java/pip/gdbm/} 1551 1552 Stuarts page of links\\ 1553 \gst{http://www.cs.waikato.ac.nz/~nzdl/gsdl3/} 1554 1555 a good basic xslt tutorial\\ 1556 \gst{http://www.zvon.org/xxl/XSLTutorial/Books/Output/contents.html} 1557 1558 JAXP (java api for xml processing) package overview\\ 1559 \gst{http://java.sun.com/xml/jaxp/dist/1.1/docs/api/overview-summary.html} 1560 1561 DeveloperWorks, xml zone\\ 1562 \gst{http://www-106.ibm.com/developerworks/xml/} 1563 1564 xslt.com\\ 1565 \gst{http://www.xslt.com/} 1566 1567 jeni tennison's xslt pages\\ 1568 \gst{http://www.jenitennison.com/xslt/} 1569 1570 apaches xml tools\\ 1571 \gst{http://xml.apache.org/} 1146 1572 1147 1573
Note:
See TracChangeset
for help on using the changeset viewer.