Changeset 4236
- Timestamp:
- 2003-05-08T09:29:29+12:00 (21 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl3/docs/manual/manual.tex
r4190 r4236 52 52 Native Interface) will be used to communicate with these. 53 53 54 A description of the general design and architecture of Greenstone3 is covered by the document ``The design of Greenstone3: An agent based dynamic digital library''(design-2002.ps, in the gsdl3/docs/manual directory).54 A description of the general design and architecture of Greenstone3 is covered by the document {\em The design of Greenstone3: An agent based dynamic digital library} (design-2002.ps, in the gsdl3/docs/manual directory). 55 55 56 56 \section{System modules}\label{sec:modules} 57 57 58 A Greenstone3 'library' system consists of many components ... Figure~\ref{fig:local} showsthey fit together in a stand-alone system.58 A Greenstone3 'library' system consists of many components: MessageRouter, Receptionist, Actions, Collections, ServiceRacks etc. Figure~\ref{fig:local} shows how they fit together in a stand-alone system. 59 59 60 60 \begin{figure}[t] … … 71 71 Functionally Collection and ServiceCluster are very similar, but conceptually, and to the user, they are quite different. 72 72 73 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. 73 {\em ServiceRack}: these provide one or more services - they are grouped into a single class purely for code reuse, or to avoid instantiating the same objects several times. For example, MGPP searching services all need to have the index loaded into memory. Services provide the core functionality for the system, eg searching, retrieving documents, building collections etc. 74 74 75 75 {\em Communicator/Server}: these facilitate communication between remote modules. For example, if you want MR1 to talk to MR2, you need a Communicator-Server pair. The Server sits on top of MR2, and MR1 talks to the Communicator. Each communication type needs a new pair. So far we have only been using SOAP, so we have a SOAPCommunicator and a SOAPServer. 76 76 77 {\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages.77 {\em Receptionist}: this is the point of contact for the 'front end'. It is pretty much a router to Actions, but it also handles anything that is common to all pages, such as creating some XML data for the pages. 78 78 79 79 {\em Actions}: these do the job of creating the 'pages'. There is a different action for each type of page, for example PageAction handles semi-static pages, QueryAction handles queries, DocumentAction displays documents. They know a little bit about specific service types. Based on the 'cgi' arguments passed in to them, they construct requests for the system, and put together the responses into data for the page. This data is transformed (currently into HTML) using XSLT. The various actions are described in more detail in Section~\ref{sec:pagegen}. … … 89 89 instructions on how the collection is to be built. The second is produced by 90 90 the build-time process and includes any metadata that can be determined 91 automatically. It also includes configuration information for any serviceRacks needed by the collection.91 automatically. It also includes configuration information for any ServiceRacks needed by the collection. 92 92 93 93 The configuration files are read in when the system is initialised, and their contents are cached in memory. This means that changes made to these files once the system is running will have no effect. There are a series of cgi-type commands that can be sent to the library to induce reconfiguration of different modules, including reloading the whole site. This removes the need to shutdown and restart the system to reflect these changes. These commands are described in Section~\ref{sec:runtime-config}. … … 213 213 214 214 The \gst{<metadataList>} element specifies some collection metadata, such as name and description. These metadata elements can be specified in different languages. The configuration file should be encoded in utf-8. 215 The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list s uold look like.216 The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in t ehhierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail.217 218 There is also a need for a descrip iton of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}.215 The \gst{<search>} element specifies what type of indexer to use, and what indexes to build. A \gst{<format>} element is used to customize what each document entry in a results list should look like. 216 The \gst{<browse>} element specifies what browsing structures should be created over the documents. Again, \gst{<format>} elements are used to customize items in the hierarchy, both classifier nodes, and document entries. Section~\ref{sec:colldesign} looks at the collection configuration file in more detail. 217 218 There is also a need for a description of how documents should be displayed. For example, whether a table of contents is needed, what metadata to display, and whether or not the text should be displayed. This will probably be in an element such as \gst{<documentDisplay>}. 219 219 220 220 \subsection{Building configuration file}\label{sec:buildconfig} 221 221 222 The file \gst{buildConfig.xml} contains themetadata and other information about the collection that can223 be determined automatically when building the collection,such as the number of224 documents it contains. It also includes a list of serviceRack classes that are222 The file \gst{buildConfig.xml} is produced by the collection building process, and contains metadata and other information about the collection that can 223 be determined automatically, such as the number of 224 documents it contains. It also includes a list of ServiceRack classes that are 225 225 required at runtime to provide the services that have been built into the 226 226 collection. The serviceRack names are Java classes that are loaded … … 291 291 292 292 The \gst{init()} method creates a new Receptionist and a new 293 MessageRouter. The appropriate system variables are set in each (interface294 name, site name, etc.) and then \gst{configure()} is called. AMessageRouter295 reference is givento the Receptionist. The servlet then communicates only with293 MessageRouter. By default, the base Receptionist and MessageRouter classes are used, but subclasses can be used if they are specified in the servlet init params (see Section~\ref{sec:tomcat}). The appropriate system variables are set in each (interface 294 name, site name, etc.) and then \gst{configure()} is called. The MessageRouter 295 is passed to the Receptionist. The servlet then communicates only with 296 296 the Receptionist, not with the MessageRouter. 297 297 … … 303 303 to be connected to. 304 304 It has a module map that maps names to objects. This is used for routing the messages. It also keeps small chunks of XML---serviceList, collectionList, clusterList and siteList. These are what get returned in response to a describe request (see Section~\ref{sec:describe}.). 305 Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is a dded to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module.305 Each ServiceRack specified in the config file is created, then queried for its list of services. Each service name is added to the map, pointing to the ServiceRack object. Each service is also added to the serviceList. After this stage, ServiceRacks are transparent to the system, and each service is treated as a separate module. 306 306 ServiceClusters are created and passed the \gst{<serviceCluster>} element for configuration. They are added to the map as is, with the cluster name as a key. A serviceCluster is also added to the serviceClusterList. 307 For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If t eh server for teh remote site is up and running, this should be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static lists.308 309 The MessageRouter also looks inside the site's \gst{collect} directory loads up a Collection object for each valid collection found.307 For each site specified, the MessageRouter creates an appropriate type Communicator object. Then is tries to get the site description. If the server for the remote site is up and running, this should be successful. The site will be added to the map with its site name as a key. The sites collections, services and clusters will also be added into the static xml lists. If the server for the remote site is not running, the site will not be included in the siteList or module map. To try again to access the site, either Tomcat must be restarted, or a run-time reconfigure sites commands must be sent (see next section). 308 309 The MessageRouter also looks inside the site's \gst{collect} directory, and loads up a Collection object for each valid collection found. 310 310 311 311 The Collection object reads its \gst{buildConfig.xml} and \gst{collectionConfig.xml} 312 312 files, determines the metadata, and loads ServiceRack classes based on the 313 313 names specified in \gst{buildConfig.xml\/}. The \gst{<ServiceRack>} XML element is passed to the object to be used in configuration. The collectionConfig.xml contents are also passed in to the ServiceRacks. Any format or display information that the services need must be extracted from the collection config file. 314 Collection objects are added to t eh module map with their name as a key, and also a collection element is added into tehcollectionList xml.314 Collection objects are added to the module map with their name as a key, and also a collection element is added into the collectionList xml. 315 315 316 316 \subsection{Run-time (re)configuration}\label{sec:runtime-config} 317 317 318 The startup configuration reads in teh various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided. 319 320 Currently there are commands to reconfigure the entire site, part of the site, single collections etc. 321 The configure request messages are described in Section~\ref{sec:system}. A new action, SystemAction, is used to convert 'cgi'-arguments into system requests. Currently there is no configure web pages, but the arguments can be entered in the URL. The arguments and urls are described in Section~\ref{sec:system-action}. 322 323 324 ***TODO*** 325 whats available, whats not. show URLS, refer to system messages in next section 318 The startup configuration reads in the various config files and loads up quite a lot of XML into memory. This avoids having to read in files all the time. However, this means that any changes to these files will have no effect in the system. So some run-time reconfiguration options are provided. Currently, these can only be accessed by typing in cgi-arguments into the URL, there is no nice web form yet to do this. SystemAction converts these arguments into system requests, which are described in Section~\ref{sec:system}. 319 320 The cgi arguments are entered after the \gst{library?} part of the URL. There are three types of commands: configure, activate, deactivate. These are specified by \gst{a=s\&sa=c}, \gst{a=s\&sa=a}, and \gst{a=s\&sa=d}, respectively (a is action, sa is subaction). By default, the requests are sent to the MessageRouter, but they can be sent to a collection/cluster by the addition of \gst{c=xxx}, where \gst{xxx} is the name of the collection or cluster. 321 322 \begin{tabular}{lp{8cm}} 323 a=s\&sa=c & reconfigures the whole site, reads in siteConfig.xml, reloads all the collections. Just part of this can be specified with another argument ss (system subset). The valid values are collectionList, siteList, serviceList, clusterList. \\ 324 a=s\&sa=c\&c=demo & reconfigures a collection or cluster. ss can also be used here, valid values are metadataList and serviceList. \\ 325 a=s\&sa=a & activate a specific module. Modules are specified using two arguments, st (system module type) and sn (system module name). Valid types are collection, cluster site.\\ 326 a=s\&sa=d & deactivate a module. st and sn can be used here too. Valid types are collection, cluster, site, service. \\ 327 a=s\&sa=d\&c=demo & deactivate a module belonging to a collection or cluster. Valid types are service. \\ 328 \end{tabular} 329 326 330 327 331 \section{System messages}\label{sec:messages} 328 332 329 for each type of message, show the basic elements, then some example messages.330 Lists must only have the same elements in them.331 333 332 334 Once the system is up and running (the configuration 333 335 process described in Section~\ref{sec:startup-config} has been carried out), it is passing messages back and forth. All modules communicate via message passing. 334 336 335 First, we look at how messages originate, and how they flow in the system. Then, we examine the basic message 336 format, and look at the different types of messages. 337 338 \subsection{Message flow} 339 340 \subsection{Basic format} 341 342 All messages are enclosed in 343 \begin{quote}\begin{gsc}\begin{verbatim} 344 <message> 345 \end{verbatim}\end{gsc}\end{quote} 346 Messages contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='xx'}. 337 There are two different styles of messaging. The first style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.Each individual communication is contained in a \gst{<message>} element\footnote{all sample requests and responses shown here are assumed to have \gst{<message>} elements}. 338 They contain either \gst{<request>} or \gst{<response>} elements--- a single message may contain multiple requests/responses. Each \gst{<request>} (and \gst{<response>}?) has a language attribute, of the form \gst{lang='...'}. 347 339 The language attribute is used by the XSLT to determine the language currently 348 340 being used by the user interface. Virtually all messages contain text strings, 349 and services use this attribute to return strings in the appropriate language. 350 351 There are two different styles of messaging, explained in the two subsections 352 below. The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system. The response is a page of data, typically in HTML. The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message. 353 354 This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system. 341 and services use this attribute to return strings in the appropriate language. Element and attribute names are formated in lower case with the first letter of internal words capitalized, like 'matchDocs'. Each request typically specifies one service or one action, and the response contains either the data requested, or a status message. 342 Lists must only have the same elements in them.(put this here??) 343 344 Requests have 345 a \gst{to} attribute and responses have \gst{from}. These are addresses used 346 by routing modules. For example \gst{to='site1/demo/TextQuery'} routes a 347 message to modules named site1, demo then TextQuery. These modules happen to be a MessageRouter for a remote site (site1), a Collection (demo), and a Service (TextQuery). 348 349 There are several types of request: 'describe', 'system', 'process', 'status', 'format'. These requests can ask for any functionality available in the system. 350 The second messaging style is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has a request type of 'cgi'. It has the same format as any other request in the system. The response, however, is a page of data, typically in HTML. 351 352 These cgi-type messages come into the Receptionist and are passed to the appropriate action. The actions generate appropriate internal messages which are sent to the MessageRouter. The responses are put together into a single piece of XML and transformed, using XSLT, into a 'page' of HTML. 355 353 356 354 \subsection{cgi-type messages}\label{sec:cgi} 357 355 358 Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a356 These are the special 'external'-style messages. Servlet to Receptionist messages are requests for a 'page' of data---for example, the home page for a site; the query page for a collection; the text of a document. They contain, in XML, a representation of the arguments in a 359 357 Greenstone URL. The two main arguments are \gst{a} (action) and \gst{sa} 360 358 (subaction).\footnote{The \gst{sa} replaces Greenstone's old \gst{p} arg for … … 377 375 field is used to indicate what type of output to return. The actions do not 378 376 return responses in the normal format; instead they return a page of 379 information, expressed by default in HTML. Alternative formats could be XML or WML. 377 information, expressed by default in HTML. Alternative formats could be XML or WML. The basic structure of the XML data (before transformation to HTML or other) is described in Section~\ref{sec:pagegen}. What the HTML looks like depends on the XSLT used to transform the data, and will not be shown here. 380 378 381 379 The LibraryServlet class communicates with the Receptionist, which is the entry 382 380 point into the system. Future GUIs could communicate either with the 383 Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client.381 Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. However, the Receptionist will pass other types of request directly to the MessageRouter. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client. 384 382 385 383 The cgi arguments used currently are shown in Table~\ref{tab:args}. 386 Other arguments can be specified by particular actions. . For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args.384 Other arguments can be specified by particular actions. For example, when the query action receives a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args. 387 385 388 386 \begin{table} … … 399 397 s & service name & TextQuery, ImportCollection \\ 400 398 rt & request type & d (display), r (request), s (status) \\ 401 ro & re questonly & 0 or 1 - if set to one, the request is carried out \\399 ro & response only & 0 or 1 - if set to one, the request is carried out \\ 402 400 & & but no processing of the results is done \\ 403 401 & & currently only used in process actions \\ … … 413 411 \end{table} 414 412 415 Here is an example message that retrieves the home page in French: 416 \begin{quote}\begin{gsc}\begin{verbatim} 417 <message> 418 <request lang='fr' type='cgi' action='p' subaction='home' 413 Here is an example request that retrieves the home page in French: 414 \begin{quote}\begin{gsc}\begin{verbatim} 415 a=p&sa=home&l=fr 416 417 <request lang='fr' type='cgi' action='p' subaction='home' 419 418 output='html'/> 420 </message> 421 \end{verbatim}\end{gsc}\end{quote} 422 423 This message represents a text query: 424 \begin{quote}\begin{gsc}\begin{verbatim} 425 <message> 426 419 \end{verbatim}\end{gsc}\end{quote} 420 421 This request represents a text query: 422 \begin{quote}\begin{gsc}\begin{verbatim} 423 a=q&l=en&s=TextQuery&c=demo&rt=r&ca=0&st=1&m=10&q=snail 424 425 <request lang='en' type='cgi' action='q' output='html'> 427 426 <paramList> 428 427 <param name='s' value='TextQuery'/> … … 435 434 <param name='q' value='snail'/> <!-- query string --> 436 435 </paramList> 437 </message> 438 \end{verbatim}\end{gsc}\end{quote} 439 440 \subsubsection{Module to module messages} 441 442 In Greenstone3's modular architecture messages are used extensively to pass 443 information from one module to another, for example from an Action to the 444 MessageRouter module, and from that module to a service module. Requests have 445 a \gst{to} attribute and responses have \gst{from}. These are addresses used 446 by routing modules. For example \gst{to='site1/site2/demo/TextQuery'} routes a 447 message to a MessageRouter (\gst{site1}), from there to another MessageRouter 448 (\gst{site2}), from there to a collection (\gst{demo}), and from there to a 449 particular service (\gst{TextQuery}). 450 451 Each request asks for a description of a single module, or requests a particular service. Unlike the first type of message which requests pre-defined types of pages, these internal requests can ask for any functionality available in the system. 452 436 </request> 437 \end{verbatim}\end{gsc}\end{quote} 438 439 These cgi requests get passed to the appropriate action, which determines what data is required for the page, and what internal requests to send off. The page generation process for the different actions is described in Section~\ref{sec:pagegen}. 453 440 \subsection{'describe'-type messages}\label{sec:describe} 454 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient. 455 \begin{quote}\begin{gsc}\begin{verbatim} 456 <message> 457 <request lang='en' type='describe' to=''/> 458 </message> 459 \end{verbatim}\end{gsc}\end{quote} 460 If the \gst{to} field is empty, the request is answered by the first module that it is passed to. 441 This is the first of the internal messages. 442 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a semi-predefined piece of XML, making these requests very efficient. The info is predefined apart from any language specific text strings, which are put together as each request comes in. 443 \begin{quote}\begin{gsc}\begin{verbatim} 444 <request lang='en' type='describe' to=''/> 445 \end{verbatim}\end{gsc}\end{quote} 446 If the \gst{to} field is empty, it is answered by the MessageRouter. 461 447 An example response from a MessageRouter might look like this: 462 448 \begin{quote}\begin{gsc}\begin{verbatim} 463 <message> 464 <response lang='en' type='describe'> 465 <serviceList> 466 <service name='CrossCollectionSearch' type='query' /> 467 </serviceList> 468 <siteList> 469 <site name='org.greenstone.gsdl1' 449 <response lang='en' type='describe'> 450 <serviceList> 451 <service name='CrossCollectionSearch' type='query' /> 452 </serviceList> 453 <siteList> 454 <site name='org.greenstone.gsdl1' 470 455 address='http://localhost:8080/soap/servlet/rpcrouter' 471 456 type='soap' /> 472 473 474 457 </siteList> 458 <collectionList> 459 <collection name='org.greenstone.gsdl1/ 475 460 org.greenstone.gsdl2/fao' /> 476 <collection name='org.greenstone.gsdl1/demo' /> 477 <collection name='org.greenstone.gsdl1/fao' /> 478 <collection name='myfiles' /> 479 </collectionList> 480 </response> 481 </message> 461 <collection name='org.greenstone.gsdl1/demo' /> 462 <collection name='org.greenstone.gsdl1/fao' /> 463 <collection name='myfiles' /> 464 </collectionList> 465 </response> 482 466 \end{verbatim}\end{gsc}\end{quote} 483 467 This MessageRouter has one site-wide service, a cross-collection searching service. It … … 488 472 489 473 It is possible to ask just for a specific part of the information provided by a 490 describe request, rather than the whole message. For example, these two474 describe request, rather than the whole thing. For example, these two 491 475 messages get the \gst{collectionList} and the \gst{siteList} respectively: 492 476 \begin{quote}\begin{gsc}\begin{verbatim} 493 <message lang='en'> 494 <request type='describe' to='' info='collectionList'/> 495 </message> 496 497 <message lang='en'> 498 <request type='describe' to='' info='siteList'/> 499 </message> 500 \end{verbatim}\end{gsc}\end{quote} 501 When a collection is asked to describe itself, what is returned is all of the 477 <request lang='en' type='describe' to=''> 478 <paramList> 479 <param name='subset' value='collectionList'/> 480 </paramList> 481 </request> 482 483 <request lang='en' type='describe' to=''> 484 <paramList> 485 <param name='subset' value='siteList'/> 486 </paramList> 487 </request> 488 \end{verbatim}\end{gsc}\end{quote} 489 When a collection or service cluster is asked to describe itself, what is returned is all of the 502 490 collection specific metadata and a list of services. For example, here is such 503 491 a message, along with a sample response. 504 492 505 493 \begin{quote}\begin{gsc}\begin{verbatim} 506 <message lang='en'> 507 <request type='describe' to='demo'/> 508 </message> 509 510 <message> 511 <response lang='en' type='describe' from='demo' > 512 <collection name='demo'> 513 <serviceList> 514 <service name='TextQuery' type='query' /> 515 <service name='DocRetrieve' type='query' /> 516 <service name='MetadataRetrieve' type='query' /> 517 </serviceList> 518 <metadataList> 519 <metadata name='numDocs'>321</metadata> 520 <metadata name='numSections'>5532</metadata> 521 <metadata name='title'>The demo collection</metadata> 522 <metadata name='aboutText'>This is a demo collection. 523 </metadata> 524 </metadataList> 525 </collection> 526 </response> 527 </message> 528 \end{verbatim}\end{gsc}\end{quote} 494 <request lang='en' type='describe' to='demo'/> 495 496 <response lang='en' type='describe' from='demo' > 497 <collection name='demo'> 498 <serviceList> 499 <service name='TextQuery' type='query' /> 500 <service name='DocumentContentRetrieve' type='retrieve' /> 501 <service name='DocumentMetadataRetrieve' type='retrieve' /> 502 </serviceList> 503 <metadataList> 504 <metadata name='numDocs'>321</metadata> 505 <metadata name='numSections'>5532</metadata> 506 <metadata name='colName' lang='en'>The demo collection</metadata> 507 <metadata name='colDescription' lang='en'>This is a demo collection. 508 </metadata> 509 </metadataList> 510 </collection> 511 </response> 512 \end{verbatim}\end{gsc}\end{quote} 513 514 The subset parameter can also be used in a describe request to a collection, to retrieve just the metadataList or serviceList. 515 529 516 A \gst{describe} request sent to a service returns a list of parameters that 530 the service accepts, and describes the content type for the request and 531 response. 517 the service accepts, some display information, (and in future may describe the content type for the request and response). 532 518 533 519 Parameters have the following format: … … 542 528 </param> 543 529 \end{verbatim}\end{gsc}\end{quote} 530 ****describe the various types, what the type means - display purposes- etc. 531 544 532 If no default is specified, the parameter is assumed to be mandatory. 545 533 Here are some examples of parameters: … … 565 553 566 554 \end{verbatim}\end{gsc}\end{quote} 567 Here is a message, along with a sample response. 568 \begin{quote}\begin{gsc}\begin{verbatim} 569 <message> 570 <request lang='en' type='describe' to='demo/TextQuery'/> 571 </message> 572 573 <message> 574 <response lang='en' type='describe' from='demo/TextQuery' > 575 <service name='TextQuery' type='query'> 555 The type attribute is used to determine how to display the parameters on a web page or interface. For example, a string parameter may result in a text entry box, a boolean an on/off button, enum\_single/enum\_multi a drop-down menu, where one or more items, respectively, can be selected. 556 A multi-type parameter indicates that two or more parameters are associated, and should be displayed appropriately. For example, in a field query, the text box and field list should be associated. The occurs attribute specifies how many times the parameter should be displayed on the page. 557 Parameters also come with display information... 558 559 A service description also contains a display element - this contains all the language dependent text strings - put together on the fly. These strings are name of the service, what to use for the submit button, and text strings for all the parameters: name, what each value is called, etc. 560 Here is a request, along with a sample response. 561 562 \begin{quote}\begin{gsc}\begin{verbatim} 563 <request lang='en' type='describe' to='demo/TextQuery'/> 564 565 <response lang='en' type='describe' from='demo/TextQuery' > 566 <service name='TextQuery' type='query'> 567 <paramList> 568 <param name='matchDocs' type='integer' default='50/> 569 <param name='case' type='boolean' default='1'/> 570 <param name='index' type='enum' default='tt'> 571 <option name='tt'/> 572 <option name='t0'/> 573 </param> 574 </paramList> 575 </response> 576 \end{verbatim}\end{gsc}\end{quote} 577 \begin{figure} 578 \begin{quote}\begin{gsc}\begin{verbatim} 579 <request lang="en" to="mgppdemo/FieldQuery" type="describe" /> 580 581 <response from="mgppdemo/FieldQuery" type="describe"> 582 <service name="FieldQuery" type="query"> 576 583 <paramList> 577 <param name='matchDocs' type='integer' default='50/> 578 <param name='case' type='boolean' default='1'/> 579 <param name='index' type='enum' default='tt'> 580 <option name='tt'/> 581 <option name='t0'/> 584 <param default="Section" name="level" type="enum_single"> 585 <option name="Document" /> 586 <option name="Section" /> 587 </param> 588 <param default="1" name="case" type="boolean" /> 589 <param default="1" name="stem" type="boolean" /> 590 <param default="10" name="maxDocs" type="integer" /> 591 <param name="simpleField" occurs="4" type="multi"> 592 <param name="fqv" type="string" /> 593 <param default="" name="fqf" type="enum_single"> 594 <option name="ZZ" /><option name="TX" /> 595 <option name="SU" /><option name="TI" /> 596 </param> 582 597 </param> 583 598 </paramList> 584 </response> 585 </message> 586 \end{verbatim}\end{gsc}\end{quote} 587 588 So far, we have only looked at ``describe'' requests. These can be asked of any module. Other requests are ``configure'' requests, and requests for services. 599 <display> 600 <name>Form Query</name> 601 <submit>Search</submit> 602 <param name="level"> 603 <name>Granularity to search at</name> 604 <option name="Document">Document</option> 605 <option name="Section">Section</option> 606 </param> 607 <param name="case"> 608 <name>Turn casefolding </name> 609 <option name="0">off</option> 610 <option name="1">on</option> 611 </param> 612 <param name="stem"> 613 <name>Turn stemming </name> 614 <option name="0">off</option> 615 <option name="1">on</optin> 616 </param> 617 <param name="maxDocs"> 618 <name>Maximum documents to return</name> 619 </param> 620 <param name="fqv"> 621 <name>Search for </name> 622 </param> 623 <param name="fqf"> 624 <name>in field</name> 625 <option name="ZZ">All fields</option> 626 <option name="TX">TextOnly</option> 627 <option name="SU">Subject</option> 628 <option name="TI">Title</option> 629 </param> 630 </display> 631 </service> 632 </response> 633 \end{verbatim}\end{gsc}\end{quote} 634 \end{figure} 635 636 \begin{figure}[t] 637 \centering 638 \includegraphics[width=3.5in]{query2.ps} 639 \caption{Sample query form.} 640 \label{fig:query} 641 \end{figure} 642 643 describe request to an applet type service: returns ... 644 \begin{quote}\begin{gsc}\begin{verbatim} 645 <request type='describe' to='mgppdemo/PhindApplet'/> 646 647 <response type='describe'> 648 <service name='PhindApplet' type='query'> 649 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar, 650 jaxp.jar, xml-apis.jar' 651 CODE='org.greenstone.applet.phind.Phind.class' 652 CODEBASE='lib/java' 653 HEIGHT='400' WIDTH='500'> 654 <PARAM NAME='library' VALUE=''/> 655 <PARAM NAME='phindcgi' VALUE='?a=a&sa=r&sn=Phind'/> 656 <PARAM NAME='collection' VALUE='mgppdemo' /> 657 <PARAM NAME='classifier' VALUE='1' /> 658 <PARAM NAME='orientation' VALUE='vertical' /> 659 <PARAM NAME='depth' VALUE='2' /> 660 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' /> 661 <PARAM NAME='backdrop' VALUE='interfaces/default/> 662 images/phindbg1.jpg'/> 663 <PARAM NAME='fontsize' VALUE='10' /> 664 <PARAM NAME='blocksize' VALUE='10' /> 665 The Phind java applet. 666 </applet> 667 </service> 668 </response> 669 \end{verbatim}\end{gsc}\end{quote} 589 670 590 671 \subsection{'system'-type messages}\label{sec:system} 591 ``System'' requests are used to tell the MessageRouter or a Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. 592 593 So far, we have \gst{activate} and \gst{deactivate} configure requests. 594 Some examples are as follows. 595 \begin{quote}\begin{gsc}\begin{verbatim} 596 <message><request type='configure' to=''> 597 <configure action='deactivate' type='collection' name='demo'/> 598 </request></message> 599 600 <message><request type='configure' to=''> 601 <configure action='activate' type='collection' name='demo'/> 602 </request></message> 603 604 <message><request type='configure' to=''> 605 <configure action='activate' type='serviceRack' 606 name='TranslationServices'/> 607 </request></message> 608 \end{verbatim}\end{gsc}\end{quote} 609 610 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first. 611 612 The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is: 613 \begin{quote}\begin{gsc}\begin{verbatim} 614 <message><response from='' type='configure'> 615 <status>demo collection activated</status> 616 </response></message> 617 \end{verbatim}\end{gsc}\end{quote} 618 \footnote{this format not properly defined yet} 619 620 Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also. 621 622 \subsection{'process'-type messages} 672 ``System'' requests are used to tell a MessageRouter, Collection or ServiceCluster to update its cached information and activate or deactivate other modules. For example, the MessageRouter has a set of Collection modules that it can talk to. It also holds some XML information about those collections---this is returned when a request for a collection list comes in. If a collection is deleted or modified, or a new one created, this information may need to change, and the list of available modules may also change. 673 674 The basic format of a system request is as follows: 675 676 \begin{quote}\begin{gsc}\begin{verbatim} 677 <request type='system' to=''> 678 <system .../> 679 </request> 680 \end{verbatim}\end{gsc}\end{quote} 681 682 Each system request is specified in a system element. The following are examples: 683 \begin{quote}\begin{gsc}\begin{verbatim} 684 <system type='configure' subset=''/> 685 <system type='configure' subset='collectionList'/> 686 <system type='activate' moduleType='collection' moduleName='demo'/> 687 <system type='deactivate' moduleType='site' moduleName='site1'/> 688 \end{verbatim}\end{gsc}\end{quote} 689 690 The first request reconfigures the whole site---the MessageRouter goes through its whole configure process again. The second request just reconfigures the collectionList---the MessageRouter will delete all its collection information, and re-look through the collect directory and reload all the collections again. 691 The third request is to activate collection demo. This could be a new collection, or a reactivation of an old one. If a collection module already exists, it will be deleted, and a new one loaded. The final request deactivates the site site1---this removes the site from the siteList and module map, and also removes any of that sites collections/services from the static lists. 692 693 694 A response just contains a status message, for example: 695 \begin{quote}\begin{gsc}\begin{verbatim} 696 <response from=""> 697 <status>collectionList reconfigured successfully</status> 698 </response> 699 \end{verbatim}\end{gsc}\end{quote} 700 701 702 System requests are mainly answered by the MessageRouter. However, Collections and ServiceClusters will respond to a subset of these requests. 703 704 \subsection{'process'-type messages} ***** TODO **** 623 705 624 706 divide this up into service types: query, retrieve (metadata, structure, content), process, applet, enrich, browse... … … 804 886 \subsubsection{'enrich]-type services} 805 887 806 \subsection{'status'-type messages} 807 808 809 \subsection{'format'-type messages} 810 811 \subsection{'applet'-type services} 812 813 \section{Page generation}\label{sec:pagegen} 814 815 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system. 816 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module. 817 818 Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user. 819 820 The basic page format is: 821 \begin{quote}\begin{gsc}\begin{verbatim} 822 <page> 823 <pageExtra> 824 <config/> 825 <display/> 826 </pageExtra> 827 <pageRequest/> 828 <pageResponse/> 829 </page> 830 \end{verbatim}\end{gsc}\end{quote} 831 832 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization. 833 834 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. 835 836 837 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are 838 located in interfaces/default/transforms. Collections, sites and other interfaces 839 can override these files by having their own copy of the appropriate 840 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current 841 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.} 842 ***TODO*** describe a bit more?? 843 844 \subsection{Internationalization} 845 846 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 847 848 Language specific text strings are specified in resource bundle property files. These live in resources/java. 849 850 There is a properties file per class, and one per interface. At the moment, we have 851 852 GS2MGPPSearch.properties 853 GS2MGPPRetrieve.properties etc - the service classes 854 855 interface\_default.properties. - for the default interface 856 857 To add other languages, create eg GS2MGPPSearch\_fr.properties. 858 859 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there. 860 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? 861 862 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created. 863 864 \subsection{Page action} 865 866 Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is 867 transformed using \gst{home.xsl}. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata 868 and a list of services, and the result is transformed using \gst{about.xsl}. 869 870 871 \subsection{Query action} 872 873 There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action. 874 For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is 875 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers 876 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of 877 documents, with a request for their \gst{Title} metadata. The service description and query result are combined into a page of xml, which is 878 transformed using \gst{basicquery.xsl} to produce the html page. 879 880 \subsection{Applet action} 881 882 There are two types of request to the applet action: \gst{a=a \& sa=d\/} and 883 \gst{a=a \& sa=r\/}. The value \gst{sa=d\/} means ``display the applet.'' A 884 \gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this 885 into the page, and the servlet returns the HTML. 886 887 The value \gst{sa=r} signals a request from the applet. The result is returned 888 directly to the applet code, in XML. The other parameters are sent to the 889 service untransformed, and the result is passed directly back to the applet. 890 Applet action can therefore work with any applet whose service understands the 891 messages. 892 893 Here are two examples of requests generated by the Applet action, along with their corresponding responses. 894 895 The first request corresponds to the URL arguments \gst{a=a \& 896 sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind 897 applet for the mgppdemo collection''. 898 899 \begin{quote}\begin{gsc}\begin{verbatim} 900 <message> 901 <request type='describe' to='mgppdemo/PhindApplet'/> 902 </message> 903 904 <message> 905 <response type='describe'> 906 <service name='PhindApplet' type='query'> 907 <applet ARCHIVE='phind.jar, xercesImpl.jar, gsdl3.jar, 908 jaxp.jar, xml-apis.jar' 909 CODE='org.greenstone.applet.phind.Phind.class' 910 CODEBASE='lib/java' 911 HEIGHT='400' WIDTH='500'> 912 <PARAM NAME='library' VALUE=''/> 913 <PARAM NAME='phindcgi' VALUE='?a=a&sa=r&sn=Phind'/> 914 <PARAM NAME='collection' VALUE='mgppdemo' /> 915 <PARAM NAME='classifier' VALUE='1' /> 916 <PARAM NAME='orientation' VALUE='vertical' /> 917 <PARAM NAME='depth' VALUE='2' /> 918 <PARAM NAME='resultorder' VALUE='L,l,E,e,D,d' /> 919 <PARAM NAME='backdrop' VALUE='interfaces/default/ 920 images/phindbg1.jpg'/> 921 <PARAM NAME='fontsize' VALUE='10' /> 922 <PARAM NAME='blocksize' VALUE='10' /> 923 The Phind java applet. 924 </applet> 925 </service> 926 </response> 927 </message> 928 \end{verbatim}\end{gsc}\end{quote} 929 930 The second request corresponds to the arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this 931 indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the 932 request as parameters. The response is in a form suitable for the applet, placed inside 933 \gst{<appletData>} in a standard Greenstone message. AppletAction returns the 934 contents of appletData to the browser, i.e. to the applet itself. 888 \subsubsection{'applet'-type services} 935 889 936 890 \begin{quote}\begin{gsc}\begin{verbatim} … … 979 933 \end{verbatim}\end{gsc}\end{quote} 980 934 935 \subsection{'status'-type messages} 936 937 938 \subsection{'format'-type messages} 939 940 \begin{quote}\begin{gsc}\begin{verbatim} 941 <request lang="en" to="mgppdemo/FieldQuery" type="format" /> 942 943 <response from="mgppdemo/FieldQuery" type="format"> 944 <format> 945 <gsf:template match="documentNode"><td><gsf:link><gsf:metadata name="Title" />(<gsf:metadata name="Source" />)</gsf:link></td></gsf:template> 946 </format> 947 </response> 948 \end{verbatim}\end{gsc}\end{quote} 949 950 \section{Page generation}\label{sec:pagegen} 951 952 URL-style requests are received by the Receptionist. Based on the arguments, a page of data must be returned to the servlet. As described in Section~\ref{sec:cgi}, the requests are XML representations of Greenstone URLs. One of the arguments is action (a). This tells the Receptionist which Action module to pass the request to. Action modules decode the rest of the cgi-arguments to determine what requests need to be made to the system. 953 System requests are received by the MessageRouter, which answers them one by one, either itself or by passing them on to the appropriate module. 954 955 Once the data needed from the system has been accumulated, it is put into a 'page' of XML. The page is transformed to its output form, currently HTML, via XSLT transformations, and returned to the user. 956 957 The basic page format is: 958 \begin{quote}\begin{gsc}\begin{verbatim} 959 <page> 960 <pageExtra> 961 <config/> 962 <display/> 963 </pageExtra> 964 <pageRequest/> 965 <pageResponse/> 966 </page> 967 \end{verbatim}\end{gsc}\end{quote} 968 969 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization. 970 971 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. 972 973 974 Once the xml page has been put together, the page to return to the user is created by transforming the XML using XSLT. The output is HTML at this stage, but it will be possible to generate alternative outputs, such as XML, WML etc. A set of XSLT files defines an 'interface'. Different users can change the look of their web pages by creating new XSLT files for a new 'interface'. Just as we have a sites directory where different sites 'live' (ie where their configuration file and collections are located), we have an interfaces directory where the different interfaces 'live' (ie their transforms and images are located there). The default XSLT files are 975 located in interfaces/default/transforms. Collections, sites and other interfaces 976 can override these files by having their own copy of the appropriate 977 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current 978 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.} 979 ***TODO*** describe a bit more?? 980 981 \subsection{Internationalization} 982 983 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 984 985 Language specific text strings are specified in resource bundle property files. These live in resources/java. 986 987 There is a properties file per class, and one per interface. At the moment, we have 988 989 GS2MGPPSearch.properties 990 GS2MGPPRetrieve.properties etc - the service classes 991 992 interface\_default.properties. - for the default interface 993 994 To add other languages, create eg GS2MGPPSearch\_fr.properties. 995 996 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml \gst{<display>} element - the xslt can get the ones it needs from there. 997 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? 998 999 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created. 1000 1001 \subsection{Page action} 1002 1003 Depending on the subaction argument, different pages can be generated. For the 'home' page, a 'describe' request is sent to the MessageRouter---this returns a list of all the collections, services, serviceClusters and sites known about. For each collection, its metadata is retrieved via a 'describe' request. This metadata is added into the previous result, which is then added into the page. The page is 1004 transformed using \gst{home.xsl}. For the 'about' page, a \gst{describe} request is sent to the module that the about page is about: this may be a collection or a service cluster. This returns a list of metadata 1005 and a list of services, and the result is transformed using \gst{about.xsl}. 1006 1007 1008 \subsection{Query action} 1009 1010 There are three query services which have been implemented: TextQuery, FieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action. 1011 For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is 1012 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers 1013 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of 1014 documents, with a request for their \gst{Title} metadata. The service description and query result are combined into a page of xml, which is 1015 transformed using \gst{basicquery.xsl} to produce the html page. 1016 1017 \subsection{Applet action} 1018 1019 There are two types of request to the applet action: \gst{a=a \& sa=d\/} and 1020 \gst{a=a \& sa=r\/}. The value \gst{sa=d\/} means ``display the applet.'' A 1021 \gst{describe} request is sent to the service, which returns the \gst{<applet>} HTML element. The transformation file \gst{applet.xsl} embeds this 1022 into the page, and the servlet returns the HTML. 1023 1024 The value \gst{sa=r} signals a request from the applet. The result is returned 1025 directly to the applet code, in XML. The other parameters are sent to the 1026 service untransformed, and the result is passed directly back to the applet. 1027 Applet action can therefore work with any applet whose service understands the 1028 messages. 1029 1030 Here are two examples of requests generated by the Applet action, along with their corresponding responses. 1031 1032 The first request corresponds to the URL arguments \gst{a=a \& 1033 sa=d \& sn=Phind \& c=mgppdemo\/}, which translate to ``display the Phind 1034 applet for the mgppdemo collection''. 1035 1036 1037 The second request corresponds to the arguments \gst{a=a \& sa=r \& sn=Phind \& c=mgppdemo \& pc=1 \& pptext=health \& pfe=0 \& ple=10 \& pfd=0 \& pld=10 \& pfl=0 \& pll=10}---this 1038 indicates a request to the service itself. The extra arguments (not a, sa, sn, c) are simply copied into the 1039 request as parameters. The response is in a form suitable for the applet, placed inside 1040 \gst{<appletData>} in a standard Greenstone message. AppletAction returns the 1041 contents of appletData to the browser, i.e. to the applet itself. 1042 1043 981 1044 Note that the applet HTML may need to know the name of the \gst{library} 982 1045 program. However, that name is chosen by the person who installed the software … … 1023 1086 \section{Collection formation} 1024 1087 1025 So far, only Greenstone2 style building is available. This uses the import.pl and buildcol.pl perl scripts from Greenstone2. T Hese scripts and their needed perl modules have not been added to teh Greenstone3 system, so to do building, you need to have Greenstoen2 installed, and GSDLHOME, and GSDLOS set. (can do this by running 'source setup.bash' in the top level directory of gsdl.1026 1027 There are three ways of getting collections into greenstoen3.1088 So far, only Greenstone2 style building is available. This uses the import.pl and buildcol.pl perl scripts from Greenstone2. These scripts and their needed perl modules have not been added to the Greenstone3 system, so to do building, you need to have Greenstone2 installed, and GSDLHOME, and GSDLOS set. (can do this by running 'source setup.bash' in the top level directory of gsdl. 1089 1090 There are three ways of getting collections into Greenstone3. 1028 1091 1029 1092 \subsection{Importing gs2 collections} 1030 1093 1031 Collections built in a Greenstone2 system can be used in Greens otne3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs3.pl}. You need to specify the collect directory and the collection name. Eg.1094 Collections built in a Greenstone2 system can be used in Greenstone3. Just copy across the collection's directory into the appropriate collect directory, and run \gst{convert\_coll\_from\_gs3.pl}. You need to specify the collect directory and the collection name. Eg. 1032 1095 1033 1096 \gst{convert\_coll\_from\_gs2.pl -collectdir /research/kjdon/gsdl3/web/sites/localsite/collect demo} … … 1038 1101 \subsection{Building new collections through the web interface} 1039 1102 1040 Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to t eh build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to tehimport directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the config files by hand. Editing collect.cfg will change the way building is done (by Greenstone2), and editing collectionConfig.xml will change the way the collection is used (by Greenstone3).1103 Collection construction can be done through the web, using the build ServiceCluster in localsite. Just sequence through the steps needed. There is no automatic sequence taking you to the next page, you have to go back to the build 'about' page, and select the next service manually. So far, AddDocument does not work, so documents need to be manually added to the import directory. And there is no ConfigureCollection service yet, so if you want anything other than the default configuration, you need to edit the config files by hand. Editing collect.cfg will change the way building is done (by Greenstone2), and editing collectionConfig.xml will change the way the collection is used (by Greenstone3). 1041 1104 1042 1105 You need to carry out the following steps: … … 1077 1140 Building stuff is in src/java/org/greenstone/gsdl3/build. 1078 1141 1079 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events.1142 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses Greenstone 2 Perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events. 1080 1143 1081 1144 \subsection{Collection design}\label{sec:colldesign} … … 1186 1249 \newcommand{\gshome}{\$GSDLHOME} 1187 1250 1188 Cu urently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}.1251 Currently, Greenstone3 is only available through CVS. The installation procedure has been semi-automated. Note, these instructions are for installation on linux. If you want to use Greenstone3 on Windows, download it using CVS, then follow the instructions in \gst{http://www.cs.waikato.ac.nz/~mdewsnip/GSDL3Windows.html}. 1189 1252 1190 1253 \subsubsection{Get the source} … … 1207 1270 If you need it, the password for anonymous CVS access is \gst{anonymous}. Note that some versions of CVS have trouble accessing this repository. We are using version 1.11.1p1. 1208 1271 1209 \subsubsection{Compile and install greenstone}\label{subsec:compile}1272 \subsubsection{Compile and install Greenstone}\label{subsec:compile} 1210 1273 1211 1274 An install.sh script has been constructed to compile and install Greenstone3. What you need to do is: … … 1218 1281 \end{gsc}\end{quote} 1219 1282 1220 If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for tomcat.1221 1222 \noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc.1283 If you want to do Greenstone2 compatible building (currently the only type) you need to have Greenstone2 installed, \gst{source setup.bash} in the top level Greenstone2 directory, then re-\gst{source setup.bash} for Greenstone3. This is to set \gst{\gshome} for Tomcat. 1284 1285 \noindent Note: \gst{source setup.bash} needs to be done once in any xterm window before doing a make or running Tomcat. setup.bash sets the environment variables \gst{CLASSPATH, PATH, JAVA\_HOME} etc. 1223 1286 1224 1287 If you want to use SOAP to talk to remote sites, you also need to do the following: … … 1230 1293 There is one java command that sometimes doesn't work under bash, so you may need to cut and paste it into the terminal to get it to work. See the output from the bash-script for details. 1231 1294 1232 To shutdown or startup tomcat, the commands are:1295 To shutdown or startup Tomcat, the commands are: 1233 1296 \begin{quote}\begin{gsc} 1234 1297 \gsdlhome/comms/tomcat/jakarta/bin/shutdown.sh\\ … … 1236 1299 \end{gsc}\end{quote} 1237 1300 1238 You don t want to run install.bash twice - it adds stuff into files.1239 To update your installation, you can run update.bash - this updates your code form cvs, and remakes all the java stuff.1301 You don't want to run install.bash twice - it adds stuff into files. 1302 To update your installation, you can run update.bash - this updates your code form CVS, and remakes all the java stuff. 1240 1303 1241 1304 1242 1305 \subsubsection{The sample sites} 1243 1306 1244 \noindent There are two greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to.1307 \noindent There are two Greenstone {\em sites} that come with the checkout: localsite, and soapsite. localsite has three collections, while soapsite has none. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to. 1245 1308 localsite does not connect to any other sites. soapsite specifies a SOAP connection to localsite. 1246 1309 1247 \subsubsection{Tomcat} 1248 1249 \noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet.1250 1251 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for tomcat---tells it what servlets to load, what initial paramaters to pass them, and what web names map to the servlets.1252 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting tomcat set up. The other two are greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite.1310 \subsubsection{Tomcat}\label{sec:tomcat} 1311 1312 \noindent Tomcat is a servlet container. It is used to serve a Greenstone site using a servlet. 1313 1314 The file \gst{\gsdlhome/web/WEB-INF/web.xml} contains the setup information for Tomcat---tells it what servlets to load, what initial parameters to pass them, and what web names map to the servlets. 1315 There are three servlets specified in web.xml: one is a test servlet that just prints ``hello greenstone'' to a web page. This is useful if you are having trouble getting Tomcat set up. The other two are Greenstone library servlets, {\em library}, which serves localsite, and {\em library1} which serves soapsite. 1253 1316 1254 1317 The initialisation parameters used by the library servlets are as follows: … … 1269 1332 It is possible to run several servlets at once, with different combinations of sites and/or interfaces. 1270 1333 1271 The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the tomcat configuration file. The installation process adds a context for greenstone3 servlets (\gst{\gsdlhome/web})---this tells tomcat where to find the web.xml file, and what url (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~.1334 The file \gst{\gsdlhome/comms/tomcat/jakarta/conf/server.xml} is the Tomcat configuration file. The installation process adds a context for Greenstone3 servlets (\gst{\gsdlhome/web})---this tells Tomcat where to find the web.xml file, and what URL (\gst{/gsdl3}) to give it. Anything inside the context directory is accessible via Tomcat\footnote{can we use .htaccess files to restrict access??}. For example, the index.html file that lives in \gst{\gsdlhome/web} can be accessed through the URL \gst{localhost:8080/gsdl3/index.html}. The demo collection's images can be accessed through \gst{localhost:8080/gsdl3/sites/localsite/collect/demo/images/}~. 1272 1335 1273 1336 … … 1275 1338 1276 1339 1277 \subsubsection{Serving your site using tomcat}\label{subsec:runtomcat}1278 1279 \noindent To run tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then,1340 \subsubsection{Serving your site using Tomcat}\label{subsec:runtomcat} 1341 1342 \noindent To run Tomcat, you need to have sourced {\footnotesize \verb#setup.bash#} in \gsdlhome\ to set up {\footnotesize \$CLASSPATH} (see \ref{subsec:compile}). Then, 1280 1343 1281 1344 \begin{gsc}\begin{tt} … … 1284 1347 \end{tt}\end{gsc} 1285 1348 1286 \noindent ({\footnotesize \verb#./shutdown.sh#} shuts down tomcat)1349 \noindent ({\footnotesize \verb#./shutdown.sh#} shuts down Tomcat) 1287 1350 \\ 1288 1351 \\ 1289 \noindent The tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page.1290 The greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page.1291 1292 \noindent Note: tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\1352 \noindent The Tomcat server can be accessed on the web at \gst{http://localhost:8080}---this gets you to a welcome page. 1353 The Greenstone stuff is at \gst{http://localhost:8080/gsdl3}---this displays \gst{\gsdlhome/web/index.html}. You should be able to run the test servlet and both library servlets from this page. 1354 1355 \noindent Note: Tomcat must be shutdown and restarted any time you make changes in the following for those changes to take effect:\\ 1293 1356 \begin{bulletedlist} 1294 1357 \begin{gsc} … … 1301 1364 \gst{\gsdlhome/comms/tomcat/jakarta/logs/catalina.out} 1302 1365 1303 On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting tomcat.1366 On startup, the servlet loads in its collections and services. If the site or collection configuration files are changed, these changes will not take effect until the site/collection is reloaded. This can be done through the reconfiguration messages (see Section~\ref{sec:runtime-config}, or by restarting Tomcat. 1304 1367 1305 1368 \subsubsection{Using SOAP to talk to a remote site} … … 1308 1371 \\ 1309 1372 \\ 1310 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service.1373 \noindent The SOAP server we use is actually run as a servlet in Tomcat. You need to set up SOAP, set up the SOAP server class which will be your SOAP web service, and then deploy that service. 1311 1374 This is done by install-soap.bash. 1312 You can also deploy a service through the website. If tomcat is not running, start it up (see \ref{subsec:runtomcat}).1375 You can also deploy a service through the website. If Tomcat is not running, start it up (see \ref{subsec:runtomcat}). 1313 1376 1314 1377 \noindent The SOAP servlet can be accessed at \begin{gsc}{\tt http://localhost:8080/soap}\end{gsc}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services. … … 1327 1390 \end{tabular} 1328 1391 1329 \noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the left hand ``List'' button.1330 1331 \noindent Information about deployed services is maintained between tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet.1392 \noindent Now click the ``deploy'' button at the bottom of the page. If the service has been deployed, it should appear when you click on the left hand ``List'' button. 1393 1394 \noindent Information about deployed services is maintained between Tomcat sessions---you only need to deploy it once. To get the library1 servlet talking to the SOAP server, you need to shutdown and restart Tomcat (see \ref{subsec:runtomcat}). You should see more collections when you run the library1 servlet. 1332 1395 1333 1396 \subsubsection{Debugging SOAP} … … 1347 1410 1348 1411 Note that \gst{http://localhost:8080/soap/servlet/rpcrouter} is the 1349 address for talking to the tomcat SOAP servlet services.1412 address for talking to the Tomcat SOAP servlet services. 1350 1413 1351 1414 \section{Developer's notes} … … 1366 1429 Dictionary & wrapper around a ResourceBundle, providing strings with parameter\\ 1367 1430 GSCGI & class to map between short name cgi args and long name request parameters \\ 1368 GSFile & class to create all greenstone file paths eg used to locate configuration files, xslt files and collection data. \\1431 GSFile & class to create all Greenstone file paths eg used to locate configuration files, xslt files and collection data. \\ 1369 1432 GSHTML & provides convenience methods for dealing with HTML, eg making strings HTML safe\\ 1370 1433 GSPath & used to create, examine and modify message address paths\\ 1371 1434 GSStatus & some static codes for status messages\\ 1372 GSXML & lots of methods for extracting information out of greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by greenstone.\\1373 GSXSLT & some manipulation functions for greenstone XSLT\\1435 GSXML & lots of methods for extracting information out of Greenstone XML, and creating some common types of elements. Also has static Strings for element and attribute names used by Greenstone.\\ 1436 GSXSLT & some manipulation functions for Greenstone XSLT\\ 1374 1437 Misc & miscellaneous functions\\ 1375 OID & class to handle greenstone (2) OIDs\\1438 OID & class to handle Greenstone (2) OIDs\\ 1376 1439 XMLConverter & provides methods to create new Documents, parse Strings or Files into Documents, and convert Nodes to Strings\\ 1377 1440 XMLTransformer & methods to transform XML using XSLT \\ … … 1387 1450 \subsection{Working with XML} 1388 1451 1389 We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but don t append them to something already in the document tree, they will be separate - but they still know who their owner document is.1452 We use the DOM model for handling XML. This involves Documents, Nodes, Elements etc. Node is the basic thing in the tree, all others inherit from this. A Document represents a whole document, and is a kind of container for all the nodes. Elements and Nodes are not supposed to exist outside of the context of a document, so you have to have a document to create them. The document is not the top level node in the tree, to get this, use Document.getDocumentElement(). If you create nodes etc but don't append them to something already in the document tree, they will be separate - but they still know who their owner document is. 1390 1453 1391 1454 To create new Documents, and convert Strings or Files to Documents, use XMLConverter. … … 1411 1474 \end{gsc}\end{quote} 1412 1475 1413 Note that you can only append one node to a document---this will become the top level node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node.1476 Note that you can only append one node to a document---this will become the top level node. After that, you can append nodes to child nodes as you like, but a document is only allowed one top level node. 1414 1477 1415 1478 Nodes can only be created by a Document. Document has creation methods for all types of Nodes, for example \gst{createElement(element\_name)}, \gst{createAttribute(attr\_name)}, \gst{createTextNode(text\_data)} etc. … … 1423 1486 1424 1487 1425 no DTDs or Schema defined yet. Until there are, try and keep to t ehfollowing rules:1488 no DTDs or Schema defined yet. Until there are, try and keep to the following rules: 1426 1489 1427 1490 \begin{bulletedlist} … … 1429 1492 \item always return expected elements even if empty, eg \gst{<paramList/>}. 1430 1493 1431 \item If you get the whole docume tn it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documetns, they are \gst{<documentNode>}s.1432 1433 \item insi ode a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it.1494 \item If you get the whole document it is called \gst{<document>}. However if you are returned a list of pointers to parts of the documents, they are \gst{<documentNode>}s. 1495 1496 \item inside a list you can only have elements of the same name as the list. For example, a \gst{<paramList>} should only have \gst{<param>} elements inside it. 1434 1497 1435 1498 \end{bulletedlist} … … 1478 1541 1479 1542 \item {\em using namespaces:} 1480 If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names don t match. This includes http:// on the front.1481 1482 \item I don t think \gst{<xsl:with-param name='xxx' select='true'/>} is1543 If you are using the same namespace in more than one file, eg in the source xml and in the stylesheet, make sure that the URI for the xmlns:xxx thingy is the same in both cases---otherwise the names don't match. This includes http:// on the front. 1544 1545 \item I don't think \gst{<xsl:with-param name='xxx' select='true'/>} is 1483 1546 the same as \gst{<xsl:with-param name='xxx'>true</xsl:with-param>}. 1484 1547 Use the second one. … … 1551 1614 The makefile in j-gdbm is crap---it tries to get stuff from its 1552 1615 original CVS tree. I have created a new Makefile---in my-j-gdbm 1553 directory. this stuff needs to go into cvsprobably.1616 directory. this stuff needs to go into CVS probably. 1554 1617 1555 1618 … … 1572 1635 \gst{http://mindprod.com/jni.html} 1573 1636 1574 Java 1.4 apiindex\\1637 Java 1.4 API index\\ 1575 1638 \gst{http://java.sun.com/j2se/1.4/docs/api/index.html} 1576 1639 … … 1578 1641 \gst{http://java.sun.com/docs/books/tutorial/index.html} 1579 1642 1580 Safari books online - has java, XML, XSLT, etc books\\1643 Safari books online - has Java, XML, XSLT, etc books\\ 1581 1644 \gst{http://proquest.safaribooksonline.com/mainhom.asp?home} 1582 1645
Note:
See TracChangeset
for help on using the changeset viewer.