Changeset 3711
- Timestamp:
- 2003-01-24T17:20:38+13:00 (21 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl3/docs/manual/manual.tex
r3557 r3711 22 22 {\end{list}} 23 23 24 \noindent25 {\em \tiny This is intended to turn into a multipurpose document that26 \begin{bulletedlist}27 \item forms the basis of a JCDL paper submission28 \item fulfills our NERF pledge to produce a ``design document for Greenstone3''29 by December 2002 ...30 \item ... and a ``definition of internal and external interfaces for all major31 components (including API for external clients)'' by July 200332 \item turns into a proper manual for Greenstone333 \end{bulletedlist}34 }35 24 36 25 \noindent … … 58 47 Native Interface) will be used to communicate with these. 59 48 49 60 50 \section{Architecture} 61 51 62 A typical basic Greenstone3 digital library system is made up of a ``back 63 end,'' which we call a digital library {\em site\/}, coupled to a ``front end'' 64 that provides the user interface. Figure 1 shows a simple stand-alone digital library with a web-based front end which communicates with a single site. In this simple example, the entire system is compiled together as a single executable. The point of contact with the back end is the MessageRouter (MR) module---all communication with the site occurs through this module. 65 66 The digital library back end in Figure 1 contains two collections, {\em demo} 67 and {\em myfiles\/}, and a cluster of collection-formation services. All 68 functions of the digital library are called ``services.'' For example, 69 AddDocument is a service that adds a document to a collection; ImportCollection 70 imports into the Greenstone system all documents associated with a collection, 71 converting them as necessary from their original form; BuildCollection builds 72 all indexes and browsing structures that are associated with collection; 73 ActivateCollection makes a newly-built collection active, so that it can be 74 seen by digital library users. These particular services are related: they are 75 all concerned with creating a digital library collection. Related 76 services may be grouped together into a ``service cluster'': all these services are provided by 77 the CollectionFormation ServiceCluster module in Figure 1. 78 79 A collection, which as far as the digital library user is concerned is a 80 focused group of documents with a uniform means of access, is a type of service 81 cluster that groups a set of services that are related by the set of data they 82 work on. For example, the {\em demo} collection in Figure 1 contains four 83 services. These provide text searching, metadata searching, 84 document retrieval, and browsing services to the user. 85 86 The Web-based front end in Figure 1 centers around the 87 Receptionist, which is the point of contact for the interface generator. A 88 servlet takes HTTP commands (in the form of URLs and arguments) and translates 89 them into XML form for the Receptionist. This is capable of executing various 90 different Actions, each of which involve one or (usually) many calls to the 91 digital library's MessageRouter. 92 93 Figure 1 shows a very simple example of a digital library structure. 94 In practice, there may be many digital library sites, possibly involving 95 distributed computers. Each site will have a structure similar to that of the back end in Figure 96 1. Different sites may know about each other and can gain access to each other's 97 collections by forwarding requests. There may also be different user 98 interfaces to the library. Figure 1 shows a simple web-based interface, but 99 other interfaces may exist, ranging from applets that display documents in 100 different ways to alert services that note when new information becomes 101 available in one of the collections and formulate email to users. Although in 102 the simplest case the front and back ends are compiled together into 103 one executable process, in general different MessageRouters will communicate 104 amongst themselves, and with Receptionists, using a protocol. 105 106 The following subsections elaborate on this structure. 107 108 \subsection{Modular structure} 109 110 Greenstone3 is made up of independent modules that communicate via a single 111 method call: 112 \begin{quote} 113 XMLout = process(XMLin); 114 \end{quote} 115 Both input and output are expressed in XML. This decision shifts attention 116 from the design of an Applications Programming Interface (API) to the design of XML 117 forms that encode the equivalent information. The advantage is modularization: 118 the XML specifications can be modified locally and communication will proceed 119 effectively according to the new scheme provided only that all affected modules 120 are altered appropriately. Conversely, if an API is changed then all modules 121 usually have to be recompiled to reflect the update. 122 123 Modules are thought of as ``agents'' that have, or have access to, certain 124 functionality. A module may respond to a message by processing it itself, or 125 forwarding it to another module, or a combination of the two.\footnote{Francois 126 used some nice words to tie up modules and agents. Kathy, can you remember 127 what he was saying?} 128 129 If modules are on different computers, the communication will take place using 130 SOAP (Simple Object Access Protocol) (although other protocols are possible). Figure 2 shows a Greenstone system where the local site has no collections or services of its own. Instead, the MessageRouter (1 in the diagram) talks to two other sites using SOAP. The local MR has two Communicator modules 131 that enable it to make SOAP requests; the two remote sites each have a SOAP server which 132 listens for such requests and fulfills them. 133 134 A potential downside of expressing the programming interface structure in XML 135 is execution efficiency. The input and output XMLin and XMLout in the above 136 statement can be either a serialized String representation, which is the 137 primary representation method, or a Document Object Model (DOM), which is a 138 tree that represents the parsed XML string. Two versions of the processing 139 operation will be provided, string to string and tree to tree. 140 141 \subsection{Dynamic configurability} 142 143 Digital libraries need to be dynamic. It must be possible to routinely add new 144 collections, or new user interfaces, or completely new kinds of service, to a 145 running digital library without having to bring it down and restart it. 146 147 The digital library back end is built around a central MessageRouter module 148 that provides a way of gaining access to any collection or service. When new 149 collections come up, they register with the MessageRouter in order to make 150 themselves visible throughout the system. When users make requests, they are 151 passed to the MessageRouter, which forwards them to the appropriate module for 152 processing. Requests are synchronous; the requesting process is blocked until 153 the result is received. (An asynchronous-to-synchronous buffering module is 154 envisaged if this should become necessary for certain purposes.) 155 156 The most basic request, which any module will respond to, is 157 ``describe-yourself''. (In fact, the ability to respond to 158 ``describe-yourself'' is really what defines a ``module.'') The MessageRouter 159 responds with an XML document which typically specifies some collections that 160 are available locally, and some other Greenstone sites (their own collections 161 may also be listed). Its response may also describe service clusters or single services provided by the 162 MessageRouter itself, for example, cross-collection searching, or collection formation capability. 163 164 A plain ``describe-yourself'' request will return a complete description. A 165 ``describe-yourself'' message sent to a collection returns collection-specific 166 metadata, and a list of services that the collection provides. It is possible 167 to add a qualifier to the request which asks for a particular facet of the 168 complete description instead, thereby achieving communication economy. 169 170 Using these facilities, it is possible for a user interface module to ask a 171 MessageRouter for a list of local collections, remote sites and their 172 collections, and for each collection a list of the services available. The XML 173 documents containing this information could be amalgamated and presented to the 174 user as an XML form that actually implements the services that are represented. 175 176 \subsection{Interacting with the user} 177 178 The MessageRouter, together with the services it provides access to, forms the 179 core of the Greenstone digital library system. Clients could be written that 180 call in a variety of ways upon the services that Greenstone provides. 181 182 A very important form of client is one that implements user interaction with 183 Greenstone3 through a Web browser, which is the standard way of communicating 184 with the digital library system. The user makes a request by clicking a URL or 185 submitting a Web form. This request is intercepted by a servlet which invokes 186 a Greenstone module called a Receptionist. The Receptionist represents the 187 user's normal point of contact with the system: based on the input, it creates XML messages which it passes i into the Greenstone system through the 188 MessageRouter. The responses are gathered together and translated it into the form of 189 a Web page for presentation to the user. 190 191 The Receptionist receives from the servlet an XML representation of 192 the arguments in the URL (``CGI arguments'', though we do not use the 193 CGI mechanism). One of these arguments is the Action, which, along 194 with the Subaction argument determines what information must be 195 requested from the MessageRouter to fulfill the request. Table 1 shows 196 a list of the actions that are understood by Greenstone2; Greenstone 197 3 will have similar functionality. 198 199 The Receptionist includes a Java class for each action. These classes do not 200 know anything about the collections, services, or other sites that are 201 available in the Greenstone system. Instead, they decode the other arguments in 202 the URL to determine what information must be requested, and send it through 203 the MessageRouter. A single action often generates several different requests: 204 for example, to generate the traditional Greenstone home page, the PageAction must query the MessageRouter for a list of its collections. Then, for each collection, collection metadata such as the collection image and collection Title must be retrieved. The XML results returned by these requests are put together 205 into one large XML tree, to which is appended system configuration and 206 translation information. The resulting XML structure is converted, using XSLT 207 files appropriate to that particular action, to an HTML page for presentation 208 to the user. 209 210 Other types of client which do not use HTML may interact with the Receptionist. An output type specifier is included in each request to the Receptionist: using XSLT modes, different output formats may be generated such as XML or WML. 211 212 \subsection{Digital library services} 213 214 A digital library consists of several different ``collections,'' each 215 represented by a collection module. For each collection, a set of ``services'' 216 is provided. Examples of services are 217 \begin{bulletedlist} 218 \item full-text query 219 \item fielded query 220 \item music query 221 \item document retrieval 222 \item metadata retrieval 223 \item browsing classifier 224 \item hierarchical phrase browsing. 225 \end{bulletedlist} 226 227 Services are provided by modules called ``service modules'', which each 228 implement a group of related operations. For example, one service is MGPPGDBM, 229 which implements four operations: full-text and fielded queries, and document 230 and metadata retrieval. MGPPGDBM operates on collections that are in the 231 format of standard Greenstone2 collections, and provides these four services 232 for such collections. Another service is GSDL2Classifier, which provides 233 operations that correspond to a browsing classifier. Together these two 234 classes allow a Greenstone2 collection to be used, completely unchanged, within 235 Greenstone3 (provided an appropriate configuration file is created). 236 237 Service modules are self-describing modules: that is, they respond to the 238 ``describe-yourself'' message. As noted above, collections are also 239 self-describing modules: they respond to ``describe-yourself'' by returning 240 collection-specific metadata, and a list of services that the collection 241 provides---which can then be queried individually using ``describe-yourself'' 242 messages. Thus a collection may be viewed as a cluster of services. 243 Greenstone3 uses service clusters to represent other things than collections. 244 For example, all the operations associated with building a particular kind of 245 collection may be grouped together into a service cluster. 246 247 \subsection{Data in the system}\footnote{I haven't discussed this with anyone yet, however I like it :-) actually now Rob likes it too. NOTE: if we keep this document-resource idea, need to change all the resource refs in this paper to document!!} 248 249 Data in the system consists of 'documents' and 'resources'. A document is an XML document\footnote{whats a better word for a generic document, not a greenstone document ??} that exists independently in the system. You could delete all other documents and it would still be valid (although links to other documents may become invalid). A resource is something that is associated with a document, and doesn't exist outside of that document's context. 250 251 For example, a book that has been added to a collection will be represented by an XML document. The document contains metadata associated with the book, for example Title, Source Author etc. It has xlinks to associated resources or other documents. Any images in the book would be resources belonging to that document. The original representation of the book, eg the pdf file, would also be a resource of the document. There may be associated documents, such as the same book but translated into a different language. This translation is a document in its own right, but is linked to by the original document. 252 253 Documents are indexed, but resources are not. This means that documents can be discovered through searching and browsing. Resources, on the other hand, can only be found via the containing document. Both can be retrieved. Documents are identified by a system id eg HASHxxx. Resources are identified by a unique identifier. This is likely to be a file path---this could be appended to an HTTP address to enable retrieval of the document via HTTP, or could be used as an identifier to request the resource from the site via XML messages. 254 255 The content of the document need not be stored with the document---it may live in the compressed data files. The documents themselves may be stored compressed or in a database. Currently, in Greenstone2, the equivalent information is stored in a gdbm database. 256 257 Documents don't just have to be books and text files. A collection could contain images---each image would have a document, and the content of the document would point to the image file. 258 A document could be a sequence of other documents eg a powerpoint show of individual slides. 259 A classifier is a document - a hierarchical ordering by metadata of a set of documents into lists or categories. 260 261 \subsection{Getting off the ground} 262 263 We have described in broad terms the basic components of Greenstone3. It is a 264 highly configurable system that allows new modules to be added while it is 265 running---dynamic configuration. However, in order to get it off the ground, 266 configuration files are used to define an initial configuration. 267 268 A single computer system may have several different Greenstone systems 269 or ``sites'' running simultaneously, each of which typically serve 270 different collections. For example, a single user may have a public 271 Greenstone site which offers collections to external users over the 272 web, as well as a private site that offers personal collections (like 273 email) that cannot be accessed externally. Or in a multiuser research 274 environment, each user may have one or more sites reflecting 275 Greenstone collections, or additional facilities, in different stages 276 of development. 277 278 The computer system will have just one Greenstone directory structure, 279 though this structure may support several different sites. Each site 280 has a home directory in the Greenstone structure, inside which is a 281 ``collect'' directory that contains the collections offered by that site. 282 283 The sites can be ``served'' in different ways. A servlet can be started up, which invokes a 284 Receptionist and a MessageRouter. One of the arguments to 285 the servlet is the site's home directory. This configuration has a client and server compiled together. The information in this site can then be accessed via the web. Alternatively, a SOAPServer could be started up, which just invokes a MessageRouter. Other Greenstone systems or clients can communicate with this site via SOAP. Greenstone is not limited to SOAP communication---any protocol which can transmit XML may be used to communicate between sites, or between clients and servers. 286 287 For each site there is a configuration file that specifies the URI for the site 288 (localSiteName), and a list of external sites that the site connects to. It 289 may also specify any services or service clusters provided by the site that are not connected with 290 a collection---for example, a language translation service. Collections are 291 not specified in this configuration file; instead they are determined by the 292 contents of the ``collect'' directory for the site. This allows new 293 collections to be added dynamically by placing them in that directory. 52 This section is covered by the paper: An agent based architecture for dynamic digital library construction and configuration. Either cut and paste it in here, or link to the text?? or have two separate docs. dont want to have to maintain two separate versions of the same thing. 294 53 295 54 \section{Greenstone Implementation} 296 55 \label{sec:impl} 297 298 299 \subsection{classes etc??}300 301 In general, a Greenstone module corresponds to a Java class. The Receptionist, Action, MessageRouter, Collection, ServiceCluster modules are all Java classes. The exception is the service. Many services share operations, for example, access to the MGPP index files. For this reason, several services may be implemented by a single class---we call this a ServicesImpl class. For example, MGPPGDBMServices is subclass of ServicesImpl which provides services that use the MGPP files and GDBM databases of a Greenstone 2 collection: TextQuery, DocumentRetrieve and MetadataRetrieve. MGGDBMServices provides the same services, but uses MG and GDBM files from a Greenstone 2 collection.302 56 303 57 \subsection{Configuring Greenstone} … … 312 66 instructions on how the collection is to be built. The second is produced by 313 67 the build-time process and includes any metadata that can be determined 314 automatically.\footnote{Currently it is produced by hand, because collections must 315 be built with Greenstone2.} 68 automatically.\footnote{Currently only the buildConfig.xml file is used - collections are built using gs2 style building and therefore use the old collect.cfg.} 316 69 317 70 \subsubsection{Site configuration file} … … 319 72 The file {\em siteConfig.xml} specifies the URI for the site ({\em 320 73 localSiteName\/}), any services or service clusters provided by the site that are not connected 321 with a particular collection (for example, translation services ), and a list of74 with a particular collection (for example, translation services, or collection building), and a list of 322 75 known external sites to connect to. Collections are not specified in the site 323 76 configuration file, instead they are determined by the contents of the site's … … 325 78 326 79 Here is a configuration file for a rudimentary site with no site-wide services, 327 which does not connect to any external sites. 80 which does not connect to any external sites.\footnote{should the code be tolerant of missing elements? or do we require empty elements?} 328 81 \begin{quote}\begin{footnotesize}\begin{verbatim} 329 82 <config> 330 83 <localSiteName value="org.greenstone.localsite"/> 331 84 <serviceClusterList/> 332 <service sImplList/>85 <serviceRackList/> 333 86 <siteList/> 334 87 </config> 335 88 \end{verbatim}\end{footnotesize}\end{quote} 336 The following configuration file is for a site with one site-wide service, a 337 translation service. It connects to the previous site using SOAP. 89 The following configuration file is for a site with one site-wide service cluster - a collection building cluster. It also connects to the previous site using SOAP. 338 90 \begin{quote}\begin{footnotesize}\begin{verbatim} 339 91 <config> 340 92 <localSiteName value="org.greenstone.gsdl1"/> 341 <service sImplList>93 <serviceRackList/> 342 94 <servicesImpl name="TranslationServices"/> 343 95 </servicesImplList> 344 <serviceClusterList/> 96 <serviceClusterList> 97 <serviceCluster name="build"> 98 <metadataList> 99 <metadata name="Title">Collection builder</metadata> 100 <metadata name="Description">Builds collections in a gsdl2-style manner</metadata> 101 </metadataList> 102 <serviceRackList> 103 <serviceRack name="GS2Construct"/> 104 </serviceRackList> 105 </serviceCluster> 106 </serviceClusterList> 345 107 <siteList> 346 108 <site name="org.greenstone.localsite" … … 351 113 \end{verbatim}\end{footnotesize}\end{quote} 352 114 115 These two sites are running on the same machine. For site1 to talk to localsite, a SOAP server must be run for localsite. The address of the SOAP server, in this case, is "http://localhost:8080/soap/servlet/rpcrouter" 116 353 117 \subsubsection{Building configuration file} 354 118 355 The file {\em buildConfig.xml} contains all metadata a bout the collection that can119 The file {\em buildConfig.xml} contains all metadata and other information about the collection that can 356 120 be determined automatically when building the collection, such as the number of 357 documents it contains. It also includes a list of service sImplclasses that are121 documents it contains. It also includes a list of serviceRack classes that are 358 122 required at runtime to provide the services that have been built into the 359 collection. The service sImplnames are Java classes that are loaded360 dynamically at runtime. Any information inside the service sImplelement is123 collection. The serviceRack names are Java classes that are loaded 124 dynamically at runtime. Any information inside the serviceRack element is 361 125 specific to that service---there is no set format. Here is an example: 362 126 363 127 \begin{quote}\begin{footnotesize}\begin{verbatim} 364 <buildConfiguration> 128 129 <buildConfig> 365 130 <metadataList> 366 <metadata name=" iconCollection">mgppdemo.gif</metadata>367 <metadata name="col Name">mgpp demo</metadata>368 <metadata name=" numDocs">5</metadata>369 <metadata name=" numSections">189</metadata>131 <metadata name="numDocs">11</metadata> 132 <metadata name="colIcon">mgppdemo.gif</metadata> 133 <metadata name="colName">Greenstone demo collection</metadata> 134 <metadata name="colDescription">This is a demonstration collection for the Greenstone digital library software. It contains a small subset of the Humanitarian and Development Libraries.</metadata> 370 135 </metadataList> 371 <servicesImplList> 372 <servicesImpl name="MGPPGDBMServices"> 136 <serviceRackList> 137 <serviceRack name="GS2MGPPRetrieve"> 138 <defaultLevel name="Section"/> 139 <!-- something list this should be used to advertise what metadata the collection has available to be retrieved - however, it is not used yet --> 140 <metadataList> 141 <element name="Title"/><element name="Subject"/><element name="Organization"/><element name="URL"/> 142 </metadataList> 143 </serviceRack> 144 <serviceRack name="GS2MGPPSearch"> 373 145 <defaultIndex name="tt"/> 374 146 <defaultLevel name="Section"/> … … 380 152 <index name="tt"/> 381 153 <index name="t0"/> 382 </indexList> 383 <metadataList> 384 <element name="Title"/> 385 <element name="Subject"/> 386 <element name="Organization"/> 387 <element name="URL"/> 388 </metadataList> 389 </servicesImpl> 390 <servicesImpl name="PhindServices"/> 391 <servicesImpl name="GSDL2ClassifierServices"> 154 </indexList> 155 <fieldList> 156 <field name="TX"/><field name="SU"/><field name="TI"/> 157 </fieldList> 158 </serviceRack> 159 <serviceRack name="PhindPhraseBrowse"/> 160 <serviceRack name="GS2Browse"> 392 161 <classifierList> 393 <classifier name="CL1"> 394 <metadataList> 395 <metadata name="Title">Subject</metadata> 396 </metadataList> 397 </classifier> 398 <classifier name="CL2" > 399 <metadataList> 400 <metadata name="Title">Title</metadata> 401 </metadataList> 402 </classifier> 162 <classifier name="CL1"><metadataList><metadata name="Title">Subject</metadata></metadataList></classifier> 163 <classifier name="CL2" ><metadataList><metadata name="Title">Title</metadata></metadataList></classifier> 164 <classifier name="CL4"><metadataList><metadata name="Title">Organization</metadata></metadataList></classifier> 165 <classifier name="CL5" ><metadataList><metadata name="Title">Keyword</metadata></metadataList></classifier> 403 166 </classifierList> 404 </service sImpl>405 </service sImplList>167 </serviceRack> 168 </serviceRackList> 406 169 </buildConfig> 407 170 \end{verbatim}\end{footnotesize}\end{quote} 408 Note: because {\em collectionConfig.xml} is not used yet, the {\em iconCollection}171 Note: because {\em collectionConfig.xml} is not used yet, the {\em colIcon}, {\em colDescription} 409 172 and {\em colName} metadata elements have been specified here. 410 173 … … 431 194 432 195 The MessageRouter reads in its site configuration file {\em siteConfig.xml}. This 433 lists the Service sImplclasses that need to be loaded, and lists any sites that need196 lists the ServiceRack classes that need to be loaded, and lists any sites that need 434 197 to be connected to. It looks inside the {\em collect} directory which contains 435 198 all the site's collections and loads up a Collection object for each valid … … 437 200 438 201 The Collection object reads its {\em buildConfig.xml} and {\em collectionConfig.xml} 439 files, determines the metadata, and loads ServicesImpl classes based on the 440 names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServicesImpl>#} XML element is passed to the object to be used in configuration.\footnote{Kathy, I don't 441 understand this sentence.} 202 files, determines the metadata, and loads ServiceRack classes based on the 203 names specified in {\em buildConfig.xml\/}. The {\footnotesize \verb#<ServiceRack>#} XML element is passed to the object to be used in configuration. 442 204 443 205 \section{System messages} … … 450 212 All messages are enclosed in 451 213 \begin{quote}\begin{footnotesize}\begin{verbatim} 452 <message lang='xx'> 453 \end{verbatim}\end{footnotesize}\end{quote} 214 <message> 215 \end{verbatim}\end{footnotesize}\end{quote} 216 Messages contain either {\em <request>\/} or {\em <response>\/} elements--- a single message may contain multiple requests. Each {\em <request>\/} (and {\em <response>\/}?) has a language attribute, of the form ``lang='xx'''. 454 217 The language attribute is used by the XSLT to determine the language currently 455 218 being used by the user interface. Virtually all messages contain text strings, 456 219 and services use this attribute to return strings in the appropriate language. 457 Requests are called {\em <request>\/}, responses are called {\em <response>\/}. 458 A single message can hold several requests or responses. 459 460 There are two different types of message, explained in the two subsections 461 below. The first is a simple representation of the arguments in a Greenstone 462 URL. It is a rudimentary message passed into the digital library system from 463 outside. The response is a page of data, typically in HTML. All other messages 464 are internal Greenstone messages, and have the same basic format.\footnote{We 465 format names in lower case with the first letter of internal words capitalized, 466 like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message. 220 221 There are two different styles of messaging, explained in the two subsections 222 below. The first is the communication between the servlet (or other external agent) and the Greenstone system (via the Receptionist). The request contains a simple representation of the arguments in a Greenstone URL, and has the same format as any request in the system. The response is a page of data, typically in HTML. The second style of messaging is the internal Greenstone communication. Requests and responses follow a basic format, and both are in XML.\footnote{We format names in lower case with the first letter of internal words capitalized, like 'matchDocs'.} They typically request one service or one action, and the response contains either the data requested, or a status message. 467 223 468 224 This section describes the two message formats. The following section looks at how the front-end (Receptionist plus Actions) responds to the URL-type messages, and creates internal xxx-type\footnote{are there good names to distinguish the two types of messages?} messages to pass into the system. … … 480 236 481 237 \begin{quote}\begin{footnotesize}\begin{verbatim} 482 <request type=' action' action='a-arg-value' subaction='sa-arg-value'483 output='html'>238 <request type='cgi' action='a-arg-value' subaction='sa-arg-value' 239 lang='en' output='html'> 484 240 <paramList> 485 241 <param name='xx' value=''yyy'/> … … 497 253 Receptionist or directly with the MessageRouter. If they communicate with the Receptionist they must use the cgi-args type of request, asking for predefined pages of information. If they communicate with the MessageRouter directly, they must use the internal message format described in the next section---this is more powerful, but involves more work by the client. Individual services are requested---the results need to be put together by the client. 498 254 499 The arguments used currently are shown in Table~\ref{tab:args}a. 500 Other arguments can be specified by the particular service. For example, the 501 TextQuery service that the MGPPGDBMService module provides uses the additional 502 arguments shown in Table~\ref{tab:args}b. 255 The cgi arguments used currently are shown in Table~\ref{tab:args}. 256 Other arguments can be specified by particular actions.. For example, when the query action recieves a list of parameters from the TextQuery service, it creates short names for them and adds them to the global list of cgi-args. 503 257 504 258 \begin{table} 505 259 \center{\footnotesize 506 260 \begin{tabular}{llll} 507 \cline{2-4} 508 (a) & \bf Action & \bf Argument & \bf Typical value \\ 509 \cline{2-4} 510 & p (page) & sa & home, about \\ 511 & & c (collection) & demo, mgppdemo, ... \\ 512 & q (query) & sa & text, field, music\\ 513 & & c & demo, mgppdemo, ... \\ 514 & & q (query) & the \\ 515 & r (resource) & sa & (not used yet) \\ 516 & & c & demo, mgppdemo, ... \\ 517 & & r (resource) & HASH01af33...\\ 518 & a (applet) & sa & d (display), r (request) \\ 519 & & c & demo, mgppdemo, ... \\ 520 \cline{2-4}\\ 521 \cline{2-4} 522 (b) & \bf Argument & \bf Values \\ 523 \cline{2-4} 524 & s (stem) & 0, 1 \\ 525 & k (casefold) & 0, 1 \\ 526 & mm (matchMode) & all, some \\ 527 & sb (sortBy) & rank, natural \\ 528 & ql (queryLevel) & \multicolumn{2}{l}{Document, Section, Paragraph} \\ 529 & md (matchDocs) & 10, 20, ... \\ 530 \cline{2-4} 261 \hline 262 \bf Argument & \bf Meaning &\bf Typical values \\ 263 \hline 264 a & action & a (applet), q (query), b (browse), p (page), pr (process) \\ 265 sa & subaction & home, about (page action)\\ 266 c & collection or service cluster & demo, build \\ 267 s & service name & TextQuery, ImportCollection \\ 268 rt & request type & d (display), r (request), s (status) \\ 269 ro & request only & 0 or 1 - if set to one, the request is carried out but no processing of the results is done \\ 270 o & output type & xml, html, wml \\ 271 l & language & en, fr, zh \\ 272 d & document id & HASHxxx \\ 273 r & resource id & ???\\ 274 id & process handle & an integer identifying a particular process request \\ 275 \hline 531 276 \end{tabular}} 532 277 \label{tab:args} 533 \caption{Arguments that can appear in a Greenstone URL: (a) generic; 534 (b) additional arguments for the TextQuery service} 278 \caption{Generic rguments that can appear in a Greenstone URL} 535 279 \end{table} 536 280 537 281 Here is an example message that retrieves the home page in French: 538 282 \begin{quote}\begin{footnotesize}\begin{verbatim} 539 <message lang='fr'>540 <request type='action' action='p' subaction='home' output='html'/>283 <message> 284 <request lang='fr' type='cgi' action='p' subaction='home' output='html'/> 541 285 </message> 542 286 \end{verbatim}\end{footnotesize}\end{quote} … … 544 288 This message represents a text query: 545 289 \begin{quote}\begin{footnotesize}\begin{verbatim} 546 <message lang='en'>547 <request type='action' page='q/text'output='html'>290 <message> 291 <request lang='en' type='cgi' action='q' output='html'> 548 292 <paramList> 549 <param name='k' value='0'/> 550 <param name='s' value='1'/> 551 <param name='md' value='10'/> 293 <param name='s' value='TextQuery'/> 552 294 <param name='c' value='demo'/> 553 <param name='q' value='the'/> 295 <param name='rt' value='r'/> 296 <!-- the rest are the service specific params --> 297 <param name='ca' value='0'/> <!-- casefold --> 298 <param name='st' value='1'/> <!-- stem --> 299 <param name='m' value='10'/> <!-- maxdocs --> 300 <param name='q' value='snail'/> <!-- query string --> 554 301 </paramList> 555 302 </message> 556 303 \end{verbatim}\end{footnotesize}\end{quote} 557 304 305 **** UP TO HERE ************** 558 306 \subsubsection{Module to module messages} 559 307 … … 571 319 The most basic message is ``describe-yourself'', which can be sent to any module in the system. The module responds with a predefined piece of XML, making these requests very efficient. 572 320 \begin{quote}\begin{footnotesize}\begin{verbatim} 573 <message lang='en'>574 <request type='describe' to=''/>321 <message> 322 <request lang='en' type='describe' to=''/> 575 323 </message> 576 324 \end{verbatim}\end{footnotesize}\end{quote} … … 578 326 An example response from a MessageRouter might look like this: 579 327 \begin{quote}\begin{footnotesize}\begin{verbatim} 580 <message lang='en'>581 <response type='describe'>328 <message> 329 <response lang='en' type='describe'> 582 330 <serviceList> 583 331 <service name='CrossCollectionSearch' type='query' /> … … 625 373 </message> 626 374 627 <message lang='en'>628 <response type='describe' from='demo' >375 <message> 376 <response lang='en' type='describe' from='demo' > 629 377 <collection name='demo'> 630 378 <serviceList> … … 649 397 Parameters have the following format: 650 398 \begin{quote}\begin{footnotesize}\begin{verbatim} 651 <param name='xxx' type='integer|boolean|string |input' default='yyy'/>652 <param name='xxx' type='enum ' default='aa'/>399 <param name='xxx' type='integer|boolean|string' default='yyy'/> 400 <param name='xxx' type='enum_single|enum_multi' default='aa'/> 653 401 <option name='aa'/><option name='bb'/>... 654 402 </param> 403 <param name='xxx' type='multi' occurs='4'> 404 <param .../> 405 <param .../> 406 </param> 655 407 \end{verbatim}\end{footnotesize}\end{quote} 656 408 If no default is specified, the parameter is assumed to be mandatory. 657 Here are three examples of parameters:409 Here are some examples of parameters: 658 410 \begin{quote}\begin{footnotesize}\begin{verbatim} 659 411 <param name='Case' type='boolean' default='0'/> … … 666 418 <option name='stx'/> 667 419 <param> 420 421 <!-- this one is for the text box and field list for the simple field query--> 422 <param name='simple' type='multi' occurs='4'> 423 <param name='fqv' type='string'/> 424 <param name='fqf' type='enum_single'> 425 <option name='TI'/><option name='AU'/><option name='OR'/> 426 </param> 427 </param> 428 668 429 \end{verbatim}\end{footnotesize}\end{quote} 669 430 Here is a message, along with a sample response. 670 431 \begin{quote}\begin{footnotesize}\begin{verbatim} 671 <message lang='en'>672 <request type='describe' to='demo/TextQuery'/>673 </message> 674 675 <message lang='en'>676 <response type='describe' from='demo/TextQuery' >432 <message> 433 <request lang='en' type='describe' to='demo/TextQuery'/> 434 </message> 435 436 <message> 437 <response lang='en' type='describe' from='demo/TextQuery' > 677 438 <service name='TextQuery' type='query'> 678 439 <paramList> … … 704 465 705 466 <message><request type='configure' to=''> 706 <configure action='activate' type='service sImpl'467 <configure action='activate' type='serviceRack' 707 468 name='TranslationServices'/> 708 469 </request></message> 709 470 \end{verbatim}\end{footnotesize}\end{quote} 710 471 711 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the service sImpl class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new servicesImplobject. As for collections, if one already exists, it is deactivated first.472 The first request is used to remove a collection from the running system once it has been physically deleted. The Collection module is removed from the module list, and information about the collection is removed from the collection list XML. The second request is used when the demo collection has either been modified, or has been newly created. The MessageRouter first checks whether a Collection module of that name already exists, and if so deactivates it, as described above. Then a new Collection module is created and configured, and information added into the XML tree. The final request (re)activates the services provided by the serviceRack class TranslationServices. The site config file is re-read, and the appropriate element used for configuration of the new serviceRack object. As for collections, if one already exists, it is deactivated first. 712 473 713 474 The response to a configure request is a status or an error message. No data is sent back, just success or error. An example is: … … 721 482 Configure requests are only answered by the MessageRouter at this stage. It is possible that other modules may need to respond to these requests also. 722 483 723 The main type of requests in the system are for services. There are different types of services: query, build\footnote{need new name?}, transform, enrich, extract, accrete. The two most common ones are build and query. Build is for collection formation, query is for the typical use of those collections---querying, browsing, retrieving documents. The other types of service generally enhance the functionality of the first two. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form. 724 725 'Query' requests are the most used requests in the system. They are requests for data of some kind, for example, a list of the documents matching a certain criteria, the Title and Author metadata for some specified documents, the text for a specified document, and so on. Each request has a content, and some parameters that specify modifications to the way the query is carried out. So the basic form of a query request is as follows: 726 727 \begin{quote}\begin{footnotesize}\begin{verbatim} 728 <message lang='en'> 729 <request type='query' to='demo/TextQuery'> 484 The main type of requests in the system are for services. There are different types of services: query, browse, retrieve, process, applet. Query services do some kind of search and return a list of documents. Retrieve services can return those documents, metadata about the documents, or other resources. Browse is for browsing lists or hierarchies of documents. process type services are those where the request is for a command to be run. A status code will be returned immediately, and then if the command has not finished, an update of the status can be requested. Applet services are those that run an applet. 485 486 Other possibilities include transform, enrich, extract, accrete. These types of service generally enhance the functionality of the first set. They may be used during collection formation: 'accrete' documents by adding them to a collection, 'transform' the documents into a different format, 'extract' information or acronyms from the documents, 'enrich' those documents with the information extracted or by adding new information. They may also be used during querying: 'transform' a query before using it to query a collection, or 'transform' the documents you get back into an appropriate form. 487 488 The basic structure of a service request is as follows: 489 \begin{quote}\begin{footnotesize}\begin{verbatim} 490 <message> 491 <request lang='en' type='query' to='demo/TextQuery'> 730 492 <paramList/> 731 493 <content/> … … 734 496 \end{verbatim}\end{footnotesize}\end{quote} 735 497 736 The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request. The value of the parameter can be an attribute, or the content of the parameter. 737 Attributes can be used for simple strings. 498 The parameters are name value pairs corresponding to parameters that were specified in the service description sent in response to a describe request. 738 499 739 500 \begin{quote}\begin{footnotesize}\begin{verbatim} … … 742 503 <param name='index' value='dtx'/> 743 504 \end{verbatim}\end{footnotesize}\end{quote} 744 or 745 \begin{quote}\begin{footnotesize}\begin{verbatim} 746 <param name='case'>1</param> 747 <param name='maxDocs'>34</param> 748 <param name='index'>dtx</param> 749 \end{verbatim}\end{footnotesize}\end{quote} 750 751 The content of the query is the actual query itself---for a text query, this is the query string. For an image or music query, it would be the image file or music clip. For document retrieval, the identifier of the document is the content. 752 753 Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query. 505 506 Some requests have a content---for document retrieval, the content is the list of documents to retrieve. For metadata retrieval, teh content is the list of documents, and a list of metadata to retrieve for each document. 507 508 Responses vary depending on the type of request. 509 Responses to query requests contain a content, which is the actual result, along with some metadata about the query\footnote{is this called metadata or something else?}. For instance, a text query on 'snail farming', with the parameter 'maxDocs=10' might return the first 10 documents, and one of the query metadata items would be the total number of documents that matched the query.\footnote{no metadata about the query result is returned yet.} 754 510 755 511 The following shows some example query requests and their responses. … … 757 513 Find at most 10 Sections containing the word snail (stemmed), returning the results in unsorted order: 758 514 \begin{quote}\begin{footnotesize}\begin{verbatim} 759 <message lang='en'>760 <request to="mgppdemo/TextQuery" type="query">515 <message> 516 <request lang='en' to="mgppdemo/TextQuery" type="query"> 761 517 <paramList> 762 518 <param name="maxDocs" value="10"/> … … 774 530 775 531 \begin{quote}\begin{footnotesize}\begin{verbatim} 776 <message lang='en'>777 <response from="mgppdemo/TextQuery" type="query">532 <message> 533 <response lang='en' from="mgppdemo/TextQuery" type="query"> 778 534 <content> 779 < resourceList>780 < resourcename="HASH010f073f22033181e206d3b7"/>781 < resourcename="HASH010f073f22033181e206d3b7.2"/>782 < resourcename="HASHac0a04dd14571c60d7fbfd"/>783 </ resourceList>535 <documentList> 536 <document name="HASH010f073f22033181e206d3b7"/> 537 <document name="HASH010f073f22033181e206d3b7.2"/> 538 <document name="HASHac0a04dd14571c60d7fbfd"/> 539 </documentList> 784 540 </content> 785 541 </response> … … 789 545 Give me the Title metadata for these documents: 790 546 \begin{quote}\begin{footnotesize}\begin{verbatim} 791 <message lang='en'>792 <request to="mgppdemo/MetadataRetrieve" type="query">547 <message> 548 <request lang='en' to="mgppdemo/MetadataRetrieve" type="retrieve"> 793 549 <content> 794 < resourceList>795 < resourcename="HASH010f073f22033181e206d3b7"/>796 < resourcename="HASH010f073f22033181e206d3b7.2"/>797 < resourcename="HASHac0a04dd14571c60d7fbfd"/>798 </ resourceList>550 <documentList> 551 <document name="HASH010f073f22033181e206d3b7"/> 552 <document name="HASH010f073f22033181e206d3b7.2"/> 553 <document name="HASHac0a04dd14571c60d7fbfd"/> 554 </documentList> 799 555 <metadataList> 800 556 <metadata name="Title"/> … … 806 562 807 563 \begin{quote}\begin{footnotesize}\begin{verbatim} 808 <message lang='en'>809 <response from="mgppdemo/MetadataRetrieve" type="query">564 <message> 565 <response lang='en' from="mgppdemo/MetadataRetrieve" type="retrieve"> 810 566 <content> 811 < resourceList>812 < resourcename="HASH010f073f22033181e206d3b7">567 <documentList> 568 <document name="HASH010f073f22033181e206d3b7"> 813 569 <metadataList> 814 570 <metadata name="Title">Farming snails 1: … … 816 572 </metadata> 817 573 </metadataList> 818 </ resource>819 < resourcename="HASH010f073f22033181e206d3b7.2">574 </document> 575 <document name="HASH010f073f22033181e206d3b7.2"> 820 576 <metadataList> 821 577 <metadata name="Title">Learning about snails</metadata> 822 578 </metadataList> 823 </ resource>824 < resourcename="HASHac0a04dd14571c60d7fbfd">579 </document> 580 <document name="HASHac0a04dd14571c60d7fbfd"> 825 581 <metadataList> 826 582 <metadata name="Title">Farming snails 2: … … 828 584 </metadata> 829 585 </metadataList> 830 </ resource>831 </ resourceList>586 </document> 587 </documentList> 832 588 </content> 833 589 </response> … … 837 593 Give me the text for this document: 838 594 \begin{quote}\begin{footnotesize}\begin{verbatim} 839 <message lang='en'>840 <request to="mgppdemo/ResourceRetrieve" type="query">595 <message> 596 <request lang='en' to="mgppdemo/DocumentRetrieve" type="retrieve"> 841 597 <content> 842 < resourceList>843 < resourcename="HASH010f073f22033181e206d3b7.2"/>844 </ resourceList>598 <documentList> 599 <document name="HASH010f073f22033181e206d3b7.2"/> 600 </documentList> 845 601 </content> 846 602 </request> … … 849 605 850 606 \begin{quote}\begin{footnotesize}\begin{verbatim} 851 <message lang='en'>852 <response from="mgppdemo/ResourceRetrieve" type="query">607 <message> 608 <response lang='en' from="mgppdemo/DocumentRetrieve" type="retrieve"> 853 609 <content> 854 < resourcename="HASH010f073f22033181e206d3b7.2">610 <document name="HASH010f073f22033181e206d3b7.2"> 855 611 <content> 856 612 </B><P ALIGN="JUSTIFY"></P> … … 863 619 </P>.... 864 620 </content> 865 </ resource>621 </document> 866 622 </content> 867 623 </response> … … 876 632 877 633 \begin{quote}\begin{footnotesize}\begin{verbatim} 878 <message lang='en'>879 <request type='build' to='build/NewCollection'>634 <message> 635 <request lang='en' type='process' to='build/NewCollection'> 880 636 <paramList> 881 637 <param name='creator' value='[email protected]'/> … … 886 642 </message> 887 643 888 <message lang='en'>889 <request type='build' to='build/ImportCollection'>644 <message> 645 <request lang='en' type='process' to='build/ImportCollection'> 890 646 <paramList> 891 647 <param name='collection' value='demo'/> … … 907 663 <page> 908 664 <config/> 909 < translate/>665 <display/> 910 666 <request/> 911 667 <response/> … … 913 669 \end{verbatim}\end{footnotesize}\end{quote} 914 670 915 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Translatecontains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization.916 917 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and translateinformation, and the xslt files.671 There are four main elements in the page: config, translate, request, response. The request is the original request that came into the Receptionist---this is included so that any parameters can be preset to their previous values, for example, the query options on the query form.\footnote{this should be saved instead in some sort of state saving - if you leave a page and go back you want your parameters to be the same as well}. The response contains all the data that has been gathered from the system by the action. The other two elements contain extra information needed by XSLT. Config contains run-time variables such as the location of the gsdl home directory, the current site name, the name of the executable that is running (eg library)---these are needed to allow the XSLT to generate correct HTML URLs. Display contains some of the text strings needed in the interface---these are separate from the XSLT to allow for internationalization. 672 673 The following subsections outline, for each action, what data is needed and what requests are generated to send to the system. Following that, Section~\ref{subsec:xslt} describes the config and display information, and the xslt files. 918 674 919 675 \subsubsection{Page action} … … 926 682 \subsubsection{Query action} 927 683 928 Currently, only text query has been implemented. 929 For each page, the service description is requested from the TextQuery service or the current collection (via a describe request). This is done every time the query page is 930 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If there is no query 931 string specified in the URL, only this information is needed---the request was for the blank query page. 932 If there is a query string specified, i.e. the user has entered a query, a query request to the TextQuery service is sent. This has the query string as content, and all the parameters from the URL in the parameter list. A list of document identifiers 684 There are three query services which have been implemented: TextQuery, SimpleFieldQuery, and AdvancedFieldQuery. These are all handled in the same way by query action. 685 For each page, the service description is requested from the service of the current collection (via a describe request). This is done every time the query page is 686 displayed.\footnote{This information should be cached.} The description includes a list of the parameters available for the query, such as case/stem, max num docs to return, etc. If the request type (rt) parameter is set to d for display, the action only needs to display the form, and this is the only request to the service. Otherwise, the submit button has been pressed, and a query request to the TextQuery service is sent. This has all the parameters from the URL put into the parameter list. A list of document identifiers 933 687 is returned. A followup query is sent to the MetadataRetrieve service of the collection: the content includes the list of 934 documents, with a request for their {\em Title} metadata. The resultis935 transformed using {\em textquery.xsl\/}.688 documents, with a request for their {\em Title} metadata. The service description and query result are combined into a page of xml, which is 689 transformed using {\em basicquery.xsl\/} to produce the html page. 936 690 937 691 \subsubsection{Applet action} … … 1046 800 current library servlet as its value. 1047 801 1048 \subsubsection{ Resourceaction}1049 1050 ResourceAction sends a query to the ResourceRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and802 \subsubsection{Document action} 803 804 DocumentAction sends a query to the DocumentRetrieve service of the collection requesting the text of the specified document. At this stage no additional information is obtained, but in future stuff like Title and 1051 805 table of contents would be needed to make the display nicer. 1052 806 … … 1057 811 can override these files by having their own copy of the appropriate 1058 812 files. New interfaces have their own directory inside interfaces/. Sites and collections can have a transform directory containing XSLT files. The order in which the XSLT files are looked for is collection, site, current 1059 interface, default interface. 813 interface, default interface.\footnote{this currently breaks down for remote sites - need to rethink it a bit.} 1060 814 1061 815 \subsection{Internationalization} 1062 816 1063 Internationalization is a bit part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 1064 1065 At the moment:\footnote{this may change soon, so I haven't 'nice'd this text yet} 1066 1067 Language specific text strings are specified as xml files, named by 1068 the language code, eg en.xml, fr.xml. 1069 1070 They are located in interfaces/translate. This assumes one set of 1071 language files per system set up. (or should they be site/interface 1072 specific??) 1073 1074 A Translate class is used to hold the xml for the languages. The 1075 Receptionist has a Translate object. It sets the default language to 1076 be 'en', the current language is whatever a message lang attribute 1077 specifies. 1078 1079 The translation object internally holds DOM trees for the languages it 1080 has loaded. It has a mapping between language name and the tree. When 1081 the default language is set, the appropriate xml file is read in and 1082 parsed into a DOM tree. 1083 1084 A call to getLanguageTree(lang) returns a DOM element of the form: 1085 1086 \begin{quote}\begin{footnotesize}\begin{verbatim} 1087 <translate> 1088 <current><text>.. the actual text elems...</text></current> 1089 <default><text>.. the actual text elems...</text></default> 1090 <translate> 1091 \end{verbatim}\end{footnotesize}\end{quote} 1092 If the specified lang has not been loaded yet, it will be read into 1093 memory. Only languages which have been asked for are loaded into 1094 memory. But once loaded, they stay there. Will need to see how much 1095 memory this requires once we use full language files.---may need to 1096 limit the number of cached languages? or maybe only hold two in 1097 memory, and read them in from file again when a new one is asked for. 1098 1099 The xml files start with the {\em <text>\/} element. The elements are 1100 organized hierarchically. An example is the following. 1101 1102 \begin{quote}\begin{footnotesize}\begin{verbatim} 1103 <text> 1104 <common> 1105 <nzdl>New Zealand Digital Library</nzdl> 1106 <aboutpage>about page</aboutpage> 1107 <search>Search</search> 1108 <browse>Browse</browse> 1109 <applet>Applet</applet> 1110 <home>HOME</home> 1111 <on>on</on> 1112 <off>off</off> 1113 </common> 1114 <query> 1115 <queryoptions>Query Options:</queryoptions> 1116 <params><case><name>Case differences:</name> 1117 <on>ignore case differences</on> 1118 <off>upper/lower case must match</off></case> 1119 <stem><name>Word endings:</name> 1120 <on>ignore word endings</on> 1121 <off>whole word must match</off></stem> 1122 <sortBy><name>Sort results by:</name> 1123 <rank>rank</rank><natural>none</natural></sortBy> 1124 <maxDocs><name>Maximum number of documents to return:</name></maxDocs> 1125 <matchMode><name>Match mode:</name> 1126 <all>all</all><some>some</some></matchMode> 1127 <queryLevel><name>Level:</name><Section>Section</Section> 1128 <Document>Document</Document></queryLevel></params> 1129 <beginsearch>Begin Search</beginsearch> 1130 </query> 1131 </text> 1132 \end{verbatim}\end{footnotesize}\end{quote} 1133 Most of the text strings will be specified by the main xml files, but 1134 some will come from the services/collections. In this case, the lang 1135 attribute of the message will indicate which language text to return. 1136 1137 Text strings can added to the HTML output in two ways. In the XSLT, we 1138 know which text strings are needed, eg 'home' for the home link. home 1139 is in common/home, so we get the text by calling the text template 1140 with common/home as a param: 1141 1142 \begin{quote}\begin{footnotesize}\begin{verbatim} 1143 <xsl:call-template name='text'> 1144 <xsl:with-param name='key'>common/home</xsl:with-param> 1145 </xsl:call-template> 1146 \end{verbatim}\end{footnotesize}\end{quote} 1147 If we want to specify text strings in the xml result (rather than the 1148 XSLT---would we want to do this?), we can use 1149 {\footnotesize \verb#<text key='common/home'/>#}. 1150 {\footnotesize \verb#<xsl:apply-templates select='text'/>#} must then be used when 1151 processing the parent node. 1152 1153 The template is shown below. Basically, it looks for an appropriate 1154 element in the current language tree, and if its not found, it looks 1155 in the default language tree. 1156 1157 \begin{quote}\begin{footnotesize}\begin{verbatim} 1158 <xsl:template name='text' match='text'> 1159 <xsl:param name='key'><xsl:value-of select='@key'/></xsl:param> 1160 <!-- try the current language --> 1161 <xsl:variable name='path1'> 1162 ancestor::page/translate/current/text/<xsl:value-of select='$key'/> 1163 </xsl:variable> 1164 <xsl:variable name='string1'><xsl:value-of 1165 select='java:org.apache.xalan.lib.Extensions.evaluate($path1)'/> 1166 </xsl:variable> 1167 <xsl:choose><xsl:when test='boolean(string($string1))'> 1168 <xsl:value-of select='$string1'/></xsl:when> 1169 <xsl:otherwise> 1170 <!-- try the default language --> 1171 <xsl:variable name='path2'> 1172 ancestor::page/translate/default/text/<xsl:value-of select='$key'/> 1173 </xsl:variable> 1174 <xsl:value-of select= 1175 'java:org.apache.xalan.lib.Extensions.evaluate($path2)'/> 1176 </xsl:otherwise> 1177 </xsl:choose> 1178 </xsl:template> 1179 \end{verbatim}\end{footnotesize}\end{quote} 1180 817 Internationalization is a big part of Greenstone3. Language specific text strings are separated out from the rest of the system to allow for easy incorporation of new languages. 818 819 Language specific text strings are specified in resource bundle property files. These live in resources/java. 820 821 There is a properties file per class, and one per interface. At the moment, we have 822 823 GS2MGPPSearch.properties 824 GS2MGPPRetrieve.properties etc - the service classes 825 826 interface_default.properties. - for the default interface 827 828 To add other languages, create eg GS2MGPPSearch_fr.properties. 829 830 The interface ones are treated differently from the other ones. The action doesn't know which text strings are needed by a particular transform, so it gets them all out of the properties file, and puts them into an xml $<$display$>$ element - the xslt can get the ones it needs from there. 831 xslt could perhaps get the stuff from the properties bundle on the fly using java extension elements - would this be better? 832 833 All other class specific text strings are just retrieved one by one as they are needed and added into the xml - for example, the names for query params are retrieved when the service description is created. 1181 834 1182 835 \subsection{Collection formation} 1183 836 1184 1185 There is no facility to create collections in GSDL3 yet. There are three 1186 working servicesImpl classes: MGPPGDBMServices, GSDL2ClassifierServices and PhindServices---these use 1187 standard collections built with MGPP and gdbm from GSDL2. For 1188 PhindService, you need to add 'classify phind' to the collect.cfg file 1189 during building. For the GSDL2ClassifierServices you need to have any other classifiers specified. 1190 1191 To use a collection in GSDL3, build using mgpp in the old greenstone 1192 (see mgpp\_in\_greenstone.txt in the mgpp/docs directory of either 1193 gsdl). 1194 1195 Then copy the collection over into the appropriate collect directory, 1196 and create index/buildConfig.xml (see \ref{subsec:config}). The basic info 1197 that you need is shown below. Substitute the appropriate values for 1198 your collection. Only put the phind service one in if you have a phind 1199 classifier. 1200 1201 \begin{quote}\begin{footnotesize}\begin{verbatim} 1202 <buildConfiguration> 1203 <metadataList> 1204 <metadata name="iconCollection">mgppdemo.gif</metadata> 1205 <metadata name="colName">mgpp demo</metadata> 1206 </metadataList> 1207 <servicesImplList> 1208 <servicesImpl name="MGPPGDBMServices"> 1209 <defaultIndex name="tt"/> 1210 <defaultLevel name='Section'/> 1211 </servicesImpl> 1212 <servicesImpl name="PhindServices"/> 1213 <servicesImpl name="GSDL2ClassifierServices"> 1214 <classifierList> 1215 <classifier name="CL1"> 1216 <metadataList> 1217 <metadata name="Title">Subject</metadata> 1218 </metadataList> 1219 </classifier> 1220 <classifier name="CL2" > 1221 <metadataList> 1222 <metadata name="Title">Title</metadata> 1223 </metadataList> 1224 </classifier> 1225 </classifierList> 1226 </servicesImpl> 1227 </servicesImplList> 1228 </buildConfiguration> 1229 \end{verbatim}\end{footnotesize}\end{quote} 837 Greenstone 2 compatible building has been implemented in gsdl3. so far only mgpp collections will work. 838 839 Collection construction can be done through the web, using the build servicecluster in localsite. Just sequence through the steps needed. So far, addDocument does not work, so documents need to be manually added to teh import directory. 840 841 You need to carry out the following services: 842 NewCollection 843 - add docs to import directory 844 ImportCollection 845 BuildCollection 846 ActivateCollection 847 848 If you want anything other than the default for the config file, you need to add it by hand - there is currently no ConfigureCollection service which would enable you to do this. 849 850 Collection building can also be done on the command line: 851 852 ConstructCollection -site <site-path> -mode new|import|build|activate [options] <coll-name> 853 854 eg 855 856 ConstructCollection -site /research/kjdon/home/gsdl3/sites/localsite -mode new -creator [email protected] testcol 857 858 the options get passed to the underlying script, - there is no good help message yet. 859 860 import and build use gs2 import.pl and buildcol.pl so you can specify any of their options if you like. 861 862 Building stuff is in src/java/org/greenstone/gsdl3/build. 863 864 CollectionConstructor is the base class for building control. GS2PerlConstructor is the implementation that uses greenstone 2 perl scripts. The building process sends events (ConstructionEvent) to any listeners (ConstructionListener) as important stages happen. You can add one or more listeners to the constructor which will get notified of events. 1230 865 1231 866 \section{Details} … … 1257 892 & Utility classes \\ 1258 893 gsdl3/src/java/org/greenstone/gsdl3/collection 1259 & Collection class\\894 & ServiceCluster and Collection classes\\ 1260 895 gsdl3/src/java/org/greenstone/gsdl3/comms 1261 896 & Communicator classes, eg SOAP\\ 897 gsdl3/src/java/org/greenstone/gsdl3/build 898 & stuff for collection building \\ 1262 899 gsdl3/src/java/org/greenstone/gsdl3/action 1263 900 & Action classes used by the Receptionist---do the work of displaying the pages\\ … … 1276 913 gsdl3/lib/java 1277 914 & Java jar files\\ 915 gsdl3/resources 916 & any resources that may be needed\\ 917 gsdl3/resources/java 918 & properties files for java resource bundles - used to handle all the language specific text\\ 919 gsdl3/bin 920 & executable stuff lives here\\ 921 gsdl3/bin/script 922 & some perl building scripts\\ 923 gsdl3/bin/linux 924 & linux executables for eg mgpp\\ 1278 925 gsdl3/comms 1279 926 & Put some stuff here for want of a better place---things to do with servers and communication. eg soap stuff, and tomcat servlet container\\ … … 1305 952 gsdl3/interfaces/default/transforms 1306 953 & The XSLT files\\ 1307 gsdl3/interfaces/translate1308 & Language specific stuff---language xml files containing all the text strings go here\\1309 954 \hline 1310 955 \end{tabular}} … … 1317 962 \newcommand{\gsdlhome}{\begin{footnotesize}{\em \$GSDL3HOME}\end{footnotesize}} 1318 963 1319 Cuurently, greenstone3 is only available through CVS. The installation procedure has not been automated. Eventually, all that will be needed (hopefully) will be a {\footnotesize \verb#configure, make, make install#} sequence. But for now, all the steps must be done by hand.964 Cuurently, greenstone3 is only available through CVS. The installation procedure has been automated. 1320 965 1321 966 \subsubsection{Get the source} … … 1340 985 1341 986 \noindent If you need it, the password for anonymous CVS access is {\footnotesize \verb#anonymous#}. 1342 \\ 1343 \\ 1344 \noindent You also need to download the mgpp code - it comes in a separate CVS module. 1345 1346 \noindent I once added a directory for mgpp in gsdl3/packages in cvs---now I can't get 1347 rid of it, so you need to delete it before you start. 987 988 \subsubsection{Compile and install greenstone}\label{subsec:compile} 989 990 An install.sh script has been constructed (thanks, Stuart) to compile and install greenstone 3. What you nee to do is: 1348 991 1349 992 \begin{footnotesize}\begin{tt} 1350 \noindent cd \gsdlhome/packages\\ 1351 rm -r mgpp\\ 1352 cvs co mgpp\\ 1353 \end{tt}\end{footnotesize} 1354 1355 \subsubsection{Compile and install greenstone}\label{subsec:compile} 1356 1357 \noindent From here on, \gsdlhome\ is the absolute path to the top-level directory of the gsdl3 checkout. 1358 For example, /research/kjdon/gsdl3. 1359 \\ 1360 \\ 1361 \noindent First, set up your classpath:\\ 1362 \begin{footnotesize}\begin{tt} 1363 cd \gsdlhome\\ 993 cd gsdl3 994 source setup.bash 995 install.bash 1364 996 source setup.bash 1365 997 \end{tt}\end{footnotesize} 1366 998 1367 \noindent Note: this step needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc. 1368 \\ 1369 \\ 1370 \noindent Compile mgpp:\\ 999 If you want to do greenstone2 compatible building (currently the only type) you need to have greenstone 2 installed, 'source setup.bash' in the top level greenstone 2 directory, then re-'source setup.bash' for greenstone 3. This is to set GSDLHOME for tomcat. 1000 1001 \noindent Note: 'source setup.bash' needs to be done once in any xterm window before doing a make or running tomcat. setup.bash sets the environment variables {\footnotesize \verb#CLASSPATH#, \verb#PATH#, \verb#JAVA_HOME#} etc. 1002 1003 If you want to use SOAP to talk to remote sites, you also need to do the following: 1004 1371 1005 \begin{footnotesize}\begin{tt} 1372 cd \gsdlhome/packages/mgpp\\ 1373 ./configure --prefix \gsdlhome\\ 1374 make\\ 1375 make install\\ 1006 install-soap.bash 1376 1007 \end{tt}\end{footnotesize} 1377 1008 1378 \noindent Note: you need to use \gsdlhome\ as the prefix for mgpp's configure at this stage---mgpp has been set up properly, but gsdl3 hasn't. 1379 1380 \noindent Next you need to compile greenstone. 1381 1382 \noindent A jar file is used from tomcat during compilation, so this must be unpacked first. 1383 \begin{footnotesize}\begin{tt} 1384 cd \gsdlhome/comms/tomcat/\\ 1385 tar xzvf jakarta-tomcat-4.0.1.tar.gz \\ 1386 \end{tt}\end{footnotesize} 1387 \\ 1388 \\ 1389 \noindent Do a \verb#make#, then a \verb#make install# in each of the following directories:\\ 1390 \begin{footnotesize}\begin{tt} 1391 \gsdlhome/src/java/org/greenstone/gdbm\\ 1392 \gsdlhome/src/java/org/greenstone/testing\\ 1393 \gsdlhome/src/java/org/greenstone/gsdl3\\ 1394 \gsdlhome/src/java/org/greenstone/applet/phind 1395 \end{tt}\end{footnotesize} 1396 1397 \subsubsection{Set up the sample sites} 1009 Thats it. 1010 1011 You dont want to run install.bash twice - it adds stuff into files 1012 1013 To update your installation, you can run update.bash - this remakes all the java stuff. 1014 1015 1016 \subsubsection{The sample sites} 1398 1017 1399 1018 \noindent There are two greenstone ``sites'' that come with the checkout: localsite, and site1. localsite has several collections, only two of which have any actual data. The third is a dummy collection. site1 has one dummy collection. Each site has a configuration file which specifies the site name, site-wide services if any, and a list of remote sites to connect to. … … 1402 1021 \noindent The collections which do not have data can be looked at but you cant do any queries on them. 1403 1022 1404 \noindent The data comes in tar files, which need to be unpacked: 1405 1406 \begin{footnotesize}\begin{tt} 1407 \noindent cd \gsdlhome/sites/localsite/collect/mgppdemo/index/\\ 1408 tar xzvf mgpp-indexfiles.tar.gz\\ 1409 cd ../../chinesedemo/index\\ 1410 tar xzvf chinese-index-files.tar.gz\\ 1411 \end{tt}\end{footnotesize} 1412 1413 \subsubsection{Set up tomcat} 1023 1024 \subsubsection{Tomcat} 1414 1025 1415 1026 \noindent Tomcat is a servlet container. It is used to serve a greenstone site using a servlet. … … 1428 1039 \end{verbatim}\end{footnotesize} 1429 1040 1430 \noindent You need to replace {\footnotesize \verb#/research/kjdon/home/gsdl3#} with the correct path for \gsdlhome, in both library servlet entries. 1431 \\ 1432 \\ 1433 \noindent Next, symbolic links to the sites, interfaces and lib directories need to be set up---this enables tomcat to 'see' files in these directories. 1434 1435 \begin{footnotesize}\begin{tt} 1436 \noindent cd \gsdlhome/web\\ 1437 ln -s ../interfaces\\ 1438 ln -s ../sites\\ 1439 ln -s ../lib 1440 \end{tt}\end{footnotesize} 1441 1442 \noindent The test servlet needs to be compiled: (you need to set up your {\footnotesize CLASSPATH} if you haven't already, see \ref{subsec:compile})\\ 1443 \begin{footnotesize}\begin{tt} 1444 \noindent cd \gsdlhome/web/WEB-INF/classes\\ 1445 javac TestServlet.java 1446 \end{tt}\end{footnotesize} 1447 \\ 1448 \\ 1449 \noindent Next, one of the scripts that runs tomcat needs to be altered to use our {\footnotesize CLASSPATH}. 1450 1451 \begin{footnotesize}\begin{tt} 1452 \noindent cd \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1 1453 \end{tt}\end{footnotesize} 1454 \\ 1455 \\ 1456 \noindent edit {\footnotesize \verb#bin/catalina.sh#}: 1457 1458 \noindent on line 89 add {\footnotesize \$CLASSPATH} to the {\footnotesize CP="...."} line ie. {\footnotesize CP="\$CLASSPATH:..."}---this 1459 sets up the classpath properly 1460 \\ 1461 \\ 1462 \noindent Now you need to tell tomcat about the greenstone context: 1463 \\ 1464 \\ 1465 \noindent edit {\footnotesize \verb#conf/server.xml#}: 1466 1467 \noindent you need to add a context for gsdl servlets---there are other context elements in the xml---this one goes at the same level as those ones.\\ 1468 add the following (putting the correct path for \gsdlhome) 1469 1470 \begin{footnotesize}\begin{tt} 1471 \noindent <!-- GSDL3 Service -->\\ 1472 <Context path="/gsdl3" docBase="\gsdlhome/web" debug="1" reloadable="true"/> 1473 \end{tt}\end{footnotesize} 1041 The file \gsdlhome/comms/tomcat/jakarta/conf/server.xml is the tomcat configuration file. setup.bash adds a context for gsdl servlets - this tells tomcat where to find the web.xml file, and what url (eg /gsdl3) to give it. 1474 1042 1475 1043 \noindent Note: tomcat runs on port 8080 - you can change that if you wish in this file … … 1507 1075 \\ 1508 1076 \noindent The SOAP server we use is actually run as a servlet in tomcat. You need to set up SOAP, set up the SOAP server class which will be your service, and then deploy that service. 1509 \\ 1510 \\ 1511 \noindent Set up SOAP: 1512 \\ 1513 \\ 1514 \begin{footnotesize}\begin{tt} 1515 \noindent cd \gsdlhome/comms/soap\\ 1516 tar xzvf soap-bin-2.2.tar.gz 1517 \end{tt}\end{footnotesize} 1518 \\ 1519 \\ 1520 \noindent The context for the SOAP servlet needs to be added to the tomcat server.xml file in the same way that you added the context for gsdl3: 1521 1522 \noindent edit \begin{footnotesize}{\tt \gsdlhome/comms/tomcat/jakarta-tomcat-4.0.1/conf/server.xml}\end{footnotesize} 1523 1524 \noindent add the following (put the proper path for \gsdlhome) 1525 1526 \begin{footnotesize}\begin{tt} 1527 \noindent <!-- SOAP Service -->\\ 1528 <Context path="/soap" docBase="\gsdlhome/comms/soap/soap-2\_2/webapps/soap"\\ 1529 debug="1" reloadable="true"/> 1530 \end{tt}\end{footnotesize} 1531 \\ 1532 \\ 1533 \noindent Next, the class SOAPServer must be altered---the constructor is not allowed any arguments, so it has a path hard coded in it. This is the address of the site that is to be served. In \begin{footnotesize}{\tt \gsdlhome/src/java/org/greenstone/gsdl3/SOAPServer.java}\end{footnotesize}, you need to change the {\footnotesize \verb#site_home#} variable to \begin{footnotesize}{\tt \gsdlhome/sites/localsite}\end{footnotesize} (using the absolute path). 1534 \\ 1535 \\ 1536 \noindent The SOAPServer service now needs to be deployed. If tomcat is not running, start it up (see \ref{subsec:runtomcat}). 1077 1078 this is done by install-soap.bash. 1079 You can also deploy a service through the website. If tomcat is not running, start it up (see \ref{subsec:runtomcat}). 1537 1080 1538 1081 \noindent The SOAP servlet can be accessed at \begin{footnotesize}{\tt http://localhost:8080/soap}\end{footnotesize}. You should see a welcome page. Click on ``Run the admin client''. This enables you to list, deploy and undeploy SOAP services.
Note:
See TracChangeset
for help on using the changeset viewer.