source: documented-examples/trunk/oai-e/resources/ 36558

Last change on this file since 36558 was 36558, checked in by anupama, 5 months ago

These were missed out in previous commit: Fixes to URLs and eacute html entity reference in oai-e, bibtex-e and dls-e. The eacute html entity change was so the English string would work on GTI, I wasn't able to get entities other than less than and greater than to appear on GTI. 2 Updates to the out-of-date Russian DEC coll description translations to make them DEC friendly.

File size: 7.7 KB
1name=OAI demo
9index_Description=photo captions
13shortDescription=<p>This collection demonstrates Greenstone\'s <i>ImportFrom</i> feature. Using the <a href="http\://">Open Archive Protocol</a> (version 1.1), it retrieves metadata from <a href="http\://"></a>, a collection of photographs taken at the inaugural <a href="http\://">Joint Conference on Digital Libraries</a>. A Greenstone collection is built from the records exported from this OAI data provider. The implementation is flexible enough to cope with the minor syntax differences between OAI 1.1 and OAI 2.0.</p>
15description1=<h3>How the collection works</h3><p>The collection configuration file, <tt>collectionConfig.xml</tt>, includes an <i>acquire</i> line that is interpreted by a special program called <i></i>. Like other Greenstone programs, this takes as argument the name of the collection, and provides a summary of other arguments when invoked with argument <i>-help</i>. It reads the collection configuration file, finds the acquire line, and processes it. In this case, it is run with the command\: <pre> oai-e </pre> (the collection\'s name is <i>oai-e</i>).</p>
17description2=<p>The <i>acquire</i> line in the configuration file specifies the OAI protocol and gives the base URL of an OAI repository. The <i>importfrom</i> program downloads all the metadata in that repository into the collection\'s <i>import</i> directory. The <i>getdoc</i> argument instructs it to also download the collection\'s source documents, whose URLs are given in each document\'s Dublin Core <i>Identifier</i> field (this is a common convention). The metadata files, which each contain an XML record for one source document, are placed in the <i>import</i> file structure along with the documents themselves, and the document filename is the same as the filename in the URL. The <i>Identifier</i> field is overridden to give the local filename, and its original value is retained in a new field called <i>OrigURL</i>.</p>
19description3=<p>This <i>oai-e</i> collection\'s own <tt>etc/oai.txt</tt> is an example of a downloaded metadata file.</p>
21description4=<p>Once the OAI information has been imported, the collection is processed in the usual way. Besides the four standard plugins (GreenstoneXMLPlugin, MetadataXMLPlugin, ArchivesInfPlugin and DirectoryPlugin), the configuration file specifies the OAI plugin, which processes OAI metadata, and the image plugin, because in this case the collection\'s source documents are image files. The OAI plugin has been supplied with an <i>input_encoding</i> argument because data in this archive contains extended characters. It also has a <i>default_language</i> argument. Greenstone normally determines the language of documents automatically, but these metadata records are too small for this to be done reliably\: hence English is specified explicitly in the <i>language</i> argument. The OAI plugin parses the metadata and passes it to the appropriate source document file, which is then processed by an appropriate plugin -- in this case <i>ImagePlugin</i>. This plugin specifies the resolution for the screen versions of the images.</p>
23description5=<p>Extracted metadata from OAI records are mapped to Dublin Core Metadata Set by default. As a result, classifiers and indexes in this collection are built with Dublin meatadata elements.</p>
25description6=<p>The collection configuration file, <tt>collectionConfig.xml</tt>, specifies a single full-text index containing <i>dc.Description</i> metadata and overrides Greenstone\'s custom <i>gsf</i> format templates <tt>DocumentHeading</tt> and <tt>DocumentContent</tt> (XSL). When a document is displayed, the <i>DocumentHeading</i> format statement puts out its <i>dc.Subject</i>. Then the <i>DocumentContent</i> statement follows this with <i>screenicon</i>, which is produced by <i>ImagePlugin</i> and gives a screen-resolution version of the image; it can be hyperlinked to the <i>dc.OrigURL</i> metadata -- that is, the original version of the image on the remote OAI site. Since this is no longer available on the web, it is now hyperlinked to the full version of the image file. This is followed by the image\'s <i>dc.Description</i>, also with a hyperlink; the image\'s size and type, again generated as metadata by <i>ImagePlugin</i>; and then <i>dc.Subject</i>, <i>dc.Publisher</i>, and <i>dc.Rights</i> metadata. <a href="library/collection/oai-e/document/01dle6">This</a> is the result.</p>
27description7=<p>There are two browsing classifiers, one based on <i>dc.Subject</i> metadata and the other on <i>dc.Description</i> metadata (but with a button named "captions"). Recall that the <i>AZCompactList</i> classifier is like <i>AZList</i> but generates a bookshelf for duplicate items. In this collection there are a lot of images but only a few different values for <i>dc.Subject</i> metadata.</p>
29description8=<p>It\'s a little surprising that <i>AZCompactList</i> is used (instead of <i>AZList</i>) for the <i>dc.Description</i> index too, because <i>dc.Description</i> metadata is usually unique for each image. However, in this collection the same description has occasionally been given to several images, and some of the divisions in an <i>AZList</i> would contain a large number of images, slowing down transmission of that page. To avoid this, the compact version of the list is used with some arguments (<i>mincompact</i>, <i>maxcompact</i>, <i>mingroup</i>, <i>minnesting</i>) to control the display -- e.g. groups (represented by bookshelves) are not formed unless they have at least 5 (<i>mingroup</i>) items. To find out the meaning of the other arguments for this classifier, execute the command <i> AZCompactList</i>. The programs <i></i> (for classifiers) and <i></i> (for plugins) are useful tools for learning about the capabilities of Greenstone modules. Note incidentally the backslash in the configuration file, used to indicate a continuation of the previous line.</p>
31description9=<p>The <i>VList</i> format specification shows the image thumbnail, hyperlinked to the associated document, followed by <i>dc.Description</i> metadata; the result can be seen in the <a href="library/collection/oai-e/browse/CL2">CL2</a> classifier browser. The <i>Vlists</i> for the classifiers use <i>numleafdocs</i> to switch between an icon representing several documents (which will appear as a bookshelf) and the thumbnail itself, if there is only one image.</p>
33description10=<h3>The Greenstone OAI server</h3><p>Greenstone comes with a built-in OAI data provider. This runs as a CGI program called "oaiserver.cgi", and is installed in the Greenstone <i>cgi-bin</i> directory. It can be accessed via the same URL as the Greenstone library (replacing "library.cgi" with "oaiserver.cgi"). If you are using the Windows local library server, you must install a web server (such as Apache) to run the OAI server.</p>
35description11=<p>Configuration of the server is done via the <i>oai.cfg</i> file in the Greenstone <i>etc</i> directory. This file specifies general information about the repository, and lists collections to be made accessible to OAI clients. By default, collections are not accessible. To enable a collection, add its name to the <i>oaicollection</i> list.</p>
37description12=<p>Greenstone\'s OAI server currently supports Dublin Core, qualified Dublin Core and rfc1807 metadata sets. The <i>oaimetadata</i> line specifies which sets should be used. For collections that use other metadata sets, metadata mapping rules should be provided to map the existing metadata to the sets in use. See the <i>oai.cfg</i> file for details.</p>
Note: See TracBrowser for help on using the repository browser.