Changeset 36477

2022-08-24T20:04:52+12:00 (20 months ago)

Reinstated as many internal links in the DEC collection descriptions for oai-e and pagedimg-e as I could, there weren't many. Also some more minor corrections.

2 edited


  • documented-examples/trunk/oai-e/resources/

    r36404 r36477  
    1717description2=<p>The <i>acquire</i> line in the configuration file specifies the OAI protocol and gives the base URL of an OAI repository. The <i>importfrom</i> program downloads all the metadata in that repository into the collection\'s <i>import</i> directory. The <i>getdoc</i> argument instructs it to also download the collection\'s source documents, whose URLs are given in each document\'s Dublin Core <i>Identifier</i> field (this is a common convention). The metadata files, which each contain an XML record for one source document, are placed in the <i>import</i> file structure along with the documents themselves, and the document filename is the same as the filename in the URL. The <i>Identifier</i> field is overridden to give the local filename, and its original value is retained in a new field called <i>OrigURL</i>.</p>
    19 description3=<p>The <tt>etc/oai.txt</tt> is an example of a downloaded metadata file.</p>
     19description3=<p>This <i>oai-e</i> collection's own <tt>etc/oai.txt</tt> is an example of a downloaded metadata file.</p>
    2121description4=<p>Once the OAI information has been imported, the collection is processed in the usual way. Besides the four standard plugins (GreenstoneXMLPlugin, MetadataXMLPlugin, ArchivesInfPlugin and DirectoryPlugin), the configuration file specifies the OAI plugin, which processes OAI metadata, and the image plugin, because in this case the collection\'s source documents are image files. The OAI plugin has been supplied with an <i>input_encoding</i> argument because data in this archive contains extended characters. It also has a <i>default_language</i> argument. Greenstone normally determines the language of documents automatically, but these metadata records are too small for this to be done reliably: hence English is specified explicitly in the <i>language</i> argument. The OAI plugin parses the metadata and passes it to the appropriate source document file, which is then processed by an appropriate plugin -- in this case <i>ImagePlugin</i>. This plugin specifies the resolution for the screen versions of the images.</p>
    2929description8=<p>It\'s a little surprising that <i>AZCompactList</i> is used (instead of <i>AZList</i>) for the <i>dc.Description</i> index too, because <i>dc.Description</i> metadata is usually unique for each image. However, in this collection the same description has occasionally been given to several images, and some of the divisions in an <i>AZList</i> would contain a large number of images, slowing down transmission of that page. To avoid this, the compact version of the list is used with some arguments (<i>mincompact</i>, <i>maxcompact</i>, <i>mingroup</i>, <i>minnesting</i>) to control the display -- e.g. groups (represented by bookshelves) are not formed unless they have at least 5 (<i>mingroup</i>) items. To find out the meaning of the other arguments for this classifier, execute the command <i> AZCompactList</i>. The programs <i></i> (for classifiers) and <i></i> (for plugins) are useful tools for learning about the capabilities of Greenstone modules. Note incidentally the backslash in the configuration file, used to indicate a continuation of the previous line.</p>
    31 description9=<p>The <i>VList</i> format specification shows the image thumbnail, hyperlinked to the associated document, followed by <i>dc.Description</i> metadata; the result can be seen in the <tt>CL2</tt> classifier browser. The <i>Vlists</i> for the classifiers use <i>numleafdocs</i> to switch between an icon representing several documents (which will appear as a bookshelf) and the thumbnail itself, if there is only one image.</p>
     31description9=<p>The <i>VList</i> format specification shows the image thumbnail, hyperlinked to the associated document, followed by <i>dc.Description</i> metadata; the result can be seen in the <a href="library/collection/oai-e/browse/CL2">CL2</a> classifier browser. The <i>Vlists</i> for the classifiers use <i>numleafdocs</i> to switch between an icon representing several documents (which will appear as a bookshelf) and the thumbnail itself, if there is only one image.</p>
    3333description10=<h3>The Greenstone OAI server</h3><p>Greenstone comes with a built-in OAI data provider. This runs as a CGI program called "oaiserver.cgi", and is installed in the Greenstone <i>cgi-bin</i> directory. It can be accessed via the same URL as the Greenstone library (replacing "library.cgi" with "oaiserver.cgi"). If you are using the Windows local library server, you must install a web server (such as Apache) to run the OAI server.</p>
  • documented-examples/trunk/pagedimg-e/resources/

    r36454 r36477  
    1010description3=<p>The second style is an extended format, and uses XML. It allows a hierarchy of pages, and metadata specification at the page level as well as at the document level. An example is <i>Matariki 1881, No. 2</i> in <tt>import/xml/23/23__2.item</tt>. This newspaper also has an abstract associated with it. The contents have been grouped into two sections: Supplementary Material, which contains the Abstract, and Newspaper Pages, which contains the page images.</p>
    12 description4=<p>Paged documents can be presented with a hierarchical table of contents (e.g. <tt>23__1.2.1</tt>), or with next and previous page arrows, and a goto page box (e.g. <tt>10_1_2</tt>). This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin. The next and previous arrows suit the linear sequence documents, while the table of contents suits the hierarchically organised document. Ordinarily, a Greenstone collection would have one plugin per document type, and all documents of that type get the same processing. In this case, we want to treat the XML-based item files differently from the text-based item files. We can achieve this by adding two PagedImagePlugin plugins to the collection, and configuring them differently.</p>
     12description4=<p>Paged documents can be presented with a hierarchical table of contents (e.g. <a href="library/collection/pagedimg-e/document/23__1?ed=1">23__1.2.1</a>), or with a single-depth structure (e.g. <a href="library/collection/pagedimg-e/document/10_1_2">10_1_2</a>). This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin. Ordinarily, a Greenstone collection would have one plugin per document type, and all documents of that type get the same processing. In this case, we want to treat the XML-based item files differently from the text-based item files. We can achieve this by adding two PagedImagePlugin plugins to the collection, and configuring them differently.</p>
    14 description5=<p><tt>plugin PagedImagePlugin -documenttype hierarchy -process_exp xml.*\.item$; <br/> plugin PagedImagePlugin -documenttype paged</tt></p>
     14description5=<p><tt>plugin PagedImagePlugin -documenttype hierarchy -process_exp xml.*\.item$ ... <br/> plugin PagedImagePlugin -documenttype paged ...</tt></p>
    1616description6=<p>XML based newpapers have been grouped into a folder called <tt>xml</tt>. This enables us to process these files differently, by utilising the <tt>process_exp</tt> option which all plugins support. The first PagedImagePlugin in the list looks for item files underneath the xml folder. These documents will be processed as hierarchical documents. Item files that don\'t match the process expression (i.e. aren\'t underneath the xml folder) will be passed onto the second PagedImagePlugin, and these are treated as paged documents.</p>
Note: See TracChangeset for help on using the changeset viewer.