source: documented-examples/trunk/pagedimg-e/resources/collectionConfig.properties@ 36249

Last change on this file since 36249 was 36249, checked in by anupama, 23 months ago

Another DEC collection now runs. The collection descriptions are still GS2 specific, but will need to discuss with Kathy later before fixing on final versions. Also need to ask her how the section_text variable can get loaded when referenced from collectionConfig.xml (it's now defined in resources\collectionConfig properties.

File size: 3.8 KB
Line 
1name=Paged Image example
2section_text=newspaper pages
3
4shortDescription=<p>This collection contains a few newspapers from the <a href='http://www.nzdl.org/cgi-bin/library?a=p&amp;p=about&amp;c=niupepa'> Niupepa</a> collection of Maori newspapers.</p>
5
6description1=<h3>How the collection works</h3> <p>Each newspaper issue consists of a set of images, one per page, and a set of text files for the OCR'd text. An item file links the set of pages into a single newspaper document. PagedImagePlugin is used to process the item files.</p>
7
8description2=<p>There are two styles of item files, and this collection demonstrates both. The first uses a text based format, and consists of a list of metadata for the document, and a list of pages. Here are some examples: <a href='_httpcollection_/import/09/09\_1\_1.item'>Te Waka o Te Iwi, Vol. 1, No. 1</a>, <a href='_httpcollection_/import/10/10\_1\_3.item'>Te Whetu o Te Tau, Vol. 1, No. 3</a>. This format allows specification of document level metadata, and a single list of pages.</p>
9
10description3=<p>The second style is an extended format, and uses XML. It allows a hierarchy of pages, and metadata specification at the page level as well as at the document level. An example is <a href='_httpcollection_/import/xml/23/23\_\_2.item'>Matariki 1881, No. 2</a>. This newspaper also has an abstract associated with it. The contents have been grouped into two sections: Supplementary Material, which contains the Abstract, and Newspaper Pages, which contains the page images.</p>
11
12description4=<p>Paged documents can be presented with a hierarchical table of contents (e.g. <a href='?a=d&amp;amp;d=23\_\_1.2.1&amp;sa=text'>this one</a>), or with next and previous page arrows, and a goto page box (e.g. <a href='?a=d&amp;amp;d=10\_1\_2&amp;sa=preview'>this one</a>). This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin. The next and previous arrows suit the linear sequence documents, while the table of contents suits the hierarchically organised document. Ordinarily, a Greenstone collection would have one plugin per document type, and all documents of that type get the same processing. In this case, we want to treat the XML-based item files differently from the text-based item files. We can achieve this by adding two PagedImagePlugin plugins to the collection, and configuring them differently.</p>
13
14description5=<p><tt>plugin PagedImagePlugin -documenttype hierarchy -process_exp xml.*\.item$; <br/> plugin PagedImagePlugin -documenttype paged</tt></p>
15
16description6=<p>XML based newpapers have been grouped into a folder called <tt>xml</tt>. This enables us to process these files differently, by utilising the <tt>process_exp</tt> option which all plugins support. The first PagedImagePlugin in the list looks for item files underneath the xml folder. These documents will be processed as hierarchical documents. Item files that don't match the process expression (i.e. aren't underneath the xml folder) will be passed onto the second PagedImagePlugin, and these are treated as paged documents.</p>
17
18description7=<p><b>Formatting</b> <p>We have modified the document formatting to display fullsized images, preview images or text, with buttons to switch between them. This involves modifications to the DocumentHeading and DocumentText format statements in the <a href='_httpcollection_/etc/collect.cfg' target='collect.cfg'>collection configuration file</a>, and some macro definitions in the <a href='_httpcollection_/macros/extra.dm' target='extra.dm'>extra.dm macro file</a>. The extra.dm macro file provides definitions for the buttons (\_viewfullsize\_, \_viewpreview\_, \_viewtext\_) which are used by the format statement in the collect.cfg file. The format statement switches the document display and sets the buttons to be displayed based on the p argument, which is also set by the format statement.</p>
19
Note: See TracBrowser for help on using the repository browser.