- Timestamp:
- 2022-08-26T19:23:36+12:00 (20 months ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
documented-examples/trunk/pagedimg-e/resources/collectionConfig.properties
r36477 r36518 2 2 section_text=newspaper pages 3 3 4 shortDescription=<p>This collection contains a few newspapers from the <a href='http ://www.nzdl.org/cgi-bin/library?a=p&p=about&c=niupepa'> Niupepa</a> collection of Maori newspapers.</p>4 shortDescription=<p>This collection contains a few newspapers from the <a href='http\://www.nzdl.org/cgi-bin/library?a=p&p=about&c=niupepa'> Niupepa</a> collection of Maori newspapers.</p> 5 5 6 6 description1=<h3>How the collection works</h3> <p>Each newspaper issue consists of a set of images, one per page, and a set of text files for the OCR\'d text. An item file links the set of pages into a single newspaper document. PagedImagePlugin is used to process the item files.</p> 7 7 8 description2=<p>There are two styles of item files, and this collection demonstrates both. The first uses a text based format, and consists of a list of metadata for the document, and a list of pages. Some examples are : <i>Te Waka o Te Iwi, Vol. 1, No. 1</i> (in <tt>import/09/09_1_1.item</tt>) and <i>Te Whetu o Te Tau, Vol. 1, No. 3</i> (in <tt>import/10/10_1_3.item</tt>. This format allows specification of document level metadata, and a single list of pages.</p>8 description2=<p>There are two styles of item files, and this collection demonstrates both. The first uses a text based format, and consists of a list of metadata for the document, and a list of pages. Some examples are\: <i>Te Waka o Te Iwi, Vol. 1, No. 1</i> (in <tt>import/09/09_1_1.item</tt>) and <i>Te Whetu o Te Tau, Vol. 1, No. 3</i> (in <tt>import/10/10_1_3.item</tt>. This format allows specification of document level metadata, and a single list of pages.</p> 9 9 10 description3=<p>The second style is an extended format, and uses XML. It allows a hierarchy of pages, and metadata specification at the page level as well as at the document level. An example is <i>Matariki 1881, No. 2</i> in <tt>import/xml/23/23__2.item</tt>. This newspaper also has an abstract associated with it. The contents have been grouped into two sections : Supplementary Material, which contains the Abstract, and Newspaper Pages, which contains the page images.</p>10 description3=<p>The second style is an extended format, and uses XML. It allows a hierarchy of pages, and metadata specification at the page level as well as at the document level. An example is <i>Matariki 1881, No. 2</i> in <tt>import/xml/23/23__2.item</tt>. This newspaper also has an abstract associated with it. The contents have been grouped into two sections\: Supplementary Material, which contains the Abstract, and Newspaper Pages, which contains the page images.</p> 11 11 12 12 description4=<p>Paged documents can be presented with a hierarchical table of contents (e.g. <a href="library/collection/pagedimg-e/document/23__1?ed=1">23__1.2.1</a>), or with a single-depth structure (e.g. <a href="library/collection/pagedimg-e/document/10_1_2">10_1_2</a>). This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin. Ordinarily, a Greenstone collection would have one plugin per document type, and all documents of that type get the same processing. In this case, we want to treat the XML-based item files differently from the text-based item files. We can achieve this by adding two PagedImagePlugin plugins to the collection, and configuring them differently.</p>
Note:
See TracChangeset
for help on using the changeset viewer.