[18738] | 1 | creator [email protected]
|
---|
| 2 | maintainer [email protected]
|
---|
| 3 | public true
|
---|
| 4 |
|
---|
| 5 | indexes section:text
|
---|
| 6 | defaultindex section:text
|
---|
| 7 |
|
---|
[19060] | 8 | plugin GreenstoneXMLPlugin
|
---|
[18738] | 9 | # We want the two types of paged documents to be treated differently: paged
|
---|
[19168] | 10 | # and hierarchical. So include two PagedImagePlugin plugins and modify the
|
---|
| 11 | # process_exp.
|
---|
[19060] | 12 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$
|
---|
| 13 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype paged
|
---|
| 14 | plugin MetadataXMLPlugin
|
---|
| 15 | plugin ArchivesInfPlugin
|
---|
| 16 | plugin DirectoryPlugin
|
---|
[18738] | 17 |
|
---|
| 18 | classify AZCompactList -metadata Series -sort Date
|
---|
| 19 | classify DateList
|
---|
| 20 |
|
---|
| 21 | # Format statements to display Series, Volume, Number and Date information
|
---|
| 22 |
|
---|
| 23 | format DocumentVList "<td valign=top>[link][icon][/link]</td>
|
---|
| 24 | <td valign=top>{If}{[Series],[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]},[highlight]{Or}{[Title],[PageNum]}[/highlight]}</td>"
|
---|
| 25 |
|
---|
| 26 | format CL1VList "<td valign=top>[link][icon][/link]</td>
|
---|
| 27 | <td valign=top>{If}{[numleafdocs],[Title],{If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]} ([Date])}</td>"
|
---|
| 28 |
|
---|
| 29 | format SearchVList "<td valign=top>[link][icon][/link]</td>
|
---|
| 30 | <td valign=top>[parent(Top):Series] {If}{[parent(Top):Volume],Vol. [parent(Top):Volume]} {If}{[parent(Top):Number],No. [parent(Top):Number]} Page [Title]</td>"
|
---|
| 31 |
|
---|
| 32 | format DateList "<td valign=top>[link][icon][/link]</td>
|
---|
| 33 | <td valign=top>[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]}</td>"
|
---|
| 34 |
|
---|
| 35 | format HList "[link][highlight][ex.Title][/highlight][/link]"
|
---|
| 36 |
|
---|
| 37 | # We customise the document display, so use the extended options
|
---|
| 38 | format AllowExtendedOptions true
|
---|
| 39 |
|
---|
[19168] | 40 | # We want to add in fullsize/preview/text buttons to switch between the
|
---|
| 41 | # different versions of each page
|
---|
[18738] | 42 |
|
---|
| 43 | format DocumentHeading "<center><table width=_pagewidth_>
|
---|
| 44 | <tr valign=top><td>{Or}{[parent(Top):Series],[Series]}</td></tr>
|
---|
| 45 | <tr valign=top><td><table><tr><td>
|
---|
| 46 | [DocumentButtonDetach][DocumentButtonHighlight]
|
---|
| 47 | {If}{_cgiargp_ eq 'fullsize',{If}{[screenicon],_document:viewpreview_}
|
---|
[19168] | 48 | {If}{[NoText] eq \'1\',,_document:viewtext_},
|
---|
[18738] | 49 | {If}{_cgiargp_ eq 'preview',{If}{[srcicon],_document:viewfullsize_}
|
---|
[19168] | 50 | {If}{[NoText] eq \'1\',,_document:viewtext_},
|
---|
[18738] | 51 | {If}{[srcicon],_document:viewfullsize_}
|
---|
| 52 | {If}{[screenicon],_document:viewpreview_}}}
|
---|
| 53 | </td></tr></table></td>
|
---|
| 54 | <td>[DocTOC]</td></tr></table></center>"
|
---|
| 55 |
|
---|
| 56 | # Document text display changes based on the p argument - this is not used
|
---|
| 57 | #normally for document display, so we can use it here to switch between
|
---|
| 58 | #fullsize/preview/text versions.
|
---|
| 59 | format DocumentText "<center><table width=_pagewidth_><tr><td>
|
---|
[19168] | 60 | {If}{_cgiargp_ eq \'fullsize\',[srcicon],
|
---|
| 61 | {If}{_cgiargp_ eq \'preview\',[screenicon],{If}{[NoText] eq \'1\',,[Text]}}}
|
---|
[18738] | 62 | </td></tr></table></center>"
|
---|
| 63 |
|
---|
| 64 |
|
---|
| 65 | # -- English strings --------------------
|
---|
| 66 | collectionmeta collectionname [l=en] "Paged Image example"
|
---|
| 67 | collectionmeta .section:text [l=en] "newspaper pages"
|
---|
| 68 |
|
---|
| 69 | # -- English text -----------------------
|
---|
| 70 |
|
---|
| 71 | collectionmeta collectionextra [l=en] "This collection contains a few newspapers from the
|
---|
| 72 | <a href='http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=niupepa'>
|
---|
| 73 | Niupepa</a> collection of Maori newspapers.
|
---|
| 74 |
|
---|
| 75 | <h3>How the collection works</h3>
|
---|
| 76 | <p>Each newspaper issue consists of a set of images, one per page, and a set
|
---|
| 77 | of text files for the OCR'd text. An item file links the set of pages into a
|
---|
[19168] | 78 | single newspaper document. PagedImagePlugin is used to process the item files.
|
---|
[18738] | 79 | <p>There are two styles of item files, and this collection demonstrates both.
|
---|
| 80 | The first uses a text based format, and consists of a list of metadata for the
|
---|
| 81 | document, and a list of pages. Here are some examples:
|
---|
| 82 | <a href='_httpcollection_/import/09/09\_1\_1.item'>Te Waka o Te Iwi, Vol. 1, No. 1</a>,
|
---|
| 83 | <a href='_httpcollection_/import/10/10\_1\_3.item'>Te Whetu o Te Tau, Vol. 1, No. 3</a>.
|
---|
| 84 | This format allows specification of document level metadata, and a single list of pages.
|
---|
| 85 | <p>The second style is an extended format, and uses XML. It allows a hierarchy
|
---|
| 86 | of pages, and metadata specification at the page level as well as at the
|
---|
| 87 | document level. An example is <a href='_httpcollection_/import/xml/23/23\_\_2.item'>Matariki 1881, No. 2</a>.
|
---|
| 88 | This newspaper also has an abstract associated with it. The contents have been
|
---|
| 89 | grouped into two sections: Supplementary Material, which contains the Abstract,
|
---|
| 90 | and Newspaper Pages, which contains the page images.
|
---|
| 91 | <p>Paged documents can be presented with a hierarchical table of contents
|
---|
| 92 | (e.g. <a href='?a=d&c=pagedimg&d=HASHecd552ed3c2d5f1f6a620f.2.2&p=text'>this one</a>),
|
---|
| 93 | or with next and previous page arrows, and a goto page box
|
---|
| 94 | (e.g. <a href='?a=d&c=pagedimg&d=HASH01f4f2a92e501cdfa5d243bb.2&p=preview'>this one</a>).
|
---|
[19168] | 95 | This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin.
|
---|
[18738] | 96 | The next and previous arrows suit the linear sequence documents, while the table of contents
|
---|
| 97 | suits the hierarchically organised document. Ordinarily, a Greenstone collection
|
---|
| 98 | would have one plugin per document type, and all documents of that type get
|
---|
| 99 | the same processing. In this case, we want to treat the XML-based item files
|
---|
| 100 | differently from the text-based item files. We can achieve this by adding two
|
---|
[19168] | 101 | PagedImagePlugin plugins to the collection, and configuring them differently.
|
---|
| 102 | <p><tt>plugin PagedImagePlugin -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$<br/>
|
---|
| 103 | plugin PagedImagePlugin -screenview -minimumsize 100 -documenttype paged </tt>
|
---|
[18738] | 104 |
|
---|
| 105 | <p>XML based newpapers have been grouped into a folder called <tt>xml</tt>.
|
---|
| 106 | This enables us to process these files differently, by utilising the
|
---|
[19168] | 107 | <tt>process_exp</tt> option which all plugins support. The first PagedImagePlugin
|
---|
[18738] | 108 | in the list looks for item files underneath the xml folder. These documents
|
---|
| 109 | will be processed as hierarchical documents. Item files that don't match the
|
---|
| 110 | process expression (i.e. aren't underneath the xml folder) will be passed onto
|
---|
[19168] | 111 | the second PagedImagePlugin, and these are treated as paged documents.
|
---|
| 112 |
|
---|
[18738] | 113 | <p><b>Formatting</b>
|
---|
| 114 | <p>We have modified the document formatting to display fullsized images,
|
---|
| 115 | preview images or text, with buttons to switch between them. This involves
|
---|
| 116 | modifications to the DocumentHeading and DocumentText format statements in the
|
---|
| 117 | <a href='_httpcollection_/etc/collect.cfg'>collection configuration file</a>,
|
---|
| 118 | and some macro definitions in the <a href='_httpcollection_/macros/extra.dm'>extra.dm macro file</a>.
|
---|
| 119 | The extra.dm macro file provides definitions for the buttons (\_viewfullsize\_,
|
---|
| 120 | \_viewpreview\_, \_viewtext\_) which are used by the format statement in the
|
---|
| 121 | collect.cfg file. The format statement switches the document display and sets
|
---|
| 122 | the buttons to be displayed based on the p argument, which is also set by the
|
---|
| 123 | format statement.
|
---|
| 124 | "
|
---|
| 125 |
|
---|