1 | creator [email protected]
|
---|
2 | maintainer [email protected]
|
---|
3 | public true
|
---|
4 |
|
---|
5 | indexes section:text
|
---|
6 | defaultindex section:text
|
---|
7 |
|
---|
8 | plugin GreenstoneXMLPlugin
|
---|
9 | # We want the two types of paged documents to be treated differently: paged
|
---|
10 | # and hierarchical. So include two PagedImagePlugin plugins and modify the
|
---|
11 | # process_exp.
|
---|
12 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$
|
---|
13 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype paged
|
---|
14 | plugin MetadataXMLPlugin
|
---|
15 | plugin ArchivesInfPlugin
|
---|
16 | plugin DirectoryPlugin
|
---|
17 |
|
---|
18 | classify AZCompactList -metadata Series -sort Date
|
---|
19 | classify DateList
|
---|
20 |
|
---|
21 | # Format statements to display Series, Volume, Number and Date information
|
---|
22 |
|
---|
23 | format DocumentVList "<td valign=top>[link][icon][/link]</td>
|
---|
24 | <td valign=top>{If}{[Series],[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]},[highlight]{Or}{[Title],[PageNum]}[/highlight]}</td>"
|
---|
25 |
|
---|
26 | format CL1VList "<td valign=top>[link][icon][/link]</td>
|
---|
27 | <td valign=top>{If}{[numleafdocs],[Title],{If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]} ([Date])}</td>"
|
---|
28 |
|
---|
29 | format SearchVList "<td valign=top>[link][icon][/link]</td>
|
---|
30 | <td valign=top>[parent(Top):Series] {If}{[parent(Top):Volume],Vol. [parent(Top):Volume]} {If}{[parent(Top):Number],No. [parent(Top):Number]} Page [Title]</td>"
|
---|
31 |
|
---|
32 | format DateList "<td valign=top>[link][icon][/link]</td>
|
---|
33 | <td valign=top>[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]}</td>"
|
---|
34 |
|
---|
35 | format HList "[link][highlight][ex.Title][/highlight][/link]"
|
---|
36 |
|
---|
37 | # We customise the document display, so use the extended options
|
---|
38 | format AllowExtendedOptions true
|
---|
39 |
|
---|
40 | # We want to add in fullsize/preview/text buttons to switch between the
|
---|
41 | # different versions of each page
|
---|
42 |
|
---|
43 | format DocumentHeading "<center><table width=_pagewidth_>
|
---|
44 | <tr valign=top><td>{Or}{[parent(Top):Series],[Series]}</td></tr>
|
---|
45 | <tr valign=top><td><table><tr><td>
|
---|
46 | [DocumentButtonDetach][DocumentButtonHighlight]
|
---|
47 | {If}{_cgiargp_ eq 'fullsize',{If}{[screenicon],_document:viewpreview_}
|
---|
48 | {If}{[NoText] eq \'1\',,_document:viewtext_},
|
---|
49 | {If}{_cgiargp_ eq 'preview',{If}{[srcicon],_document:viewfullsize_}
|
---|
50 | {If}{[NoText] eq \'1\',,_document:viewtext_},
|
---|
51 | {If}{[srcicon],_document:viewfullsize_}
|
---|
52 | {If}{[screenicon],_document:viewpreview_}}}
|
---|
53 | </td></tr></table></td>
|
---|
54 | <td>[DocTOC]</td></tr></table></center>"
|
---|
55 |
|
---|
56 | # Document text display changes based on the p argument - this is not used
|
---|
57 | #normally for document display, so we can use it here to switch between
|
---|
58 | #fullsize/preview/text versions.
|
---|
59 | format DocumentText "<center><table width=_pagewidth_><tr><td>
|
---|
60 | {If}{_cgiargp_ eq \'fullsize\',[srcicon],
|
---|
61 | {If}{_cgiargp_ eq \'preview\',[screenicon],{If}{[NoText] eq \'1\',,[Text]}}}
|
---|
62 | </td></tr></table></center>"
|
---|
63 |
|
---|
64 |
|
---|
65 | # -- English strings --------------------
|
---|
66 | collectionmeta collectionname [l=en] "Paged Image example"
|
---|
67 | collectionmeta .section:text [l=en] "newspaper pages"
|
---|
68 |
|
---|
69 | # -- English text -----------------------
|
---|
70 |
|
---|
71 | collectionmeta collectionextra [l=en] "This collection contains a few newspapers from the
|
---|
72 | <a href='http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=niupepa'>
|
---|
73 | Niupepa</a> collection of Maori newspapers.
|
---|
74 |
|
---|
75 | <h3>How the collection works</h3>
|
---|
76 | <p>Each newspaper issue consists of a set of images, one per page, and a set
|
---|
77 | of text files for the OCR'd text. An item file links the set of pages into a
|
---|
78 | single newspaper document. PagedImagePlugin is used to process the item files.
|
---|
79 | <p>There are two styles of item files, and this collection demonstrates both.
|
---|
80 | The first uses a text based format, and consists of a list of metadata for the
|
---|
81 | document, and a list of pages. Here are some examples:
|
---|
82 | <a href='_httpcollection_/import/09/09\_1\_1.item'>Te Waka o Te Iwi, Vol. 1, No. 1</a>,
|
---|
83 | <a href='_httpcollection_/import/10/10\_1\_3.item'>Te Whetu o Te Tau, Vol. 1, No. 3</a>.
|
---|
84 | This format allows specification of document level metadata, and a single list of pages.
|
---|
85 | <p>The second style is an extended format, and uses XML. It allows a hierarchy
|
---|
86 | of pages, and metadata specification at the page level as well as at the
|
---|
87 | document level. An example is <a href='_httpcollection_/import/xml/23/23\_\_2.item'>Matariki 1881, No. 2</a>.
|
---|
88 | This newspaper also has an abstract associated with it. The contents have been
|
---|
89 | grouped into two sections: Supplementary Material, which contains the Abstract,
|
---|
90 | and Newspaper Pages, which contains the page images.
|
---|
91 | <p>Paged documents can be presented with a hierarchical table of contents
|
---|
92 | (e.g. <a href='?a=d&c=pagedimg&d=HASHecd552ed3c2d5f1f6a620f.2.2&p=text'>this one</a>),
|
---|
93 | or with next and previous page arrows, and a goto page box
|
---|
94 | (e.g. <a href='?a=d&c=pagedimg&d=HASH01f4f2a92e501cdfa5d243bb.2&p=preview'>this one</a>).
|
---|
95 | This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImagePlugin.
|
---|
96 | The next and previous arrows suit the linear sequence documents, while the table of contents
|
---|
97 | suits the hierarchically organised document. Ordinarily, a Greenstone collection
|
---|
98 | would have one plugin per document type, and all documents of that type get
|
---|
99 | the same processing. In this case, we want to treat the XML-based item files
|
---|
100 | differently from the text-based item files. We can achieve this by adding two
|
---|
101 | PagedImagePlugin plugins to the collection, and configuring them differently.
|
---|
102 | <p><tt>plugin PagedImagePlugin -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$<br/>
|
---|
103 | plugin PagedImagePlugin -screenview -minimumsize 100 -documenttype paged </tt>
|
---|
104 |
|
---|
105 | <p>XML based newpapers have been grouped into a folder called <tt>xml</tt>.
|
---|
106 | This enables us to process these files differently, by utilising the
|
---|
107 | <tt>process_exp</tt> option which all plugins support. The first PagedImagePlugin
|
---|
108 | in the list looks for item files underneath the xml folder. These documents
|
---|
109 | will be processed as hierarchical documents. Item files that don't match the
|
---|
110 | process expression (i.e. aren't underneath the xml folder) will be passed onto
|
---|
111 | the second PagedImagePlugin, and these are treated as paged documents.
|
---|
112 |
|
---|
113 | <p><b>Formatting</b>
|
---|
114 | <p>We have modified the document formatting to display fullsized images,
|
---|
115 | preview images or text, with buttons to switch between them. This involves
|
---|
116 | modifications to the DocumentHeading and DocumentText format statements in the
|
---|
117 | <a href='_httpcollection_/etc/collect.cfg'>collection configuration file</a>,
|
---|
118 | and some macro definitions in the <a href='_httpcollection_/macros/extra.dm'>extra.dm macro file</a>.
|
---|
119 | The extra.dm macro file provides definitions for the buttons (\_viewfullsize\_,
|
---|
120 | \_viewpreview\_, \_viewtext\_) which are used by the format statement in the
|
---|
121 | collect.cfg file. The format statement switches the document display and sets
|
---|
122 | the buttons to be displayed based on the p argument, which is also set by the
|
---|
123 | format statement.
|
---|
124 | "
|
---|
125 |
|
---|