1 | creator [email protected]
|
---|
2 | maintainer [email protected]
|
---|
3 | public true
|
---|
4 |
|
---|
5 | indexes section:text
|
---|
6 | defaultindex section:text
|
---|
7 |
|
---|
8 | plugin GreenstoneXMLPlugin
|
---|
9 | # We want the two types of paged documents to be treated differently: paged
|
---|
10 | # and hierarchical. So include two PagedImgPlug plugins and modify the process_exp.
|
---|
11 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$
|
---|
12 | plugin PagedImagePlugin -create_screenview true -minimumsize 100 -documenttype paged
|
---|
13 | plugin MetadataXMLPlugin
|
---|
14 | plugin ArchivesInfPlugin
|
---|
15 | plugin DirectoryPlugin
|
---|
16 |
|
---|
17 | classify AZCompactList -metadata Series -sort Date
|
---|
18 | classify DateList
|
---|
19 |
|
---|
20 | # Format statements to display Series, Volume, Number and Date information
|
---|
21 |
|
---|
22 | format DocumentVList "<td valign=top>[link][icon][/link]</td>
|
---|
23 | <td valign=top>{If}{[Series],[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]},[highlight]{Or}{[Title],[PageNum]}[/highlight]}</td>"
|
---|
24 |
|
---|
25 | format CL1VList "<td valign=top>[link][icon][/link]</td>
|
---|
26 | <td valign=top>{If}{[numleafdocs],[Title],{If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]} ([Date])}</td>"
|
---|
27 |
|
---|
28 | format SearchVList "<td valign=top>[link][icon][/link]</td>
|
---|
29 | <td valign=top>[parent(Top):Series] {If}{[parent(Top):Volume],Vol. [parent(Top):Volume]} {If}{[parent(Top):Number],No. [parent(Top):Number]} Page [Title]</td>"
|
---|
30 |
|
---|
31 | format DateList "<td valign=top>[link][icon][/link]</td>
|
---|
32 | <td valign=top>[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]}</td>"
|
---|
33 |
|
---|
34 | format HList "[link][highlight][ex.Title][/highlight][/link]"
|
---|
35 |
|
---|
36 | # We customise the document display, so use the extended options
|
---|
37 | format AllowExtendedOptions true
|
---|
38 |
|
---|
39 | # We want to add in fullsize/preview/text buttons to switch between the different versions of each page
|
---|
40 |
|
---|
41 | format DocumentHeading "<center><table width=_pagewidth_>
|
---|
42 | <tr valign=top><td>{Or}{[parent(Top):Series],[Series]}</td></tr>
|
---|
43 | <tr valign=top><td><table><tr><td>
|
---|
44 | [DocumentButtonDetach][DocumentButtonHighlight]
|
---|
45 | {If}{_cgiargp_ eq 'fullsize',{If}{[screenicon],_document:viewpreview_}
|
---|
46 | {If}{[Text] ne \'This document has no text. \',_document:viewtext_},
|
---|
47 | {If}{_cgiargp_ eq 'preview',{If}{[srcicon],_document:viewfullsize_}
|
---|
48 | {If}{[Text] ne \'This document has no text. \',_document:viewtext_},
|
---|
49 | {If}{[srcicon],_document:viewfullsize_}
|
---|
50 | {If}{[screenicon],_document:viewpreview_}}}
|
---|
51 | </td></tr></table></td>
|
---|
52 | <td>[DocTOC]</td></tr></table></center>"
|
---|
53 |
|
---|
54 | # Document text display changes based on the p argument - this is not used
|
---|
55 | #normally for document display, so we can use it here to switch between
|
---|
56 | #fullsize/preview/text versions.
|
---|
57 | format DocumentText "<center><table width=_pagewidth_><tr><td>
|
---|
58 | {If}{_cgiargp_ eq 'fullsize',[srcicon],
|
---|
59 | {If}{_cgiargp_ eq 'preview',[screenicon],{If}{[Text] ne \'This document has no text. \',[Text]}}}
|
---|
60 | </td></tr></table></center>"
|
---|
61 |
|
---|
62 |
|
---|
63 | # -- English strings --------------------
|
---|
64 | collectionmeta collectionname [l=en] "Paged Image example"
|
---|
65 | collectionmeta .section:text [l=en] "newspaper pages"
|
---|
66 |
|
---|
67 | # -- English text -----------------------
|
---|
68 |
|
---|
69 | collectionmeta collectionextra [l=en] "This collection contains a few newspapers from the
|
---|
70 | <a href='http://www.nzdl.org/cgi-bin/library?a=p&p=about&c=niupepa'>
|
---|
71 | Niupepa</a> collection of Maori newspapers.
|
---|
72 |
|
---|
73 | <h3>How the collection works</h3>
|
---|
74 | <p>Each newspaper issue consists of a set of images, one per page, and a set
|
---|
75 | of text files for the OCR'd text. An item file links the set of pages into a
|
---|
76 | single newspaper document. PagedImgPlug is used to process the item files.
|
---|
77 | <p>There are two styles of item files, and this collection demonstrates both.
|
---|
78 | The first uses a text based format, and consists of a list of metadata for the
|
---|
79 | document, and a list of pages. Here are some examples:
|
---|
80 | <a href='_httpcollection_/import/09/09\_1\_1.item'>Te Waka o Te Iwi, Vol. 1, No. 1</a>,
|
---|
81 | <a href='_httpcollection_/import/10/10\_1\_3.item'>Te Whetu o Te Tau, Vol. 1, No. 3</a>.
|
---|
82 | This format allows specification of document level metadata, and a single list of pages.
|
---|
83 | <p>The second style is an extended format, and uses XML. It allows a hierarchy
|
---|
84 | of pages, and metadata specification at the page level as well as at the
|
---|
85 | document level. An example is <a href='_httpcollection_/import/xml/23/23\_\_2.item'>Matariki 1881, No. 2</a>.
|
---|
86 | This newspaper also has an abstract associated with it. The contents have been
|
---|
87 | grouped into two sections: Supplementary Material, which contains the Abstract,
|
---|
88 | and Newspaper Pages, which contains the page images.
|
---|
89 | <p>Paged documents can be presented with a hierarchical table of contents
|
---|
90 | (e.g. <a href='?a=d&c=pagedimg&d=HASHecd552ed3c2d5f1f6a620f.2.2&p=text'>this one</a>),
|
---|
91 | or with next and previous page arrows, and a goto page box
|
---|
92 | (e.g. <a href='?a=d&c=pagedimg&d=HASH01f4f2a92e501cdfa5d243bb.2&p=preview'>this one</a>).
|
---|
93 | This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImgPlug.
|
---|
94 | The next and previous arrows suit the linear sequence documents, while the table of contents
|
---|
95 | suits the hierarchically organised document. Ordinarily, a Greenstone collection
|
---|
96 | would have one plugin per document type, and all documents of that type get
|
---|
97 | the same processing. In this case, we want to treat the XML-based item files
|
---|
98 | differently from the text-based item files. We can achieve this by adding two
|
---|
99 | PagedImgPlug plugins to the collection, and configuring them differently.
|
---|
100 | <p><tt>plugin PagedImgPlug -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$<br/>
|
---|
101 | plugin PagedImgPlug -screenview -minimumsize 100 -documenttype paged </tt>
|
---|
102 |
|
---|
103 | <p>XML based newpapers have been grouped into a folder called <tt>xml</tt>.
|
---|
104 | This enables us to process these files differently, by utilising the
|
---|
105 | <tt>process_exp</tt> option which all plugins support. The first PagedImgPlug
|
---|
106 | in the list looks for item files underneath the xml folder. These documents
|
---|
107 | will be processed as hierarchical documents. Item files that don't match the
|
---|
108 | process expression (i.e. aren't underneath the xml folder) will be passed onto
|
---|
109 | the second PagedImgPlug, and these are treated as paged documents.
|
---|
110 | <p>Note that GLI will not let you add two of the same plugin (apart from
|
---|
111 | UnknownPlug), so this must be added to the collect.cfg file manually. The
|
---|
112 | collection must not be open in GLI while you are doing this.
|
---|
113 | <p><b>Formatting</b>
|
---|
114 | <p>We have modified the document formatting to display fullsized images,
|
---|
115 | preview images or text, with buttons to switch between them. This involves
|
---|
116 | modifications to the DocumentHeading and DocumentText format statements in the
|
---|
117 | <a href='_httpcollection_/etc/collect.cfg'>collection configuration file</a>,
|
---|
118 | and some macro definitions in the <a href='_httpcollection_/macros/extra.dm'>extra.dm macro file</a>.
|
---|
119 | The extra.dm macro file provides definitions for the buttons (\_viewfullsize\_,
|
---|
120 | \_viewpreview\_, \_viewtext\_) which are used by the format statement in the
|
---|
121 | collect.cfg file. The format statement switches the document display and sets
|
---|
122 | the buttons to be displayed based on the p argument, which is also set by the
|
---|
123 | format statement.
|
---|
124 | "
|
---|
125 |
|
---|