source: documented-example-collections/trunk/pagedimg-e/etc/collect.cfg@ 18738

Last change on this file since 18738 was 18738, checked in by oranfry, 15 years ago

the rest of the documented example collections

  • Property svn:executable set to *
File size: 6.9 KB
Line 
1creator [email protected]
2maintainer [email protected]
3public true
4
5indexes section:text
6defaultindex section:text
7
8plugin GAPlug
9# We want the two types of paged documents to be treated differently: paged
10# and hierarchical. So include two PagedImgPlug plugins and modify the process_exp.
11plugin PagedImgPlug -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$
12plugin PagedImgPlug -screenview -minimumsize 100 -documenttype paged
13plugin ArcPlug
14plugin RecPlug -use_metadata_files
15
16classify AZCompactList -metadata Series -sort Date
17classify DateList
18
19# Format statements to display Series, Volume, Number and Date information
20
21format DocumentVList "<td valign=top>[link][icon][/link]</td>
22<td valign=top>{If}{[Series],[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]},[highlight]{Or}{[Title],[PageNum]}[/highlight]}</td>"
23
24format CL1VList "<td valign=top>[link][icon][/link]</td>
25<td valign=top>{If}{[numleafdocs],[Title],{If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]} ([Date])}</td>"
26
27format SearchVList "<td valign=top>[link][icon][/link]</td>
28<td valign=top>[parent(Top):Series] {If}{[parent(Top):Volume],Vol. [parent(Top):Volume]} {If}{[parent(Top):Number],No. [parent(Top):Number]} Page [Title]</td>"
29
30format DateList "<td valign=top>[link][icon][/link]</td>
31<td valign=top>[Series] {If}{[Volume],Vol. [Volume]} {If}{[Number],No. [Number]}</td>"
32
33format HList "[link][highlight][ex.Title][/highlight][/link]"
34
35# We customise the document display, so use the extended options
36format AllowExtendedOptions true
37
38# We want to add in fullsize/preview/text buttons to switch between the different versions of each page
39
40format DocumentHeading "<center><table width=_pagewidth_>
41<tr valign=top><td>{Or}{[parent(Top):Series],[Series]}</td></tr>
42<tr valign=top><td><table><tr><td>
43[DocumentButtonDetach][DocumentButtonHighlight]
44{If}{_cgiargp_ eq 'fullsize',{If}{[screenicon],_document:viewpreview_}
45{If}{[Text] ne \'This document has no text. \',_document:viewtext_},
46{If}{_cgiargp_ eq 'preview',{If}{[srcicon],_document:viewfullsize_}
47{If}{[Text] ne \'This document has no text. \',_document:viewtext_},
48{If}{[srcicon],_document:viewfullsize_}
49{If}{[screenicon],_document:viewpreview_}}}
50</td></tr></table></td>
51<td>[DocTOC]</td></tr></table></center>"
52
53# Document text display changes based on the p argument - this is not used
54#normally for document display, so we can use it here to switch between
55#fullsize/preview/text versions.
56format DocumentText "<center><table width=_pagewidth_><tr><td>
57{If}{_cgiargp_ eq 'fullsize',[srcicon],
58{If}{_cgiargp_ eq 'preview',[screenicon],{If}{[Text] ne \'This document has no text. \',[Text]}}}
59</td></tr></table></center>"
60
61collectionmeta iconcollection [l=en] "_httpprefix_/collect/pagedimg-e/images/en/pagedimg-e.gif"
62
63
64# -- English strings --------------------
65collectionmeta collectionname [l=en] "Paged Image example"
66collectionmeta .section:text [l=en] "newspaper pages"
67
68# -- English text -----------------------
69
70collectionmeta collectionextra [l=en] "This collection contains a few newspapers from the
71<a href='http://www.nzdl.org/cgi-bin/library?a=p&amp;p=about&amp;c=niupepa'>
72Niupepa</a> collection of Maori newspapers.
73
74<h3>How the collection works</h3>
75<p>Each newspaper issue consists of a set of images, one per page, and a set
76of text files for the OCR'd text. An item file links the set of pages into a
77single newspaper document. PagedImgPlug is used to process the item files.
78<p>There are two styles of item files, and this collection demonstrates both.
79The first uses a text based format, and consists of a list of metadata for the
80document, and a list of pages. Here are some examples:
81<a href='_httpcollection_/import/09/09\_1\_1.item'>Te Waka o Te Iwi, Vol. 1, No. 1</a>,
82<a href='_httpcollection_/import/10/10\_1\_3.item'>Te Whetu o Te Tau, Vol. 1, No. 3</a>.
83This format allows specification of document level metadata, and a single list of pages.
84<p>The second style is an extended format, and uses XML. It allows a hierarchy
85of pages, and metadata specification at the page level as well as at the
86document level. An example is <a href='_httpcollection_/import/xml/23/23\_\_2.item'>Matariki 1881, No. 2</a>.
87This newspaper also has an abstract associated with it. The contents have been
88grouped into two sections: Supplementary Material, which contains the Abstract,
89 and Newspaper Pages, which contains the page images.
90<p>Paged documents can be presented with a hierarchical table of contents
91(e.g. <a href='?a=d&amp;c=pagedimg&amp;d=HASHecd552ed3c2d5f1f6a620f.2.2&p=text'>this one</a>),
92or with next and previous page arrows, and a goto page box
93(e.g. <a href='?a=d&amp;c=pagedimg&amp;d=HASH01f4f2a92e501cdfa5d243bb.2&p=preview'>this one</a>).
94This is specified by the <tt>-documenttype (hierarchy|paged)</tt> option to PagedImgPlug.
95The next and previous arrows suit the linear sequence documents, while the table of contents
96suits the hierarchically organised document. Ordinarily, a Greenstone collection
97 would have one plugin per document type, and all documents of that type get
98the same processing. In this case, we want to treat the XML-based item files
99differently from the text-based item files. We can achieve this by adding two
100PagedImgPlug plugins to the collection, and configuring them differently.
101<p><tt>plugin PagedImgPlug -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*\.item$<br/>
102plugin PagedImgPlug -screenview -minimumsize 100 -documenttype paged </tt>
103
104<p>XML based newpapers have been grouped into a folder called <tt>xml</tt>.
105This enables us to process these files differently, by utilising the
106<tt>process_exp</tt> option which all plugins support. The first PagedImgPlug
107in the list looks for item files underneath the xml folder. These documents
108will be processed as hierarchical documents. Item files that don't match the
109process expression (i.e. aren't underneath the xml folder) will be passed onto
110the second PagedImgPlug, and these are treated as paged documents.
111<p>Note that GLI will not let you add two of the same plugin (apart from
112UnknownPlug), so this must be added to the collect.cfg file manually. The
113collection must not be open in GLI while you are doing this.
114<p><b>Formatting</b>
115<p>We have modified the document formatting to display fullsized images,
116preview images or text, with buttons to switch between them. This involves
117modifications to the DocumentHeading and DocumentText format statements in the
118<a href='_httpcollection_/etc/collect.cfg'>collection configuration file</a>,
119and some macro definitions in the <a href='_httpcollection_/macros/extra.dm'>extra.dm macro file</a>.
120The extra.dm macro file provides definitions for the buttons (\_viewfullsize\_,
121 \_viewpreview\_, \_viewtext\_) which are used by the format statement in the
122collect.cfg file. The format statement switches the document display and sets
123the buttons to be displayed based on the p argument, which is also set by the
124format statement.
125"
126
Note: See TracBrowser for help on using the repository browser.