Changeset 29717

Show
Ignore:
Timestamp:
05.02.2015 16:37:37 (5 years ago)
Author:
ak19
Message:

Updating after change to wvware config file wvHtml.xml to convert justified text in input word docs to justified text in output HTML, instead of being output as left-aligned html. Still need to check whether nightly tests on this collection succeed before regenerating the other affected model collections

Location:
other-projects/nightly-tasks/diffcol/trunk/model-collect/Associated-Files
Files:
9 modified

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Associated-Files/archives/HASH0a87.dir/doc.xml

    r29015 r29717  
    99    <Metadata name="GENERATOR">wvWare/wvWare version 1.2.4</Metadata> 
    1010    <Metadata name="Title">Greenstone: A Comprehensive Open-Source</Metadata> 
    11     <Metadata name="URL">http://research/ak19/gs2-svn-22Aug2013/collect/Associated-Files/tmp/1398925830/greenstone01.html</Metadata> 
    12     <Metadata name="UTF8URL">http://research/ak19/gs2-svn-22Aug2013/collect/Associated-Files/tmp/1398925830/greenstone01.html</Metadata> 
     11    <Metadata name="URL">http://Scratch/ak19/gs2-svn-22Oct2014/collect/Associated-Files/tmp/1423106799/greenstone01.html</Metadata> 
     12    <Metadata name="UTF8URL">http://Scratch/ak19/gs2-svn-22Oct2014/collect/Associated-Files/tmp/1423106799/greenstone01.html</Metadata> 
    1313    <Metadata name="gsdlsourcefilename">import/greenstone01.doc</Metadata> 
    14     <Metadata name="gsdlconvertedfilename">tmp/1398925830/greenstone01.html</Metadata> 
     14    <Metadata name="gsdlconvertedfilename">tmp/1423106799/greenstone01.html</Metadata> 
    1515    <Metadata name="OrigSource">greenstone01.html</Metadata> 
    1616    <Metadata name="Source">greenstone01.doc</Metadata> 
     
    3434    <Metadata name="equivlink"> &lt;a href=&quot;_httpprefix_/collect/[collection]/index/assoc/{Or}{[parent(Top):assocfilepath],[assocfilepath]}/greenstone01.pdf&quot;&gt;{If}{_iconpdf_,_iconpdf_,pdf}&lt;/a&gt;</Metadata> 
    3535    <Metadata name="Identifier">HASH0a87f402e5d107f0d73a2a</Metadata> 
    36     <Metadata name="lastmodified">1398925758</Metadata> 
    37     <Metadata name="lastmodifieddate">20140501</Metadata> 
    38     <Metadata name="oailastmodified">1398925831</Metadata> 
    39     <Metadata name="oailastmodifieddate">20140501</Metadata> 
     36    <Metadata name="lastmodified">1423106757</Metadata> 
     37    <Metadata name="lastmodifieddate">20150205</Metadata> 
     38    <Metadata name="oailastmodified">1423106799</Metadata> 
     39    <Metadata name="oailastmodifieddate">20150205</Metadata> 
    4040    <Metadata name="assocfilepath">HASH0a87.dir</Metadata> 
    4141    <Metadata name="gsdlassocfile">greenstone010.png:image/png:</Metadata> 
     
    136136&lt;/table&gt; 
    137137 
    138 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    139  
    140 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     138&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     139 
     140&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    141141&lt;b&gt;&lt;/b&gt; 
    142142&lt;/p&gt;&lt;/div&gt; 
     
    144144 
    145145 
    146 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    147  
    148 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     146&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     147 
     148&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    149149&lt;b&gt;&lt;/b&gt; 
    150150&lt;/p&gt;&lt;/div&gt; 
     
    164164 
    165165 
    166 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    167  
    168 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     166&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     167 
     168&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    169169This paper describes the Greenstone digital library software, a comprehensive, open-source system for the construction and presentation of information collections. Collections built with Greenstone offer effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. Moreover, they are easily maintainable and can be augmented and rebuilt entirely automatically. The system is extensible: software &amp;ldquo;plugins&amp;rdquo; accommodate different document and metadata types. 
    170170&lt;/p&gt;&lt;/div&gt; 
     
    180180 
    181181 
    182 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    183  
    184 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     182&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     183 
     184&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    185185Notwithstanding intense research activity in the digital library field during the second half of the 1990s, comprehensive software systems for creating digital libraries are not widely available. In fact, the usual solution when creating a digital library is also the most obvious&amp;mdash;just put it on the Web. But consider how much effort is involved in constructing a Web site for a digital library. To be effective it needs to be visually attractive and ergonomically easy to use, incorporate convenient and powerful searching capabilities, and offer rich and natural browsing facilities. Above all it must be easy to maintain and augment, which presents a significant challenge if any manual organization is involved.  
    186186&lt;/p&gt;&lt;/div&gt; 
     
    188188 
    189189 
    190 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    191  
    192 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     190&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     191 
     192&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    193193The alternative is to automate these activities through software tools. But the broad scope of digital library requirements makes this a daunting prospect. Ideally the software should incorporate facilities ranging from multilingual information retrieval to distributed computing protocols, from interoperability to search engine technology, from metadata standards to multiformat document parsing, from multimedia to multiple operating systems, from Web browsers to plug-and-play DVDs.  
    194194&lt;/p&gt;&lt;/div&gt; 
     
    196196 
    197197 
    198 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    199  
    200 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     198&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     199 
     200&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    201201The Greenstone Digital Library Software from the New Zealand Digital Library (NZDL) project tackles this issue by providing a new way of organizing information and making it available over the Internet. A &lt;i&gt;collection&lt;/i&gt; of information comprises several (typically several thousand, or several million) &lt;i&gt;documents&lt;/i&gt;, and a uniform interface is provided to all documents in a collection. A library may include many different collections, each organized differently&amp;mdash;though there is a strong family resemblance in how collections are presented. 
    202202&lt;/p&gt;&lt;/div&gt; 
     
    204204 
    205205 
    206 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    207  
    208 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     206&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     207 
     208&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    209209Making information available using this system is far more than &amp;ldquo;just putting it on the Web.&amp;rdquo; The collection becomes maintainable, searchable, and browsable. Each collection, prior to presentation, undergoes a &amp;ldquo;building&amp;rdquo; process that, once established, is completely automatic. This process creates all the structures that are used at run-time for accessing the collection. Searching is based on various indexes, while browsing is based on various metadata; support structures for both are created during the building operation. When new material appears it can be fully incorporated into the collection by rebuilding. 
    210210&lt;/p&gt;&lt;/div&gt; 
     
    212212 
    213213 
    214 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    215  
    216 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     214&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     215 
     216&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    217217To address the exceptionally broad demands of digital libraries, the system is public and extensible. It is issued under the Gnu public license and, in the spirit of open-source software, users are invited to contribute modifications and enhancements. Only through an international cooperative effort will digital library software become sufficiently comprehensive to meet the world's needs. Currently the Greenstone software is used at sites in Canada, Germany, New Zealand, Romania, UK, and the US, and collections range from newspaper articles to technical documents, from educational journals to oral history, from visual art to folksongs. The software has been used for collections in many different languages, and for CD-ROMs that have been published by the United Nations and other humanitarian agencies in Belgium, France, Japan, and the US for distribution in developing countries (Humanity Libraries, 1998; PAHO, 1999; UNESCO, 1999; UNU, 1998). Further details can be obtained from &lt;i&gt;www.nzdl.org&lt;/i&gt;. 
    218218&lt;/p&gt;&lt;/div&gt; 
     
    236236 
    237237 
    238 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    239  
    240 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     238&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     239 
     240&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    241241This paper sets the scene with a brief discussion of what a digital library is. We then give an overview of the facilities offered by Greenstone and show how end users find information in collections. Next we describe the files and directories involved in a collection, and then discuss the processes of updating existing collections and creating new ones, including extending the software to provide new facilities. We conclude with an overview of related work. 
    242242&lt;/p&gt;&lt;/div&gt; 
     
    252252 
    253253 
    254 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    255  
    256 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     254&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     255 
     256&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    257257Ten definitions of the term &amp;ldquo;digital library&amp;rdquo; have been culled from the literature by Fox (1998), and their spirit is captured in the following brief characterization:  
    258258&lt;/p&gt;&lt;/div&gt; 
     
    268268 
    269269 
    270 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    271  
    272 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     270&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     271 
     272&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    273273(Akscyn and Witten, 1998). Lesk (1998) views digital libraries as &amp;ldquo;organized collections of digital information,&amp;rdquo; and wisely recommends that they articulate the principles governing what is included and how the collection is organized. 
    274274&lt;/p&gt;&lt;/div&gt; 
     
    276276 
    277277 
    278 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    279  
    280 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     278&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     279 
     280&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    281281Digital libraries are generally distinguished from the World-Wide Web, the essential difference being in selection and organization. But they are not generally distinguished from a web &lt;i&gt;site&lt;/i&gt;: indeed, virtually all extant digital libraries manifest themselves as a web site. Hence the obvious question: to make a digital library, why not just put the information on the Web?  
    282282&lt;/p&gt;&lt;/div&gt; 
     
    284284 
    285285 
    286 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    287  
    288 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     286&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     287 
     288&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    289289But we make a distinction between a digital library and a web site that lies at the heart of our software design: one should easily be able to add new material to a library without having to integrate it manually or edit its content in any way. Once added, new material should immediately become a first-class component of the library. And what permits it to be integrated into existing searching and browsing structures without any manual intervention is &lt;i&gt;metadata&lt;/i&gt;. This provides sufficient focus to the concept of &amp;ldquo;digital library&amp;rdquo; to support the development of a construction kit. 
    290290&lt;/p&gt;&lt;/div&gt; 
     
    300300 
    301301 
    302 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    303  
    304 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     302&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     303 
     304&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    305305Information collections built by Greenstone combine extensive full-text search facilities with browsing indexes based on different metadata types. There are several ways for users to find information, although they differ between collections depending on the metadata available and the collection design. Typically you can &lt;i&gt;search for particular words&lt;/i&gt; that appear in the text, or within a section of a document, or within a title or section heading. You can &lt;i&gt;browse documents by title&lt;/i&gt;: just click on the displayed book icon to read it. You can &lt;i&gt;browse documents by subject&lt;/i&gt;. Subjects are represented by bookshelves: just click on a shelf to see the books. Where appropriate, documents come complete with a table of contents (constructed automatically): you can click on a chapter or subsection to open it, expand the full table of contents, or expand the full document.  
    306306&lt;/p&gt;&lt;/div&gt; 
     
    308308 
    309309 
    310 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    311  
    312 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     310&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     311 
     312&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    313313An example of searching is shown in Figure 1 where documents in the Global Help Project's Humanity Development Library (HDL) are being searched for chapters matching the word &lt;i&gt;butterfly&lt;/i&gt;. In Figure 2 the same collection is being browsed by subject: by clicking on the bookshelf icons the user has discovered an item under Section 16, Animal Husbandry. Pursuing an interest in butterfly farming, the user selects a book by clicking on its book icon. In Figure 3 the front cover of the book is displayed as a graphic on the left, and the automatically constructed table of contents appears at the start of the document. The current focus, &lt;i&gt;Introduction and Summary&lt;/i&gt;, is shown in bold in the table of contents with its text starting further down the page. 
    314314&lt;/p&gt;&lt;/div&gt; 
     
    316316 
    317317 
    318 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    319  
    320 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     318&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     319 
     320&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    321321In accordance with Lesk's advice, a statement of purpose and coverage accompanies each collection, along with an explanation of how it is organized (Figure 1 shows the start of this). A distinction is made between &lt;i&gt;searching&lt;/i&gt; and &lt;i&gt;browsing&lt;/i&gt;. Searching is full-text, and&amp;mdash;depending on the collection's design&amp;mdash;the user can choose between indexes built from different parts of the documents, or from different metadata. Some collections have an index of full documents, an index of sections, an index of paragraphs, an index of titles, and an index of section headings, each of which can be searched for particular words or phrases. Browsing involves data structures created from metadata that the user can examine: lists of authors, lists of titles, lists of dates, hierarchical classification structures, and so on. Data structures for both browsing and searching are built according to instructions in a configuration file, which controls both building and serving the collection. Sample configuration files are discussed below. 
    322322&lt;/p&gt;&lt;/div&gt; 
     
    324324 
    325325 
    326 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    327  
    328 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     326&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     327 
     328&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    329329 
    330330&lt;/p&gt;&lt;/div&gt; 
     
    348348 
    349349 
    350 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    351  
    352 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     350&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     351 
     352&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    353353Rich browsing facilities can be provided by manually linking parts of documents together and building explicit indexes and tables of contents. However, manually-created linking becomes difficult to maintain, and often falls into disrepair when a collection expands. The Greenstone software takes a different tack: it facilitates &lt;i&gt;maintainability&lt;/i&gt; by creating all searching and browsing structures automatically from the documents themselves. No links are inserted by hand. This means that when new documents in the same format become available, they can be added automatically. Indeed, for some collections this is done by processes that wake up regularly, scout for new material, and rebuild the indexes&amp;mdash;all without manual intervention. 
    354354&lt;/p&gt;&lt;/div&gt; 
     
    356356 
    357357 
    358 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    359  
    360 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     358&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     359 
     360&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    361361Collections comprise many documents: thousands, tens of thousands, or even millions. Each document may be hierarchically organized into &lt;i&gt;sections&lt;/i&gt; (subsections, sub-subsections, and so on). Each section comprises one or more &lt;i&gt;paragraphs&lt;/i&gt;. Metadata such as author, title, date, keywords, and so on, may be associated with documents, or with individual sections of documents. This is the raw material for indexes. It must either be provided explicitly for each document and section (for example, in an accompanying spreadsheet) or be derivable automatically from the source documents. Metadata is converted to Dublin Core and stored with the document for internal use. 
    362362&lt;/p&gt;&lt;/div&gt; 
     
    364364 
    365365 
    366 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    367  
    368 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     366&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     367 
     368&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    369369In order to accommodate different kinds of source documents, the software is organized so that &amp;ldquo;plugins&amp;rdquo; can be written for new document types. Plugins exist for plain text documents, HTML documents, email documents, and bibliographic formats. Word documents are handled by saving them as HTML; PostScript ones by applying a preprocessor (Nevill-Manning &lt;i&gt;et al&lt;/i&gt;., 1998). Specially written plugins also exist for proprietary formats such as that used by the BBC archives department. A collection may have source documents in different forms: it is just a matter of specifying all the necessary plugins. In order to build browsing indexes from metadata, an analogous scheme of &amp;ldquo;classifiers&amp;rdquo; is used: classifiers create indexes of various kinds based on metadata. Source documents are brought into the Greenstone system through a process called &lt;i&gt;importing&lt;/i&gt;, which uses the plugins and classifiers specified in the collection configuration file.  
    370370&lt;/p&gt;&lt;/div&gt; 
     
    372372 
    373373 
    374 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    375  
    376 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     374&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     375 
     376&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    377377The international Unicode character set is used throughout, so documents&amp;mdash;and interfaces&amp;mdash;can be written in any language. Collections have so far been produced in English, French, Spanish, German, Maori, Chinese, and Arabic. The NZDL Web site provides numerous examples. Collections can contain text, pictures, and even audio and video clips; a text-only version of the interface is also provided to accommodate visually impaired users. Compression technology is used to ensure best use of storage (Witten &lt;i&gt;et al&lt;/i&gt;., 1999). Most non-textual material is either linked to textual documents or accompanied by textual descriptions (such as photo captions) to allow full-text searching and browsing. However, the architecture permits the implementation of plugins and classifiers even for non-textual data. 
    378378&lt;/p&gt;&lt;/div&gt; 
     
    380380 
    381381 
    382 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    383  
    384 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     382&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     383 
     384&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    385385The system includes an &amp;ldquo;administrative&amp;rdquo; function whereby specified users can examine the composition of all collections, protect documents so that they can only be accessed by registered users on presentation of a password, and so on. Logs of user activity are kept that record all queries made to every Greenstone collection (though this facility can be disabled). 
    386386&lt;/p&gt;&lt;/div&gt; 
     
    388388 
    389389 
    390 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    391  
    392 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     390&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     391 
     392&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    393393Although primarily designed for Internet access over the World-Wide Web, collections can be made available, in precisely the same form, on CD-ROM. In either case they are accessed through any Web browser. Greenstone CD-ROMs operate on a standalone PC under Windows 3.X, 95, 98, and NT, and the interaction is identical to accessing the collection on the Web&amp;mdash;except that response is faster and more predictable. The requirement to operate on early Windows systems is one that plagues the software design, but is crucial for many users&amp;mdash;particularly those in underdeveloped countries seeking access to humanitarian aid collections. If the PC is connected to a network (intranet or Internet), a custom-built Web server provided on each CD makes exactly the same information available to others through their standard Web browser. The use of compression ensures that the greatest possible volume of information can be packed on to a CD-ROM. 
    394394&lt;/p&gt;&lt;/div&gt; 
     
    396396 
    397397 
    398 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    399  
    400 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     398&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     399 
     400&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    401401The collection-serving software operates under Unix and Windows NT, and works with standard Web servers. A flexible process structure allows different collections to be served by different computers, yet be presented to the user in the same way, on the same Web page, as part of the same digital library, even as part of the same collection (McNab and Witten, 1998). Existing collections can be updated and new ones brought on-line at any time, without bringing the system down; the process responsible for the user interface will notice (through periodic polling) when new collections appear and add them to the list presented to the user.  
    402402&lt;/p&gt;&lt;/div&gt; 
     
    428428 
    429429 
    430 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    431  
    432 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     430&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     431 
     432&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    433433Greenstone digital library systems generally include several separate collections. A home page allows you to select a collection; in addition, each collection has its own &amp;ldquo;about&amp;rdquo; page that gives you information about how the collection is organized and the principles governing what is included. 
    434434&lt;/p&gt;&lt;/div&gt; 
     
    436436 
    437437 
    438 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    439  
    440 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     438&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     439 
     440&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    441441All icons in the screenshots of Figures 1-4 are clickable. Those icons at the top of the page return to the home page, provide help text, and allow you to set user interface and searching preferences. The navigation bar underneath gives access to the searching and browsing facilities, which differ from one collection to another.  
    442442&lt;/p&gt;&lt;/div&gt; 
     
    444444 
    445445 
    446 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    447  
    448 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     446&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     447 
     448&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    449449Each of the five buttons provides a different way to find information. You can &lt;i&gt;search for particular words&lt;/i&gt; that appear in the text from the &amp;ldquo;search&amp;rdquo; page (or from the &amp;ldquo;about&amp;rdquo; page of Figure 1). This collection contains indexes of chapters, section titles, and entire books. The default search interface is a simple one, suitable for casual users; advanced searching&amp;mdash;which allows full Boolean expressions, phrase searching, case and stemming control&amp;mdash;can be enabled from the &lt;i&gt;Preferences&lt;/i&gt; page. 
    450450&lt;/p&gt;&lt;/div&gt; 
     
    452452 
    453453 
    454 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    455  
    456 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     454&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     455 
     456&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    457457This collection has four browsable metadata indexes. You can &lt;i&gt;access publications by subject&lt;/i&gt; by clicking the &lt;i&gt;subjects&lt;/i&gt; button, which brings up a list of subjects, represented by bookshelves (Figure 2). You can &lt;i&gt;access publications by title&lt;/i&gt; by clicking &lt;i&gt;titles a-z&lt;/i&gt; (Figure 4), which brings up a list of books in alphabetic order. You can &lt;i&gt;access publications by organization&lt;/i&gt; (i.e. Dublin Core &amp;ldquo;publisher&amp;rdquo;), bringing up a list of organizations. You can &lt;i&gt;access publications by &amp;ldquo;how to&amp;rdquo; listing&lt;/i&gt;, yielding a list of hints defined by the collection's editors. We use the Dublin Core as a base and extend it in an &lt;i&gt;ad hoc&lt;/i&gt; manner to accommodate the individual requirements of collection designers. 
    458458&lt;/p&gt;&lt;/div&gt; 
     
    468468 
    469469 
    470 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    471  
    472 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     470&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     471 
     472&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    473473When a new collection is created or material is added to an existing one, the original source documents are first brought into the system through a process known as &amp;ldquo;importing.&amp;rdquo; This involves converting documents into a simple HTML-like format known as GML (for &amp;ldquo;Greenstone Markup Language&amp;rdquo;), which includes any metadata associated with the document. Documents are assumed to be in the Unicode UTF-8 code (of which the ASCII characters form a subset).  
    474474&lt;/p&gt;&lt;/div&gt; 
     
    484484 
    485485 
    486 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    487  
    488 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     486&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     487 
     488&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    489489There is a separate directory for each collection, which contains five subdirectories: the original raw material (&lt;i&gt;import&lt;/i&gt;), the GML files created from this (&lt;i&gt;archives&lt;/i&gt;), the final collection as it is served to users (&lt;i&gt;index&lt;/i&gt;), a directory for use during the building process (&lt;i&gt;building&lt;/i&gt;), and one for any supporting files (&lt;i&gt;etc&lt;/i&gt;)&amp;mdash;including the configuration file that controls the collection creation procedure. Additional files might be required: for example, building a hierarchy of classifications requires a data file of sub-classifications. 
    490490&lt;/p&gt;&lt;/div&gt; 
     
    500500 
    501501 
    502 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    503  
    504 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     502&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     503 
     504&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    505505In order to identify documents internally, a unique object identifier or OID is assigned to each original source document when it is imported (formed by hashing the content, to overcome file duplication effects caused by mirroring) and stored as metadata within that document. It is important that OIDs persist throughout the index-building process&amp;mdash;so that a user's search history is unaffected by rebuilding the collection. OIDs are assigned by hashing the contents of the original source document. 
    506506&lt;/p&gt;&lt;/div&gt; 
     
    508508 
    509509 
    510 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    511  
    512 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     510&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     511 
     512&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    513513Once imported, each document is stored in its own subdirectory of &lt;i&gt;archives&lt;/i&gt;, along with any associated files&amp;mdash;for example, images. To ensure compatibility with Windows 3.0, only eight characters are used in directory and file names, which causes annoying but essentially trivial complications. 
    514514&lt;/p&gt;&lt;/div&gt; 
     
    524524 
    525525 
    526 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    527  
    528 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     526&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     527 
     528&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    529529The GML format imposes a limited amount of structure on documents. Documents are divided into paragraphs. They can be split hierarchically into sections and subsections. OIDs are extended to identify these components by appending numbers, separated by periods, to a document's OID. When a book is read, its section hierarchy is visible as the table of contents (Figure 3). Chapters, sections, subsections, and pages are all implemented simply as &amp;ldquo;sections&amp;rdquo; within the document. In some collections documents do not have a hierarchical subsection structure, but are split into pages to permit browsing within a retrieved document. 
    530530&lt;/p&gt;&lt;/div&gt; 
     
    532532 
    533533 
    534 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    535  
    536 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     534&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     535 
     536&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    537537The document structure is used for searchable indexes. There are three levels of index: &lt;i&gt;documents&lt;/i&gt;, &lt;i&gt;sections&lt;/i&gt;, and &lt;i&gt;paragraphs&lt;/i&gt;, corresponding to the distinctions that GML makes&amp;mdash;the hierarchical structure is flattened for the purposes of creating these indexes. Indexes can be of text, or metadata, or any combination. Thus you can create a searchable index of section titles, and/or authors, and/or document descriptions, as well as the document text. 
    538538&lt;/p&gt;&lt;/div&gt; 
     
    564564 
    565565 
    566 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    567  
    568 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     566&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     567 
     568&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    569569Updating an existing collection with new files in the same format is easy. For example, the raw material for the HDL is supplied in the form of HTML files marked up with &amp;lt;&amp;lt;TOC&amp;gt;&amp;gt; tags to split books into sections and subsections, and &amp;lt;&amp;lt;I&amp;gt;&amp;gt; tags to indicate where an image is to be inserted. For each book in the library there is a directory that contains a single HTML file representing the book, and separate files containing the associated images. An accompanying spreadsheet file contains the classification hierarchy; this is converted to a simple file format (using Excel's &lt;i&gt;Save As&lt;/i&gt; command). 
    570570&lt;/p&gt;&lt;/div&gt; 
     
    572572 
    573573 
    574 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    575  
    576 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     574&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     575 
     576&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    577577Since the collection exists, its directory is already set up with subdirectories &lt;i&gt;import&lt;/i&gt;, &lt;i&gt;archives&lt;/i&gt;, &lt;i&gt;building&lt;/i&gt;, &lt;i&gt;index&lt;/i&gt;, and &lt;i&gt;etc&lt;/i&gt;, and the &lt;i&gt;etc&lt;/i&gt; directory will contain a suitable collection configuration file.  
    578578&lt;/p&gt;&lt;/div&gt; 
     
    588588 
    589589 
    590 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    591  
    592 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     590&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     591 
     592&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    593593To update a collection, the new raw material is placed in the &lt;i&gt;import&lt;/i&gt; directory, in whatever form it is available. Then the &lt;i&gt;import&lt;/i&gt; process is invoked, which converts the files into GML using the specified plugins. Old material for which GML files have previously been created is not re-imported. Then the &lt;i&gt;build&lt;/i&gt; process is invoked to build the requisite indexes for the collection. Finally, the contents of the &lt;i&gt;building&lt;/i&gt; directory are moved into the &lt;i&gt;index&lt;/i&gt; directory, and the new version of the collection automatically becomes live. 
    594594&lt;/p&gt;&lt;/div&gt; 
     
    596596 
    597597 
    598 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    599  
    600 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     598&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     599 
     600&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    601601This procedure may seem cumbersome. But all the steps are necessary for efficient operation with large collections. The &lt;i&gt;import&lt;/i&gt; process could be performed on the fly during the building operation&amp;mdash;but because building indexes is a multipass operation, the often lengthy importing would be repeated several times. The &lt;i&gt;build&lt;/i&gt; process can take considerable time&amp;mdash;a day or two, for very large collections. Consequently, the results are placed in the &lt;i&gt;building&lt;/i&gt; directory so that, if the collection already exists, it will continue to be served to users in its old form throughout the building operation. 
    602602&lt;/p&gt;&lt;/div&gt; 
     
    604604 
    605605 
    606 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    607  
    608 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     606&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     607 
     608&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    609609Active users of the collection will not be disturbed when the new version becomes live&amp;mdash;they will probably not even notice. The persistent OIDs ensure that interactions remain coherent&amp;mdash;users who are examining the results of a query or browse operation will still retrieve the expected documents&amp;mdash;and if a search is actually in progress when the change takes place the program detects the resulting file-structure inconsistency and automatically and transparently re-executes the query, this time on the new version of the collection.  
    610610&lt;/p&gt;&lt;/div&gt; 
     
    620620 
    621621 
    622 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    623  
    624 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     622&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     623 
     624&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    625625The original material in the &lt;i&gt;import&lt;/i&gt; directory may be in any format, and plugins are required to process each format type. The plugins that a collection uses must be specified in the collection configuration file. The &lt;i&gt;import&lt;/i&gt; program reads the list of plugins and passes each document to each plugin in order until it finds one that can process it. When updating an existing collection, all plugins necessary to process new material should already have been specified in the configuration file.  
    626626&lt;/p&gt;&lt;/div&gt; 
     
    628628 
    629629 
    630 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    631  
    632 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     630&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     631 
     632&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    633633The building step creates the indexes for both searching and browsing. The MG software is generally used to do the searching (Witten &lt;i&gt;et al.&lt;/i&gt;, 1999), and the &lt;i&gt;mgbuild&lt;/i&gt; module is automatically invoked to create each of the indexes that is required. For example, the Humanity Development Library has three indexes, one for entire books, one for chapters, and one for section titles. Subdirectories of the &lt;i&gt;index&lt;/i&gt; directory are created for each of these indexes. 
    634634&lt;/p&gt;&lt;/div&gt; 
     
    16381638 
    16391639 
    1640 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1641  
    1642 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1640&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1641 
     1642&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16431643MG also compresses the text of the collection; and the image files are linked into the &lt;i&gt;index&lt;/i&gt; subdirectory. Now none of the material in the &lt;i&gt;import&lt;/i&gt; and &lt;i&gt;archives&lt;/i&gt; directories is needed to run the collection and can be removed from the file system (though they would be needed if the collection were rebuilt). 
    16441644&lt;/p&gt;&lt;/div&gt; 
     
    16461646 
    16471647 
    1648 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1649  
    1650 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1648&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1649 
     1650&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16511651Associated with each collection is a database stored in GDBM (Gnu database manager) format. This contains an entry for each document, giving its OID, its internal MG document number, and metadata such as title. Information for each of the browsing indexes, which appear as buttons on the Greenstone search/browse bar, is also extracted during the building process and stored in the database. A &amp;ldquo;classifier&amp;rdquo; program is required for each browsing index to extract the appropriate information from GML documents. Like plugins, classifiers are written on an &lt;i&gt;ad hoc&lt;/i&gt; basis for the particular information required, and where possible reused from one collection to another. 
    16521652&lt;/p&gt;&lt;/div&gt; 
     
    16541654 
    16551655 
    1656 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1657  
    1658 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1656&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1657 
     1658&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16591659The building program creates the indexes based on whatever appears in the &lt;i&gt;archives&lt;/i&gt; directory. The first plugin specified by all collections is one that processes GML files, and so if &lt;i&gt;archives&lt;/i&gt; contains imported files they will be processed correctly. If it contains material in the original format, that will be converted using the appropriate plugin. Thus the import process is optional. 
    16601660&lt;/p&gt;&lt;/div&gt; 
     
    16621662 
    16631663 
    1664 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1665  
    1666 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1664&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1665 
     1666&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16671667GML is designed to be fast and easy to parse, an important requirement when millions of documents are to be processed. Something as simple as requiring tags to be lower-case, for example, yields a substantial speed-up. In certain circumstances, however, it might be preferable to use a standardized format such as XML. This is straightforward to implement_just write an XML plugin_although we have not done so ourselves. Given the transitory nature of the imported data, to date, we have found GML a satisfactory and beneficial format. 
    16681668&lt;/p&gt;&lt;/div&gt; 
     
    16781678 
    16791679 
    1680 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1681  
    1682 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1680&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1681 
     1682&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16831683Building new collections from scratch is only slightly different from updating an existing collection. The key new requirement is creating a collection configuration file, and a software utility is provided to help. Two pieces of information are required for this: the name of the directory that the collection will use (into which the source data and other files will eventually be placed), and a contact e-mail address for use if any problems are encountered by the software once the collection is up and running. The utility creates files and directories within the newly-named directory to support a generic collection of plain text documents. With suitable data placed in the &lt;i&gt;import&lt;/i&gt; directory, building the collection at this point will yield a document-level searchable index of all the text and a browsable list of &amp;ldquo;titles&amp;rdquo; (defined in this case to be the document filenames). 
    16841684&lt;/p&gt;&lt;/div&gt; 
     
    16861686 
    16871687 
    1688 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1689  
    1690 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1688&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1689 
     1690&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    16911691To enhance the functionality and presentation&amp;mdash; something anything but the most trivial collection will require&amp;mdash;the configuration file must be edited. For a collection sourced from documents in an already supported data format, presented in a similar fashion to an existing collection, the amount of editing is minimal. Importing new data formats and browsing metadata in ways not currently supported are more complex activities that require programming skills. 
    16921692&lt;/p&gt;&lt;/div&gt; 
     
    17181718 
    17191719 
    1720 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1721  
    1722 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1720&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1721 
     1722&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17231723Figure 5b shows simple alterations to the generic configuration file in Figure 5a that was generated by the new-collection utility. &lt;i&gt;TEXTPlug&lt;/i&gt; is replaced with &lt;i&gt;EMAILPlug&lt;/i&gt; (line 7) which reads email files and extracts metadata (&lt;i&gt;From&lt;/i&gt;, &lt;i&gt;To&lt;/i&gt;, &lt;i&gt;Date&lt;/i&gt;, &lt;i&gt;Subject&lt;/i&gt;) from them. A classifier for dates is added (line 10) to make the collection browsable chronologically. The default presentation of search results is overridden (line 17) to display both the title of the message (i.e. Dublin Core &lt;i&gt;Title&lt;/i&gt;) and its sender (i.e. Dublin Core &lt;i&gt;Author&lt;/i&gt;). Elements in square brackets, such as &lt;i&gt;[Title]&lt;/i&gt;, are replaced by the metadata associated with a particular document. The built-in term &lt;i&gt;[icon]&lt;/i&gt; produces a suitable image that represents the document (such as a book icon or page icon), and the &lt;i&gt;[link]&amp;hellip;[/link]&lt;/i&gt; construct forms a hyperlink to the complete document. Anything else in the format statement, which in this case is solely table-cell tags in HTML, is passed through to the page being displayed. 
    17241724&lt;/p&gt;&lt;/div&gt; 
     
    17261726 
    17271727 
    1728 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1729  
    1730 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1728&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1729 
     1730&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17311731As this example shows, creating a new collection that stays within the bounds of the library's established capabilities falls within the capability of many computer users&amp;mdash;for instance, computer-trained librarians. Extending Greenstone to handle new document formats and browse metadata in new ways is more challenging. 
    17321732&lt;/p&gt;&lt;/div&gt; 
     
    17421742 
    17431743 
    1744 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1745  
    1746 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1744&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1745 
     1746&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17471747Extensibility  is  obtained through  plugins  and  classifiers. 
    17481748&lt;/p&gt;&lt;/div&gt; 
     
    17501750 
    17511751 
    1752 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1753  
    1754 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1752&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1753 
     1754&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17551755These are modules of code that can be slotted into the system to enhance its capabilities. Plugins parse documents, extracting the text and metadata to be indexed. Classifiers control how metadata is brought together to form browsable data structures. Both are specified in an object-oriented framework using inheritance to minimize the amount of code written. 
    17561756&lt;/p&gt;&lt;/div&gt; 
     
    17581758 
    17591759 
    1760 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1761  
    1762 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1760&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1761 
     1762&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17631763A plugin must specify three things: what file formats it can handle, how they should be parsed, and whether the plugin is recursive. File formats are normally determined using regular expression matching on the filename. For example, the HTML plugin accepts all files that end in &lt;i&gt;.htm&lt;/i&gt;, .&lt;i&gt;html&lt;/i&gt;, &lt;i&gt;.HTM&lt;/i&gt;, or &lt;i&gt;.HTML&lt;/i&gt;. (It is quite possible, however, to write plugins that &amp;ldquo;look inside&amp;rdquo; the file as well.) For other files, the plugin returns &lt;i&gt;undefined&lt;/i&gt; and the file is passed to the next plugin in the collection's configuration file (e.g. Figure 5 line 7). If it can, the plugin parses the file and returns the number of documents processed. This involves extracting text and metadata and adding it to the library's content through calls to &lt;i&gt;add text&lt;/i&gt; and &lt;i&gt;add metadata&lt;/i&gt;.  
    17641764&lt;/p&gt;&lt;/div&gt; 
     
    17661766 
    17671767 
    1768 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1769  
    1770 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1768&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1769 
     1770&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17711771Some plugins (&amp;ldquo;recursive&amp;rdquo; ones) add extra files into the stream of data processed during the building phase by artificially reactivating the list of plugins. This is how directory hierarchies are traversed.  
    17721772&lt;/p&gt;&lt;/div&gt; 
     
    17741774 
    17751775 
    1776 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1777  
    1778 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1776&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1777 
     1778&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17791779Plugins are small modules of code that are easy to write. We monitored the time it took to develop a new one that was different to any we had produced so far. We chose to make as an example a collection of HTML bookmark files, the motivation being to produce a convenient way of searching and browsing one's bookmarked Web pages. Figure 6 shows a user searching for bookmarked pages about &lt;i&gt;music&lt;/i&gt;. The new plugin took under an hour to write, and was 160 lines long (ignoring blank lines and comments)&amp;mdash;about the average length of existing plugins. 
    17801780&lt;/p&gt;&lt;/div&gt; 
     
    17821782 
    17831783 
    1784 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1785  
    1786 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1784&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1785 
     1786&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17871787Classifiers are more general than plugins because they work on GML-format data. For example, any plugin that generates date metadata in accordance with the Dublin core can request the collection to be browsable chronologically by specifying the &lt;i&gt;DateList&lt;/i&gt; classifier in the collection's configuration file (Figure 7). Classifiers are more elaborate than most plugins, but new ones are seldom required. The average length of existing classifiers is 230 lines.  
    17881788&lt;/p&gt;&lt;/div&gt; 
     
    17901790 
    17911791 
    1792 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1793  
    1794 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1792&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1793 
     1794&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    17951795Classifiers must specify three things: an initialization routine, how individual documents are classified, and the final browsable data structure. Initialization takes care of any options specified in the configuration file (such as &lt;i&gt;metadata=Title &lt;/i&gt;on line 9 of Figure 5b). Classifying individual documents is an iterative process: for each one, a call to &lt;i&gt;document-classify&lt;/i&gt; is made. On presentation of the document's OID, the necessary metadata is located and used to control where the document is added to the browsable data structure being constructed.  
    17961796&lt;/p&gt;&lt;/div&gt; 
     
    17981798 
    17991799 
    1800 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1801  
    1802 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1800&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1801 
     1802&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    18031803Once all documents have been added, a request is made for the completed data structure. Some classifiers return the data structure directly; others transform the data structure before it is returned. For example, the &lt;i&gt;AZList&lt;/i&gt; classifier divides the alphabetically sorted list of metadata into separate pages of about the same size and returns the alphabetic ranges for each one (Figure 4). 
    18041804&lt;/p&gt;&lt;/div&gt; 
     
    18301830 
    18311831 
    1832 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1833  
    1834 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1832&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1833 
     1834&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18351835Two projects that provide substantial open source digital library software are Dienst (Lagoze and Fielding, 1998) and Harvest (Bowman &lt;i&gt;et al.&lt;/i&gt;, 1994). The origins of Dienst (&lt;i&gt;www.cs.cornell.edu/cdlrg&lt;/i&gt;) stretch back to 1992. The term has come to represent three entities: a conceptual architecture for distributed digital libraries; an open protocol for service communication; and a software system that implements the protocol. To date, five sample digital libraries have been built using this technology. They manifest themselves in two forms: technical reports and primary source documents. 
    18361836&lt;/p&gt;&lt;/div&gt; 
     
    18381838 
    18391839 
    1840 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1841  
    1842 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1840&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1841 
     1842&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18431843Best known is NCSTRL, the Networked Computer Science Technical Reference Library project (&lt;i&gt;www.ncstrl.org&lt;/i&gt;). This collection facilitates searching by title, author and abstract, and browsing by year and author, across a distributed network of document repositories. Documents can (where supported) be delivered in various formats such as PostScript, a thumbnail overview of the pages, and a GIF image of a particular page. 
    18441844&lt;/p&gt;&lt;/div&gt; 
     
    18461846 
    18471847 
    1848 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1849  
    1850 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1848&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1849 
     1850&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18511851The &lt;i&gt;Making of America&lt;/i&gt; resource is an example of a collection based around primary sources_in this case American social history, 1830−1900. It has a different &amp;ldquo;look and feel&amp;rdquo; to NCSTRL, being strongly oriented toward browsing rather than searching. A user navigates their way through a hierarchical structure of hyperlinks to reach a book of interest. The book itself is a series of scanned images: delivery options include going directly to a page number, next and previous page buttons, and displaying a particular page at different resolutions. A text version of the page is also available upon which a searching option is also provided. 
    18521852&lt;/p&gt;&lt;/div&gt; 
     
    18541854 
    18551855 
    1856 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1857  
    1858 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1856&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1857 
     1858&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18591859Started in 1994, Harvest is also a long-running research project. It provides an efficient means of gathering source data from the Internet and distributing indexing information over the Internet. This is accomplished through five components: &lt;i&gt;gatherer&lt;/i&gt;, &lt;i&gt;broker&lt;/i&gt;, &lt;i&gt;indexer&lt;/i&gt;, &lt;i&gt;replicator&lt;/i&gt; and &lt;i&gt;cache&lt;/i&gt;. The first three are central to creating, updating and searching a collection; the last two help to improve performance over the Internet through transparent mirroring and caching techniques. 
    18601860&lt;/p&gt;&lt;/div&gt; 
     
    18621862 
    18631863 
    1864 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1865  
    1866 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1864&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1865 
     1866&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18671867The system is configurable and customizable. While searching is most commonly implemented using Glimpse (&lt;i&gt;glimpse.cs.arizona.edu&lt;/i&gt;), in principle any search engine that supports incremental updates and Boolean combinations of attribute-based queries can be used. It is possible to control what type of documents are gathered during creation and updating, and how the query interface looks and is laid out. 
    18681868&lt;/p&gt;&lt;/div&gt; 
     
    18701870 
    18711871 
    1872 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1873  
    1874 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1872&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1873 
     1874&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18751875Sample collections cited by the developers include 21,000 computer science technical reports and 7,000 home pages. Other examples include a sizable collection of agriculture-related electronic journals and magazines called &amp;ldquo;tomato-juice&amp;rdquo; (accessed through &lt;i&gt;hegel.lib.ncsu.edu&lt;/i&gt;) and a full-text index of library-related electronic serials (&lt;i&gt;sunsite.berkeley.edu/IndexMorganagus&lt;/i&gt;). Harvest is also often used to index Web sites (for example &lt;i&gt;www.middlebury.edu&lt;/i&gt;). 
    18761876&lt;/p&gt;&lt;/div&gt; 
     
    18781878 
    18791879 
    1880 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1881  
    1882 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1880&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1881 
     1882&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18831883Comparing Greenstone with Dienst and Harvest, there are both similarities and differences. All provide substantial digital library systems, hence common themes recur, but they are driven by projects with different aims. Harvest, for instance, was not conceived as a digital library project at all, but by virtue of its selective document gathering process it can be classed (and is used) as one. While it provides sophisticated search options, it lacks the complementary service of browsing. Furthermore it adds no structure or order to the documents collected, relying on whatever structures are present in the site that they were gathered from. A proven strength of the design is its flexibility through configuration and customization_an element also present in Greenstone. 
    18841884&lt;/p&gt;&lt;/div&gt; 
     
    18861886 
    18871887 
    1888 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1889  
    1890 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1888&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1889 
     1890&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18911891Dienst_best exemplified through the NCSTRL work_supports searching and browsing, like Greenstone. Both use open protocols. Differences include a high reliance in Dienst on user-supplied information when a document is added, and a smaller range of document types supported&amp;mdash;although Dienst does include a document model that should, over time, allow this to expand with relative ease. 
    18921892&lt;/p&gt;&lt;/div&gt; 
     
    18941894 
    18951895 
    1896 &lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1897  
    1898 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1896&lt;p&gt;&lt;div name=&quot;Plain Text&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.24mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1897 
     1898&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    18991899There are also commercial systems that provide similar digital library services to those described. However, since corporate culture instills proprietary attitudes there is little opportunity for advancement through a shared collaborative effort. Consequently they are not reviewed here. 
    19001900&lt;/p&gt;&lt;/div&gt; 
     
    19101910 
    19111911 
    1912 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1913  
    1914 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1912&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1913 
     1914&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    19151915Greenstone is a comprehensive software system for creating digital library collections. It builds data structures for searching and browsing from the material provided, rather than relying on any hand-crafting. The process is controlled by a configuration file, and once a collection exists new material can be added completely automatically. Browsing is based on Dublin Core metadata. 
    19161916&lt;/p&gt;&lt;/div&gt; 
     
    19181918 
    19191919 
    1920 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1921  
    1922 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1920&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1921 
     1922&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    19231923New collections can be developed easily, particularly if they resemble existing ones. Extensibility is achieved through software &amp;ldquo;plugins&amp;rdquo; that can be written to accommodate documents, and metadata, in different formats. Standard plugins exist for many document types; new ones are easily written. Browsing is controlled by &amp;ldquo;classifiers&amp;rdquo; that process metadata into browsing structures (by date, alphabetical, hierarchical, etc). 
    19241924&lt;/p&gt;&lt;/div&gt; 
     
    19261926 
    19271927 
    1928 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1929  
    1930 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1928&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1929 
     1930&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    19311931However, the most powerful support for extensibility is achieved not by technical means but by making the source code freely available under the Gnu public license. Only through an international cooperative effort will digital library software become sufficiently comprehensive to meet the world's needs with the richness and flexibility that users deserve. 
    19321932&lt;/p&gt;&lt;/div&gt; 
     
    19421942 
    19431943 
    1944 &lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;left&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1945  
    1946 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
     1944&lt;p&gt;&lt;div name=&quot;paragraph&quot; align=&quot;justify&quot; style=&quot;margin: 2.08mm 0.00mm 0.00mm 0.00mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1945 
     1946&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 3.819444mm; color: Black; background-color: White; &quot;&gt; 
    19471947We gratefully acknowledge all those who have worked on the Greenstone software, and all members of the New Zealand Digital Library project for their enthusiasm and ideas. 
    19481948&lt;/p&gt;&lt;/div&gt; 
     
    19591959 
    19601960&lt;ol type=&quot;1&quot;&gt; 
    1961 &lt;li value=&quot;1&quot;&gt;&lt;p&gt;&lt;div name=&quot;References&quot; align=&quot;left&quot; style=&quot;margin: 1.04mm 0.00mm 0.00mm 6.25mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1962  
    1963 &lt;p style=&quot;text-indent: -6.25mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1961&lt;li value=&quot;1&quot;&gt;&lt;p&gt;&lt;div name=&quot;References&quot; align=&quot;justify&quot; style=&quot;margin: 1.04mm 0.00mm 0.00mm 6.25mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1962 
     1963&lt;p style=&quot;text-indent: -6.25mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    19641964Akscyn, R.M. and Witten, I.H. (1998) &amp;ldquo;Report on First Summit on International Cooperation on Digital Libraries.&amp;rdquo; ks.com/idla-wp-oct98. 
    19651965&lt;/p&gt;&lt;/div&gt;&lt;/li&gt; 
     
    19671967 
    19681968 
    1969 &lt;li value=&quot;2&quot;&gt;&lt;p&gt;&lt;div name=&quot;References&quot; align=&quot;left&quot; style=&quot;margin: 1.04mm 0.00mm 0.00mm 6.25mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    1970  
    1971 &lt;p style=&quot;text-indent: -6.25mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     1969&lt;li value=&quot;2&quot;&gt;&lt;p&gt;&lt;div name=&quot;References&quot; align=&quot;justify&quot; style=&quot;margin: 1.04mm 0.00mm 0.00mm 6.25mm;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     1970 
     1971&lt;p style=&quot;text-indent: -6.25mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    19721972Bowman, C.M., Danzig, P.B., Manber, U., and Schwartz, M.F. &amp;ldquo;Scalable Internet resource discovery: Research problems and approaches&amp;rdquo; &lt;i&gt;Communications of the ACM,&lt;/i&gt; Vol. 37, No. 8, pp. 98−107, 1994. 
    19731973&lt;/p&gt;&lt;/div&gt;&lt;/li&gt; 
     
    21192119 
    21202120 
    2121 &lt;p&gt;&lt;div name=&quot;Header&quot; align=&quot;left&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    2122  
    2123 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     2121&lt;p&gt;&lt;div name=&quot;Header&quot; align=&quot;justify&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     2122 
     2123&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    21242124&lt;span style=&quot;text-transform:lowercase&quot;&gt;&lt;/span&gt; 
    21252125&lt;/p&gt;&lt;/div&gt; 
     
    21352135 
    21362136 
    2137 &lt;p&gt;&lt;div name=&quot;Footer&quot; align=&quot;left&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
    2138  
    2139 &lt;p style=&quot;text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
     2137&lt;p&gt;&lt;div name=&quot;Footer&quot; align=&quot;justify&quot; style=&quot;  padding: 0.00mm 0.00mm 0.00mm 0.00mm; &quot;&gt;  
     2138 
     2139&lt;p style=&quot;text-indent: 0.00mm; text-align: justify; line-height: 4.166667mm; color: Black; background-color: White; &quot;&gt; 
    21402140&lt;span style=&quot;text-transform:lowercase&quot;&gt;&lt;/span&gt; 
    21412141&lt;/p&gt;&lt;/div&gt; 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Associated-Files/index/build.cfg

    r29015 r29717  
    1 builddate   1398925833 
     1builddate   1423106800 
    22buildtype   mgpp 
    3 earliestdatestamp   1398925830 
     3earliestdatestamp   1423106799 
    44indexfieldmap   text->TX    dc.Title,ex.dc.Title,Title->TI 
    55indexfields text    dc.Title,ex.dc.Title,Title 
     
    1010levelmap    document->Doc 
    1111maxnumeric  4 
    12 numbytes    112313 
     12numbytes    112757 
    1313numdocs 1 
    1414numsections 1