- 2016-10-17T19:22:16+13:00 (7 years ago)
- 1 edited
r30875 r30877 5305 5305 <Version initial="2.71" current="2.86"/> 5306 5306 <Content> 5307 <Text id="ic-00">Intro</Text> 5307 <Text id="ic-00">To allow you to quickly try out and experiment with our tutorial exercises, we tend to keep the number of sample files small. Every time you rebuild these collections, for simplicity, the default settings used in Greenstone means that the previous version built is removed in its entirety. We refer to this as a <i>full-rebuilding</i>. When building larger collections, this is inefficient.</Text> 5308 <Text id="ic-00a">Greenstone also has the ability to rebuild collections <i>incrementally</i>: this means the previous version of the collection is retained and only the changes detected need to be incorporated. There are, however, quite a few aspects to incremental building to control. This is the focus of this tutorial exercise.</Text> 5309 <Text id="ic-00b">To gain the best level of understanding, this tutorial builds collections using the command line.</Text> 5308 5310 <NumberedItem> 5309 5311 <Text id="ic-01">In GLI, create a new collection called <i>Incremental With Manifests</i> and base it on the <i>Demo Collection</i>. The short name of this collection will become <i>incremen</i>, and this will be the name of the collection's folder on the file system.</Text> … … 5339 5341 </NumberedItem> 5340 5342 <Heading> 5341 <Text id="ic-06">Incrementally rebuilding your collection after adding some additional new documents to it</Text> 5343 <Text id="ic-06">Incrementally </Text> 5342 5344 </Heading> 5343 5345 <NumberedItem> … … 5378 5380 </NumberedItem> 5379 5381 <Heading> 5380 <Text id="ic-10">Incrementally rebuilding your collection after deleting some documents from it</Text> 5382 <Text id="ic-10">Incrementally </Text> 5381 5383 </Heading> 5382 5384 <NumberedItem> … … 5410 5412 </NumberedItem> 5411 5413 <Heading> 5412 <Text id="ic-14"> Incrementally rebuilding your collection after editing a document's text and modifying document metadata</Text> 5414 <Text id="ic-14"></Text> 5413 5415 </Heading> 5414 5416 <NumberedItem> … … 5447 5449 <Text id="ic-19">Preview the collection once more. Check that the 2 documents contain your edits: try searching for any additional words you added. Also check the dc.Title metadata that you had modified can now be searched and appears as the title for the b20cre document in the Titles browsing classifier.</Text> 5448 5450 </NumberedItem> 5449 <Text id="ic-20"><CONCLUSION></Text> 5450 <Text id="ic-20a">In this tutorial, we looked at cutting down the amount of time spent on rebuilding a collection by manually controlling the rebuild operation so that it processes only what has changed. We do so by means of a manifest that specifies exactly what files need to be rebuilt and how (whether they need to be Indexed, Deleted or Reindexed).</Text> 5451 <Text id="ic-20b"><Also mention how lucene provides incremental-buildcol too, whereas mg and mgpp only provide incremental-import.></Text> 5452 <Text id="ic-20c">Note: There's no search highlighting in collection documents that were modified and then incrementally rebuilt.</Text> 5451 <Text id="ic-20">In this tutorial, we looked at cutting down the amount of time spent on rebuilding a collection by manually controlling the rebuild operation so that it processes only what has changed. We do so by means of a manifest that specifies exactly what files need to be rebuilt and how (whether they need to be Indexed, Deleted or Reindexed). Greenstone also has an automatic incremental rebuild feature, sparing you the need to specify a manifest file in the import phase. Omitting the manifest argument in the above exercises activates this behaviour, however, this is typically slower, because Greenstone now needs to scan the entire <Format>import</Format> folder and compare this with the information in the <Format>archives</Format> folder to determine what has changed.</Text> 5452 <Heading><Text id="ic-21">Incrementally indexing automatically</Text></Heading> 5453 <Text id="ic-22">Just as there is the command full-rebuild.pl, to completely build a collection from scratch, there is also the command incremental-rebuild.pl. The exercise you have just completed could equally have been achieved by running:</Text> 5454 <Format>perl -S incremental-rebuild.pl -site localsite THECOLL</Format> 5455 <Text id="ic-23">For every collection, the import phase can be run incrementally (either using a manifest file or automatically), however, the ability for the buildcol phase to be incremental depends on the indexer in use. Lucene and Solr indexers support incremental indexing, but MG and MGPP do not. A warning is issued if you attempt to run the buildcol phase incrementally when the chosen indexer does not support this.</Text> 5453 5456 </Content> 5454 5457 </Tutorial>
Note: See TracChangeset for help on using the changeset viewer.