Changeset 30877


Ignore:
Timestamp:
2016-10-17T19:22:16+13:00 (8 years ago)
Author:
ak19
Message:

Dr Bainbridge added an intro and conclusion, and the skeleton of another sub-exercise, yet to be filled in.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r30875 r30877  
    53055305<Version initial="2.71" current="2.86"/>
    53065306<Content>
    5307 <Text id="ic-00">Intro</Text>
     5307<Text id="ic-00">To allow you to quickly try out and experiment with our tutorial exercises, we tend to keep the number of sample files small. Every time you rebuild these collections, for simplicity, the default settings used in Greenstone means that the previous version built is removed in its entirety. We refer to this as a <i>full-rebuilding</i>. When building larger collections, this is inefficient.</Text>
     5308<Text id="ic-00a">Greenstone also has the ability to rebuild collections <i>incrementally</i>: this means the previous version of the collection is retained and only the changes detected need to be incorporated. There are, however, quite a few aspects to incremental building to control. This is the focus of this tutorial exercise.</Text>
     5309<Text id="ic-00b">To gain the best level of understanding, this tutorial builds collections using the command line.</Text>
    53085310<NumberedItem>
    53095311<Text id="ic-01">In GLI, create a new collection called <i>Incremental With Manifests</i> and base it on the <i>Demo Collection</i>. The short name of this collection will become <i>incremen</i>, and this will be the name of the collection's folder on the file system.</Text>
     
    53395341</NumberedItem>
    53405342<Heading>
    5341 <Text id="ic-06">Incrementally rebuilding your collection after adding some additional new documents to it</Text>
     5343<Text id="ic-06">Incrementally adding some additional new documents to a collection</Text>
    53425344</Heading>
    53435345<NumberedItem>
     
    53785380</NumberedItem>
    53795381<Heading>
    5380 <Text id="ic-10">Incrementally rebuilding your collection after deleting some documents from it</Text>
     5382<Text id="ic-10">Incrementally deleting some documents from a collection</Text>
    53815383</Heading>
    53825384<NumberedItem>
     
    54105412</NumberedItem>
    54115413<Heading>
    5412 <Text id="ic-14">Incrementally rebuilding your collection after editing a document's text and modifying document metadata</Text>
     5414<Text id="ic-14">Editing a document's text and metadata, and then incrementally rebuilding the collection</Text>
    54135415</Heading>
    54145416<NumberedItem>
     
    54475449<Text id="ic-19">Preview the collection once more. Check that the 2 documents contain your edits: try searching for any additional words you added. Also check the dc.Title metadata that you had modified can now be searched and appears as the title for the b20cre document in the Titles browsing classifier.</Text>
    54485450</NumberedItem>
    5449 <Text id="ic-20">&lt;CONCLUSION&gt;</Text>
    5450 <Text id="ic-20a">In this tutorial, we looked at cutting down the amount of time spent on rebuilding a collection by manually controlling the rebuild operation so that it processes only what has changed. We do so by means of a manifest that specifies exactly what files need to be rebuilt and how (whether they need to be Indexed, Deleted or Reindexed).</Text>
    5451 <Text id="ic-20b">&lt;Also mention how lucene provides incremental-buildcol too, whereas mg and mgpp only provide incremental-import.&gt;</Text>
    5452 <Text id="ic-20c">Note: There's no search highlighting in collection documents that were modified and then incrementally rebuilt.</Text>
     5451<Text id="ic-20">In this tutorial, we looked at cutting down the amount of time spent on rebuilding a collection by manually controlling the rebuild operation so that it processes only what has changed. We do so by means of a manifest that specifies exactly what files need to be rebuilt and how (whether they need to be Indexed, Deleted or Reindexed). Greenstone also has an automatic incremental rebuild feature, sparing you the need to specify a manifest file in the import phase. Omitting the manifest argument in the above exercises activates this behaviour, however, this is typically slower, because Greenstone now needs to scan the entire <Format>import</Format> folder and compare this with the information in the <Format>archives</Format> folder to determine what has changed.</Text>
     5452<Heading><Text id="ic-21">Incrementally indexing automatically</Text></Heading>
     5453<Text id="ic-22">Just as there is the command full-rebuild.pl, to completely build a collection from scratch, there is also the command incremental-rebuild.pl. The exercise you have just completed could equally have been achieved by running:</Text>
     5454<Format>perl -S incremental-rebuild.pl -site localsite THECOLL</Format>
     5455<Text id="ic-23">For every collection, the import phase can be run incrementally (either using a manifest file or automatically), however, the ability for the buildcol phase to be incremental depends on the indexer in use. Lucene and Solr indexers support incremental indexing, but MG and MGPP do not. A warning is issued if you attempt to run the buildcol phase incrementally when the chosen indexer does not support this.</Text>
    54535456</Content>
    54545457</Tutorial>
Note: See TracChangeset for help on using the changeset viewer.