Changeset 30880

Show
Ignore:
Timestamp:
18.10.2016 17:22:51 (3 years ago)
Author:
ak19
Message:

Some more additions to the new tutorial on incr building.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r30877 r30880  
    53495349</NumberedItem> 
    53505350<NumberedItem> 
    5351 <Text id="ic-07">We want to build just the newly added documents into the collection if possible, instead of rebuilding everything. Return to the terminal you had left open. This time, instead of running <Format>full-rebuild</Format>, we'll run the <Format>incremental-import</Format> and <Format>incremental-buildcol</Format> scripts to perform the two phases of a Greenstone build operation incrementally. Incremental building allows us to (re)build just what is necessary, rather than everything.</Text> 
     5351<Text id="ic-07">We want to build just the newly added documents into the collection if possible, instead of rebuilding everything. Return to the terminal you had left open. This time, instead of running <Format>full-rebuild</Format>, we'll run the <Format>incremental-import</Format> and <Format>incremental-buildcol</Format> scripts to perform the two phases of a Greenstone build operation incrementally, these being the <i>import</i> and <i>buildcol</i> phases. Incremental building allows us to (re)build just what is necessary, rather than everything.</Text> 
    53525352<Text id="ic-07a">Since we know exactly which files have been added and thus which files need to be built, we can write a manifest file specifying this. The manifest files used by the Greenstone incremental building process are just XML files that can be created and edited in a plain text editor, and which indicate which files need to be (re)processed by a Greenstone incremental build operation.</Text> 
    53535353<Text id="ic-07b">We've already prepared the manifest files we'll be using in this tutorial exercise for you. Use a File Browser to copy the <i>manifests</i> subfolder from the sample files folder into your <Format>incremen</Format> collection folder that's located inside your Greenstone 3 installation directory (at <Format>web\sites\localsite\collect\incremen</Format>).</Text> 
     
    54505450</NumberedItem> 
    54515451<Text id="ic-20">In this tutorial, we looked at cutting down the amount of time spent on rebuilding a collection by manually controlling the rebuild operation so that it processes only what has changed. We do so by means of a manifest that specifies exactly what files need to be rebuilt and how (whether they need to be Indexed, Deleted or Reindexed). Greenstone also has an automatic incremental rebuild feature, sparing you the need to specify a manifest file in the import phase. Omitting the manifest argument in the above exercises activates this behaviour, however, this is typically slower, because Greenstone now needs to scan the entire <Format>import</Format> folder and compare this with the information in the <Format>archives</Format> folder to determine what has changed.</Text> 
     5452<Text id="ic-21">Now repeat all the above exercises in the same sequence once again, but with a new collection called <i>autoincr</i> also based on the <i>Demo</i> collection. But this time, don't pass in the manifest file as an argument to the <Format>import.pl</Format> script. After each incremental build, preview your autoincr collection to check that the Browsing classifiers contain the expected documents and that searching returns the expected results.</Text> 
    54525453<Heading><Text id="ic-21">Incrementally indexing automatically</Text></Heading> 
    5453 <Text id="ic-22">Just as there is the command full-rebuild.pl, to completely build a collection from scratch, there is also the command incremental-rebuild.pl. The exercise you have just completed could equally have been achieved by running:</Text> 
    5454 <Format>perl -S incremental-rebuild.pl -site localsite THECOLL</Format> 
     5454<Text id="ic-22">Just as there is the command <Format>full-rebuild.pl</Format> to completely build a collection from scratch, there is also the command <Format>incremental-rebuild.pl</Format>. The exercise you have just completed could equally have been achieved by running:</Text> 
     5455<Format>perl -S incremental-rebuild.pl -site localsite autoincr</Format> 
    54555456<Text id="ic-23">For every collection, the import phase can be run incrementally (either using a manifest file or automatically), however, the ability for the buildcol phase to be incremental depends on the indexer in use. Lucene and Solr indexers support incremental indexing, but MG and MGPP do not. A warning is issued if you attempt to run the buildcol phase incrementally when the chosen indexer does not support this.</Text> 
    54565457</Content>