Changeset 36364


Ignore:
Timestamp:
2022-08-09T13:34:47+12:00 (8 weeks ago)
Author:
anupama
Message:

In DEC GS2 collection manifest-demo-e, when a delete manifest file specifies the OIDs of files to delete, one does not need to manually delete the matching documents from import. I've just tested this out with GS3 (Incremental Building tutorial) on Linux, and the same is true: when a manifest that specifies the OIDs to delete is provided, the matching documents do not need to be deleted from the import folder for the incremental rebuilding to get rid of the specified documents from archives and index. The OIDtype for the collection can furthermore also remain at dirname (instead of changing it to full_filname) for the incremental delete with manifests to work just fine. So this commit updates the incremental building tutorial to not instruct the user to do unnecessary steps.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r35532 r36364  
    55005500<Text id="ic-03">Do not build the collection in GLI. We'll be building and rebuilding manually, from the command-line terminal. So close GLI once the files and folders have finished copying into your collection. You can choose to run the Greenstone server at any stage, however.</Text>
    55015501</NumberedItem>
     5502<!--
    55025503<NumberedItem>
    55035504<Text id="ic-04">In a text editor, open your <Format>incremen</Format> collection's <Format><MajorVersion number="2">collect.cfg</MajorVersion><MajorVersion number="3">collectionConfig.xml</MajorVersion></Format> file located in <Format><MajorVersion number="3">web\sites\localsite\</MajorVersion>collect\incremen\etc</Format>.</Text>
     
    55145515</MajorVersion>
    55155516</NumberedItem>
     5517-->
    55165518<NumberedItem>
    55175519<Text id="ic-05">Since this is the first time we're building our collection, we're going to do a complete build. And we'll use the command line to do so. Open a terminal. To open a terminal in Windows, press Ctrl+r and type <Format>cmd</Format> in the <b>Run</b> dialog that displays. To open a terminal on a Mac machine, click on menu <Path>Go &rarr; Utilities &rarr; Terminal</Path>. Use the terminal to <Format>cd</Format> into your Greenstone installation folder. For instance, if you have your Greenstone installed on Windows as "<i>Greenstone</i>" within your account folder at <Format>C:\Users\me</Format>, then type the following in your terminal and hit Enter:</Text>
     
    55935595&lt;Manifest&gt;<br />
    55945596  <Tab n="1"/>&lt;Delete&gt;<br />
    5595     <Tab n="2"/>&lt;OID&gt;b18ase-b18ase_htm&lt;/OID&gt;<br />
     5597    <!--<Tab n="2"/>&lt;OID&gt;b18ase-b18ase_htm&lt;/OID&gt;<br />
    55965598    <Tab n="2"/>&lt;OID&gt;fb33fe-fb33fe_htm&lt;/OID&gt;<br />
     5599    -->
     5600    <Tab n="2"/>&lt;OID&gt;b18ase&lt;/OID&gt;<br />
     5601    <Tab n="2"/>&lt;OID&gt;fb33fe&lt;/OID&gt;<br />
    55975602  <Tab n="1"/>&lt;/Delete&gt;<br />
    55985603&lt;/Manifest&gt;
     
    56005605<Text id="ic-10b">As per the above manifest file, the operation to be performed by an incremental build is a &lt;Delete&gt; operation on two documents. For the delete operation, the documents are not indicated by the &lt;Filename&gt; XML element, but by the &lt;OID&gt; element which specifies the object identifier. We need to use the OID here because we're telling Greenstone precisely what the identifiers of the documents are that we wish to have removed from our collection. The identifiers of every built document in a Greenstone collection are specified in the Identifier field of the document's <i>doc.xml</i> file located in the collection's <Format>archives</Format> folder. The <i>doc.xml</i> file is the Greenstone-specific XML format in which Greenstone stores documents already imported.</Text>
    56015606<Text id="ic-10c">For instance, to find the identifier of the <i>b18ase.htm</i> document in your built collection, open up <Format><MajorVersion number="3">web\sites\localsite\</MajorVersion>collect\incremen\archives\b18ase-b.dir\doc.xml</Format> in a text editor. Then scroll down, looking for a piece of Greenstone extracted metadata labelled <i>Identifier</i>, which is the OID for this document:</Text>
    5602 <Format>&lt;Metadata name=&quot;Identifier&quot;&gt;b18ase-b18ase_htm&lt;/Metadata&gt;</Format>
     5607<!--<Format>&lt;Metadata name=&quot;Identifier&quot;&gt;b18ase-b18ase_htm&lt;/Metadata&gt;</Format>-->
     5608<Format>&lt;Metadata name=&quot;Identifier&quot;&gt;b18ase&lt;/Metadata&gt;</Format>
    56035609<Text id="ic-10d">The above value for the document identifier is what's used in the <i>delete-some-files.xml</i> manifest file to refer to this document. This document is one of two that are to be deleted as per the manifest file. Make sure to close the <i>doc.xml</i> file if you have it open.</Text>
    56045610</NumberedItem>
     5611<!--
    56055612<NumberedItem>
    56065613<Text id="ic-11">So then, let's first physically remove these two documents from our collection, so that the contents of the <Format>import</Format> folder match what the manifest specifies: use a file browser to remove the folders <i>b18ase</i> and <i>fb33fe</i> from the collection's <Format>import</Format> folder.</Text>
    56075614</NumberedItem>
     5615-->
    56085616<NumberedItem>
    56095617<Text id="ic-12">Finally, let's incrementally rebuild the collection, specifying the manifest file that Greenstone should use this time to carry out the incremental build operation. As before, there are two steps.</Text>
     
    56685676</Comment>
    56695677<NumberedItem>
    5670   <Text id="ic-21">Now repeat all the above exercises in the same sequence once again, but with a new collection called <i>autoincr</i> also based on the <i>Demo</i> collection. Remember to once again set <Format>&lt;importOption name="OIDtype" value="full_filename"/&gt;</Format> in the collectionConfig.xml file and to make <Format>document</Format> level for searching the <Format>default</Format>. And build the collection the first time around with <Format>perl -S full-rebuild.pl -site localsite autoincr</Format>, also largely as before. However, this time <i>don't</i> pass in any manifest file as an argument to the subsequent rebuild commands which use the <Format>incremental-import.pl</Format> script. So you'd be running these commands after each change:</Text>
     5678  <Text id="ic-21">Now repeat all the above exercises in the same sequence once again, but with a new collection called <i>autoincr</i> also based on the <i>Demo</i> collection. Remember <!--to once again set <Format>&lt;importOption name="OIDtype" value="full_filename"/&gt;</Format> in the collectionConfig.xml file and--> to make <Format>document</Format> level for searching the <Format>default</Format>. And build the collection the first time around with <Format>perl -S full-rebuild.pl -site localsite autoincr</Format>, also largely as before. However, this time <i>don't</i> pass in any manifest file as an argument to the subsequent rebuild commands which use the <Format>incremental-import.pl</Format> script. So you'd be running these commands after each change:</Text>
    56715679  <Format>
    56725680    perl -S incremental-import.pl -incremental -site localsite autoincr<br />
Note: See TracChangeset for help on using the changeset viewer.