Changeset 37641


Ignore:
Timestamp:
2023-04-09T20:03:02+12:00 (13 months ago)
Author:
anupama
Message:
  1. Minor updates. 2. Related to previous commit 37640. In an earlier commit when creating the manifest-demo-e DEC collection on Windows, we'd worked out that the OIDtype needn't be changed from default dirname to full_filename for incremental building to work. This meant that the delete_manifest file would contain much shorter (simpler) OIDs meant for deletion. It was not just the manifest-demo-e DEC collection that was affected, but the incremental-building tutorial too. The display of the contents of the delete_manifest had been updated in the tutorial, but not the actual delete_manifest xml file itself. This has finally been updated in previous commit 37640, but the tutorial itself still made references to OIDs generated according to the full_filename OIDtype (such as in an archives dir location and build output). This commit adjusts the references to the OIDs in the incremental-building tutorial to be of the format of the default OIDtype of dirname.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r37599 r37641  
    48624862<MajorVersion number="3">gs3-setup.bat</MajorVersion>
    48634863</Command>
    4864 <Text id="0742">to set up the ability to run Greenstone command-line programs. On Linux/Mac, you would run <Command>source <MajorVersion number="2">setup.bash</MajorVersion><MajorVersion number="3">gs3-setup.sh</MajorVersion></Command>.</Text>
     4864<Text id="0742">to set up the ability to run Greenstone command-line programs. On Linux/Mac, you would run <Command>source <MajorVersion number="2">./setup.bash</MajorVersion><MajorVersion number="3">./gs3-setup.sh</MajorVersion></Command>.</Text>
    48654865</NumberedItem>
    48664866<NumberedItem>
     
    49364936</NumberedItem>
    49374937<NumberedItem><Text id="ucp-11">Visit the <Link url="http://www.djvu.org/resources/djvu_digital_vs_super_hero_pdf.php">'DjVu-Digital vs. "Super Hero" PDF' page</Link>. The page compares a PDF sample document to its equivalent DjVu version and provides download links for both.</Text>
    4938 <Text id="ucp-11a">Download their <Link url="https://trac.greenstone.org/export/37353/documentation/trunk/tutorial_sample_files/unknownconverter/superhero.djvu">sample DjVu document</Link> (originally <Link url="http://www.djvu.org/docs/superhero.djvu?djvuopts&amp;zoom=page">here</Link>) into your <i>DjVu Collection</i>'s <b>import</b> folder at <MajorVersion number="2"><Path>Greenstone &rarr; collect &rarr; djvucoll &rarr; import</Path> </MajorVersion><MajorVersion number="3"><Path>Greenstone &rarr; web &rarr; sites &rarr; localsite &rarr; collect &rarr; djvucoll &rarr; import</Path></MajorVersion>.</Text>
     4938<Text id="ucp-11a">Download their <Link url="https://trac.greenstone.org/export/37353/documentation/trunk/tutorial_sample_files/unknownconverter/superhero.djvu">sample DjVu document</Link> (originally <Link url="http://www.djvu.org/docs/superhero.djvu?djvuopts&amp;zoom=page">here</Link>) into your <i>DjVu Collection</i>'s <b>import</b> folder at <MajorVersion number="2"><Path>Greenstone &rarr; collect &rarr; djvucoll &rarr; import</Path> </MajorVersion><MajorVersion number="3"><Path>Greenstone &rarr; web &rarr; sites &rarr; localsite &rarr; collect &rarr; djvucoll &rarr; import</Path></MajorVersion>. If you're offline, you can also get this file from <Path>sample_files &rarr; djvu &rarr; superhero.djvu</Path>.</Text>
    49394939</NumberedItem>
    49404940<NumberedItem><Text id="ucp-12">Back in GLI, in the <b>Collection</b> view of the <AutoText key="glidict::GUI.Gather"/> pane, right click and select <AutoText key="glidict::CollectionPopupMenu.Refresh"/>. You should now see your new document "superhero.djvu" ready to be built.</Text>
     
    49984998<NumberedItem><Text id="ucp-39">Greenstone doesn't have an icon for DjVu documents, since it doesn't know about the format. If you Google for the djvu icon, you'd probably find the <Link url="https://en.wikipedia.org/wiki/DjVu">Wikipedia page for it</Link>.</Text>
    49994999<Text id="ucp-40">Save one of their DjVu icon images. Then open the image in Windows Paint or GIMP or another image editor, and use the application's scaling feature to scale the image's height or the width (whichever is greater) to anywhere between 26 and 32 pixels. Save the scaled image as a GIF file with the name "<Format>idjvu.gif</Format>", storing it in your Greenstone installation's <Format>web/interfaces/default/images</Format> folder. You can also use free online image resizing websites to carry out this step.</Text>
     5000<Text id="ucp-40a">If you're working offline, you can get a resized and ready copy of the idjvu.gif file from <Path>sample_files &rarr; djvu &rarr; idjvu.gif</Path>. Put it in your Greenstone 3 installation's <Format>web/interfaces/default/images</Format> folder.</Text>
    50005001</NumberedItem>
    50015002<NumberedItem><Text id="ucp-41">Greenstone knows nothing about the <Format>icondjvu</Format> macro we defined as the value for UnknownConverterPlugin's <Format>srcicon</Format> field, so we have to teach Greenstone about this new macro. Use a text editor to open your Greenstone 3's <Format>web/sites/localsite/siteConfig.xml</Format> file.</Text>
     
    50105011</NumberedItem>
    50115012<NumberedItem><Text id="ucp-47">Having designed your collection to handle DjVu documents, you can now add any other documents, including more DjVu documents. Greenstone should now be able to index the text content of DjVu documents in the collection to make them searchable, in all instances where text can be successfully extracted from them by <Format>djvutxt</Format>.</Text>
    5012 <Text id="ucp-47a">Make the search format statement look like below, then try searching:</Text>
     5013<Text id="ucp-47a">Make the search format statement look like below (you can copy it from <Path>sample_files &rarr; djvu &rarr; formats &rarr; format_tweaks.txt</Path>), then try searching:</Text>
    50135014<Format>
    50145015  &lt;gsf:template match="documentNode"&gt;<br/>
     
    56965697</Format>
    56975698<Text id="ic-10b">As per the above manifest file, the operation to be performed by an incremental build is a &lt;Delete&gt; operation on two documents. For the delete operation, the documents are not indicated by the &lt;Filename&gt; XML element, but by the &lt;OID&gt; element which specifies the object identifier. We need to use the OID here because we're telling Greenstone precisely what the identifiers of the documents are that we wish to have removed from our collection. The identifiers of every built document in a Greenstone collection are specified in the Identifier field of the document's <i>doc.xml</i> file located in the collection's <Format>archives</Format> folder. The <i>doc.xml</i> file is the Greenstone-specific XML format in which Greenstone stores documents already imported.</Text>
    5698 <Text id="ic-10c">For instance, to find the identifier of the <i>b18ase.htm</i> document in your built collection, open up <Format><MajorVersion number="3">web\sites\localsite\</MajorVersion>collect\incremen\archives\b18ase-b.dir\doc.xml</Format> in a text editor. Then scroll down, looking for a piece of Greenstone extracted metadata labelled <i>Identifier</i>, which is the OID for this document:</Text>
     5699<Text id="ic-10c">For instance, to find the identifier of the <i>b18ase.htm</i> document in your built collection, open up <Format><MajorVersion number="3">web\sites\localsite\</MajorVersion>collect\incremen\archives\b18ase.dir\doc.xml</Format> in a text editor. Then scroll down, looking for a piece of Greenstone extracted metadata labelled <i>Identifier</i>, which is the OID for this document:</Text>
    56995700<!--<Format>&lt;Metadata name=&quot;Identifier&quot;&gt;b18ase-b18ase_htm&lt;/Metadata&gt;</Format>-->
    57005701<Format>&lt;Metadata name=&quot;Identifier&quot;&gt;b18ase&lt;/Metadata&gt;</Format>
     
    57145715<Format>perl -S incremental-buildcol.pl -activate <MajorVersion number="3">-site localsite</MajorVersion> incremen</Format>
    57155716<Text id="ic-12d">If you were to scroll through the buildcol output in the terminal this time, you would see the following:</Text>
    5716 <Format>GreenstoneXMLPlugin: processing fb33fe-f.dir\doc.xml<br />
    5717 GreenstoneXMLPlugin: processing b18ase-b.dir\doc.xml
     5717<Format>GreenstoneXMLPlugin: processing fb33fe.dir\doc.xml<br />
     5718GreenstoneXMLPlugin: processing b18ase.dir\doc.xml
    57185719</Format>
    57195720<Text id="ic-12e">Only these 2 files were actually processed by <Format>buildcol</Format>, and that's because the manifest specified they were being deleted.</Text>
Note: See TracChangeset for help on using the changeset viewer.