Ignore:
Timestamp:
2012-07-20T14:53:33+12:00 (12 years ago)
Author:
ak19
Message:

Corrections to misspelling of AutoText tag.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r25966 r25996  
    10931093<Tutorial id="pdfbox-extension">
    10941094<Title>
    1095 <Text id="pdfbox-ext-0">Setting up the PDFBox extension to process newer versions of PDF</Text>
     1095<Text id="pdfbox-ext-0">Processing newer versions of PDF with PDFBox</Text>
    10961096</Title>
    10971097<Prerequisite id="word_pdf_collection"/>
     
    11261126</NumberedItem>
    11271127<NumberedItem>
    1128 <Text id="pdfbox-ext-11">Now that you've installed the PDFBox extension, this will be available as an option in the plugin's configuration dialog. To turn on the PDFBox extension for any collection you open in GLI, you would go to the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText key="glidict::CDM.GUI.Plugins"/> from the left and on the right, double click the <Autotext text="PDFPlugin"/> (alternatively, select this plugin and click the <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> below) to open the dialog to configure this plugin. In the <AutoText key="glidict::CDM.PlugInManager.Configure"/> dialog, scroll down to the section <Autotext text="AutoLoadConverters"/> and select the checkbox next to the <Autotext text="pdfbox_conversion"/> option. Click <AutoText key="glidict::General.OK"/> to close the dialog, switch to the <AutoText key="glidict::GUI.Create"/> panel and rebuild your collection. This time, PDF files will be processed by PDFBox which will extract their text.</Text>
    1129 <Text id="pdfbox-ext-12">Try this feature out on a collection of recent PDF files, by configuring its PDFPlugin with the <Autotext text="pdfbox_conversion"/> option turned on.</Text>
    1130 <Text id="pdfbox-ext-12">You can also experiment by configuring the PDFPlugin used in the <b>Reports</b> collection, although that one contains old PDF versions which the default settings of <Autotext text="PDFPlugin"/> can already process successfully. If you do decide to test out the PDFBox extension with the <b>Reports</b> collection, then rebuild it and preview it. However, once you've inspected the results, you may wish to go back to the <AutoText key="glidict::GUI.Design"/> panel and turn off <Autotext text="pdfbox_conversion"/> and rebuild the collection once more, so that it's back to its original state and ready for future tutorials.</Text>
     1128<Text id="pdfbox-ext-11">Now that you've installed the PDFBox extension, this will be available as an option in the plugin's configuration dialog. To turn on the PDFBox extension for any collection you open in GLI, you would go to the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText key="glidict::CDM.GUI.Plugins"/> from the left and on the right, double click the <AutoText text="PDFPlugin"/> (alternatively, select this plugin and click the <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> below) to open the dialog to configure this plugin. In the <AutoText key="glidict::CDM.PlugInManager.Configure"/> dialog, scroll down to the section <AutoText text="AutoLoadConverters"/> and select the checkbox next to the <AutoText text="pdfbox_conversion"/> option. Click <AutoText key="glidict::General.OK"/> to close the dialog, switch to the <AutoText key="glidict::GUI.Create"/> panel and rebuild your collection. This time, PDF files will be processed by PDFBox which will extract their text.</Text>
     1129<Text id="pdfbox-ext-12">Try this feature out on a collection of recent PDF files, by configuring its PDFPlugin with the <AutoText text="pdfbox_conversion"/> option turned on.</Text>
     1130<Text id="pdfbox-ext-12">You can also experiment by configuring the PDFPlugin used in the <b>Reports</b> collection, although that one contains old PDF versions which the default settings of <AutoText text="PDFPlugin"/> can already process successfully. If you do decide to test out the PDFBox extension with the <b>Reports</b> collection, then rebuild it and preview it. However, once you've inspected the results, you may wish to go back to the <AutoText key="glidict::GUI.Design"/> panel and turn off <AutoText text="pdfbox_conversion"/> and rebuild the collection once more, so that it's back to its original state and ready for future tutorials.</Text>
    11311131</NumberedItem>
    11321132</Content>
     
    14801480</NumberedItem>
    14811481<NumberedItem>
    1482 <Text id="assoc-files-8">In <AutoText key="glidict::CDM.GUI.Plugins"/>, select the <Autotext text="WordPlugin"/> and press the <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> button.
     1482<Text id="assoc-files-8">In <AutoText key="glidict::CDM.GUI.Plugins"/>, select the <AutoText text="WordPlugin"/> and press the <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> button.
    14831483In the resulting popup, scroll down to find the associate_ext option, and set this option to <AutoText text="pdf" type="italics"/>.</Text>
    1484 <Text id="assoc-files-9">Note 1: as this is an option that is categorized under the <Autotext text="BasePlugin"/> heading, it is therefore an option that is available across all the plugins provided by Greenstone. In our example, we happen to be binding a PDF document to a Word document, however it could equally be used to bind MP3 versions of files to PNG artwork of album covers.</Text>
    1485 <Text id="assoc-files-10">Note 2: More than one filename extension can be provided as part of this option, separated by a comma. For example, setting the value of the associate_ext in <Autotext text="TextPlugin"/> to <Autotext text="avi,png" type="italics"/> would allow both an AVI video file (say an oral history interview) and a PNG image (say a picture of the interviewee taken at the time of the recording) to bind to a text version of the document (say representing a transcript of the interview). Both AVI and PNG versions of the file can be present at the same time, or alternatively only one of the two file types need be present, or neither, and Greenstone will process the situation accordingly.</Text>
     1484<Text id="assoc-files-9">Note 1: as this is an option that is categorized under the <AutoText text="BasePlugin"/> heading, it is therefore an option that is available across all the plugins provided by Greenstone. In our example, we happen to be binding a PDF document to a Word document, however it could equally be used to bind MP3 versions of files to PNG artwork of album covers.</Text>
     1485<Text id="assoc-files-10">Note 2: More than one filename extension can be provided as part of this option, separated by a comma. For example, setting the value of the associate_ext in <AutoText text="TextPlugin"/> to <AutoText text="avi,png" type="italics"/> would allow both an AVI video file (say an oral history interview) and a PNG image (say a picture of the interviewee taken at the time of the recording) to bind to a text version of the document (say representing a transcript of the interview). Both AVI and PNG versions of the file can be present at the same time, or alternatively only one of the two file types need be present, or neither, and Greenstone will process the situation accordingly.</Text>
    14861486<Text id="assoc-files-11">Note 3: The option <Format>associate_ext</Format> is in fact a simplified version of a more general option <Format>associate_tail_re</Format>. Using regular expression syntax, the latter provides a more powerful way of manipulating filenames. Rather than focus on just the filename extension, with <Format>associate_tail_re</Format>, one is able to group files together that share a similar filename root, but might start to differ in characters before the filename extension. For instance, the Word version of the document might be <Format>my-article.doc</Format> but the PDF version might be <Format>my-article-ver13.pdf</Format> reflecting the fact that the PDF file is saved in version 1.3 of this format. Using <Format>associate_tail_re</Format> (and a little bit of regular expression know-how!), such differences can be surmounted, and the two files still processed automatically as different versions of the same document.</Text>
    14871487</NumberedItem>
    14881488<NumberedItem>
    1489 <Text id="assoc-files-12">If you're working with structured Word documents that contain formatted headings and you want better structured and formatted HTML versions of the documents to be generated by Greenstone from the Word format, optionally set the <Format>windows_scripting</Format> option for the <Autotext text="WordPlugin"/> if building on Windows, or turn on the <Format>open_office_scripting</Format> option if this extension has been added to your Greenstone installation and either OpenOffice or LibreOffice is available on your system.</Text>
    1490 <Text id="assoc-files-13">Optionally set the <Autotext text="level1_heading" type="italics"/> to <i>heading\s*1</i>, or whatever is appropriate for your documents if they use style information for headings that deviate from the norm for Word. Repeat as is needed for <Autotext text="level2_heading" type="italics"/> and so forth. For more details on how to control sections within a Word document, see the <TutorialRef id="enhanced_word"/> tutorial.</Text>
     1489<Text id="assoc-files-12">If you're working with structured Word documents that contain formatted headings and you want better structured and formatted HTML versions of the documents to be generated by Greenstone from the Word format, optionally set the <Format>windows_scripting</Format> option for the <AutoText text="WordPlugin"/> if building on Windows, or turn on the <Format>open_office_scripting</Format> option if this extension has been added to your Greenstone installation and either OpenOffice or LibreOffice is available on your system.</Text>
     1490<Text id="assoc-files-13">Optionally set the <AutoText text="level1_heading" type="italics"/> to <i>heading\s*1</i>, or whatever is appropriate for your documents if they use style information for headings that deviate from the norm for Word. Repeat as is needed for <AutoText text="level2_heading" type="italics"/> and so forth. For more details on how to control sections within a Word document, see the <TutorialRef id="enhanced_word"/> tutorial.</Text>
    14911491</NumberedItem>
    14921492<NumberedItem>
     
    14991499<Text id="assoc-files-18">to:</Text>
    15001500<Format><td valign="top">[ex.equivDocLink][ex.equivDocIcon][ex./equivDocLink]</td></Format>
    1501 <Text id="assoc-files-19">Two things occur in this edit. The main difference is the switch from using <Autotext text="ex.srclink" type="italics"/> and <Autotext text="ex.srcicon" type="italics"/> that provides the link to the primary source document (which is the Word document), and replace it with a hyperlink around an icon to the document that Greenstone has associated as an equivalent document (which is the PDF version). The icon Greenstone chooses to show is based on the filename extension of the matching file it has found. In this case <img src="../tutorial_files/ipdf.gif"/>.</Text>
    1502 <Text id="assoc-files-20">The second (more minor) change in this edit is to simplify the statement a bit. The original uses an <Format>{Or}</Format> statement to show a thumbnail version of the document if Greenstone has one, in preference over the source icon. Since in this collection we have no thumbnails generated, it has been simplified by eliminating the <Format>{Or}</Format> combination and going straight to the <Autotext text="ex.equivDocIcon" type="italics"/> metadata item.</Text>
     1501<Text id="assoc-files-19">Two things occur in this edit. The main difference is the switch from using <AutoText text="ex.srclink" type="italics"/> and <AutoText text="ex.srcicon" type="italics"/> that provides the link to the primary source document (which is the Word document), and replace it with a hyperlink around an icon to the document that Greenstone has associated as an equivalent document (which is the PDF version). The icon Greenstone chooses to show is based on the filename extension of the matching file it has found. In this case <img src="../tutorial_files/ipdf.gif"/>.</Text>
     1502<Text id="assoc-files-20">The second (more minor) change in this edit is to simplify the statement a bit. The original uses an <Format>{Or}</Format> statement to show a thumbnail version of the document if Greenstone has one, in preference over the source icon. Since in this collection we have no thumbnails generated, it has been simplified by eliminating the <Format>{Or}</Format> combination and going straight to the <AutoText text="ex.equivDocIcon" type="italics"/> metadata item.</Text>
    15031503<Text id="assoc-files-21">Switch to the <AutoText key="glidict::GUI.Format"/> panel and edit the format statement for VList (All).</Text>
    15041504<Text id="assoc-files-22">Change:</Text>
     
    15191519 [/highlight]{If}{[dc.Creator],: [sibling(All'\, '):dc.Creator]}</td><br />
    15201520</Format>
    1521 <Text id="assoc-files-24">Note: When Greenstone encounters a file that matches the provided <Format>associate_ext</Format> value (<Format>pdf</Format> in our case), it sets the metadata value <Autotext text="ex.equivDocIcon"/> for that document to be the macro <i>_iconXXX_</i>, where <i>XXX</i> is whatever the filename extension is (so <Autotext text="_iconpdf_" type="italics"/> in our case). As long as there is an existing macro defined for that combination of the word <i>icon</i> and the filename extension, then a suitable icon will be displayed when the document appears in a VList. For <i>pdf</i> the displayed icon will be <img src="../tutorial_files/ipdf.gif"/>.</Text>
     1521<Text id="assoc-files-24">Note: When Greenstone encounters a file that matches the provided <Format>associate_ext</Format> value (<Format>pdf</Format> in our case), it sets the metadata value <AutoText text="ex.equivDocIcon"/> for that document to be the macro <i>_iconXXX_</i>, where <i>XXX</i> is whatever the filename extension is (so <AutoText text="_iconpdf_" type="italics"/> in our case). As long as there is an existing macro defined for that combination of the word <i>icon</i> and the filename extension, then a suitable icon will be displayed when the document appears in a VList. For <i>pdf</i> the displayed icon will be <img src="../tutorial_files/ipdf.gif"/>.</Text>
    15221522</NumberedItem>
    15231523</Content>
     
    33133313</NumberedItem>
    33143314<NumberedItem>
    3315 <Text id="oaiserver-14">Although the data transmitted over OAI is in the form of XML, Greenstone uses a stylesheet to transform that XML response into a user-friendly, structured web page you see when you perform the <Autotext text="Identify"/> request (thereby visiting the <AutoText text="verb=Identify" type="italics"/> response page). This allows <AutoText text="Identify" type="italics"/> and other verbs in the OAI specification to be shown in the main Greenstone OAI Server pages as link buttons. You can see these in the main Greenstone <AutoText text="oaiserver.cgi" type="italics"/> (or <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/>) page, as a row of links starting with "Identify" at the top and in the lower end of the page.</Text>
     3315<Text id="oaiserver-14">Although the data transmitted over OAI is in the form of XML, Greenstone uses a stylesheet to transform that XML response into a user-friendly, structured web page you see when you perform the <AutoText text="Identify"/> request (thereby visiting the <AutoText text="verb=Identify" type="italics"/> response page). This allows <AutoText text="Identify" type="italics"/> and other verbs in the OAI specification to be shown in the main Greenstone OAI Server pages as link buttons. You can see these in the main Greenstone <AutoText text="oaiserver.cgi" type="italics"/> (or <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/>) page, as a row of links starting with "Identify" at the top and in the lower end of the page.</Text>
    33163316<Text id="oaiserver-15">Clicking on the links will execute that verb as a request and return the response from your Greenstone OAI server as a structured web page. Try clicking on all the links.</Text>
    33173317</NumberedItem>
    33183318<NumberedItem>
    3319 <Text id="oaiserver-16">OAI defines a concept called a <Autotext text="Set"/>. In Greenstone, the OAI Set concept is mapped to the practical Greenstone collection. The link to the <AutoText text="ListSets" type="italics"/> verb will therefore request the Greenstone OAI server to list all the collections that have been enabled for OAI.</Text>
     3319<Text id="oaiserver-16">OAI defines a concept called a <AutoText text="Set"/>. In Greenstone, the OAI Set concept is mapped to the practical Greenstone collection. The link to the <AutoText text="ListSets" type="italics"/> verb will therefore request the Greenstone OAI server to list all the collections that have been enabled for OAI.</Text>
    33203320<Text id="oaiserver-17">Click on the <b>ListSets</b> button link and have a look.</Text>
    33213321<Text id="oaiserver-18">The response page for the <AutoText text="ListSets" type="italics"/> verb will show you that your backdrop collection is one of the collections available over OAI in your Greenstone repository.</Text>
    33223322</NumberedItem>
    33233323<NumberedItem>
    3324 <Text id="oaiserver-19">You will see a couple of buttons next to each collection (or <Autotext text="Set"/>) listed here. The first is <b>Identifiers</b> and the second <b>Records</b>. Click on the <b>Identifiers</b> button for the backdrop Set. This will list all the IDs of the documents contained in your OAI collection. If you look at the IDs, they look similar enough to Greenstone's internal document IDs, but with an additional prefix (<Format>oai:&lt;repositoryID&gt;:setname</Format>, where <AutoText text="repositoryID" type="italics"/> was set by you in the oai.cfg configuration file).</Text>
     3324<Text id="oaiserver-19">You will see a couple of buttons next to each collection (or <AutoText text="Set"/>) listed here. The first is <b>Identifiers</b> and the second <b>Records</b>. Click on the <b>Identifiers</b> button for the backdrop Set. This will list all the IDs of the documents contained in your OAI collection. If you look at the IDs, they look similar enough to Greenstone's internal document IDs, but with an additional prefix (<Format>oai:&lt;repositoryID&gt;:setname</Format>, where <AutoText text="repositoryID" type="italics"/> was set by you in the oai.cfg configuration file).</Text>
    33253325</NumberedItem>
    33263326<NumberedItem>
     
    33283328<Text id="oaiserver-21">As you would have specified some Dublin Core (dc) metadata for some of the images in the backdrop collection, the page that loads will display this information for each document in the collection (Set).</Text>
    33293329<Text id="oaiserver-22">Greenstone's OAI at present supports 3 metadata formats, as is explained in the comments in the oai.cfg file. Of these three, the OAI standard for Dublin Core, <AutoText text="oai_dc" type="italics"/>, is the one pertinent to this tutorial. If your collection specifies metadata for a different metadata set format, you can use the oai.cfg file to tell Greenstone how to map the metadata fields of your chosen metadata set format into the Dublin Core metadata set supported by the Greenstone OAI server (or one of the other metadata sets it supports).</Text>
    3330 <Text id="oaiserver-23">Look in the oai.cfg file again and scroll down to the section on <AutoText text="oaimapping" type="italics"/>, which will explain and provide examples for how to specify such mappings from your metadata format to one that Greenstone's OAI server uses. For instance, the <b>demo</b> collection comes enabled for OAI upon installation, and specifies some mappings from its <Autotext text="DLS" type="italics"/> metadata format to <Autotext text="OAI DC" type="italics"/>. Its <AutoText key="metadata::dls.Title"/> metadata is mapped using the following line in the oai.cfg configuration file:</Text>
     3330<Text id="oaiserver-23">Look in the oai.cfg file again and scroll down to the section on <AutoText text="oaimapping" type="italics"/>, which will explain and provide examples for how to specify such mappings from your metadata format to one that Greenstone's OAI server uses. For instance, the <b>demo</b> collection comes enabled for OAI upon installation, and specifies some mappings from its <AutoText text="DLS" type="italics"/> metadata format to <AutoText text="OAI DC" type="italics"/>. Its <AutoText key="metadata::dls.Title"/> metadata is mapped using the following line in the oai.cfg configuration file:</Text>
    33313331<Format>oaimapping dls.Title oai_dc.title</Format>
    33323332<Text id="oaiserver-24">Because the backdrop collection uses DC metadata already, no mapping is required.</Text>
     
    33383338<Text id="gli-oai-0">Connecting to an OAI server from GLI</Text>
    33393339</Title>
    3340 <Prerequisite id="simple_image_collection"/>
     3340<Prerequisite id="setting_up_GS_OAI_server"/>
    33413341<Version initial="2.85" current="2.85"/>
    33423342<Comment>
     
    33673367</NumberedItem>
    33683368<NumberedItem>
    3369 <Text id="gli-oai-9">After a while, it will have finished downloading. Change to the <AutoText key="glidict::GUI.Gather"/> panel, and on the left-hand side, open up the <AutoText key="glidict::Tree.DownloadedFiles"/>Downloaded Files folder. This is where Greenstone stores files you downloaded using the <AutoText key="glidict::GUI.Download"/> panel. In this case, it will contain a folder wherein the oai metadata files and images that you've just downloaded from your own Greenstone OAI server is stored.</Text>
     3369<Text id="gli-oai-9">After a while, it will have finished downloading. Change to the <AutoText key="glidict::GUI.Gather"/> panel, and on the left-hand side, open up the <AutoText key="glidict::Tree.DownloadedFiles"/> folder. This is where Greenstone stores files you downloaded using the <AutoText key="glidict::GUI.Download"/> panel. In this case, it will contain a folder wherein the oai metadata files and images that you've just downloaded from your own Greenstone OAI server is stored.</Text>
    33703370</NumberedItem>
    33713371<NumberedItem>
     
    33933393<Content>
    33943394<NumberedItem>
    3395 <Text id="gs-oai-3">You will want to be running the included Apache web server. So if you're on Windows and using the Local Library Server, quit it and rename the <Autotext text="server.exe" type="italics"/> application in your Greenstone installation folder to server.not. Then use the <Autotext text="Start" type="italics"/> menu shortcut to the Greenstone Server once more, to now launch the Apache web server.</Text>
    3396 </NumberedItem>
    3397 <NumberedItem>
    3398 <Text id="gs-oai-4">For this exercise, we will visit the <b>Open Archives Validator</b>, for which your OAIserver needs to provide a valid email address. In a text editor, open up your greenstone installation's etc/oai.cfg file and set the value of the <Autotext text="maintainer" type="italics"/> field to your email address.</Text>
    3399 <Text id="gs-oai-5">Note that by default, your Greenstone installation will make the <b>demo</b> collection available over OAI. This collection has been set up with a dummy (and invalid) email address for the <Autotext text="creator" type="italics"/> and <Autotext text="maintainer" type="italics"/> fields in the collection's collect.cfg file. You will need to open up collect/demo/etc/collect.cfg and clear the email values for the <Autotext text="creator" type="italics"/> and <Autotext text="maintainer" type="italics"/> properties (or else set these to a valid email again). Otherwise the OpenArchives validator will resort to using the <b>demo</b> collection's default dummy email to send the initial validation results to. Alternatively, you can simply remove the <b>demo</b> collection from being listed in the oai.cfg file's oaicollection property, which will cease to make the <b>demo</b> collection available over OAI.</Text>
    3400 <Text id="gs-oai-6">Note also that, if you wish to specify contact emails at a collection level, you will need to edit your greenstone installation's <Format>collect/&lt;collection-name&gt;/etc/collect.cfg</Format> file for those collections and set the <Autotext text="creator" type="italics"/> and <Autotext text="maintainer" type="italics"/> fields to the desired email address.</Text>
     3395<Text id="gs-oai-3">You will want to be running the included Apache web server. So if you're on Windows and using the Local Library Server, quit it and rename the <AutoText text="server.exe" type="italics"/> application in your Greenstone installation folder to server.not. Then use the <AutoText text="Start" type="italics"/> menu shortcut to the Greenstone Server once more, to now launch the Apache web server.</Text>
     3396</NumberedItem>
     3397<NumberedItem>
     3398<Text id="gs-oai-4">For this exercise, we will visit the <b>Open Archives Validator</b>, for which your OAIserver needs to provide a valid email address. In a text editor, open up your greenstone installation's etc/oai.cfg file and set the value of the <AutoText text="maintainer" type="italics"/> field to your email address.</Text>
     3399<Text id="gs-oai-5">Note that by default, your Greenstone installation will make the <b>demo</b> collection available over OAI. This collection has been set up with a dummy (and invalid) email address for the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> fields in the collection's collect.cfg file. You will need to open up collect/demo/etc/collect.cfg and clear the email values for the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> properties (or else set these to a valid email again). Otherwise the OpenArchives validator will resort to using the <b>demo</b> collection's default dummy email to send the initial validation results to. Alternatively, you can simply remove the <b>demo</b> collection from being listed in the oai.cfg file's oaicollection property, which will cease to make the <b>demo</b> collection available over OAI.</Text>
     3400<Text id="gs-oai-6">Note also that, if you wish to specify contact emails at a collection level, you will need to edit your greenstone installation's <Format>collect/&lt;collection-name&gt;/etc/collect.cfg</Format> file for those collections and set the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> fields to the desired email address.</Text>
    34013401</NumberedItem>
    34023402<NumberedItem>
Note: See TracChangeset for help on using the changeset viewer.