Changeset 26000 for documentation/trunk


Ignore:
Timestamp:
2012-07-23T16:08:20+12:00 (12 years ago)
Author:
ak19
Message:
  1. Merged the OAI server setup tutorial and the OAI server validation tutorial on Kathy's advice, but as they retain their individual subheadings, GS users should be able to find them. 2. Tested the tutorial with a GS2 from SVN on the Ubuntu here. 3. Corrected the expired portions of the out of date Downloading over OAI tutorial using the recently added tutorial on using GLI to download over OAI from GS' own OAIServer. The manual download section uses Kathy's more up to date instructions from the wiki.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r25997 r26000  
    10991099<Content>
    11001100<Comment>
    1101 <Text id="pdfbox-ext-1">Greenstone comes with the PDFPlugin which can handle older versions of PDF, but can't cope by default with newer PDF files. However, a Greenstone extension making use of <b>PDFBox</b>, an open-source PDF conversion tool, is available if you want Greenstone to extract text from more recent PDF files. This tutorial will cover how to install the PDFBox extension for Greenstone and how to switch on its functionality in the Greenstone Librarian Interface.</Text>
     1101<Text id="pdfbox-ext-1">By default the PDFPlugin can process PDF versions 1.4 and older. The PDFBox extension for Greenstone allows text from more recent PDF files to be extracted. The extension uses <b>PDFBox</b>, an open-source PDF conversion tool. This tutorial will cover how to install the PDFBox extension for Greenstone and how to switch on its functionality in the Greenstone Librarian Interface to process text from newer versions of PDF.</Text>
    11021102</Comment>
    11031103<Heading>Obtaining and installing the PDFBox extension for Greenstone</Heading>
     
    31963196</Content>
    31973197</Tutorial>
    3198 <Tutorial id="OAI_downloading">
    3199 <Title>
    3200 <Text id="0733">Downloading over OAI</Text>
    3201 </Title>
    3202 <Prerequisite id="OAI_collection"/>
    3203 <Version initial="2.60" current="2.85"/>
    3204 <Content>
    3205 <Comment>
    3206 <Text id="0734">The previous exercise did not obtain the data from an external OAI-PMH server. This missing step is accomplished either by running a command-line program or by using the <AutoText key="glidict::GUI.Download"/> panel in the Librarian Interface. This exercise shows you how to do this using both methods.</Text>
    3207 </Comment>
    3208 <Heading>
    3209 <Text id="oai-1">Downloading using the Librarian Interface</Text>
    3210 </Heading>
    3211 <NumberedItem>
    3212 <Text id="oai-2">In the Librarian Interface, switch to the <AutoText key="glidict::GUI.Download"/> panel. Select <AutoText key="glidict::DOWNLOAD.MODE.OAIDownload"/> from the list of download types on the left hand side.</Text>
    3213 </NumberedItem>
    3214 <NumberedItem>
    3215 <Text id="oai-3">In the <AutoText text="url"/> box, type in the following URL:</Text>
    3216 <Link>http://rocky.dlib.vt.edu/~jcdlpix/cgi-bin/OAI/jcdlpix.pl</Link>
    3217 </NumberedItem>
    3218 <NumberedItem>
    3219 <Text id="oai-4">We want to download the documents as well as the metadata, so tick the <AutoText text="Get document"/> checkbox.</Text>
    3220 </NumberedItem>
    3221 <NumberedItem>
    3222 <Text id="oai-5">If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Click the <AutoText key="glidict::Mirroring.Preferences" type="button"/> button. Switch on the <AutoText key="glidict::Preferences.Connection.Use_Proxy"/> checkbox. Enter the proxy server address and port number in the <AutoText key="glidict::Preferences.Connection.Proxy_Host"/> and <AutoText key="glidict::Preferences.Connection.Proxy_Port"/> boxes. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    3223 </NumberedItem>
    3224 <NumberedItem>
    3225 <Text id="oai-7">Now click <AutoText key="glidict::Mirroring.Download" type="button"/>. If you have set proxy information in <AutoText key="glidict::Menu.File_Options"/>, a popup will ask for your user name and password. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing.</Text>
    3226 </NumberedItem>
    3227 <NumberedItem>
    3228 <Text id="oai-8">Downloaded files are stored in a top-level folder called <AutoText key="glidict::Tree.DownloadedFiles"/> that appears on the left-hand side of the <AutoText key="glidict::GUI.Gather"/> panel. These files can then be added to a collection.</Text>
    3229 </NumberedItem>
    3230 <Heading>
    3231 <Text id="oai-9">Downloading using the command line</Text>
    3232 </Heading>
    3233 <Comment>
    3234 <Text id="oai-10">For command line downloading to work, your computer must have a direct connection to the Internet&mdash;being behind a firewall may interfere with the ability to download the information. You will need to use the Librarian Interface for downloading if you are behind a firewall.</Text>
    3235 </Comment>
    3236 <NumberedItem>
    3237 <Text id="oai-11">Close the Librarian Interface.</Text>
    3238 <Text id="oai-12">We will work with the OAI collection used in exercise <TutorialRef id="OAI_collection"/>. You may have noticed that its internal name is <AutoText text="oaiservi"/>.</Text>
    3239 </NumberedItem>
    3240 <NumberedItem>
    3241 <Text id="0736">In a text editor (e.g. WordPad), open the collection's configuration file, which is in <Path>Greenstone &rarr; collect &rarr; oaiservi &rarr; etc &rarr; collect.cfg</Path>. Add the following line (all on one line):</Text>
    3242 <Command>acquire OAI -src http://rocky.dlib.vt.edu/~jcdlpix/cgi-bin/OAI/jcdlpix.pl -getdoc</Command>
    3243 <Text id="0737">Although the position of this line is not critical, we recommend that you place it near the beginning of the file, after the public and creator lines but before the index line. Save the file and quit the editor.</Text>
    3244 </NumberedItem>
    3245 <NumberedItem>
    3246 <Text id="0738">Delete the contents of the collection's <Path>import</Path> folder. This contains the canned version of the collection files, put there during the previous exercise. Now we want to witness the data arriving anew from the external OAI server.</Text>
    3247 </NumberedItem>
    3248 <NumberedItem>
    3249 <Text id="0739">Open a DOS window to access the command-line prompt. This facility should be located somewhere within your <Menu>Start &rarr; Programs</Menu> menu, but details vary between different Windows systems. If you cannot locate it, select <Menu>Start &rarr; Run</Menu> and enter <i>cmd</i> in the popup window that appears.</Text>
    3250 </NumberedItem>
    3251 <NumberedItem>
    3252 <Text id="0742">In the DOS window, move to the home directory where you installed Greenstone. This is accomplished by something like:</Text>
    3253 <Command>cd C:\Program Files\Greenstone</Command>
    3254 </NumberedItem>
    3255 <NumberedItem>
    3256 <Text id="0743">Type:</Text>
    3257 <Command>setup.bat</Command>
    3258 <Text id="0744">to set up the ability to run Greenstone command-line programs.</Text>
    3259 </NumberedItem>
    3260 <NumberedItem>
    3261 <Text id="0745">Change directory into the folder containing the OAI Services Provider collection you built in the last exercise.</Text>
    3262 <Command>cd collect\oaiservi</Command>
    3263 <Comment>
    3264 <Text id="0746">Even though the collection name used capital letters the directory generated by the Librarian Interface is all lowercase.</Text>
    3265 </Comment>
    3266 </NumberedItem>
    3267 <NumberedItem>
    3268 <Text id="0747">Run:</Text>
    3269  <Command>perl -S importfrom.pl oaiservi</Command>
    3270 <Comment>
    3271 <Text id="0748">Greenstone will immediately set to work and generate a stream of diagnostic output. The importfrom.pl program connects to the OAI data provider specified in collection configuration file (it does this for each "acquire" line in the file) and exports all the records on that site.</Text>
    3272 </Comment>
    3273 </NumberedItem>
    3274 <NumberedItem>
    3275 <Text id="0749">The downloaded files are saved in the collection's import folder. Once the command is finished, everything is in place and the collection is ready to be built. Confirm you have successfully acquired the OAI records by rebuilding the collection.</Text>
    3276 </NumberedItem>
    3277 </Content>
    3278 </Tutorial>
    3279 <Tutorial id="setting_up_GS_OAI_server">
     3198<Tutorial id="GS_OAI_server">
    32803199<Title>
    32813200<Text id="oaiserver-0">Setting up your Greenstone OAI Server</Text>
     
    32853204<Content>
    32863205<Comment>
    3287 <Text id="oaiserver-1">Greenstone 2 collections are not enabled for OAI out of the box. To make a collection available to serve up over OAI, some minor adjustments need to be made first.</Text>
    3288 <Text id="oaiserver-2">This tutorial will look at how to make an existing collection available over OAI and how to get it validated against the Open Archives validator.</Text>
     3206<Text id="oaiserver-1">Greenstone 2 collections are not enabled for OAI out of the box. To make a collection available for serving up over OAI, some minor adjustments need to be made first. </Text>
     3207<Text id="oaiserver-2">This tutorial will look at how to make an existing collection available over OAI and testing its accessibility by getting it validated against the Open Archives validator.</Text>
    32893208</Comment>
    32903209<NumberedItem>
    32913210<Text id="oaiserver-2">Use a text editor to open the file etc/oai.cfg located in your Greenstone installation folder. The oai.cfg configuration file contains properties that control the behaviour and features of your Greenstone OAI server.</Text>
    3292 <Text id="oaiserver-4">The basic properties to edit in order to get your collection served by the inbuilt OAI server are the <AutoText text="repositoryNametype" type="italics"/>, <AutoText text="repositoryIDtype" type="italics"/> and <AutoText text="oaicollection" type="italics"/>. Look up these properties in the file.</Text>
    3293 <Text id="oaiserver-5">For <AutoText text="repositoryName" type="italics"/> and <AutoText text="repositoryID" type="italics"/>, type in some values that make sense for your digital library. For example:</Text>
     3211<Text id="oaiserver-4">The basic properties to edit in order to get your collection served by the inbuilt OAI server are the <Format>repositoryName</Format>, <Format>repositoryID</Format> and <Format>oaicollection</Format>. Look up these properties in the file.</Text>
     3212<Text id="oaiserver-5">For <Format>repositoryName</Format> and <Format>repositoryID</Format>, type in some values that make sense for your digital library. For example:</Text>
    32943213<Format>repositoryName "Greenstone"<br />
    32953214repositoryID "greenstone"</Format>
     
    33083227<Text id="oaiserver-10">Press the <AutoText text="Enter Library"/> button and you will end up on your Digital Library home page as usual. Adjust the URL so that instead of the <AutoText text="library.cgi" type="italics"/> suffix, it says <AutoText text="oaiserver.cgi" type="italics"/>.</Text>
    33093228<Text id="oaiserver-11">The page that loads now will contain an error message (<AutoText text="badVerb" type="italics"/>) saying that you've provided an illegal OAI verb. This is because the OAI specification requires you to provide more instruction in the URL as to what you want. The specification defines verbs and possible arguments to them.</Text>
    3310 <Text id="oaiserver-12">A basic verb is <AutoText text="Identify" type="italics"/>, which requests the OAI server to return some information about the OAI repository it's serving. Adjust the URL once more by suffixing <AutoText text="?verb=Identify" type="italics"/> so that your URL now looks like:</Text>
     3229<Text id="oaiserver-12">A basic verb is <AutoText text="Identify" type="italics"/>, which requests the OAI server to return some information about the OAI repository that it's serving. Adjust the URL once more by suffixing <AutoText text="?verb=Identify" type="italics"/>, so that your URL now looks like:</Text>
    33113230<Format>http://&lt;domain&gt;/greenstone/cgi-bin/oaiserver.cgi?verb=Identify</Format>
    33123231<Text id="oaiserver-13">Visiting this page now gives some information about your Greenstone OAI repository.</Text>
    33133232</NumberedItem>
    33143233<NumberedItem>
    3315 <Text id="oaiserver-14">Although the data transmitted over OAI is in the form of XML, Greenstone uses a stylesheet to transform that XML response into a user-friendly, structured web page you see when you perform the <AutoText text="Identify"/> request (thereby visiting the <AutoText text="verb=Identify" type="italics"/> response page). This allows <AutoText text="Identify" type="italics"/> and other verbs in the OAI specification to be shown in the main Greenstone OAI Server pages as link buttons. You can see these in the main Greenstone <AutoText text="oaiserver.cgi" type="italics"/> (or <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/>) page, as a row of links starting with "Identify" at the top and in the lower end of the page.</Text>
     3234<Text id="oaiserver-14">Although the data transmitted over OAI is in the form of XML, Greenstone uses a stylesheet to transform that XML response into a user-friendly, structured web page that you see when you perform the <AutoText text="Identify"/> request (as happens when you visit the <AutoText text="verb=Identify" type="italics"/> response page). This allows <AutoText text="Identify" type="italics"/> and other verbs in the OAI specification to be shown in the main Greenstone OAI Server pages as link buttons. You can see these verbs represented in the main Greenstone <AutoText text="oaiserver.cgi" type="italics"/> (or <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/>) page as a row of links, starting with "Identify" at the top and in the lower end of the page.</Text>
    33163235<Text id="oaiserver-15">Clicking on the links will execute that verb as a request and return the response from your Greenstone OAI server as a structured web page. Try clicking on all the links.</Text>
    33173236</NumberedItem>
    33183237<NumberedItem>
    33193238<Text id="oaiserver-16">OAI defines a concept called a <AutoText text="Set"/>. In Greenstone, the OAI Set concept is mapped to the practical Greenstone collection. The link to the <AutoText text="ListSets" type="italics"/> verb will therefore request the Greenstone OAI server to list all the collections that have been enabled for OAI.</Text>
    3320 <Text id="oaiserver-17">Click on the <b>ListSets</b> button link and have a look.</Text>
    3321 <Text id="oaiserver-18">The response page for the <AutoText text="ListSets" type="italics"/> verb will show you that your backdrop collection is one of the collections available over OAI in your Greenstone repository.</Text>
    3322 </NumberedItem>
    3323 <NumberedItem>
    3324 <Text id="oaiserver-19">You will see a couple of buttons next to each collection (or <AutoText text="Set"/>) listed here. The first is <b>Identifiers</b> and the second <b>Records</b>. Click on the <b>Identifiers</b> button for the backdrop Set. This will list all the IDs of the documents contained in your OAI collection. If you look at the IDs, they look similar enough to Greenstone's internal document IDs, but with an additional prefix (<Format>oai:&lt;repositoryID&gt;:setname</Format>, where <AutoText text="repositoryID" type="italics"/> was set by you in the oai.cfg configuration file).</Text>
    3325 </NumberedItem>
    3326 <NumberedItem>
    3327 <Text id="oaiserver-20">Click the browser Back button to get back to the ListSets page and press the <b>Records</b> button located next to the backdrop collection created in <b>A Simple image collection</b> tutorial.</Text>
    3328 <Text id="oaiserver-21">As you would have specified some Dublin Core (dc) metadata for some of the images in the backdrop collection, the page that loads will display this information for each document in the collection (Set).</Text>
    3329 <Text id="oaiserver-22">Greenstone's OAI at present supports 3 metadata formats, as is explained in the comments in the oai.cfg file. Of these three, the OAI standard for Dublin Core, <AutoText text="oai_dc" type="italics"/>, is the one pertinent to this tutorial. If your collection specifies metadata for a different metadata set format, you can use the oai.cfg file to tell Greenstone how to map the metadata fields of your chosen metadata set format into the Dublin Core metadata set supported by the Greenstone OAI server (or one of the other metadata sets it supports).</Text>
    3330 <Text id="oaiserver-23">Look in the oai.cfg file again and scroll down to the section on <AutoText text="oaimapping" type="italics"/>, which will explain and provide examples for how to specify such mappings from your metadata format to one that Greenstone's OAI server uses. For instance, the <b>demo</b> collection comes enabled for OAI upon installation, and specifies some mappings from its <AutoText text="DLS" type="italics"/> metadata format to <AutoText text="OAI DC" type="italics"/>. Its <AutoText key="metadata::dls.Title"/> metadata is mapped using the following line in the oai.cfg configuration file:</Text>
     3239<Text id="oaiserver-17">Click on the <b>ListSets</b> link and have a look.</Text>
     3240<Text id="oaiserver-18">The response page for the <AutoText text="ListSets" type="italics"/> verb will show you that your <b>backdrop</b> collection (created in the <b>Simple image collection</b> tutorial) is one of the collections available over OAI in your Greenstone repository.</Text>
     3241</NumberedItem>
     3242<NumberedItem>
     3243<Text id="oaiserver-19">You will see a couple of buttons next to each collection (or <AutoText text="Set"/>) listed here. The first is <b>Identifiers</b> and the second <b>Records</b>. Click on the <b>Identifiers</b> button for the <b>backdrop</b> Set. This will list all the IDs of the documents contained in your OAI collection. If you look at the IDs, they look similar enough to Greenstone's internal document IDs, but with an additional prefix (<Format>oai:&lt;repositoryID&gt;:&lt;setname&gt;</Format>, where <Format>repositoryID</Format> was set by you in the <AutoText text="oai.cfg" type="italics"/> configuration file, and <Format>setname</Format> is the name of the collection).</Text>
     3244</NumberedItem>
     3245<NumberedItem>
     3246<Text id="oaiserver-20">Click the browser Back button to get back to the ListSets page and press the <b>Records</b> button located next to the backdrop collection.</Text>
     3247<Text id="oaiserver-21">If you specified some Dublin Core (dc) metadata for each images in the <b>backdrop</b> collection, then the page that loads will display this information for each document in the collection (Set).</Text>
     3248<Text id="oaiserver-22">Greenstone's OAI at present supports 3 metadata formats, as is explained in the instructive comments in the oai.cfg file. Of these three, the OAI standard for Dublin Core, <AutoText text="oai_dc" type="italics"/>, is the one pertinent to this tutorial. If your collection specifies metadata for a different metadata set format, you can use the oai.cfg file to tell Greenstone how to map the metadata fields of your chosen metadata set format into the Dublin Core metadata set supported by the Greenstone OAI server (or one of the other metadata sets it supports).</Text>
     3249<Text id="oaiserver-23">Look in the oai.cfg file again and scroll down to the section on <Format>oaimapping</Format>, which will explain and provide examples for how to specify such mappings from your metadata format to one that Greenstone's OAI server uses. For instance, the <b>demo</b> collection comes enabled for OAI upon installation, and specifies some mappings from its <AutoText text="DLS" type="italics"/> metadata format to <AutoText text="OAI DC" type="italics"/>. Its <AutoText key="metadata::dls.Title"/> metadata is mapped to <AutoText text="oai_dc.title"/> using the following line in the oai.cfg configuration file (note the use of case):</Text>
    33313250<Format>oaimapping dls.Title oai_dc.title</Format>
    33323251<Text id="oaiserver-24">Because the backdrop collection uses DC metadata already, no mapping is required.</Text>
    33333252</NumberedItem>
     3253<Heading>
     3254<Text id="gs-oai-0">Validating the Greenstone OAI server</Text>
     3255</Heading>
     3256<Comment>
     3257<Text id="gs-oai-1">In this section, you'll be testing that you've set up your Greenstone OAI server correctly so that it's accessible over OAI. For this part of the exercise, you need to be on a networked computer and your host computer needs to be visible to the outside world. (That is, when you provide the full name of your computer, someone else in the world should be able to find that computer by typing its URL into their browser's address field.)</Text>
     3258</Comment>
     3259<Comment>
     3260<Text id="gs-oai-2">We'll be using an external OAI client to access our up-and-running Greenstone OAI server. It's not just any OAI client either, but an OAI Server validator.</Text>
     3261</Comment>
     3262<NumberedItem>
     3263<Text id="gs-oai-3">You will want to be running the included Apache web server. So if you're on Windows and using the Local Library Server, quit it and rename the <AutoText text="server.exe" type="italics"/> application in your Greenstone installation folder to <i>server.not</i>. Then use the <AutoText text="Start" type="italics"/> menu shortcut to the Greenstone Server once more, to now launch the Apache web server.</Text>
     3264</NumberedItem>
     3265<NumberedItem>
     3266<Text id="gs-oai-4">For this exercise, we will be visiting the <b>Open Archives Validator</b>, for which your OAIserver needs to provide a valid email address. In a text editor, open up your greenstone installation's etc/oai.cfg file and set the value of the <Format>maintainer</Format> field to your email address.</Text>
     3267<Text id="gs-oai-5">Note that by default, your Greenstone installation will make the <b>demo</b> collection available over OAI. This collection has been set up with a dummy (and invalid) email address for the <Format>creator</Format> and <Format>maintainer</Format> fields in the collection's collect.cfg file. You will need to open up collect/demo/etc/collect.cfg and clear the email values for the <Format>creator</Format> and <Format>maintainer</Format> properties (or else set these to a valid email again). Otherwise the OpenArchives validator will resort to using the <b>demo</b> collection's default dummy email to send the initial validation results to. Alternatively, you can simply remove the <b>demo</b> collection from being listed in the oai.cfg file's <Format>oaicollection</Format> property, which will cease to make the <b>demo</b> collection available over OAI.</Text>
     3268<Text id="gs-oai-6">Note also that, if you wish to specify contact emails at a collection level, you will need to edit your greenstone installation's <Format>collect/&lt;collection-name&gt;/etc/collect.cfg</Format> file for those collections and set the <Format>creator</Format> and <Format>maintainer</Format> fields to the desired email address.</Text>
     3269</NumberedItem>
     3270<NumberedItem>
     3271<Text id="gs-oai-7">If your collection contains document items for which you have not assigned any (Dublin Core, <b>dc</b>) metadata, the OAI validation can fail because it is dependent on having Metadata Formats listed even on a per record (per document) basis. Therefore, if your document has no <b>dc</b> metadata assigned, Greenstone won't know what OAI-supported metadata format is used by that document in order to list it.</Text>
     3272<Text id="gs-oai-8">In practice, this means that you either have to assign one or more <Format>dc.*</Format> metadata to each document in your OAI collection, or you will have to set up an <Format>oaimapping</Format> in the oai.cfg file to map existing metadata of whichever format to <Format>dc.*</Format> metadata.</Text>
     3273<Text id="gs-oai-9">For instance, if you created an image collection without assigning any metadata and are happy to use the Title or Source metadata that Greenstone extracted for each image (<AutoText key="metadata::ex.Title"/>, <AutoText key="metadata::ex.SourceFile"/>) as the image document's "title", you could map either of these metadata to <AutoText key="metadata::dc.Title"/> in the file oai.cfg. To do so, you'd open up oai.cfg in an editor, go down to the section specifying the oaimapping properties and add a new line:</Text>
     3274<Format>oaimapping Title oai_dc.title</Format>
     3275<Text id="gs-oai-10">(Or: <Format>oaimapping SourceFile oai_dc.title</Format>).</Text>
     3276<Text id="gs-oai-11">This step will not be not necessary for the <b>backdrop</b> collection <i>if</i> you had assigned any <Format>dc.*</Format> metadata for each image in the collection.</Text>
     3277<Comment>Note: If the <b>demo</b> collection that comes with a Greenstone installation is not built, it will either need to be built before submitting your OAI server for inspection by the Open Archives validator, or you will need to adjust the oai.cfg file once more by removing the mention of <Format>demo</Format> from the <Format>oaicollection</Format> property. This is because the <b>demo</b> collection is mentioned as being set up for OAI in the oai.cfg file. However, if this collection is unbuilt, it will not be accessible to the OAI validator and so your oaiserver may fail tests due to this oversight.</Comment>
     3278</NumberedItem>
     3279<NumberedItem>
     3280<Text id="gs-oai-12">If you are working with legacy collections (built before Greenstone version 2.85) you may have to rebuild them if you plan to make them available over OAI and be compliant with the Open Archives validator. Rebuilding old collections will recalculate the <AutoText text="earliest datestamp"/> value for the repository. This calculation is different from Greenstone 2.85 onwards.</Text>
     3281</NumberedItem>
     3282<NumberedItem>
     3283<Text id="gs-oai-13">Next you will need to set up your Greenstone server to be accessible from outside, so that external OAI clients can access it.</Text>
     3284<Text id="gs-oai-14">Go to the <Path>File &rarr; Settings</Path> menu of your Greenstone server interface dialog and check the <AutoText text="Allow External Connections"/> option and also check the <AutoText text="Get local IP and resolve to a name"/> option (or the <AutoText text="Get local IP"/> option) as its address resolution method.</Text>
     3285</NumberedItem>
     3286<NumberedItem>
     3287<Text id="gs-oai-15">Press the button in the Greenstone Server Interface dialog that says <AutoText text="Enter Library"/> (or it may say <AutoText text="Restart Library"/>). Your Digital Library home page will open up in a browser tab. Adjust this URL to have a suffix of <Format>oaiserver.cgi</Format> in place of the terminating <Format>library.cgi</Format>, then copy the resulting URL and visit <Link>http://www.openarchives.org/Register/ValidateSite</Link>.</Text>
     3288</NumberedItem>
     3289<NumberedItem>
     3290<Text id="gs-oai-16">The Open Archives Validator page will request the URL to your Greenstone OAI server. Paste the URL you have in your copy buffer into the field provided for this, and press the <b>Validate baseURL</b> button to start running the tests. You will be told to check your email to continue the remaining tests and get the validation report.</Text>
     3291<Text id="gs-oai-17">If the validator does not recognise the URL, make sure you have given the full domain of your host machine rather than just the host name. Alternatively, visit the <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/> page again and check that works. If it doesn't, maybe your machine is not set up to be accessible to outside networks. Check your proxy settings, make sure you've set up port forwarding and that your firewall is not interfering.</Text>
     3292</NumberedItem>
    33343293</Content>
    33353294</Tutorial>
    3336 <Tutorial id="connecting_GLI_to_OAI_server">
     3295<Tutorial id="OAI_downloading">
    33373296<Title>
    3338 <Text id="gli-oai-0">Connecting to an OAI server from GLI</Text>
     3297<Text id="0733">Downloading over OAI</Text>
    33393298</Title>
    3340 <Prerequisite id="setting_up_GS_OAI_server"/>
    3341 <Version initial="2.85" current="2.85"/>
     3299<Prerequisite id="GS_OAI_server"/>
     3300<Version initial="2.60" current="2.85"/>
    33423301<Content>
    33433302<Comment>
    3344 <Text id="gli-oai-1">GLI can serve as an OAI client application: it can connect to a remote OAI server and retrieve metadata, even download documents. In the previous tutorial, we set up the Greenstone's OAI server and set up the backdrop collection to be served over OAI. In this tutorial we will use GLI to connect to that OAI server and download OAI metadata for the <b>A Simple image collection</b> and even download its documents.</Text>
    3345 </Comment>
     3303<Text id="0734">GLI can serve as an OAI client application: it can connect to a remote OAI server and retrieve metadata, even download documents. The tutorial <TutorialRef id="OAI_collection"/> did not obtain the data from an external OAI-PMH server. This missing step is accomplished either by running a command-line program or by using the <AutoText key="glidict::GUI.Download"/> panel in the Librarian Interface. This exercise explains you would do this using both methods. In the previous exercise, we set up the Greenstone server to serve the <b>Simple image collection (backdrop)</b> over OAI. In this tutorial, we will use GLI to connect to that OAI server and download OAI metadata for the <b>Simple image collection</b> and even download its documents. The principle is the same if you wish to connect to other OAI servers.</Text>
     3304</Comment>
     3305<Heading>
     3306<Text id="gli-oai-1">Downloading using the Librarian Interface</Text>
     3307</Heading>
    33463308<NumberedItem>
    33473309<Text id="gli-oai-2">Launch GLI. This should launch the Greenstone server as well, if this is not already running, so that the OAI server is also up and running.</Text>
     
    33513313</NumberedItem>
    33523314<NumberedItem>
    3353 <Text id="gli-oai-4">On the right, set the Source URL field to contain the URL to your Greenstone OAI server. It would be of the form</Text>
     3315<Text id="gli-oai-4">On the right, set the <AutoText text="Source URL"/> field to contain the URL to your Greenstone OAI server. It would be of the form</Text>
    33543316<Format>http://&lt;hostname:portnumber&gt;/greenstone/cgi-bin/oaiserver.cgi</Format>
    3355 <Text id="gli-oai-4a">Make sure that you can generally access this URL from your browser.</Text>
    3356 </NumberedItem>
    3357 <NumberedItem>
    3358 <Text id="gli-oai-5">If at this stage you press the <AutoText key="glidict::Download.ServerInformation"/> button (in the central row of buttons), a dialog will pop up with basic details about the OAI server. At the end, it will diplay the names of the sets available at the OAI Server. In our example, <AutoText text="backdrop" type="italics"/> would be listed as one of the setNames.</Text>
    3359 </NumberedItem>
    3360 <NumberedItem>
    3361 <Text id="gli-oai-6">Tick the <AutoText key="perlmodules::OAIDownload.metadata_prefix_disp"/> checkbox as well as the <AutoText key="perlmodules::OAIDownload.set_disp"/> checkbox. For the latter, type backdrop for the set name. Then tick <AutoText key="perlmodules::OAIDownload.get_doc_disp"/>, <AutoText key="perlmodules::OAIDownload.get_doc_exts_disp"/> and add jpg to the list of comma separated values for it so that it becomes</Text>
     3317<Text id="gli-oai-5">Make sure that you can generally access this URL from your browser.</Text>
     3318</NumberedItem>
     3319<NumberedItem>
     3320<Text id="gli-oai-6">If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Click the <AutoText key="glidict::Mirroring.Preferences" type="button"/> button. Switch on the <AutoText key="glidict::Preferences.Connection.Use_Proxy"/> checkbox. Enter the proxy server address and port number in the <AutoText key="glidict::Preferences.Connection.Proxy_Host"/> and <AutoText key="glidict::Preferences.Connection.Proxy_Port"/> boxes. Click <AutoText key="glidict::General.OK" type="button"/> to get back to the <AutoText key="glidict::DOWNLOAD.MODE.OAIDownload"/> section of the <AutoText key="glidict::GUI.Download"/> panel. </Text>
     3321</NumberedItem>
     3322<NumberedItem>
     3323<Text id="gli-oai-7">If at this stage you were to press the <AutoText key="glidict::Download.ServerInformation"/> button (in the central row of buttons), a dialog will pop up with basic details about the OAI server. At the end, it will diplay the names of the sets available via that OAI Server. In our example, <AutoText text="backdrop" type="italics"/> (the Simple Image collection) would be listed as one of the setNames.</Text>
     3324</NumberedItem>
     3325<NumberedItem>
     3326<Text id="gli-oai-8">Tick the <AutoText key="perlmodules::OAIDownload.metadata_prefix_disp"/> checkbox as well as the <AutoText key="perlmodules::OAIDownload.set_disp"/> checkbox. For the latter, type <AutoText text="backdrop" type="italics"/> for the <AutoText text="set"/> name. Then tick <AutoText key="perlmodules::OAIDownload.get_doc_disp"/>, <AutoText key="perlmodules::OAIDownload.get_doc_exts_disp"/> and add <AutoText text="jpg" type="italics"/> to the list of comma separated values for it so that it becomes</Text>
    33623327<Format>jpg,doc,pdf,ppt</Format>
    3363 <Text id="gli-oai-7">Next, tick <AutoText key="perlmodules::OAIDownload.max_records_disp"/> and set it to 10. There will be 9 images in the collection, so we don't really need to set the Max records value, but this is a helpful feature that you can use when downloading from an OAI server.</Text>
    3364 </NumberedItem>
    3365 <NumberedItem>
    3366 <Text id="gli-oai-8">Finally, press the <AutoText key="glidict::Mirroring.Download"/> button that's located beside the <AutoText key="glidict::Download.ServerInformation"/> button. GLI will start downloading oai metadata. Moreover, because we have ticked the <AutoText key="perlmodules::OAIDownload.get_doc_disp"/> checkbox, it will also be retrieving actual documents, but not more than 10, because of the limit of 10 that we've placed on the number of records to download.</Text>
    3367 </NumberedItem>
    3368 <NumberedItem>
    3369 <Text id="gli-oai-9">After a while, it will have finished downloading. Change to the <AutoText key="glidict::GUI.Gather"/> panel, and on the left-hand side, open up the <AutoText key="glidict::Tree.DownloadedFiles"/> folder. This is where Greenstone stores files you downloaded using the <AutoText key="glidict::GUI.Download"/> panel. In this case, it will contain a folder wherein the oai metadata files and images that you've just downloaded from your own Greenstone OAI server is stored.</Text>
    3370 </NumberedItem>
    3371 <NumberedItem>
    3372 <Text id="gli-oai-10">You can now drag and drop these downloaded files into a new Greenstone collection. Because there are <Format>*.oai</Format> files among them, GLI will offer to add the <AutoText text="OAIPlugin"/>. Accept, and go to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel. There, you will find <AutoText text="OAIPlugin"/> at the end of your plugin list. Select it and press the <AutoText key="glidict::CDM.Move.Move_Up"/> button so that it is listed above the <AutoText text="EmbeddedMetadataPlugin"/>. Because <AutoText text="OAIPlugin"/> appears earlier in the plugin pipeline, it processes the metadata in the oai files, rather than letting the more general <AutoText text="EmbeddedMetadataPlugin"/> process their contents.</Text>
    3373 </NumberedItem>
    3374 <NumberedItem>
    3375 <Text id="gli-oai-11">Move onto the <AutoText key="glidict::GUI.Create"/> panel and press the build button. During this stage, the <AutoText text="OAIPlugin"/> will extract the metadata in the oai files and attach them to the associated jpg file. You can see this once the collection has been built, by switching to the <AutoText key="glidict::GUI.Enrich"/> panel and clicking on an oai file, as no metadata is set for such files. If you then click on a jpg file and scroll down, there will be metadata names that start with <Format>ex.dc</Format>. This refers to Greenstone-extracted Dublin Core metadata.  <AutoText key="metadata::ex.dc.Description"/> and  <AutoText key="metadata::ex.dc.Title"/> will be set to the values you had assigned the images in the tutorial <b>A Simple Image Collection</b>. Greenstone will have added additional <Format>ex.dc</Format> metadata in the form of <AutoText key="metadata::ex.dc.Identifier"/>, which is the source URL for this image.</Text>
    3376 </NumberedItem>
    3377 <NumberedItem>
    3378 <Text id="gli-oai-12">If you wish, you can now set up this collection in a manner similar to how the <b>backdrop</b> collection was set up in <b>A Simple Image Collection</b>. Don't forget to copy any specific format statements, then rebuild it and <b>Preview</b> the collection.</Text>
    3379 </NumberedItem>
    3380 </Content>
    3381 </Tutorial>
    3382 <Tutorial id="connecting_to_OAI_server">
    3383 <Title>
    3384 <Text id="gs-oai-0">Connecting to the Greenstone OAI server from the outside world</Text>
    3385 </Title>
    3386 <Prerequisite id="setting_up_GS_OAI_server"/>
    3387 <Version initial="2.85" current="2.85"/>
    3388 <Content>
    3389 <Comment>
    3390 <Text id="gs-oai-1">For this exercise, you need to be on a networked computer and your host computer needs to be visible to the outside world.
    3391 (That is, when you provide the full name of your computer, someone else in the world should be able to find that computer by typing its URL into their browser's address field.)</Text>
    3392 </Comment>
    3393 <Comment>
    3394 <Text id="gs-oai-2">For now though, we proceed to using an external OAI client to access our up-and-running Greenstone OAI server. It's not just any OAI client either, but an OAI Server validator.</Text>
    3395 </Comment>
    3396 <NumberedItem>
    3397 <Text id="gs-oai-3">You will want to be running the included Apache web server. So if you're on Windows and using the Local Library Server, quit it and rename the <AutoText text="server.exe" type="italics"/> application in your Greenstone installation folder to server.not. Then use the <AutoText text="Start" type="italics"/> menu shortcut to the Greenstone Server once more, to now launch the Apache web server.</Text>
    3398 </NumberedItem>
    3399 <NumberedItem>
    3400 <Text id="gs-oai-4">For this exercise, we will visit the <b>Open Archives Validator</b>, for which your OAIserver needs to provide a valid email address. In a text editor, open up your greenstone installation's etc/oai.cfg file and set the value of the <AutoText text="maintainer" type="italics"/> field to your email address.</Text>
    3401 <Text id="gs-oai-5">Note that by default, your Greenstone installation will make the <b>demo</b> collection available over OAI. This collection has been set up with a dummy (and invalid) email address for the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> fields in the collection's collect.cfg file. You will need to open up collect/demo/etc/collect.cfg and clear the email values for the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> properties (or else set these to a valid email again). Otherwise the OpenArchives validator will resort to using the <b>demo</b> collection's default dummy email to send the initial validation results to. Alternatively, you can simply remove the <b>demo</b> collection from being listed in the oai.cfg file's oaicollection property, which will cease to make the <b>demo</b> collection available over OAI.</Text>
    3402 <Text id="gs-oai-6">Note also that, if you wish to specify contact emails at a collection level, you will need to edit your greenstone installation's <Format>collect/&lt;collection-name&gt;/etc/collect.cfg</Format> file for those collections and set the <AutoText text="creator" type="italics"/> and <AutoText text="maintainer" type="italics"/> fields to the desired email address.</Text>
    3403 </NumberedItem>
    3404 <NumberedItem>
    3405 <Text id="gs-oai-7">If your collection contains document items for which you have not assigned any (Dublin Core, <b>dc</b>) metadata, the OAI validation can fail because it is dependent on having Metadata Formats listed even on a per record (per document) basis. And if your document has no <b>dc</b> metadata assigned, Greenstone won't know what OAI-supported metadata format is used by that document in order to list it.</Text>
    3406 <Text id="gs-oai-8">In practice, this means that you either have to assign one or more <Format>dc.*</Format> metadata to each document in your OAI collection, or you will have to set up an oaimapping in the oai.cfg file to map existing metadata of whichever format to <Format>dc.*</Format> metadata.</Text>
    3407 <Text id="gs-oai-9">For instance, if you created an image collection without assigning any metadata and are happy to use the Title or Source metadata that Greenstone extracted for each image (<AutoText key="metadata::ex.Title"/>, <AutoText key="metadata::ex.SourceFile"/>) as the image document's "title", you could map either of these metadata to <AutoText key="metadata::dc.Title"/> in the file oai.cfg. To do so, you'd open up oai.cfg in an editor, go down to the section specifying the oaimapping properties and add a new line:</Text>
    3408 <Format>oaimapping Title oai_dc.title</Format>
    3409 <Text id="gs-oai-9a">(Or: <Format>oaimapping SourceFile oai_dc.title</Format>).</Text>
    3410 <Text id="gs-oai-10">This step is not necessary for the <b>backdrop</b> collection, since each image in the collection was assigned some <Format>dc.*</Format> metadata.</Text>
    3411 </NumberedItem>
    3412 <NumberedItem>
    3413 <Text id="gs-oai-11">If you are working with legacy collections (built before Greenstone version 2.85) you may have to rebuild them if you plan to make them available over OAI and compliant with the Open Archives validator. Rebuilding old collections will recalculate the <AutoText text="earliest datestamp"/> for the repository. This calculation is different from Greenstone 2.85 onwards.</Text>
    3414 </NumberedItem>
    3415 <NumberedItem>
    3416 <Text id="gs-oai-12">Next you will need to set up your Greenstone server to be accessible from outside, so that external OAI clients can access it.</Text>
    3417 <Text id="gs-oai-13">Go to the <Path>File &rarr; Settings</Path> menu of your Greenstone server interface dialog and check the <AutoText text="Allow External Connections"/> option and also check the <AutoText text="Get local IP and resolve to a name"/> option (or the <AutoText text="Get local IP"/> option) as its address resolution method.</Text>
    3418 </NumberedItem>
    3419 <NumberedItem>
    3420 <Text id="gs-oai-14">Press the button in the Greenstone Server Interface dialog that says <AutoText text="Enter Library"/> (or it may say <AutoText text="Restart Library"/>). Your Digital Library home page will open up in a browser tab. Adjust this URL to have a suffix of <Format>oaiserver.cgi</Format> in place of the terminating <Format>library.cgi</Format>, then copy the resulting URL and visit <Link>http://www.openarchives.org/Register/ValidateSite</Link>.</Text>
    3421 </NumberedItem>
    3422 <NumberedItem>
    3423 <Text id="gs-oai-15">The Open Archives Validator page will request the URL to your Greenstone OAI server. Paste the URL you have in your copy buffer into the field provided for this, and press the <b>Validate baseURL</b> button to start running the tests. You will be told to check your email to continue the remaining tests and get the validation report.</Text>
    3424 <Text id="gs-oai-16">If the validator does not recognise the URL, make sure you have given the full domain of your host machine rather than just the host name. Alternatively, visit the <AutoText text="oaiserver.cgi?verb=Identify" type="italics"/> page again and check that works. If it doesn't, maybe your machine is not set up to be accessible to outside networks. Check you proxy settings, make sure you've set up port forwarding and that your firewall is not interfering.</Text>
     3328<Text id="gli-oai-9">Next, tick <AutoText key="perlmodules::OAIDownload.max_records_disp"/> and set it to 10. There will be 9 images in the collection, so we don't really need to set the Max records value, but this is a helpful feature that you can use when downloading from an OAI server.</Text>
     3329</NumberedItem>
     3330<NumberedItem>
     3331<Text id="gli-oai-10">Finally, click <AutoText key="glidict::Mirroring.Download" type="button"/>, located beside the <AutoText key="glidict::Download.ServerInformation"/> button. If you have set proxy information in <AutoText key="glidict::Menu.File_Options"/>, a popup will ask for your user name and password. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing. GLI will download oai metadata and, because we have ticked the <AutoText key="perlmodules::OAIDownload.get_doc_disp"/> checkbox, it will also be retrieving actual documents, but not more than 10, because of the limit of 10 that we've placed on the number of records to download.</Text>
     3332</NumberedItem>
     3333<NumberedItem>
     3334<Text id="gli-oai-11">After a while, it will have finished downloading. Change to the <AutoText key="glidict::GUI.Gather"/> panel, and on the left-hand side, open up the <AutoText key="glidict::Tree.DownloadedFiles"/> folder. This is where Greenstone stores files you downloaded using the <AutoText key="glidict::GUI.Download"/> panel. In this case, it will contain a folder wherein the oai metadata files and images that you've just downloaded from your own Greenstone OAI server is stored. These files can then be added to a collection, as will be covered later in this tutorial.</Text>
     3335</NumberedItem>
     3336<Heading>
     3337<Text id="oai-9">Downloading using the command line</Text>
     3338</Heading>
     3339<Comment>
     3340<Text id="oai-10">For command line downloading to work, your computer must have a direct connection to the Internet&mdash;being behind a firewall may interfere with the ability to download the information. You will need to use the Librarian Interface for downloading if you are behind a firewall.</Text>
     3341</Comment>
     3342<NumberedItem>
     3343<Text id="oai-11">Close the Librarian Interface.</Text>
     3344</NumberedItem>
     3345<NumberedItem>
     3346<Text id="0739">Open a DOS window to access the command-line prompt. This facility should be located somewhere within your <Menu>Start &rarr; Programs</Menu> menu, but details vary between different Windows systems. If you cannot locate it, select <Menu>Start &rarr; Run</Menu> and enter <i>cmd</i> in the popup window that appears.</Text>
     3347</NumberedItem>
     3348<NumberedItem>
     3349<Text id="0740">Before you start, you must set up your Greenstone environment in the terminal. In the DOS window, move to the home directory where you installed Greenstone. This is accomplished by something like:</Text>
     3350<Command>cd C:\Program Files\Greenstone</Command>
     3351</NumberedItem>
     3352<NumberedItem>
     3353<Text id="0741">Type:</Text>
     3354<Command>setup.bat</Command>
     3355<Text id="0742">to set up the ability to run Greenstone command-line programs. On Linux/Mac, you would run <Command>source setup.bash</Command>.</Text>
     3356</NumberedItem>
     3357<Comment>
     3358<Text id="0743">GLI uses a perl script, <Format>downloadfrom.pl</Format>, to do the downloading. This can be run on the command line, outside of GLI.</Text>
     3359</Comment>
     3360<Comment>
     3361<Text id="0744">The <AutoText text="downloadfrom.pl"/> script can download using several different protocols. These are specified using the <AutoText text="-mode"/> option. To see the available options for download mode, run <Format>perl -S downloadfrom.pl -h</Format>. This shows that the current options are: <AutoText text="Web, MediaWiki, OAI, Z3950, SRW"/>. For OAI downloading, use <b>-mode OAI</b></Text>
     3362</Comment>
     3363<Comment>
     3364<Text id="0745">To see the options for downloading using the OAI mode, you can run <Format>perl -S downloadinfo.pl OAIDownload</Format>. The options are the same as you can see in the GLI OAI download panel.</Text>
     3365</Comment>
     3366<NumberedItem>
     3367<Text id="0746">We'll use the <Format>set</Format> and <Format>max_records</Format> OAI Download options to download a maximum of 5 OAI records from the <b>backdrop</b> collection at your Greenstone's OAI server, which was made available over OAI as a <AutoText text="set" type="italics"/> in the previous tutorial:</Text>
     3368<Format>perl -S downloadfrom.pl -mode OAI -url http://&lt;hostname:portnumber&gt;/greenstone/cgi-bin/oaiserver.cgi -set backdrop -max_records 5</Format>
     3369<Text id="0747">The records (and optionally documents) will be downloaded into the folder where the downloadfrom.pl script is run from. To change this, use the <Format>-cache_dir <i>full-path-to-folder</i></Format> option and set its value to the full path of the destination folder you choose.</Text>
     3370</NumberedItem>
     3371<Comment><Text id="0748">You can import the downloaded documents into a new Greenstone collection and build them in the usual manner.</Text></Comment>
     3372<Heading>
     3373<Text id="gli-oai-12">Building the downloaded documents in GLI</Text>
     3374</Heading>
     3375<NumberedItem>
     3376<Text id="gli-oai-13">If you used GLI to download documents over OAI, as seen in the first part of the tutorial, you can find the downloaded items in the <AutoText key="glidict::Tree.DownloadedFiles"/> folder in the filesystem view on the left side of the <AutoText key="glidict::GUI.Gather"/> panel.</Text>
     3377<Text id="gli-oai-14">If you used the command line to download documents, the downloaded files will be stored wherever you ran the <AutoText text="downloadfrom.pl" type="italics"/> script from.</Text>
     3378</NumberedItem>
     3379<NumberedItem>
     3380<Text id="gli-oai-15">Open GLI, locate files you downloaded over OAI and drag and drop these into a new Greenstone collection called <AutoText text="OAI Collection" />. Because there are <Format>*.oai</Format> files among those downloaded, GLI will offer to add the <AutoText text="OAIPlugin"/>.</Text>
     3381</NumberedItem>
     3382<NumberedItem>
     3383<Text id="gli-oai-16">Switch to the <AutoText key="glidict::GUI.Create"/> panel and press the <b>build</b> button. During this stage, the <AutoText text="OAIPlugin"/> will extract the metadata in the <AutoText text="oai" type="italics"/> files and attach them to the associated <AutoText text="jpg" type="italics"/> files of the downloaded <b>backdrop</b> collection. You can see this once the collection has been built by switching to the <AutoText key="glidict::GUI.Enrich"/> panel and clicking on an oai file, as no metadata is set for such files. However, if you then click on a jpg file and scroll down, there will be metadata names that start with <Format>ex.dc</Format>. This refers to Greenstone-extracted Dublin Core metadata. <AutoText key="metadata::ex.dc.Description"/> and <AutoText key="metadata::ex.dc.Title"/> will be set to the values you had assigned the images in the tutorial <b>A Simple Image Collection</b>. Greenstone will have added additional <Format>ex.dc</Format> metadata in the form of <AutoText key="metadata::ex.dc.Identifier"/>, which is the source URL for this image.</Text>
     3384</NumberedItem>
     3385<NumberedItem>
     3386<Text id="gli-oai-17">If you wish, you can now set up this collection in a manner similar to how the <b>backdrop</b> collection was set up in <TutorialRef id="simple_image_collection"/>. Don't forget to copy in any specific format statements, then <b>rebuild</b> and <b>preview</b> the collection.</Text>
    34253387</NumberedItem>
    34263388</Content>
Note: See TracChangeset for help on using the changeset viewer.