Changeset 11848


Ignore:
Timestamp:
2006-05-24T13:27:47+12:00 (18 years ago)
Author:
kjdon
Message:

lots of changes. Added a few new tutorials, and other bits from the 2005 fiji workshop

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl-documentation/tutorials/xml-source/tutorial_en.xml

    r11826 r11848  
    77<TutorialList>
    88<Title>
    9 <Text id="0001">Greenstone tutorial exercises (March 2006)</Text>
     9<Text id="0001">Greenstone tutorial exercises (June 2006)</Text>
    1010</Title>
    1111<SupplementaryText>
     
    126126</NumberedItem>
    127127<NumberedItem>
    128 <Text id="0121">Click the <b>page icon</b> for the <b>first matching document</b> in the result set (</Text>
    129 <Text id="0122"><i>Five Year Implementation Review of the Vienna Declaration and Programme of Action</i>) to view the document. Because the search was at the chapter level, you are taken directly to the matching chapter within the document.</Text>
     128<Text id="0121">Click the <b>page icon</b> for the <b>first matching document</b> in the result set (<i>Five Year Implementation Review of the Vienna Declaration and Programme of Action</i>) to view the document. Because the search was at the chapter level, you are taken directly to the matching chapter within the document.</Text>
    130129</NumberedItem>
    131130<NumberedItem>
     
    590589<Text id="0254">You will need some HTML files, such as those in the <Path>hobbits</Path> folder in <Path>sample_files</Path>.</Text>
    591590</Comment>
     591<Heading>
     592<Text id="0254a">Running the Greenstone Librarian Interface</Text>
     593</Heading>
    592594<NumberedItem>
    593595<Text id="0255">Start the Greenstone Librarian Interface:</Text>
     
    599601</Comment>
    600602</NumberedItem>
     603<Heading>
     604<Text id="0256a">Starting a new collection</Text>
     605</Heading>
    601606<NumberedItem>
    602607<Text id="0257">Start a new collection within the Librarian Interface:</Text>
     
    621626<Text id="0264">Next you must gather together the files that will constitute the collection. A suitable set has been prepared ahead of time in <Path>sample_files</Path> in the folder <Path>hobbits</Path>. Using the left-hand side of the Librarian Interface's <AutoText key="glidict::GUI.Gather"/> panel, interactively navigate to the <Path>sample_files</Path> folder.</Text>
    622627</NumberedItem>
     628<Heading>
     629<Text id="0264a">Adding documents to the collection</Text>
     630</Heading>
    623631<NumberedItem>
    624632<Text id="0265">Now drag the <Path>hobbits</Path> folder from the left-hand side and drop it on the right. The progress bar at the bottom shows some activity. Gradually, duplicates of all the files will appear in the collection panel.</Text>
     
    630638<Text id="0267">Since this is our first collection, we won't complicate matters by manually assigning metadata or altering the collection's design. Instead we rely on default behaviour. So pass directly to the <AutoText key="glidict::GUI.Create"/> panel by clicking its tab.</Text>
    631639</NumberedItem>
     640<Heading>
     641<Text id="0267a">Building the collection</Text>
     642</Heading>
    632643<NumberedItem>
    633644<Text id="0268">To start building the collection, click the <AutoText key="glidict::CreatePane.Build_Collection" type="button"/> button.</Text>
     
    639650<Text id="0270">Click the <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/> button to look at the end result. This loads the relevant page into your web browser (starting it up if necessary). Look around the collection and learn about Hobbits!</Text>
    640651</NumberedItem>
     652<Heading>
     653<Text id="0270a">Viewing the extracted metadata</Text>
     654</Heading>
    641655<NumberedItem>
    642656<Text id="0271">Back in the Librarian Interface, click the <AutoText key="glidict::GUI.Enrich"/> tab to view the metadata associated with the documents in the collection.</Text>
     
    655669</Heading>
    656670<NumberedItem>
    657 <Text id="0278">To set up a shortcut to the source files, in the <AutoText key="glidict::GUI.Gather"/> panel navigate to the folder in your local file space that contains the files you want to use&mdash;in our case, the <Path>sample_files</Path> folder. Select this folder and then right-click it. Follow the instructions to set up a shortcut. Close all the folders in the file tree and you will see the shortcut to your source files in the left-hand pane of the <AutoText key="glidict::GUI.Gather"/> panel.</Text>
     671<Text id="0278">To set up a shortcut to the source files, in the <AutoText key="glidict::GUI.Gather"/> panel navigate to the folder in your local file space that contains the files you want to use&mdash;in our case, the <Path>sample_files</Path> folder. Select this folder and then right-click it, and choose <AutoText key="glidict::MappingPrompt.Map"/> from the menu. In the <AutoText key="glidict::MappingPrompt.Name"/> field, enter the name you want the shortcut to have, or accept the default <AutoText text="sample_files" type="italics"/>. Click <AutoText key="glidict::General.OK" type="button"/>. Close all the folders in the file tree in the left-hand pane, and you will see the shortcut to your source files.</Text>
    658672</NumberedItem>
    659673</Content>
    660674</Tutorial>
    661 <Tutorial id="word_PDF_collection">
     675<Tutorial id="large_html_collection">
     676<Title>
     677<Text id="0387">A large collection of HTML files&mdash;Tudor</Text>
     678</Title>
     679<SampleFiles folder="tudor"/>
     680<Version initial="2.60" current="2.70"/>
     681<Content>
     682<NumberedItem>
     683<Text id="0388">Invoke the Greenstone Librarian Interface (from the Windows <i>Start</i> menu) and start a new collection called <b>tudor</b> (use the <AutoText key="glidict::Menu.File"/> menu). Fill out the pop-up dialog with appropriate values and leave <b>Dublin Core</b>, which is selected by default, as the metadata set.</Text>
     684</NumberedItem>
     685<NumberedItem>
     686<Text id="0389">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>tudor</Path> folder in <Path>sample_files</Path>.</Text>
     687</NumberedItem>
     688<NumberedItem>
     689<Text id="0390">Drag <Path>englishhistory.net</Path> from the left-hand side to the right to include it in your <b>tudor</b> collection.</Text>
     690</NumberedItem>
     691<NumberedItem>
     692<Text id="0391">Switch to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>.</Text>
     693</NumberedItem>
     694<NumberedItem>
     695<Text id="0392">When building has finished, <b>preview</b> the collection.</Text>
     696</NumberedItem>
     697<Heading>
     698<Text id="0392a">Extracting more metadata from the HTML</Text>
     699</Heading>
     700<NumberedItem>
     701<Text id="0393">The browsing facilities in this collection (<AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/>) are based entirely on extracted metadata. Return to the <AutoText key="glidict::GUI.Enrich"/> panel in the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text>
     702</NumberedItem>
     703<NumberedItem>
     704<Text id="0393a">Many HTML documents contain metadata in <Format>&lt;meta&gt;</Format> tags in the <Format>&lt;head&gt;</Format> of the page. Open up the <Path>englishhistory.net &rarr; tudor &rarr; monarchs &rarr; boleyn.html</Path> file by navigating to it in the tree on the left hand side, and double clicking it. This will open it in a web browser. View the HTML source of the page (<Menu>View &rarr; Source</Menu> in Internet Explorer, <Menu>View &rarr; Page Source</Menu> in Mozilla). You will notice that this page has <AutoText text="page_topic,content" type="italics"/> and <AutoText text="author" type="italics"/> metadata.</Text>
     705 </NumberedItem>
     706<NumberedItem>
     707<Text id="0393b">By default, <AutoText text="HTMLPlug"/> only looks for Title metadata. Configure the plugin so that it looks for the other metadata too. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Switch on the <AutoText text="metadata_fields"/> option, and set the value to <AutoText text="Title,Author,Page_topic,Content" type="quoted"/>. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     708 </NumberedItem>
     709<NumberedItem>
     710<Text id="0393c">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>rebuild</b> the collection. Go back to the <AutoText key="glidict::GUI.Enrich"/> panel and look at the extracted metadata for some of the HTML files in <Path>englishhistory.net &rarr; tudor &rarr; monarchs</Path>. The new metadata should new be visible.</Text>
     711</NumberedItem>
     712<Heading>
     713<Text id="0393d">Blocking the stray images</Text>
     714</Heading>
     715<Comment>
     716<Text id="0394">You've probably noticed that the collection contains a few stray image files, as well as the HTML documents. This is a mistake. The issue is that many of the HTML documents include images, and although Greenstone attempts to determine which images belong to HTML pages and only considers other images for inclusion in the collection, in this case it hasn't been completely successful. (This is because the web site from which these files were downloaded occasionally departs from the usual convention of hierarchical structuring.)</Text>
     717</Comment>
     718<NumberedItem>
     719<Text id="0395">Switch back to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Beside <AutoText text="plugin HTMLPlug"/> you will see <AutoText text="-smart_block"/>. This is the option that attempts to identify images in the HTML pages and block them from inclusion&mdash;in this case, it's not smart enough! <b>Configure</b> <AutoText text="plugin HTMLPlug"/> again, scroll down the page to locate the <AutoText text="smart_block"/> option, and switch it off.</Text>
     720</NumberedItem>
     721<NumberedItem>
     722<Text id="0396"><b>Rebuild</b> and <b>preview</b> the collection. The collection is exactly as before except that these stray images are suppressed. What is happening is that plug-ins operate as a pipeline: files are passed to each one in turn until one is found that can process it. By default (i.e. without <AutoText text="smart_block"/>) the HTML plug-in blocks <i>all</i> images, which is appropriate for this collection.</Text>
     723</NumberedItem>
     724<Heading>
     725<Text id="0397">Looking at different views of the files in the <AutoText key="glidict::GUI.Gather"/> and <AutoText key="glidict::GUI.Enrich"/> panels</Text>
     726</Heading>
     727<NumberedItem>
     728<Text id="0398">Switch to the <AutoText key="glidict::GUI.Gather"/> panel and in the right-hand side open <Path>englishhistory.net &rarr; tudor</Path>.</Text>
     729</NumberedItem>
     730<NumberedItem>
     731<Text id="0400">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu for the right-hand side from <AutoText key="glidict::Filter.All_Files"/> to <AutoText key="glidict::Filter.0"/>. Notice the files displayed above are filtered accordingly, to show only files of this type.</Text>
     732</NumberedItem>
     733<NumberedItem>
     734<Text id="0401">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu to <AutoText key="glidict::Filter.3"/>. Again, the files shown above alter.</Text>
     735</NumberedItem>
     736<NumberedItem>
     737<Text id="0402">Now return the <AutoText key="glidict::Filter.Filter_Tree"/> setting back to <AutoText key="glidict::Filter.All_Files"/>, otherwise you may get confused later. Remember, if the <AutoText key="glidict::GUI.Gather"/> or <AutoText key="glidict::GUI.Enrich"/> panels do not seem to be showing all your files, this could be the problem.</Text>
     738</NumberedItem>
     739</Content>
     740</Tutorial>
     741<Tutorial id="word_pdf_collection">
    662742<Title>
    663743<Text id="0279">A collection of Word and PDF files</Text>
     
    673753</NumberedItem>
    674754<NumberedItem>
    675 <Text id="0282">Copy the 12 files from <Path>sample_files &rarr; Word_and_PDF &rarr; Documents</Path> into the collection. You can select multiple files by clicking on the first one and shift-clicking on the last one, and drag them all across together. (This is the normal technique of multiple selection.)</Text>
     755<Text id="0282">Copy all the files from <Path>sample_files &rarr; Word_and_PDF &rarr; Documents</Path> into the collection. You can select multiple files by clicking on the first one and shift-clicking on the last one, and drag them all across together. (This is the normal technique of multiple selection.)</Text>
    676756</NumberedItem>
    677757<NumberedItem>
    678758<Text id="0287">Switch to the <AutoText key="glidict::GUI.Create"/> panel, and <b>build</b> and <b>preview</b> the collection.</Text>
    679759</NumberedItem>
     760<Comment>
     761<Text id="0287a">Some of the documents don't look very nice in Greenstone. One of them, <Path>pdf05-notext.pdf</Path>, could not be processed using the default configuration. Another, <Path>pdf06-weirdchars.pdf</Path>, was processed but looks very strange. Exercise <TutorialRef>XXX</TutorialRef> looks at how to configure PDFPlug to handle these files better.</Text>
     762</Comment>
     763<Heading>
     764<Text id="0287b">Viewing the extracted metadata</Text>
     765</Heading>
    680766<NumberedItem>
    681767<Text id="0288">Again, this collection contains no manually assigned metadata. All the information that appears&mdash;title and filename&mdash;is extracted automatically from the documents themselves. Because of this the quality of some of the title metadata is suspect.</Text>
    682768</NumberedItem>
    683769<NumberedItem>
    684 <Text id="0289">Back in the Librarian Interface, click the <AutoText key="glidict::GUI.Enrich"/> tab to view the automatically extracted metadata. You will need to scroll down to see the extracted metadata, which begins with <AutoText text="ex." type="quoted"/>. The PostScript documents (<Path>cluster.ps</Path> and <Path>langmodl.ps</Path> do not have extracted titles: what appears in the <i>titles a-z</i> list is just the first few characters of the document).</Text>
     770<Text id="0289a">Back in the Librarian Interface, click the <AutoText key="glidict::GUI.Enrich"/> tab to view the automatically extracted metadata. You will need to scroll down to see the extracted metadata, which begins with <AutoText text="ex." type="quoted"/>. </Text>
     771</NumberedItem>
     772<NumberedItem>
     773<Text id="0289b">Check whether the Title metadata is correct for each document by opening it. You can open a document from the Librarian Interface by double clicking on it.</Text>
     774</NumberedItem>
     775<NumberedItem>
     776<Text id="0289c">The extracted Title metadata for some documents is incorrect. For example, the Titles for <Path>pdf01.pdf</Path> and <Path>word03.doc</Path> (the same document in different formats) have missed out the second line. The Title for <Path>pdf03.pdf</Path> has the wornf text altogether. The PostScript documents (<Path>cluster.ps</Path> and <Path>langmodl.ps</Path> do not have extracted titles: what appears in the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list is just the first few characters of the document).</Text>
    685777</NumberedItem>
    686778<Heading>
     
    688780</Heading>
    689781<NumberedItem>
    690 <Text id="0291">In the <AutoText key="glidict::GUI.Enrich"/> panel, manually add Dublin Core <AutoText key="metadata::dc.Title"/> metadata to one of these documents. Select <Path>word03.doc</Path> and double-click to open it. Copy the title of this document (<AutoText text="Greenstone: A comprehensive open-source digital library software system" type="quoted"/>) and return to the Librarian Interface. Scroll up or down in the metadata table until you can see <AutoText key="metadata::dc.Title"/>. Click in the value box, paste in the metadata and press <b>Enter</b>. </Text>
    691 </NumberedItem>
    692 <NumberedItem>
    693 <Text id="0292">Now add <AutoText key="metadata::dc.Creator"/> information for the same document. You can add more than one value for the same field: when you press <b>Enter</b> in a metadata value field, a new empty field of the same type will be generated.</Text>
     782<Text id="0291a">In the <AutoText key="glidict::GUI.Enrich"/> panel, manually add Dublin Core <AutoText key="metadata::dc.Title"/> metadata to those documents which have incorrent <AutoText key="metadata::ex.Title"/> metadata. Select <Path>word03.doc</Path> and double-click to open it. Copy the title of this document (<AutoText text="Greenstone: A comprehensive open-source digital library software system" type="quoted"/>) and return to the Librarian Interface. Scroll up or down in the metadata table until you can see <AutoText key="metadata::dc.Title"/>. Click in the value box, paste in the metadata and press <b>Enter</b>. </Text>
     783</NumberedItem>
     784<NumberedItem>
     785<Text id="0292">Now add <AutoText key="metadata::dc.Creator"/> information for the same document. You can add more than one value for the same field: when you press <b>Enter</b> in a metadata value field, a new empty field of the same type will be generated. Add each author separately as <AutoText key="metadata::dc.Creator"/> metadata.</Text>
    694786</NumberedItem>
    695787<NumberedItem>
     
    697789</NumberedItem>
    698790<NumberedItem>
    699 <Text id="0293">Next add title and creator metadata for a few of the other documents.</Text>
    700 </NumberedItem>
    701 <Comment>
    702 <Text id="0294">If you build and preview your collection at this point, you will find that nothing has changed. You need to alter the collection design to use the new Dublin Core metadata instead of the original extracted metadata.</Text>
     791<Text id="0293">Next add <AutoText key="metadata::dc.Title"/> and <AutoText key="metadata::dc.Creator"/> metadata for a few of the other documents.</Text>
     792</NumberedItem>
     793<NumberedItem>
     794<Text id="0291b">You will notice as you add more values, they appear in the <AutoText key="glidict::EnrichPane.ExistingValues" args="..."/> box below the metadata table. If you are adding the same metadata value to more than one document, you can selectit from this list. For example, <Path>pdf01.pdf</Path> and <Path>word03.doc</Path> share the same Title; and many documents have common authors.</Text>
     795</NumberedItem>
     796<Comment>
     797<Text id="0294">If you build and preview your collection at this point, you will see that the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> now shows your new Titles. However, the <AutoText key="metadata::dc.Creator"/> metadata is not displayed. You need to alter the collection design to use the new Dublin Core metadata.</Text>
    703798</Comment>
    704799<Heading>
     
    709804</NumberedItem>
    710805<NumberedItem>
    711 <Text id="0297">Click on the <AutoText key="glidict::General.Browse" type="button"/> button associated with <AutoText key="glidict::CDM.General.Icon_Collection"/>, and browse to the image <Path>sample_files &rarr; Word_and_PDF &rarr; wrdpdf.gif</Path> on your computer. When you select this image, Greenstone automatically generates an appropriate URL for the image. <b>Preview</b> the collection.</Text>
     806<Text id="0297">Click on the <AutoText key="glidict::General.Browse" type="button"/> button associated with <AutoText key="glidict::CDM.General.Icon_Collection"/>, and browse to the image <Path>sample_files &rarr; Word_and_PDF &rarr; wrdpdf.gif</Path> on your computer. When you select this image, Greenstone automatically generates an appropriate URL for the image. <b>Preview</b> the collection: you should see the new image at the top left of the page.</Text>
     807<Comment>
     808<Text id="0297a">Information on the <AutoText key="glidict::CDM.GUI.General"/> page does not require a rebuild of the collection to take effect. Just go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>.</Text>
     809</Comment>
    712810</NumberedItem>
    713811<NumberedItem>
     
    720818</Heading>
    721819<NumberedItem>
    722 <Text id="0304">Now look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlug"/>, <AutoText text="TEXTPlug"/>, <AutoText text="HTMLPlug"/>, <AutoText text="EMAILPlug"/>, <AutoText text="ImagePlug"/>, <AutoText text="ISISPlug"/> and <AutoText text="NULPlug"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GAPlug"/> is required for any type of source collection and should not be removed. </Text>
     820<Text id="0304">Back in the Librarian Interface, look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlug"/>, <AutoText text="TEXTPlug"/>, <AutoText text="HTMLPlug"/>, <AutoText text="EMAILPlug"/>, <AutoText text="ImagePlug"/>, <AutoText text="ISISPlug"/> and <AutoText text="NULPlug"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GAPlug"/> is required for any type of source collection and should not be removed. </Text>
    723821</NumberedItem>
    724822<Comment>
     
    729827</Heading>
    730828<NumberedItem>
    731 <Text id="0310">The next step in the <AutoText key="glidict::GUI.Design"/> panel is <AutoText key="glidict::CDM.GUI.Indexes"/>. These specify what parts of the collection are searchable (e.g. searching by title and author). Delete the <AutoText key="metadata::ex.Title"/> and <AutoText key="metadata::ex.Source"/> indexes, which are not particularly useful, by selecting them one at a time and clicking <AutoText key="glidict::CDM.IndexManager.Remove_Index" type="button"/>. Only the <i>text</i> index remains.</Text>
    732 </NumberedItem>
    733 <NumberedItem>
    734 <Text id="0311">Now add a Title index based on <AutoText key="metadata::dc.Title"/> by providing an <AutoText key="glidict::CDM.IndexManager.Index_Name"/> (e.g. "Document Title") and selecting <AutoText key="metadata::dc.Title"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> box. Then click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>.</Text>
    735 </NumberedItem>
    736 <NumberedItem>
    737 <Text id="0312">You can add indexes based on any metadata. Add an index called "Authors" based on <AutoText key="metadata::dc.Creator"/> metadata.</Text>
     829<Text id="0310a">The next step in the <AutoText key="glidict::GUI.Design"/> panel is <AutoText key="glidict::CDM.GUI.Indexes"/>. These specify what parts of the collection are searchable (e.g. searching by title and author). Delete the <AutoText key="metadata::ex.Source"/> index, which is not particularly useful, by selecting it and clicking <AutoText key="glidict::CDM.IndexManager.Remove_Index" type="button"/>. </Text>
     830</NumberedItem>
     831<NumberedItem>
     832<Text id="0310b">Modify the <AutoText key="metadata::ex.Title"/> index to include <AutoText key="metadata::dc.Title"/> by selecting the index in the <AutoText key="glidict::CDM.IndexManager.Indexes"/> box and then selecting <AutoText key="metadata::dc.Title"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> box. Click <AutoText key="glidict::CDM.IndexManager.MGPP.Replace_Index" type="button"/>. Searching this index will search both dc.Title and ex.Title metadata. If you want to restrict searching to just the manually added dc.Title metadata, deselect <AutoText key="metadata::ex.Title"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> box and click <AutoText key="glidict::CDM.IndexManager.MGPP.Replace_Index" type="button"/>.</Text>
     833</NumberedItem>
     834<NumberedItem>
     835<Text id="0312">You can add indexes based on any metadata. Add a new index based on <AutoText key="metadata::dc.Creator"/>. Change the <AutoText key="glidict::CDM.IndexManager.Index_Name"/> field to "authors", and select <AutoText key="metadata::dc.Creator"/> in the <AutoText key="glidict::CDM.IndexManager.Source"/>. You will need to deselect the <AutoText key="metadata::ex.Title"/> and <AutoText key="metadata::dc.Title"/> metadata items. Click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>.</Text>
    738836</NumberedItem>
    739837<Comment>
     
    744842</Heading>
    745843<NumberedItem>
    746 <Text id="0315">The <AutoText key="glidict::CDM.GUI.Classifiers"/> section adds "classifiers," which provide the collection with browsing functions. Go to this section and observe that Greenstone has provided two classifiers, <i>AZLists</i> based on <AutoText key="metadata::ex.Title"/> and <AutoText key="metadata::ex.Source"/> metadata. Remove both of these by selecting them in turn and clicking <AutoText key="glidict::CDM.ClassifierManager.Remove" type="button"/>.</Text>
     844<Text id="0315a">The <AutoText key="glidict::CDM.GUI.Classifiers"/> section adds "classifiers," which provide the collection with browsing functions. Go to this section and observe that Greenstone has provided two classifiers, <i>AZLists</i> based on <AutoText key="metadata::ex.Title"/> and <AutoText key="metadata::ex.Source"/> metadata. These correspond to the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/> buttons on the collection's access bar. Remove the <AutoText key="metadata::ex.Source"/> classifier by selecting it and clicking <AutoText key="glidict::CDM.ClassifierManager.Remove" type="button"/>.</Text>
     845</NumberedItem>
     846<NumberedItem>
     847<Text id="0315b">Modify the <AutoText key="metadata::ex.Title"/> classifier to use <AutoText key="metadata::dc.Title"/> instead. Select the classifier and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. In the <AutoText text="metadata"/> box, select <AutoText key="metadata::dc.Title"/> instead of <AutoText key="metadata::ex.Title"/>. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    747848</NumberedItem>
    748849<NumberedItem>
     
    753854</NumberedItem>
    754855<NumberedItem>
    755 <Text id="0318">Now add an <AutoText text="AZCompactList"/> classifier. Click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/> and configure it to use <AutoText key="metadata::dc.Creator"/> metadata, with button name "Creator". Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     856<Text id="0318a">Now add an <AutoText text="AZCompactList"/> classifier for <AutoText key="metadata::dc.Creator"/>. Select <AutoText text="AZCompactList"/> from the <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> drop-down list and click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>. A popup window <AutoText key="glidict::CDM.ArgumentConfiguration.Title"/> appears. Select <AutoText key="metadata::dc.Creator"/> from the <AutoText text="metadata"/> drop-down list and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     857<Text id="0318b"><AutoText text="AZCompactList"/> is like <AutoText text="AZList"/>, except that values that appear multiple times in the hierarchy are automatically grouped together and a new node, shown as a bookshelf icon, is formed.</Text>
     858</NumberedItem>
    756859<Comment>
    757860<Text id="0319">The last three sections are <AutoText key="glidict::CDM.GUI.Formats"/>, <AutoText key="glidict::CDM.GUI.Translation"/> and <AutoText key="glidict::CDM.GUI.MetadataSets"/>. In this exercise, we will not make any changes to these.</Text>
    758861</Comment>
    759 </NumberedItem>
    760862<NumberedItem>
    761863<Text id="0320">Switch to the <AutoText key="glidict::GUI.Create"/> panel, and <b>build</b> and <b>preview</b> the collection.</Text>
    762864</NumberedItem>
    763865<NumberedItem>
    764 <Text id="0321">Check that all the facilities work properly. There should be three full-text indexes, called <i>text</i>, <i>Document Title</i>, and <i>Authors</i>. In the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list should appear all the documents to which you have assigned <AutoText key="metadata::dc.Title"/> metadata (and only those documents). In the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list should appear one bookshelf for each author you have assigned as <AutoText key="metadata::dc.Creator"/>, and clicking on that bookshelf should take you to all the documents they authored.</Text>
     866<Text id="0321">Check that all the facilities work properly. There should be three full-text indexes, called <i>text</i>, <i>titles</i>, and <i>authors</i>. In the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list should appear all the documents to which you have assigned <AutoText key="metadata::dc.Title"/> metadata (and only those documents). In the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list should appear one bookshelf for each author you have assigned as <AutoText key="metadata::dc.Creator"/>, and clicking on that bookshelf should take you to all the documents they authored.</Text>
     867</NumberedItem>
     868<Heading>
     869<Text id="0321a">Classifying on multiple metadata</Text>
     870</Heading>
     871<NumberedItem>
     872<Text id="0321b">The new <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list shows only those documents which have been assigned <AutoText key="metadata::dc.Title"/> metadata. For many documents, extracted Titles may be fine, and it is impractical to add the same metadata again as <AutoText key="metadata::dc.Title"/>. Fortunately there is a way we can use both metadata types in one classifier: specify a list of metadata names in the classifier.</Text>
     873</NumberedItem>
     874<NumberedItem>
     875<Text id="0321c">In the <AutoText key="glidict::CDM.GUI.Classifiers"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="metadata::dc.Title"/> <AutoText text="AZList"/> classifier in the <AutoText key="glidict::CDM.ClassifierManager.Assigned"/> box and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. Note you can achieve the same result by double clicking on the classifier.</Text>
     876</NumberedItem>
     877<NumberedItem>
     878<Text id="0321d">Type <AutoText text=",ex.Title" type="quoted"/> after the <AutoText key="metadata::dc.Title" type="quoted"/>&mdash;i.e. make it read</Text>
     879<Format>
     880<AutoText key="metadata::dc.Title" type="plain"/><AutoText text=",ex.Title" type="plain"/>
     881</Format>
     882</NumberedItem>
     883<NumberedItem>
     884<Text id="0321e"><b>Build</b> the collection again and <b>preview</b> it. Now all the documents should appear in the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list.</Text>
     885<Text id="0321f">Extracted metadata is unreliable. But it is very cheap! On the other hand, manually assigned metadata is reliable, but expensive. The previous section of this exercise has shown how to aim for the best of both worlds by using extracted metadata but correcting it when it is wrong. While this may not satisfy the professional librarian, it could provide a useful compromise for the music teacher who wants to get their collection together with a minimum of effort.</Text>
    765886</NumberedItem>
    766887<Comment>
     
    769890</Content>
    770891</Tutorial>
    771 <Tutorial id="difficult_PDF_collection">
     892<Tutorial id="formatting_word_pdf">
    772893<Title>
    773 <Text id="0323">Difficult PDF documents</Text>
     894<Text id="fw-1">Formatting the Word and PDF collection</Text>
    774895</Title>
    775 <SampleFiles folder="difficult_documents"/>
    776 <Version initial="2.60" current="2.70"/>
     896<Prerequisite id="word_pdf_collection"/>
    777897<Content>
    778898<NumberedItem>
    779 <Text id="0324">Build a fresh Greenstone collection from the two files in <Path>sample_files &rarr; difficult_documents.</Path> Use the default collection configuration: that is, simply gather the files into a new collection, and build it.</Text>
    780 <Comment>
    781 <Text id="0325">These files are called <Path>No extractable text.pdf</Path> and <Path>Weird characters.pdf</Path>&mdash;their names hint at the problems they will cause!</Text>
    782 </Comment>
    783 </NumberedItem>
    784 <NumberedItem>
    785 <Text id="0326">Now preview the collection. The titles and filenames lists show only one of the documents. When you click the "text" icon to look at the text extracted from that document, it's garbage. During the building process this message appeared: <AutoText key="perlmodules::plugin.one_included" type="quoted"/>; <AutoText key="perlmodules::plugin.one_rejected" type="quoted"/>.</Text>
     899<Text id="fw-2">Open the <b>reports</b> collection in the Librarian Interface and go to the Format Features section of the Design panel.</Text>
     900</NumberedItem>
     901<Heading>
     902<Text id="fw-2a">Tidying up the default format statement</Text>
     903</Heading>
     904<NumberedItem>
     905<Text id="fw-3">Greenstone's default format statement is complex because it is designed to produce something reasonable under almost any conditions, and also because for practical reasons it needs to be backwards compatible with legacy collections.</Text>
     906
     907<Text id="fw-4">The default <AutoText text="VList"/> format statement looks like the following:</Text>
     908<Format>
     909&lt;td valign=top&gt;[link][icon][/link]&lt;/td&gt;<br/>
     910&lt;td valign=top&gt;[ex.srclink]{Or}{[ex.thumbicon],<br/>
     911[ex.srcicon]}[ex./srclink]&lt;/td&gt;<br/>
     912&lt;td valign=top&gt;[highlight]<br/>
     913{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}<br/>
     914[/highlight]{If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;}&lt;/td&gt;
     915</Format>
     916<Text id="fw-5">This format statement is the default used for search results, classifiers, and document table of contents. First we will tidy this up a bit. </Text>
     917
     918<Text id="fw-6"><Format>{Or}{[ex.thumbicon],[ex.srcicon]}</Format> chooses ex.thumbicon metadata if its there, otherwise chooses ex.srcicon metadata. If neither are present, nothing is displayed. For this collection there is no ex.thumbicon metadata so the choice is not needed.</Text>
     919
     920<Text id="fw-7">Replace <Format>{Or}{[ex.thumbicon],[ex.srcicon]}</Format> with <Format>[ex.srcicon]</Format>.  </Text>
     921
     922<Text id="fw-8">There is no dls.Title metadata, so remove that element from <Format>{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}</Format>.</Text>
     923
     924<Text id="fw-9">The resulting format statement looks like the following:</Text>
     925<Format>
     926&lt;td valign=top&gt;[link][icon][/link]&lt;/td&gt;<br/>
     927&lt;td valign=top&gt;[ex.srclink][ex.srcicon][ex./srclink]&lt;/td&gt;<br/>
     928&lt;td valign=top&gt;[highlight]<br/>
     929{Or}{[dc.Title],[ex.Title],Untitled}[/highlight] {If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;}&lt;/td&gt;<br/>
     930</Format>
     931<Text id="fw-10">Preview the collection to make sure the display hasn't changed.</Text>
     932
     933</NumberedItem>
     934<Heading>
     935<Text id="fw-10a">Linking to Greenstone version or original version</Text>
     936</Heading>
     937<NumberedItem>
     938<Text id="fw-11">For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default <AutoText text="VList"/> format statement links to both versions:</Text>
     939
     940<Text id="fw-12"><Format>[link][icon][/link]</Format> links to the Greenstone HTML version, while <Format>[srclink][srcicon][/srclink]</Format> links to the original.</Text>
     941
     942<Text id="fw-13">Choose <AutoText text="SearchVList"/> in <AutoText key="glidict::CDM.GUI.Formats"/> by selecting <AutoText text="Search"/> from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Experiment with removing either of the two links from the format statement. Storing and displaying the original allows users to see the correct format, but requires the user to have the relevant program installed. It also increases the size of the collection. The Greenstone version can be viewed in a browser, but may not look as nice.</Text>
     943
     944</NumberedItem>
     945<Heading>
     946<Text id="fw-13a">Making bookshelves show how many items they contain</Text>
     947</Heading>
     948<NumberedItem>
     949<Text id="fw-14">Next, we'll customize the format for the <AutoText key="coredm::_labelCreator_" type="italics"/> list. Classifier nodes have only a few pieces of metadata to display: <Format>[ex.Title]</Format> and <Format>[numleafdocs]</Format>. Whatever metadata the classifier has been built on, the node label is always stored as <Format>[ex.Title]</Format>. This is why a Creator is printed out for each bookshelf node even though dc.Creator is not specified in the format statement. <Format>[numleafdocs]</Format> is only defined for bookshelf nodes, so this metadata can be used in an <Format>{If}</Format> statement to make bookshelf nodes and document nodes display differently.</Text>
     950
     951</NumberedItem>
     952<NumberedItem>
     953<Text id="fw-15">Make each bookshelf node in the Creator classifier show how many entries it contains. In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the Creator classifier from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list.  Append the following: </Text>
     954<Format>
     955{If}{[numleafdocs],&lt;td&gt;&lt;i&gt;([numleafdocs])&lt;/i&gt;&lt;/td&gt;}
     956</Format>
     957<Text id="fw-16">Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>, switch to the <AutoText key="glidict::GUI.Create"/> panel, and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/> (no need to rebuild). </Text>
     958
     959<Text id="fw-17">This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf.  Since only bookshelf nodes define <Format>[numleafdocs]</Format>, only these nodes will display this. By modifying <AutoText text="CL2VList"/> instead of <AutoText text="VList"/>, the change will only apply to the second classifier (Creators).</Text>
     960
     961</NumberedItem>
     962<Heading>
     963<Text id="fw-17a">Displaying multi-valued metadata</Text>
     964</Heading>
     965<NumberedItem>
     966<Text id="fw-18">Modify the document nodes in the Creator classifier to display all authors. After <Format>{If}{[ex.Source],&lt;br&gt;</Format> in the format statement, add <Format>[sibling:ex.Creator]</Format>.</Text>
     967<Text id="fw-19"><Format>[ex.Source]</Format> is not defined for bookshelf nodes, so can also be used to differentiate bookshelves and documents.</Text>
     968
     969<Text id="fw-20">The resulting format statement looks like:</Text>
     970<Format>
     971&lt;td valign=top&gt;[link][icon][/link]&lt;/td&gt;<br/>
     972&lt;td valign=top&gt;[ex.srclink][ex.srcicon][ex./srclink]&lt;/td&gt;<br/>
     973&lt;td valign=top&gt;[highlight]<br/>
     974{Or}{[dc.Title],[ex.Title],Untitled}[/highlight]<br/>
     975{If}{[ex.Source],&lt;br&gt;[sibling:ex.Creator] <br/>
     976&lt;i&gt;([ex.Source])&lt;/i&gt;}&lt;/td&gt;<br/>
     977{If}{[numleafdocs],&lt;td&gt;&lt;i&gt;([numleafdocs])&lt;/i&gt;&lt;/td&gt;}
     978</Format>
     979<Text id="fw-21">This will display the Greenstone link, the link to the original, then the Title. For bookshelf nodes, it will also display how many documents the bookshelf contains. For document nodes, it will display all the Authors (Creators), and the source document. <Format>[sibling:ex.Creator]</Format> displays all the Creator metadata for the document, separated by <AutoText text=", " type="italics"/>. Preview the <AutoText key="coredm::_labelCreator_" type="italics"/> list.</Text> 
     980
     981<Text id="fw-22">Change the separator between the authors. Modify the format statement, and replace <Format>[sibling:ex.Creator]</Format> with <Format>[sibling(All'&lt;br/&gt;'):ex.Creator]</Format>. This will add a new line after each author. Preview the <AutoText key="coredm::_labelCreator_" type="italics"/> list.</Text>
     982</NumberedItem>
     983</Content>
     984</Tutorial>
     985<Tutorial id="enhanced_pdf">
     986<Title>
     987<Text id="ep-1">Enhanced PDF handling</Text>
     988</Title>
     989<Prerequisite id="word_pdf_collection"/>
     990<Version initial="2.70" current="2.70"/>
     991<Content>
     992<Text id="ep-2">Greenstone converts PDF files to HTML using third-party software: <AutoText text="pdftohtml.pl" type="itlaics"/>. This lets users view these documents even if they don't have the PDF software installed. Unfortunately, sometimes the formatting of the resulting HTML files is not so good.</Text>
     993<Text id="ep-3">This exercise explores some extra options to the PDF plugin which may produce a nicer version for display. Some of these options use the standard pdftohtml program, others use ImageMagick and Ghostscript to convert the file to a series of images. Ghostscript is a program that can convert Postscript and PDF files to other formats. You can download it from <Link>http://www.cs.wisc.edu/~ghost/</Link> (follow the link to the current stable release).</Text>
     994<NumberedItem>
     995<Text id="ep-4">In a browser, preview the reports collection created in exercise <TutorialRef id="word_pdf_collection"/>, and view the documents. Remember that <Path>pdf05-notext</Path> couldn't be processed during building, because there was no extracted text, and therefore doesn't appear in the collection. Note that the other PDF documents appear as one long document, with no sections. </Text>
    786996</NumberedItem>
    787997<Heading>
     
    7921002</Comment>
    7931003<NumberedItem>
     1004<Text id="0334a">In the Librarian Interface, rebuild the collection. During the building process this message appears: <AutoText key="perlmodules::plugin.n_included" type="quoted" args="14"/>; <AutoText key="perlmodules::plugin.one_rejected" type="quoted"/>.</Text>
     1005</NumberedItem>
     1006<NumberedItem>
    7941007<Text id="0335">Use the <AutoText key="glidict::Menu.File_Options"/> item on the <AutoText key="glidict::Menu.File"/> menu to switch to <AutoText key="glidict::Preferences.Mode.Expert"/> mode and then build the collection again. The <AutoText key="glidict::GUI.Create"/> panel looks different in <AutoText key="glidict::Preferences.Mode.Expert"/> mode because it gives more options: locate the <AutoText key="glidict::CreatePane.Build_Collection" type="button"/> button, near the bottom of the window, and click it. Now a message appears saying that the file could not be processed, and why.</Text>
    7951008</NumberedItem>
     
    7981011</NumberedItem>
    7991012<Heading>
    800 <Text id="0336a">Improved PDF Conversion with Ghostscript</Text>
    801 </Heading>
    802 <Comment>
    803 <Text id="0336b">If you have Ghostscript installed, then you can use a new method of handling these difficult PDF documents. Ghostscript is a program that can convert Postscript and PDF files to other formats. You can download it from <Link>http://www.cs.wisc.edu/~ghost/</Link> (follow the link to the current stable release).</Text>
    804 </Comment>
    805 <NumberedItem>
    806 <Text id="0327">Greenstone can convert PDF files into a series of images with a corresponding file that details how they are composed into the complete document (called an <AutoText text="item" type="quoted"/> file). For this part of the exercise, ImageMagick also needs to be installed (see <TutorialRef id="install_greenstone"/>).</Text>
    807 </NumberedItem>
    808 <NumberedItem>
    809 <Text id="0328">In the <AutoText key="glidict::CDM.GUI.Plugins"/> list in the <AutoText key="glidict::GUI.Design"/> panel, double-click <AutoText text="PDFPlug"/> to pop up a window that shows its settings, and set the <AutoText text="convert_to"/> option to <AutoText text="pagedimg_gif"/>.</Text>
    810 </NumberedItem>
    811 <NumberedItem>
    812 <Text id="0329"><b>Build</b> the collection and <b>preview</b> it. Both PDF documents have been processed and divided into pages, but each page displays <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/> because when converting PDF documents to images, no text is extracted.</Text>
    813 </NumberedItem>
    814 <NumberedItem>
    815 <Text id="0330">In order to view the documents properly we need to modify a format statement. In the <AutoText key="glidict::CDM.GUI.Formats"/> section on the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. Replace <Format>[Text]</Format> with <Format>[srcicon]</Format> and click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
    816 </NumberedItem>
    817 <NumberedItem>
    818 <Text id="0332"><b>Preview</b> the collection from the <AutoText key="glidict::GUI.Create"/> panel. (There is no need to build it). Images from the documents are now displayed instead of the extracted text. Both <Path>No extractable text.pdf</Path> and <Path>Weird characters.pdf</Path> display nicely now. </Text>
     1013<Text id="ep-5">Tidying up the HTML format</Text>
     1014</Heading>
     1015<NumberedItem>
     1016<Text id="ep-6">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, configure <AutoText text="PDFPlug"/>. Switch on the <AutoText text="use_sections"/> option. </Text>
     1017
     1018<Text id="ep-7">Build and preview the collection. Note that all the PDF documents are now split into a series of pages, and a goto page box is provided. The format is still a bit ugly though.</Text>
     1019
     1020</NumberedItem>
     1021<NumberedItem>
     1022<Text id="ep-8">Back in the <AutoText key="glidict::CDM.GUI.Plugins"/> section, configure <AutoText text="PDFPlug"/> again. Switch on the <AutoText text="complex"/> option. This will make <AutoText text="PDFPlug"/> use Ghostscript to try and generate nicer HTML. Ghostscript needs to be installed for this to work.</Text>
     1023
     1024<Text id="ep-9">Build and preview the collection, and see how the format has changed to more closely resemble the original. In particular, you can see that <Path>pdf01.pdf</Path> has retained its columns in the HTML.</Text>
     1025
     1026<Text id="ep-10">The PDF document with no text (<Path>pdf05-notext.pdf</Path>) now appears in the collection, but has no contents. The PDF with weird characters (<Path>pdf06-weirdchars.pdf</Path>) still does not display properly.</Text>
     1027</NumberedItem>
     1028<Heading>
     1029<Text id="ep-11">Using image format</Text>
     1030</Heading>
     1031<NumberedItem>
     1032<Text id="ep-12">If conversion to HTML doesn't produce the result you like, PDF documents can be converted to a series of images, one per page or slide. This requires ImageMagick and Ghostscript to be installed.</Text>
     1033
     1034</NumberedItem>
     1035<NumberedItem>
     1036<Text id="ep-13">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section, configure <AutoText text="PDFPlug"/>. Set the <AutoText text="convert_to"/> option to one of the image types, e.g. <AutoText text="pagedimg_jpg"/>. Switch off the <AutoText text="use_sections"/> and <AutoText text="complex"/> options, as they are not used with image conversion. </Text>
     1037</NumberedItem>
     1038<NumberedItem>
     1039<Text id="ep-14">Build the collection and preview. All PDF documents have been processed and divided into sections, but each section displays <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. For the conversion to images for PDF documents, no text is extracted. </Text>
     1040</NumberedItem>
     1041<NumberedItem>
     1042<Text id="ep-15">In order to view the documents properly, you will need to modify the format statement. In the <AutoText key="glidict::CDM.GUI.Formats"/> section on the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. Replace </Text>
     1043
     1044<Format>
     1045[Text]
     1046</Format>
     1047<Text id="ep-16">with</Text>
     1048<Format>
     1049{If}{[parent:FileFormat] eq PDF,[srcicon],[Text]}
     1050</Format>
     1051
     1052<Text id="ep-17">Because the other documents in the collection do not use images, we only want to show images for PDF documents. <AutoText text="FileFormat"/> is an extracted metadata item which shows the format of the source document. We use this to test whether the documents are PDF or not.</Text>
     1053
     1054</NumberedItem>
     1055<NumberedItem>
     1056<Text id="ep-18">Preview the collection from the <AutoText key="glidict::GUI.Create"/> panel. (There is no need to build it). Images from the document are now displayed instead of the extracted text. Both <Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars.pdf</Path> display nicely now. Make sure that the word documents still display properly. </Text>
     1057</NumberedItem>
     1058<Heading>
     1059<Text id="ep-19">Using <AutoText text="process_exp"/> to control document processing (advanced)</Text>
     1060</Heading>
     1061<NumberedItem>
     1062<Text id="ep-20">Processing all of the PDF documents using an image type may not give the best result for your collection. The images will look nice, but as no text is extracted, searching the full text will not be available for these documents. The best solution would be to process most of the PDF files as HTML, and only use the image format where HTML doesn't work.</Text>
     1063</NumberedItem>
     1064<NumberedItem>
     1065<Text id="ep-21">We achieve this by adding two <AutoText text="PDFPlug"/> plugins to the collection, with different options. Currently, the Librarian Interface does not allow you to add the same plugin twice to the collection (with the exception of <AutoText text="UnknownPlug"/>). You will need to edit the collection configuration file by hand. Close the reports collection in the Librarian Interface. Then open <Path>Greenstone &rarr; collect &rarr; reports &rarr; etc &rarr; collect.cfg</Path> using a text editor, e.g. WordPad. In the list of plugins, add another <AutoText text="PDFPlug"/>, i.e.</Text>
     1066<Format>
     1067plugin PDFPlug
     1068</Format>
     1069<Text id="ep-22">Don't worry about the options here - we will add these using the Librarian Interface.</Text>
     1070<Text id="ep-22a">Note that if you ever need to edit a collection's <Path>collect.cfg</Path> file by hand, you must close the collection in the Librarian Interface first, otherwise the next time it saves the file, it will overwrite your changes.</Text>
     1071
     1072</NumberedItem>
     1073<NumberedItem>
     1074<Text id="ep-23">Open up the collection again in the Librarian Interface, and go to the Gather panel. Make a new folder called <AutoText text="notext" type="quoted"/>. Right click in the collection panel and select <AutoText key="glidict::CollectionPopupMenu.New_Folder"/> from the menu. Change the <AutoText key="glidict::NewFolderOrFilePrompt.Folder_Name"/> to <AutoText text="notext" type="quoted"/>, and click <AutoText key="glidict::General.OK" type="button"/>. Move the two pdf files that have problems with html (<Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars</Path>.pdf ) into this folder by drag and drop. We will set up the plugins so that PDF files in this <Path>notext</Path> folder are processed differently to the other PDF files.</Text>
     1075
     1076</NumberedItem>
     1077<NumberedItem>
     1078<Text id="ep-24">Switch to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel. You will see that there are two PDFPlug plugins in the list. </Text>
     1079
     1080</NumberedItem>
     1081<NumberedItem>
     1082<Text id="ep-25">Switch to <AutoText key="glidict::Preferences.Mode.Systems"/> mode, as you will need to use regular expressions in the options (<Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_Options"/> &rarr; <AutoText key="glidict::Preferences.Mode"/></Menu>)</Text>
     1083</NumberedItem>
     1084<NumberedItem>
     1085<Text id="ep-26">Configure the two <AutoText text="PDFPlug"/> plugins so that the options look like the following:</Text>
     1086
     1087<Format>
     1088plugin PDFPlug -convert_to pagedimg_jpg -process_exp "notext.*\.pdf"<br/>
     1089plugin PDFPlug -convert_to html -use_sections
     1090</Format>
     1091
     1092<Text id="ep-27">The paged_img version must come earlier in the list than the html version. The <AutoText text="process_exp"/> for the first <AutoText text="PDFPlug"/> will process any PDf files in the <Path>notext</Path> directory. The second <AutoText text="PDFPlug"/> will process any PDF files that are not processed by the first one.</Text>
     1093
     1094<Text id="ep-28">Note that all plugins have the <AutoText text="process_exp"/> option, and this can be used to customize which documents are processed by which plugin. This option is only visible in <AutoText key="glidict::Preferences.Mode.Systems"/> and <AutoText key="glidict::Preferences.Mode.Expert"/> modes.</Text>
     1095<Text id="ep-29">Change back to <AutoText key="glidict::Preferences.Mode.Librarian"/> mode.</Text>
     1096</NumberedItem>
     1097<NumberedItem>
     1098<Text id="ep-30">Edit the <AutoText text="DocumentText"/> format statement. PDF files processed as HTML will not have images to display, so we need to make sure they get text displayed instead.</Text>
     1099<Text id="ep-1">Change the first <Format>[srcicon]</Format> element in the following part with <Format>{Or}{[srcicon],[Text]}</Format>, i.e. change</Text>
     1100<Format>
     1101{If}{[parent:FileFormat] eq PDF,[srcicon],[Text]}
     1102</Format>
     1103<Text id="ep-32">to</Text>
     1104<Format>
     1105{If}{[parent:FileFormat] eq PDF, {Or}{[srcicon],[Text]},[Text]}
     1106</Format>
     1107</NumberedItem>
     1108<NumberedItem>
     1109<Text id="ep-33">Build and preview the collection. All PDF  documents should look relatively nice. Try searching this collection. You will be able to locate the PDFs that were converted to HTML (try e.g. <AutoText text="bibliography" type="quoted"/>), but not the ones that were converted to images (try searching for <AutoText text="banana" type="quoted"/> or <AutoText text="METS" type="quoted"/>).</Text>
     1110</NumberedItem>
     1111</Content>
     1112</Tutorial>
     1113<Tutorial id="enhanced_word">
     1114<Title>
     1115<Text id="ew-">Enhanced Word document handling</Text>
     1116</Title>
     1117<Content>
     1118<Text id="ew-1">The standard way Greenstone processes Word documents is to convert them to HTML format using a third-party program, wvWare. This sometimes doesn't do a very good job of conversion. If you are using Windows, you can take advantage of Windows native scripting to do a better job of conversion. If the original document was hierarchically structured using Word styles, these can be used to structure the resulting HTML. Word document properties can also be extracted as metadata.</Text>
     1119<NumberedItem>
     1120<Text id="ew-2">In your digital library, preview the reports collection. Look at the Word documents and notice how they have no structure-they have been converted to flat documents.</Text>
     1121</NumberedItem>
     1122<Heading>
     1123<Text id="ew-3">Using Windows native scripting</Text>
     1124</Heading>
     1125<NumberedItem>
     1126<Text id="ew-4">In the Librarian Interface, open up the reports collection. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section on the left-hand side. Double click the <AutoText text="WordPlug"/> plugin and switch on the <AutoText text="windows_scripting"/> option.</Text>
     1127</NumberedItem>
     1128<NumberedItem>
     1129<Text id="ew-5">Build and preview the collection. Have a look at <Path>word03.doc</Path> and <Path>word06.doc</Path>. These now appear with hierarchical structure. But these two are the only ones.</Text>
     1130<Text id="ew-6">The default behaviour for <AutoText text="WordPlug"/> with <AutoText text="windows_scripting"/> is to section the document based on <AutoText text="Heading 1" type="quoted"/>, <AutoText text="Heading 2" type="quoted"/>, <AutoText text="Heading 3" type="quoted"/> styles. If you open up the <Path>word03.doc</Path> or <Path>word06.doc</Path> documents in Word, you will see that the sections use these Heading styles.</Text>
     1131<Text id="ew-7">Note, to view style information in Word, you can select <Menu>Format &rarr; Styles and Formatting</Menu> from the menu, and a side bar will appear on the right hand side. Click on a section heading and the formatting information will be displayed in this side bar.</Text>
     1132</NumberedItem>
     1133<NumberedItem>
     1134<Text id="ew-8">Some of the documents do not use styles (e.g. <Path>word01.doc</Path>) and no structure can be extracted from them. Some documents use user-defined styles. <AutoText text="WordPlug"/> can be configured to use these styles instead of <AutoText text="Heading 1" type="plain"/>, <AutoText text="Heading 2" type="plain"/> etc. Next we will configure WordPlug to use the styles found in <Path>word05.doc</Path>.</Text>
     1135</NumberedItem>
     1136<Heading>
     1137<Text id="ew-9">Defining styles</Text>
     1138</Heading>
     1139<NumberedItem>
     1140<Text id="ew-10">Change the mode in the Librarian Interface to <AutoText key="glidict::Preferences.Mode.Systems"/>  (<Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_Options"/> &rarr; <AutoText key="glidict::Preferences.Mode"/></Menu>). This is because you will need to use regular expressions to set up the style options.</Text>
     1141</NumberedItem>
     1142<NumberedItem>
     1143<Text id="ew-11">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText text="WordPlug"/> and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. Four types of header can be set which are:</Text>
     1144<Format>
     1145<BulletList>
     1146<Bullet>title_header (titleHeader1|titleHeader2|...)</Bullet>
     1147<Bullet>level1_header (level1Header1|level1Header2|...)</Bullet>
     1148<Bullet>level2_header (level2Header1|level2Header2|...)</Bullet>
     1149<Bullet>level3_header (level3Header1|level3Header2|...)</Bullet>
     1150</BulletList>
     1151</Format>
     1152<Text id="ew-12">These header options define which styles should be considered as title, level 1, level 2 and level 3 styles. Open up the <Path>word05.doc</Path> in Word (by double-clicking on it in the <AutoText key="glidict::GUI.Gather"/> pane), and examine the title and section heading styles. You will see that various user-defined header styles are set such as:</Text>
     1153<BulletList>
     1154<Bullet>
     1155<Text id="ew-13"><AutoText text="PaperTitle" type="italics"/>: Title of the paper</Text>
     1156</Bullet>
     1157<Bullet>
     1158<Text id="ew-14"><AutoText text="SammaryHeader" type="italics"/> (probably mistyped): Summary section</Text>
     1159</Bullet>
     1160<Bullet>
     1161<Text id="ew-15"><AutoText text="ChapterTitle" type="italics"/>: Level 1 section heading</Text>
     1162</Bullet>
     1163<Bullet>
     1164<Text id="ew-16"><AutoText text="SectionHeading" type="italics"/>: Level 2 section heading</Text>
     1165</Bullet>
     1166<Bullet>
     1167<Text id="ew-17"><AutoText text="ReferenceHeading" type="italics"/>: Reference section</Text>
     1168</Bullet>
     1169</BulletList>
     1170<Text id="ew-18">Set the options in <AutoText text="WordPlug"/> as follows:</Text>
     1171<Format>
     1172title_header: PaperTitle<br/>
     1173level1_header:(SammaryHeader|ChapterTitle|ReferenceHeading|Reference_heading)<br/>
     1174level2_header: SectionHeading
     1175</Format>
     1176</NumberedItem>
     1177<NumberedItem>
     1178<Text id="ew-19">Build the collection and preview it. Look in particular at <Path>word05.doc</Path>. You will see that this document is now also hierarchically structured.</Text>
     1179</NumberedItem>
     1180<Heading>
     1181<Text id="ew-20">Removing pre-defined table of contents</Text>
     1182</Heading>
     1183<NumberedItem>
     1184<Text id="ew-21">If you look at <Path>word06.doc</Path> you will see that it now has two tables of contents. One is generated by Greenstone based on the document's styles, the other was already defined in the Word document. WordPlug can be configured to remove predefined tables of contents and tables of figures. The tables must be defined with Word styles in order for this to work.</Text>
     1185</NumberedItem>
     1186<NumberedItem>
     1187<Text id="ew-22">To remove the tables of contents and figures from <Path>word06.doc</Path>, switch on the <AutoText text="delete_toc"/> option in <AutoText text="WordPlug"/>. Set the header styles as follows:</Text> 
     1188<Format>
     1189toc_header: (MsoToc1|MsoToc2|MsoToc3)<br/>
     1190tof_header: MsoTof
     1191</Format>
     1192<Text id="ew-23">Once these are set, click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1193</NumberedItem>
     1194<NumberedItem>
     1195<Text id="ew-24">Build and preview the collection. <Path>word06.doc</Path> should now only have one table of contents.</Text>
     1196</NumberedItem>
     1197<Heading>
     1198<Text id="ew-25">Extracting document properties as metadata</Text>
     1199</Heading>
     1200<NumberedItem>
     1201<Text id="ew-26">Word document properties can be extracted as metadata. By default, only the Title will be extracted. Other properties can be extracted using the <AutoText text="extracted_word_metadata_fields"/>  option.</Text>
     1202</NumberedItem>
     1203<NumberedItem>
     1204<Text id="ew-27">In the Enrich panel, look at the metadata that has been extracted for word05.doc and word06.doc. Now open the documents in Word and look at what properties they have set. (<Menu>File &rarr; Properties</Menu>). They have Title, Author, Subject, and Keywords properties. WordPlug can be configured to look for these properties and extract them.</Text>
     1205</NumberedItem>
     1206<NumberedItem>
     1207<Text id="ew-28">In the <AutoText key="glidict::GUI.Design"/> panel, under <AutoText key="glidict::CDM.GUI.Plugins"/>, select WordPlug and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. Switch on the configuration option <AutoText text="extracted_word_metadata_fields"/>. Set the value to </Text>
     1208<Format>
     1209Title,Author&lt;Creator&gt;,Subject,Keywords&lt;Subject&gt;
     1210</Format>
     1211<Text id="ew-29">This will make <AutoText text="WordPlug"/> try to extract Title, Author, Subject and Keywords metadata. Title and Subject will be saved with the same name, while Author will be saved as Creator metadata, and Keywords as Subject metadata.</Text>
     1212</NumberedItem>
     1213<NumberedItem>
     1214<Text id="ew-30">Build the collection.</Text>
     1215</NumberedItem>
     1216<NumberedItem>
     1217<Text id="ew-31">Look at the metadata for the two documents again in the <AutoText key="glidict::GUI.Enrich"/> panel. You should now see these extra metadata items. This metadata can now be used in display or browsing classifiers etc.</Text>
    8191218</NumberedItem>
    8201219</Content>
     
    8301229<Text id="0338">Start a new collection (<Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_New"/></Menu>) called <b>backdrop</b>. Fill out the fields with appropriate information. For <AutoText key="glidict::NewCollectionPrompt.Base_Collection"/>, select the item <b>Simple image collection (image-e)</b> from the pull-down menu.</Text>
    8311230<Comment>
    832 <Text id="0340">Greenstone does not ask you to choose a metadata set because the new collection inherits whatever is used by the base collection.</Text>
     1231<Text id="0340a">When you base a collection on an existing one, it inherits all the settings of the old one. You won't be asked to choose a metadata set because the new collection inherits the ones (if any) used by the seed collection.</Text>
    8331232</Comment>
    8341233</NumberedItem>
     
    8431242</NumberedItem>
    8441243<NumberedItem>
    845 <Text id="0344">Click <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar to view a list of the photos ordered by filename and presented as a thumbnail accompanied by some basic data about the image. The structure of this collection is the same as <b>Simple image collection (image-e)</b>, but the content is different.</Text>
     1244<Text id="0344">Click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar to view a list of the photos ordered by filename and presented as a thumbnail accompanied by some basic data about the image. The structure of this collection is the same as <b>Simple image collection (image-e)</b>, but the content is different.</Text>
    8461245</NumberedItem>
    8471246<NumberedItem>
    8481247<Text id="0345">Change to the <AutoText key="glidict::GUI.Enrich"/> panel and view the extracted metadata for <Path>Bear.jpg</Path>.</Text>
    8491248</NumberedItem>
     1249<Heading>
     1250<Text id="0347">Adding a metadata set to the collection</Text>
     1251</Heading>
    8501252<Comment>
    8511253<Text id="0346">We now add our own metadata and use it to give users a new way to browse the collection. We use the Dublin Core metadata set.</Text>
    8521254</Comment>
    853 <Heading>
    854 <Text id="0347">Adding a metadata set to the collection</Text>
    855 </Heading>
    8561255<NumberedItem>
    8571256<Text id="0348">The collection (image-e) on which <b>backdrop</b> is based uses only extracted metadata. To add another metadata set, go to the <AutoText key="glidict::GUI.Design"/> panel of the Librarian Interface and click <AutoText key="glidict::CDM.GUI.MetadataSets"/> in the list on the left (the last one). Then click  <AutoText key="glidict::CDM.MetadataSetManager.Add" type="button"/> (lower left button).</Text>
     
    8601259<Text id="0349">In the window that pops up, select <AutoText text="dublin.mds"/> and click <AutoText key="glidict::CDM.MetadataSetManager.Chooser.Add" type="button"/>.</Text>
    8611260</NumberedItem>
    862 <Heading>
    863 <Text id="0350">Adding Title metadata</Text>
    864 </Heading>
    865 <NumberedItem>
    866 <Text id="0351">Now switch to the <AutoText key="glidict::GUI.Enrich"/> panel by clicking this tab. The metadata for each file now shows the Dublin core <AutoText text="dc."/> fields as well as the extracted <AutoText text="ex."/> fields.</Text>
    867 </NumberedItem>
     1261<NumberedItem>
     1262<Text id="0351">Now switch to the <AutoText key="glidict::GUI.Enrich"/> panel by clicking this tab. The metadata for each file now shows the Dublin Core <AutoText text="dc."/> fields as well as the extracted <AutoText text="ex."/> fields.</Text>
     1263</NumberedItem>
     1264<Heading>
     1265<Text id="0350a">Adding Title and Description metadata</Text>
     1266</Heading>
    8681267<NumberedItem>
    8691268<Text id="0352">We work with just the first three files (<Path>Bear.jpg</Path>, <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>) to get a flavour of what is possible. First, set each file's <AutoText key="metadata::dc.Title"/> field to be the same as its filename but without the filename extension.</Text>
     
    8721271<Text id="0353">Click on <Path>Bear.jpg</Path> so its metadata fields are available, then click on its <AutoText key="metadata::dc.Title"/> field on the right-hand side. Type in <b>Bear</b>, and click <b>Enter</b>.</Text>
    8731272</NumberedItem>
    874 <Comment>
    875 <Text id="0354">The <AutoText key="glidict::EnrichPane.ExistingValues" args="..."/> box will become more useful when more entries have been added.</Text>
    876 </Comment>
    877 <NumberedItem>
    878 <Text id="0355">Repeat the process for <Path>Cat.jpg </Path>and <Path>Cheetah.jpg</Path>.</Text>
    879 </NumberedItem>
    880 <Comment>
     1273<NumberedItem>
     1274<Text id="0355">Repeat the process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>.</Text>
     1275</NumberedItem>
     1276<NumberedItem>
     1277<Text id="0355a">Add a description for each image as <AutoText key="metadata::dc.Description"/> metadata.</Text>
     1278<Text id="0372">What description should you enter? To remind yourself of a file's content, the Librarian Interface lets you open files by double-clicking them. It launches the appropriate application based on the filename extension, Word for .doc files, Acrobat for .pdf files and so on. Double-click <Path>Bear.jpg</Path>: on Windows, the image will normally be displayed by Microsoft's Photo Editor (although this depends on how your computer has been set up).</Text>
     1279</NumberedItem>
     1280<NumberedItem>
     1281<Text id="0373">Back in the Librarian Interface enter the text <b>Bear in the Rocky Mountains</b> as the <AutoText key="metadata::dc.Description"/> field's value and click <b>Enter</b> to have it added.</Text>
     1282</NumberedItem>
     1283<NumberedItem>
     1284<Text id="0374">Repeat this process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>, adding a suitable description for each.</Text>
     1285</NumberedItem>
     1286<Heading>
     1287<Text id="0357">Change Format Features to display new metadata</Text>
     1288</Heading>
     1289<NumberedItem>
    8811290<Text id="0356">Now we customize the collection's appearance. Building or previewing the collection at this point won't reveal anything new. That's because we haven't changed the design of the collection to take advantage of the new metadata.</Text>
    882 </Comment>
    883 <Heading>
    884 <Text id="0357">Change Format Features to display new metadata</Text>
    885 </Heading>
     1291</NumberedItem>
    8861292<NumberedItem>
    8871293<Text id="0358">Go to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Formats"/> from the left-hand list. Leave the feature selection controls at their default values, so that <AutoText key="glidict::CDM.FormatManager.Feature"/> remains blank and <AutoText text="VList" /> is selected as the <AutoText key="glidict::CDM.FormatManager.Part"/>. In the <AutoText key="glidict::CDM.FormatManager.Editor"/>, edit the text as follows:</Text>
     
    8891295<Text id="0359">Change "_ImageName_:" to "Title:" <br/> Change "[Image]" to "[dc.Title]"</Text>
    8901296</Indent>
    891 <Comment>
    892 <Text id="0360">Metadata names are case-sensitive in Greenstone: it is important that you capitalize "Title" (and don't capitalize "dc").</Text>
    893 </Comment>
    894 </NumberedItem>
    895 <NumberedItem>
    896 <Text id="0361">Next click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>. The first of the above changes alters the fragment of text that appears to the right of the thumbnail image, the second alters the item of metadata that follows it.</Text>
    897 </NumberedItem>
    898 <NumberedItem>
    899 <Text id="0362">Go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>. Now <b>preview</b> the collection. When you click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar the presentation has changed to "Title: Bear" and so on.</Text>
    900 </NumberedItem>
    901 <Comment>
    902 <Text id="0363">After the first three items, the Title becomes blank because we have only assigned dc.Title metadata to these first three. To get a full listing, enter all the metadata.</Text>
    903 </Comment>
    904 <Comment>
    905 <Text id="0364">For some design parameters the collection must be rebuilt before the effect of changes can be seen. However, changes to format statements take place immediately and you can see the result straightaway by clicking <b>reload</b> (or <b>refresh</b>) in the web browser.</Text>
     1297<Text id="0359a">Place your cursor after the text that says</Text>
     1298<Format>
     1299[dc.Title]&lt;br&gt;
     1300</Format>
     1301<Text id="0359b">and add the following text:</Text>
     1302<Format>
     1303Description: [dc.Description]&lt;br&gt;
     1304</Format>
     1305<Comment>
     1306<Text id="0360">Metadata names are case-sensitive in Greenstone: it is important that you capitalize "Title" and "Description" (and don't capitalize "dc").</Text>
     1307</Comment>
     1308</NumberedItem>
     1309<NumberedItem>
     1310<Text id="0361a">Next click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>. The new format statement will be displayed in the <AutoText key="glidict::CDM.FormatManager.Assigned_Formats"/> list. The first substitution alters the fragment of text that appears to the right of the thumbnail image, the second alters the item of metadata that follows it. The addition displays the description after the Title.</Text>
     1311</NumberedItem>
     1312<NumberedItem>
     1313<Text id="0362a">Go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>. Now <b>preview</b> the collection. When you click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar the presentation has changed to "Title: Bear" and so on. Each image's description should appear beside the thumbnail, following the title.</Text>
     1314</NumberedItem>
     1315<Comment>
     1316<Text id="0363">After the first three items, the Title and Description become blank because we have only assigned Dublin Core metadata to these first three. To get a full listing, enter all the metadata.</Text>
     1317</Comment>
     1318<Comment>
     1319<Text id="0364">For some design parameters the collection must be rebuilt before the effect of changes can be seen. However, changes to format statements take place immediately and you can see the result straightaway by clicking <b>reload</b> (or <b>refresh</b>) in the web browser. Above, you were asked to build before previewing because you had added metadata.</Text>
    9061320</Comment>
    9071321<Heading>
     
    9091323</Heading>
    9101324<NumberedItem>
    911 <Text id="0366">Thumbnail images are created by the <AutoText text="ImagePlug"/> plug-in, so we need to access its configuration settings. To do this, switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Plugins"/> from the list on the left. Double-click <AutoText text="plugin ImagePlug"/> to pop up a window that shows its settings. (Alternatively, select <AutoText text="plugin ImagePlug"/> with a single click and then click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> further down the screen). Currently all options are off, so standard defaults are used. Select <AutoText text="thumbnailsize"/>, set it to <AutoText text="50"/>, and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1325<Text id="0366">Lets change the size of the thumbnail image and make it smaller. Thumbnail images are created by the <AutoText text="ImagePlug"/> plug-in, so we need to access its configuration settings. To do this, switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Plugins"/> from the list on the left. Double-click <AutoText text="plugin ImagePlug"/> to pop up a window that shows its settings. (Alternatively, select <AutoText text="plugin ImagePlug"/> with a single click and then click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> further down the screen). Currently all options are off, so standard defaults are used. Select <AutoText text="thumbnailsize"/>, set it to <AutoText text="50"/>, and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    9121326</NumberedItem>
    9131327<NumberedItem>
     
    9171331<Text id="0368">Once you have seen the result of the change, return to the <AutoText key="glidict::GUI.Design"/> panel, select the configuration options for <AutoText text="ImagePlug"/>, and switch the <AutoText text="thumbnailsize"/> option off so that the thumbnail reverts to its normal size when the collection is re-built.</Text>
    9181332</NumberedItem>
    919 <Comment>
    920 <Text id="0369">Now add metadata that describes the photos in the collection. Again, for illustration, we focus on the first three images (<Path>Bear.jpg</Path>, <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>).</Text>
    921 </Comment>
    922 <Heading>
    923 <Text id="0370">Adding Description metadata</Text>
    924 </Heading>
    925 <NumberedItem>
    926 <Text id="0371">Switch to the <AutoText key="glidict::GUI.Enrich"/> panel and select <Path>Bear.jpg</Path>. We'll store our description in the <AutoText key="metadata::dc.Description"/> metadata element, so select it now in the right-hand panel.</Text>
    927 </NumberedItem>
    928 <Comment>
    929 <Text id="0372">What description should you enter? To remind yourself of a file's content, the Librarian Interface lets you open files by double-clicking them. It launches the appropriate application based on the filename extension, Word for .doc files, Acrobat for .pdf files and so on. Double-click Ascent.jpg: on Windows, the image will normally be displayed by Microsoft's Photo Editor (although this depends on how your computer has been set up).</Text>
    930 </Comment>
    931 <NumberedItem>
    932 <Text id="0373">Back in the Librarian Interface enter the text <b>Bear in the Rocky Mountains</b> as the <AutoText key="metadata::dc.Description"/> field's value and click <b>Enter</b> to have it added.</Text>
    933 </NumberedItem>
    934 <NumberedItem>
    935 <Text id="0374">Repeat this process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>, adding a suitable description for each.</Text>
    936 </NumberedItem>
    937 <NumberedItem>
    938 <Text id="0375">Build the collection again, to incorporate the new metadata.</Text>
    939 </NumberedItem>
    940 <NumberedItem>
    941 <Text id="0376">Now update the format statement to use the new <AutoText key="metadata::dc.Description"/> metadata. Switch back to the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, and ensure the <AutoText key="glidict::CDM.FormatManager.Feature"/> box is blank, and <AutoText text="VList" /> is selected in the <AutoText key="glidict::CDM.FormatManager.Part"/> box. In the <AutoText key="glidict::CDM.FormatManager.Editor"/>, place your cursor after the text that says</Text>
    942 <Format>[dc.Title]&lt;br&gt;</Format>
    943 </NumberedItem>
    944 <NumberedItem>
    945 <Text id="0377">and add the following text:</Text>
    946 <Format>Description: [dc.Description]&lt;br&gt;</Format>
    947 </NumberedItem>
    948 <NumberedItem>
    949 <Text id="0378">Then click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
    950 </NumberedItem>
    951 <NumberedItem>
    952 <Text id="0379"><b>Preview</b> the result (you don't need to build the collection as was done in step 22 to incorporate the metadata, because changes to format statements take effect immediately). Each image's description should appear beside the thumbnail, following the title.</Text>
    953 </NumberedItem>
    9541333<Heading>
    9551334<Text id="0380">Adding a browsing classifier based on Description metadata</Text>
    9561335</Heading>
    9571336<NumberedItem>
    958 <Text id="0381">Switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Classifiers"/> from the left-hand list. Set the menu item for <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> to <AutoText text="AZList" />; then click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>.</Text>
     1337<Text id="0381">Now we'll add a new browsing option based on the descriptions. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Classifiers"/> from the left-hand list. Set the menu item for <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> to <AutoText text="AZList" />; then click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>.</Text>
    9591338</NumberedItem>
    9601339<NumberedItem>
     
    9751354</Content>
    9761355</Tutorial>
    977 <Tutorial id="large_html_collection">
    978 <Title>
    979 <Text id="0387">A large collection of HTML files&mdash;Tudor</Text>
    980 </Title>
    981 <SampleFiles folder="tudor"/>
    982 <Version initial="2.60" current="2.70"/>
    983 <Content>
    984 <NumberedItem>
    985 <Text id="0388">Invoke the Greenstone Librarian Interface (from the Windows <i>Start</i> menu) and start a new collection called <b>tudor</b> (use the <AutoText key="glidict::Menu.File"/> menu). Fill out the pop-up dialog with appropriate values and leave <b>Dublin Core</b>, which is selected by default, as the metadata set.</Text>
    986 </NumberedItem>
    987 <NumberedItem>
    988 <Text id="0389">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>tudor</Path> folder in <Path>sample_files</Path>.</Text>
    989 </NumberedItem>
    990 <NumberedItem>
    991 <Text id="0390">Drag <Path>englishhistory.net</Path> from the left-hand side to the right to include it in your <b>tudor</b> collection.</Text>
    992 </NumberedItem>
    993 <NumberedItem>
    994 <Text id="0391">Switch to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>.</Text>
    995 </NumberedItem>
    996 <NumberedItem>
    997 <Text id="0392">When building has finished, <b>preview</b> the collection.</Text>
    998 </NumberedItem>
    999 <NumberedItem>
    1000 <Text id="0393">The browsing facilities in this collection (<i>titles a-z</i> and <i>filenames</i>) are based entirely on extracted metadata. Return to the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text>
    1001 </NumberedItem>
    1002 <Comment>
    1003 <Text id="0394">You've probably noticed that the collection contains a few stray image files, as well as the HTML documents. This is a mistake. The issue is that many of the HTML documents include images, and although Greenstone attempts to determine which images belong to HTML pages and only considers other images for inclusion in the collection, in this case it hasn't been completely successful. (This is because the web site from which these files were downloaded occasionally departs from the usual convention of hierarchical structuring.)</Text>
    1004 </Comment>
    1005 <NumberedItem>
    1006 <Text id="0395">Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Beside <AutoText text="plugin HTMLPlug"/> you will see <AutoText text="-smart_block"/>. This is the option that attempts to identify images in the HTML pages and block them from inclusion&mdash;in this case, it's not smart enough! Select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Scroll down the page to locate the <AutoText text="smart_block"/> option and switch it off. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    1007 </NumberedItem>
    1008 <NumberedItem>
    1009 <Text id="0396">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> and <b>preview</b> the collection. The collection is exactly as before except that these stray images are suppressed. What is happening is that plug-ins operate as a pipeline: files are passed to each one in turn until one is found that can process it. By default (i.e. without <AutoText text="smart_block"/>) the HTML plug-in blocks <i>all</i> images, which is appropriate for this collection.</Text>
    1010 </NumberedItem>
    1011 <Heading>
    1012 <Text id="0397">Looking at different views of the files in the Gather and Enrich panels</Text>
    1013 </Heading>
    1014 <NumberedItem>
    1015 <Text id="0398">Switch to the <AutoText key="glidict::GUI.Gather"/> panel and in the right-hand side open <Path>englishhistory.net &rarr; tudor</Path>.</Text>
    1016 </NumberedItem>
    1017 <NumberedItem>
    1018 <Text id="0400">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu for the right-hand side from <AutoText key="glidict::Filter.All_Files"/> to <AutoText key="glidict::Filter.0"/>. Notice the files displayed above are filtered accordingly, to show only files of this type.</Text>
    1019 </NumberedItem>
    1020 <NumberedItem>
    1021 <Text id="0401">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu to <AutoText key="glidict::Filter.3"/>. Again, the files shown above alter.</Text>
    1022 </NumberedItem>
    1023 <NumberedItem>
    1024 <Text id="0402">Now return the <AutoText key="glidict::Filter.Filter_Tree"/> setting back to <AutoText key="glidict::Filter.All_Files"/>, otherwise you may get confused later. Remember, if the <AutoText key="glidict::GUI.Gather"/> or <AutoText key="glidict::GUI.Enrich"/> panels do not seem to be showing all your files, this could be the problem.</Text>
    1025 </NumberedItem>
    1026 </Content>
    1027 
    1028 </Tutorial>
     1356
    10291357<Tutorial id="export_to_CDROM">
    10301358<Title>
     
    10411369</NumberedItem>
    10421370<NumberedItem>
    1043 <Text id="0406">Choose <Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_CDimage"/></Menu>, and in the popup window select the <b>tudor</b> collection as the collection to export. You can optionally name the CD-ROM; otherwise the default <AutoText text="collections" type="quoted"/> is used. Do so now, entering <AutoText text="Tudor collection" type="quoted"/> in the field for <AutoText key="glidict::WriteCDImagePrompt.CD_Name"/>; then click <AutoText key="glidict::WriteCDImagePrompt.Export" type="button"/>.</Text>
     1371<Text id="0406">Choose <Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_CDimage"/></Menu>. In the resulting popup window, select the collection or collections that you wish to export by ticking their check boxes. You can optionally enter a name for the CD-ROM: this is the name that will appear in the menu when the CDROM is run. If a name is not entered, the default <AutoText text="Greenstone Collections"/> will be used. Click <AutoText key="glidict::WriteCDImagePrompt.Export" type="button"/>.</Text>
    10441372<Text id="0408">The necessary files for export are written to:</Text>
    1045 <Path>Greenstone &rarr; tmp &rarr; exported_Tudorcollection</Path>
    1046 <Comment>
    1047 <Text id="0408b">Note, if you didn't specify a name for the CD-ROM, then the folder name will be <Path>exported_collections</Path> instead of <Path>exported_Tudorcollections</Path>.</Text>
    1048 </Comment>
    1049 <Text id="0409">You need to use your own computer's software to write these on to CD-ROM. On <i>Windows XP</i> this ability is built into the operating system: assuming you have a CD-ROM or DVD writer insert a blank disk into the drive and drag the contents of <Path>exported_Tudorcollection</Path> into the folder that represents the disk.</Text>
     1373<Path>Greenstone &rarr; tmp &rarr; exported_xxx</Path>
     1374<Text id="0408a">where xxx will be similar to the name you have entered. If you didn't specify a name for the CD-ROM, then the folder name will be <Path>exported_collections</Path>.</Text>
     1375<Text id="0409">You need to use your own computer's software to write these on to CD-ROM. On <i>Windows XP</i> this ability is built into the operating system: assuming you have a CD-ROM or DVD writer insert a blank disk into the drive and drag the <i>contents</i> of <Path>exported_xxx</Path> into the folder that represents the disk.</Text>
    10501376<Comment>
    10511377<Text id="0410">The result will be a self-installing Windows Greenstone CD-ROM or DVD, which starts the installation process as soon as it is placed in the drive.</Text>
     
    10651391</Comment>
    10661392<NumberedItem>
    1067 <Text id="0413">Start a new collection called <b>webtudor</b>, and base it on the <b>tudor</b> collection.</Text>
     1393<Text id="0413">Start a new collection called <b>webtudor</b>, and base it on <AutoText key="glidict::NewCollectionPrompt.NewCollection"/></Text>
    10681394</NumberedItem>
    10691395<NumberedItem>
    10701396<Text id="0414">In a web browser, visit <Link>http://englishhistory.net</Link>, follow the link to <i>Tudor England</i>, and click &lt;<b>Enter</b>&gt;. You should be at the URL</Text>
    10711397<Link>http://englishhistory.net/tudor/contents.html</Link>
    1072 <Text id="0415">This is where we started the downloading process to obtain the files you have been using for the <b>tudor</b> collection.</Text>
    1073 </NumberedItem>
    1074 <NumberedItem>
    1075 <Text id="0416">You could do the same thing by copying this URL from the web browser, pasting it into the <AutoText key="glidict::GUI.Download"/> panel, and clicking the <AutoText key="glidict::Mirroring.Download" type="button"/> button. However, several megabytes will be downloaded, which might strain your network resources&mdash;or your patience! For a faster exercise we focus on a smaller section of the site. In the <AutoText key="glidict::GUI.Download"/> panel, enter this URL</Text>
     1398<Text id="0415">This is where we started the downloading process to obtain the files you have been using for the <b>tudor</b> collection. You could do the same thing by copying this URL from the web browser, pasting it into the <AutoText key="glidict::GUI.Download"/> panel, and clicking the <AutoText key="glidict::Mirroring.Download" type="button"/> button. However, several megabytes will be downloaded, which might strain your network resources&mdash;or your patience! For a faster exercise we focus on a smaller section of the site. </Text>
     1399</NumberedItem>
     1400<NumberedItem>
     1401<Text id="0415a">In the <AutoText key="glidict::GUI.Download"/> panel, enter this URL</Text>
    10761402<Link>http://englishhistory.net/tudor/citizens/</Link>
    1077 <Text id="0417">into the <AutoText key="glidict::Mirroring.Source_URL"/> box. There are several options that govern how the download process proceeds. To copy the <i>citizens</i> section of the website, select <AutoText key="glidict::Mirroring.Higher_Directories"/>. If you don't do this (or if you miss out the terminating "/"), the downloading process will follow links to other areas of the <i>englishhistory.net</i> website and grab those as well. Set <AutoText key="glidict::Mirroring.Download_Depth"/> to <AutoText key="glidict::Mirroring.Download_Depth.Unlimited"/>&mdash;we want to follow as many links as necessary to download all the pages.</Text>
    1078 </NumberedItem>
    1079 <NumberedItem>
    1080 <Text id="0418">Now click <AutoText key="glidict::Mirroring.Download" type="button"/>. A progress bar appears in the lower half of the panel that reports on how the downloading process is doing.</Text>
     1403<Text id="0417">into the <AutoText key="glidict::Mirroring.Source_URL"/> box. There are several options that govern how the download process proceeds. To copy just the <i>citizens</i> section of the website, select <AutoText key="glidict::Mirroring.Higher_Directories"/>. If you don't do this (or if you miss out the terminating "/"), the downloading process will follow links to other areas of the <i>englishhistory.net</i> website and grab those as well. Set <AutoText key="glidict::Mirroring.Download_Depth"/> to <AutoText key="glidict::Mirroring.Download_Depth.Unlimited"/>&mdash;we want to follow as many links as necessary to download all the pages.</Text>
     1404</NumberedItem>
     1405<NumberedItem>
     1406<Text id="0417a">If your computer is behind a firewall or proxy server, youwill need to edit the proxy settings in the Librarian Interface. Open the <AutoText key="glidict::Preferences.Connection"/> tab in <Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_Options"/></Menu> and switch on the <AutoText key="glidict::Preferences.Connection.Use_Proxy"/> checkbox. Enter the proxy server address and port number in the <AutoText key="glidict::Preferences.Connection.Proxy_Host"/> and <AutoText key="glidict::Preferences.Connection.Proxy_Port"/> boxes. Click <AutoText key="General.OK" type="button"/>.</Text>
     1407</NumberedItem>
     1408<NumberedItem>
     1409<Text id="0418">Now click <AutoText key="glidict::Mirroring.Download" type="button"/>. If you have set proxy information in <AutoText key="glidict::Menu.File_Options"/>, a popup will ask for you user name and password. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing.</Text>
    10811410<Comment>
    10821411<Text id="0419">More detailed information can be obtained by clicking <AutoText key="glidict::Mirroring.DownloadJob.Log" type="button"/>. The process can be paused and restarted as needed, or stopped altogether by clicking <AutoText key="glidict::Mirroring.DownloadJob.Close" type="button"/>. Downloading can be a lengthy process involving multiple sites, and so Greenstone allows additional downloads to be queued up. When new URLs are pasted into the <AutoText key="glidict::Mirroring.Source_URL"/> box and <AutoText key="glidict::Mirroring.Download" type="button"/> clicked, a new progress bar is appended to those already present in the lower half of the panel. When the currently active download item completes, the next is started automatically.</Text>
     
    11031432</NumberedItem>
    11041433<NumberedItem>
    1105 <Text id="0425">In the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Plugins"/> section, then select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Locate the <AutoText text="file_is_url"/> option (about halfway down the first block of items) and switch it on. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1434<Text id="0425">In the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Plugins"/> section, then select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Locate the <AutoText text="file_is_url"/> option (about halfway down the first block of items) and switch it on. While you are there, switch off the <AutoText text="smart_block"/> option so that stray images are not processed. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    11061435<Text id="0426">Setting this option to the <AutoText text="HTMLPlug"/> means that Greenstone sets an additional piece of metadata for each document called <AutoText text="URL"/>, which gives its original URL.</Text>
    11071436<Text id="0427">It is important that the files gathered in the collection start with the web domain name (<i>englishhistory.net</i> in this case). The conversion process will not work if you dragged over a subfolder, for example the <Path>tudor</Path> folder, because this will set <AutoText text="URL"/> metadata to something like</Text>
     
    11231452</NumberedItem>
    11241453<NumberedItem>
    1125 <Text id="0433">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> and <b>preview</b> the collection. The collection behaves exactly as before, except that when you click a document icon your web browser retrieves the original document from the web (assuming it is still there by the time you do this exercise!). If you are working offline you will be unable to retrieve the document.</Text>
     1454<Text id="0433">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> and <b>preview</b> the collection. Note that the document icons have changed. The collection behaves exactly as before, except that when you click a document icon your web browser retrieves the original document from the web (assuming it is still there by the time you do this exercise!). If you are working offline you will be unable to retrieve the document.</Text>
    11261455</NumberedItem>
    11271456</Content>
     
    11291458<Tutorial id="enhanced_html_collection">
    11301459<Title>
    1131 <Text id="0434">Enhanced collection of HTML files</Text>
     1460<Text id="0434">Enhanced collection of HTML files&mdash;Tudor</Text>
    11321461</Title>
    11331462<Prerequisite id="large_html_collection"/>
     
    11381467</Comment>
    11391468<Heading>
    1140 <Text id="0437">Adding hierarchically-structured metadata and a Hierarchy classifier</Text>
    1141 </Heading>
    1142 <NumberedItem>
    1143 <Text id="0438">Open up your <b>tudor</b> collection (the original version, not the <b>webtudor</b> version), switch to the <AutoText key="glidict::GUI.Enrich"/> panel and select the <Path>monarchs</Path> folder (a subfolder of <Path>tudor</Path>). Set its <b>dc.Subject and Keywords</b> metadata to <b>Tudor period|Monarchs</b>. (For brevity, we refer to this metadata element in future simply as <b>dc.Subject</b>.) The vertical bar ("|") is a hierarchy marker. Selecting a <i>folder</i> and adding metadata has the effect of setting this metadata value for all files contained in this folder, its subfolders, and so on. A popup alerts you to this fact.</Text>
    1144 </NumberedItem>
    1145 <NumberedItem>
    1146 <Text id="0439">Repeat for the <Path>relative</Path> and <Path>citizens</Path> folders, setting their <AutoText key="metadata::dc.Subject"/> metadata to <b>Tudor period|Relatives</b> and <b>Tudor period|Citizens</b> respectively. Note that the hierarchy appears in the <AutoText key="glidict::EnrichPane.ExistingValues" args="dc.Subject and Keywords"/> area.</Text>
    1147 </NumberedItem>
    1148 <NumberedItem>
    1149 <Text id="0440">Finally, select all remaining files&mdash;the ones that are not in the <Path>monarchs</Path>, <Path>relative</Path>, and <Path>citizens</Path> folders&mdash;by selecting the first and shift-clicking the last. Set their <AutoText key="metadata::dc.Subject"/> metadata to <b>Tudor period|Others</b>: this is done in a single operation (there is a short delay before it completes).</Text>
     1469<Text id="0437">Adding hierarchically-structured metadata and a <AutoText text="Hierarchy"/> classifier</Text>
     1470</Heading>
     1471<NumberedItem>
     1472<Text id="0438">Open up your <b>tudor</b> collection (the original version, not the <b>webtudor</b> version), switch to the <AutoText key="glidict::GUI.Enrich"/> panel and select the <Path>citizens</Path> folder (a subfolder of <Path>englishhistory.net &rarr; tudor</Path>). Set its <AutoText key="metadata::dc.Subject"/> metadata to <b>Tudor period|Citizens</b>. The vertical bar ("|") is a hierarchy marker. Selecting a <i>folder</i> and adding metadata has the effect of setting this metadata value for all files contained in this folder, its subfolders, and so on. A popup alerts you to this fact. Click <AutoText key="glidict::General.OK" type="button"/> to close the popup.</Text>
     1473</NumberedItem>
     1474<NumberedItem>
     1475<Text id="0439">Repeat for the <Path>monarchs</Path> and <Path>relative</Path> folders, setting their <AutoText key="metadata::dc.Subject"/> metadata to <b>Tudor period|Monarchs</b> and <b>Tudor period|Relatives</b> respectively. Note that the hierarchy appears in the <AutoText key="glidict::EnrichPane.ExistingValues" args="dc.Subject and Keywords"/> area.</Text>
     1476<Text id="0439a">If you don't want to see the popup each time you add folder level metadata, tick the <AutoText key="glidict::WarningDialog.Dont_Show_Again"/> checkbox; it won't be displayed again.</Text>
     1477</NumberedItem>
     1478<NumberedItem>
     1479<Text id="0440">Finally, select all remaining files&mdash;the ones that are not in the <Path>citizens</Path>, <Path>monarchs</Path>, or <Path>relative</Path> folders&mdash;by selecting the first and shift-clicking the last. Set their <AutoText key="metadata::dc.Subject"/> metadata to <b>Tudor period|Others</b>: this is done in a single operation (there is a short delay before it completes).</Text>
     1480<Text id="0440a">When multiple files are selected in the left hand collection tree, all metadata values for all files are shown on the right hand side. Items that are common to all files are displayed in black&mdash;e.g. <AutoText key="metadata::dc.Subject"/>&mdash;which others that pertain to only one or some of the files are displayed in grey&mdash;e.g. any extracted metadata.</Text>
     1481<Text id="0440a">Metadata inherited from a parent folder is indicated by a folder icon to the left of the metadata name. Select on of the files in the <Path>relative</Path> folder to see this.</Text>
    11501482</NumberedItem>
    11511483<NumberedItem>
     
    11531485</NumberedItem>
    11541486<NumberedItem>
    1155 <Text id="0442">A window pops up to control the classifier's options. Change the <b>metadata</b> to <AutoText key="metadata::dc.Subject"/> and then click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1487<Text id="0442">A window pops up to control the classifier's options. Change the <AutoText text="metadata"/> to <AutoText key="metadata::dc.Subject"/> and then click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    11561488</NumberedItem>
    11571489<NumberedItem>
     
    11611493<Text id="0444">Now switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Choose the new <AutoText key="coredm::_Global:labelSubject_"/> link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.</Text>
    11621494</NumberedItem>
    1163 <Comment>
    1164 <Text id="0445">Next we partition the full-text index into four separate pieces. To do this we first define four subcollections obtained by "filtering" the documents according to a criterion based on their <AutoText key="metadata::dc.Subject"/> metadata. Then an index is assigned to each subcollection.</Text>
    1165 </Comment>
     1495<Heading>
     1496<Text id="0457">Adding a hierarchical phrase browser (PHIND)</Text>
     1497</Heading>
     1498<Comment>
     1499<Text id="0457a">Next we'll add an interactive hierarchical phrase browsing classifier to this collection.</Text>
     1500</Comment>
     1501<NumberedItem>
     1502<Text id="0458">Switch to the <AutoText key="glidict::GUI.Design"/> panel and choose the <AutoText key="glidict::CDM.GUI.Classifiers"/> item from the left-hand list.</Text>
     1503</NumberedItem>
     1504<NumberedItem>
     1505<Text id="0459">Choose <AutoText text="Phind"/> from the <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> menu. Click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>. A window pops asking for configuration options: leave the values at their preset defaults (this will base the phrase index on the full text) and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1506</NumberedItem>
     1507<NumberedItem>
     1508<Text id="0460"><b>Build</b> the collection again, <b>preview</b> it, and try out the new <AutoText key="coredm::_Global:labelphrases_"/> option in the navigation bar. An interesting PHIND search term for this collection is <AutoText text="king" type="quoted"/>.</Text>
     1509</NumberedItem>
    11661510<Heading>
    11671511<Text id="0446">Partitioning the full-text index based on metadata values</Text>
    11681512</Heading>
     1513<Comment>
     1514<Text id="0445">Next we partition the full-text index into four separate pieces. To do this we first define four subcollections obtained by "filtering" the documents according to a criterion based on their <AutoText key="metadata::dc.Subject"/> metadata. Then an index is assigned to each subcollection. This will enable users to restrict a search to a subset of the documents.</Text>
     1515</Comment>
     1516
    11691517<NumberedItem>
    11701518<Text id="0447">Switch to the <AutoText key="glidict::GUI.Design"/> panel, and click <AutoText key="glidict::CDM.GUI.Subcollections"/>. This feature is disabled because you are operating in <AutoText key="glidict::Preferences.Mode.Librarian"/> mode (this is indicated in the title bar at the top of the window).</Text>
    11711519</NumberedItem>
    11721520<NumberedItem>
    1173 <Text id="0448">Switch to <AutoText key="glidict::Preferences.Mode.Systems"/> mode by going to <AutoText key="glidict::Menu.File_Options"/> (on the <AutoText key="glidict::Menu.File"/> menu) and clicking <AutoText key="glidict::Preferences.Mode" type="button"/>. Read about the other modes too. Note that the mode appears in the title bar.</Text>
    1174 </NumberedItem>
    1175 <NumberedItem>
    1176 <Text id="0449">Return to the <AutoText key="glidict::CDM.GUI.Subcollections"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Ensure that the <AutoText key="glidict::CDM.SubcollectionManager.Subcollection_Controls"/> tab is selected (the default). Define a subcollection filter with name <b>monarchs</b> that matches against <b>dc.Subject and Keywords,</b> and type <b>Monarchs</b> as the regular expression to match with. Click <AutoText key="glidict::CDM.SubcollectionManager.Add" type="button"/>. This filter includes any file whose <AutoText key="metadata::dc.Subject"/> metadata contains the word <i>Monarchs</i>.</Text>
     1521<Text id="0448">Switch to <AutoText key="glidict::Preferences.Mode.Systems"/> mode by going to <AutoText key="glidict::Menu.File_Options"/> (on the <AutoText key="glidict::Menu.File"/> menu) and clicking <AutoText key="glidict::Preferences.Mode" type="button"/>. Read about the other modes too.</Text>
     1522</NumberedItem>
     1523<NumberedItem>
     1524<Text id="0449">Return to the <AutoText key="glidict::CDM.GUI.Subcollections"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Ensure that the <AutoText key="glidict::CDM.SubcollectionManager.Subcollection_Controls"/> tab is selected (the default). Define a subcollection filter with name <b>monarchs</b> that matches against <AutoText key="metadata::dc.Subject"/>, and type <b>Monarchs</b> as the regular expression to match with. Click <AutoText key="glidict::CDM.SubcollectionManager.Add" type="button"/>. This filter includes any file whose <AutoText key="metadata::dc.Subject"/> metadata contains the word <i>Monarchs</i>.</Text>
    11771525</NumberedItem>
    11781526<NumberedItem>
     
    11801528</NumberedItem>
    11811529<NumberedItem>
    1182 <Text id="0451">Having defined the subcollections, we partition the index into corresponding parts. Click the <AutoText key="glidict::CDM.SubcollectionManager.Subindex_Controls"/> tab. Select the first subcollection and give it the name <b>citizens</b>; click <AutoText key="glidict::CDM.SubcollectionIndexManager.Add_Subindex" type="button"/>. Repeat for the other three subcollections, naming their partitions <b>monarchs</b>, <b>others</b> and <b>relatives</b>. <b>Build</b> and <b>preview</b> the collection.</Text>
     1530<Text id="0451">Having defined the subcollections, we partition the index into corresponding parts. Click the <AutoText key="glidict::CDM.SubcollectionManager.Subindex_Controls"/> tab. Select the first subcollection and give it the name <b>citizens</b>; click <AutoText key="glidict::CDM.SubcollectionIndexManager.Add_Subindex" type="button"/>. Repeat for the other three subcollections, naming their partitions <b>monarchs</b>, <b>others</b> and <b>relatives</b>.</Text>
     1531<Text id="0451a">The order they appear in the <AutoText key="glidict::CDM.SubcollectionIndexManager.Subindexes"/> list os the order they will appear in the drop down menu on the search page. You can change the order by using the <AutoText key="glidict::CDM.Move.Move_Up" type="button"/> and <AutoText key="glidict::CDM.Move.Move_Down" type="button"/> buttons.</Text>
     1532</NumberedItem>
     1533<NumberedItem>
     1534<Text id="0451b"><b>Build</b> and <b>preview</b> the collection.</Text>
    11831535</NumberedItem>
    11841536<NumberedItem>
     
    11861538</NumberedItem>
    11871539<NumberedItem>
    1188 <Text id="0453">To allow users to search the collection as a whole as well as each subcollection individually, return to the <AutoText key="glidict::CDM.GUI.Subcollections"/> section of the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.SubcollectionManager.Subindex_Controls"/> tab.<b></b> Type <b>all</b> into the <AutoText key="glidict::CDM.SubcollectionIndexManager.PartitionName"/> and select all four subcollections by checking their boxes.</Text>
     1540<Text id="0453">To allow users to search the collection as a whole as well as each subcollection individually, return to the <AutoText key="glidict::CDM.GUI.Subcollections"/> section of the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.SubcollectionManager.Subindex_Controls"/> tab. Type <b>all</b> into the <AutoText key="glidict::CDM.SubcollectionIndexManager.PartitionName"/> and select all four subcollections by checking their boxes. Click <AutoText key="glidict::CDM.SubcollectionIndexManager.Add_Subindex" type="button"/>.</Text>
    11891541</NumberedItem>
    11901542<NumberedItem>
     
    11981550</NumberedItem>
    11991551<Heading>
    1200 <Text id="0457">Adding a hierarchical phrase index (PHIND)</Text>
    1201 </Heading>
    1202 <NumberedItem>
    1203 <Text id="0458">Switch to the <AutoText key="glidict::GUI.Design"/> panel and choose the <AutoText key="glidict::CDM.GUI.Classifiers"/> item from the left-hand list.</Text>
    1204 </NumberedItem>
    1205 <NumberedItem>
    1206 <Text id="0459">Choose <AutoText text="Phind"/> from the <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> menu. Click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>. A window pops asking for configuration options: leave the values at their preset defaults (this will base the phrase index on the full text) and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    1207 </NumberedItem>
    1208 <NumberedItem>
    1209 <Text id="0460"><b>Build</b> the collection again, <b>preview</b> it, and try out the new <AutoText key="coredm::_Global:labelphrases_"/> option in the navigation bar. An interesting PHIND search term for this collection is <AutoText text="king" type="quoted"/>.</Text>
    1210 </NumberedItem>
     1552<Text id="0462">Controlling the building process</Text>
     1553</Heading>
    12111554<Comment>
    12121555<Text id="0461">Finally we look at how the building process can be controlled. Developing a new collection usually involves numerous cycles of building, previewing, adjusting some enrich and design features, and so on. While prototyping, it is best to temporarily reduce the number of documents in the collection. This can be accomplished through the <AutoText text="maxdocs"/>  parameter to the building process.</Text>
    12131556</Comment>
    1214 <Heading>
    1215 <Text id="0462">Controlling the building process</Text>
    1216 </Heading>
    12171557<NumberedItem>
    12181558<Text id="0463">Switch to the <AutoText key="glidict::GUI.Create"/> panel and view the options that are displayed in the top portion of the screen. Select <AutoText text="maxdocs"/> and set its numeric counter to <AutoText text="3"/>. Now <b>build</b>.</Text>
     
    12261566</Content>
    12271567</Tutorial>
    1228 <Tutorial id="format_and_macros">
     1568<Tutorial id="formatting_tudor">
    12291569<Title>
    1230 <Text id="0465">Learning about formats and macros</Text>
     1570<Text id="0465">Formatting the HTML collection&mdash;Tudor</Text>
    12311571</Title>
    12321572<Prerequisite id="large_html_collection"/>
    12331573<Version initial="2.60" current="2.70"/>
    12341574<Content>
    1235 <Comment>
    1236 <Text id="0466">Format statements and macro files allow you to customize the appearance of Greenstone collections. They are very powerful, but complex and hard to learn. This tutorial exercise gives an introduction to the facilities they provide.</Text>
    1237 </Comment>
    1238 <Heading>
    1239 <Text id="0467">Experimenting with format statements</Text>
    1240 </Heading>
    12411575<NumberedItem>
    12421576<Text id="0468">Open up your <b>tudor</b> collection, go to the <AutoText key="glidict::GUI.Design"/> panel (by clicking on its tab) and select <AutoText key="glidict::CDM.GUI.Formats"/> from the left-hand list. Leave the editing controls at their default value, so that <AutoText key="glidict::CDM.FormatManager.Feature"/> remains blank and <AutoText text="VList"/> is selected as the <AutoText key="glidict::CDM.FormatManager.Part"/>. The text in the <AutoText key="glidict::CDM.FormatManager.Editor"/> box reads as follows:</Text>
     
    13271661<Text id="0498">Go to the <AutoText key="glidict::GUI.Create"/> panel, click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>, and examine the subject hierarchy again to see the effect of your changes.</Text>
    13281662</NumberedItem>
    1329 <Heading>
    1330 <Text id="0499">Collection-specific macros</Text>
    1331 </Heading>
    1332 <Comment>
    1333 <Text id="0500">The appearance of all pages produced by Greenstone is governed by macro files, which reside in the folder <Path>Greenstone &rarr; macros</Path>. The garish example collection is a version of the demo collection with bizarre layout and coloring. Now we apply the same bizarre layout and coloring to the tudor collection.</Text>
    1334 </Comment>
    1335 <NumberedItem>
    1336 <Text id="0505">Go to the folder <Path>Greenstone &rarr; collect &rarr; garish &rarr; macros</Path>. Copy the file <Path>extra.dm</Path>. Now go to your collection folder <Path>Greenstone &rarr; collect &rarr; tudor</Path> and create a new folder in there called <Path>macros</Path>. Paste <Path>extra.dm</Path> into that new folder. The overall effect is that you have created a new file <Path>Greenstone &rarr; collect &rarr; tudor &rarr; macros &rarr; extra.dm</Path>.</Text>
    1337 </NumberedItem>
    1338 <NumberedItem>
    1339 <Text id="0505a">This macro file uses a CSS style file and some images which you will also need to copy from the garish collection. Go to the folder <Path>Greenstone &rarr; collect &rarr; garish &rarr; images</Path>. Select the three files <Path>style.css</Path>, <Path>horzline.gif</Path> and <Path>bg_blue.gif</Path>. <b>Copy</b> these files and paste them into the <Path>Greenstone &rarr; collect &rarr; tudor &rarr; images</Path> folder.</Text>
    1340 </NumberedItem>
    1341 <NumberedItem>
    1342 <Text id="0507">Go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>. The content of your collection remains the same, but its appearance has changed completely&mdash;for example, all the pages are pink! To learn about how to control these changes, go to the documented example collection called <i>Garish version of demo collection</i>, and read about it.</Text>
    1343 </NumberedItem>
    1344 <Heading>
    1345 <Text id="0512">General macros</Text>
    1346 </Heading>
    1347 <Comment>
    1348 <Text id="0513">You can also use macros to completely change the appearance of your Greenstone site. Like the above exercise, what follows is just a lead-in to illustrate what is possible and show you where to look to achieve different kinds of effects.</Text>
    1349 </Comment>
    1350 <NumberedItem>
    1351 <Text id="0514">Exit from the Librarian Interface, since it is concerned with individual collections and we are now dealing with the site as a whole.</Text>
    1352 </NumberedItem>
    1353 <NumberedItem>
    1354 <Text id="0515">Go to the folder <Path>Greenstone &rarr; etc</Path> and edit the file called <Path>main.cfg</Path>. This is Greenstone's main configuration file, and contains a list of the macros that will be loaded in on startup. One of them, <Path>home.dm</Path>, dictates how the Greenstone home page will look, which is specified in the file <Path>Greenstone &rarr; macros &rarr; home.dm</Path>. This <Path>macros</Path> folder contains an alternative version, called <Path>yourhome.dm</Path>, which is not currently being used. To use it instead, in <Path>main.cfg</Path> change the string <AutoText text="home.dm" type="quoted"/> to <AutoText text="yourhome.dm" type="quoted"/>.</Text>
    1355 </NumberedItem>
    1356 <NumberedItem>
    1357 <Text id="0516">Now restart Greenstone (just the Greenstone Digital Library will do, rather than the Greenstone Librarian Interface). You will find that the appearance of the home page has changed completely.</Text>
    1358 </NumberedItem>
    1359 <NumberedItem>
    1360 <Text id="0517">Instead of substituting <AutoText text="yourhome.dm" type="quoted"/> for <AutoText text="home.dm" type="quoted"/> in the file <Path>main.cfg</Path>, you could have simply edited <Path>home.dm</Path> and left <Path>main.cfg</Path> as it is. However, we wanted to preserve <Path>home.dm</Path> so that you could revert to your original Greenstone home page! Do this now by editing <Path>main.cfg</Path> and changing the string <AutoText text="yourhome.dm" type="quoted"/> back to <AutoText text="home.dm" type="quoted"/>. You will need to re-start Greenstone for this to take effect.</Text>
    1361 </NumberedItem>
    1362 <Comment>
    1363 <Text id="0518">To learn more about macros, read <i>Customizing the Greenstone User Interface</i>, an illustrated guide to customizing the user interface, by Allison Zhang of the Washington Research Library Consortium, available at <Link>http://www.wrlc.org/dcpc/UserInterface/interface.htm</Link>.</Text>
    1364 </Comment>
     1663</Content>
     1664</Tutorial>
     1665<Tutorial id="section_tagging">
     1666<Title>
     1667<Text id="st-1">Section tagging for HTML documents</Text>
     1668</Title>
     1669<Content>
     1670<NumberedItem>
     1671<Text id="st-1a">In a browser, take a look at the Greenstone demo collection. Browse to one of the documents. This collection is based on HTML files, but they appear structured in the collection. This is because these HTML files were tagged by hand into sections.</Text>
     1672</NumberedItem>
     1673<NumberedItem>
     1674<Text id="st-2">Using a text editor (e.g. WordPad) open up one of the HTML files from the demo collection: <Path>Greenstone &rarr; collect &rarr; demo &rarr; import &rarr; fb33fe &rarr;fb33fe.htm</Path>. You will see some HTML comments which contain section information for Greenstone. They look like:</Text>
     1675<Format>
     1676&lt;!--<br/>
     1677&lt;Section&gt;<br/>
     1678&nbsp;&nbsp;&lt;Description&gt;<br/>
     1679&nbsp;&nbsp;&nbsp;&nbsp;&lt;Metadata name="Title"&gt;Farming snails 1: Learning about snails;<br/>
     1680&nbsp;&nbsp;&nbsp;&nbsp;Building a pen; Food and shelter plants&lt;/Metadata&gt;<br/>
     1681&nbsp;&nbsp;&lt;/Description&gt;<br/>
     1682--&gt;<br/>
     1683<br/>
     1684&lt;!--<br/>
     1685&lt;/Section&gt;<br/>
     1686&lt;Section&gt;<br/>
     1687&nbsp;&nbsp;&lt;Description&gt;<br/>
     1688&nbsp;&nbsp;&nbsp;&nbsp;&lt;Metadata name="Title"&gt;Dew and rain&lt;/Metadata&gt;<br/>
     1689&nbsp;&nbsp;&lt;/Description&gt;<br/>
     1690--&gt;
     1691</Format>
     1692<Text id="st-3">When Greenstone encounters a <Format>&lt;Section&gt;</Format> tag in one of these comments, it will start a new subsection of the document. This will be closed when a <Format>&lt;/Section&gt;</Format> tag is encountered. Metadata can also be added for each section&mdash;in this case, <AutoText text="Title"/> metadata has been added for each section. In the browser, find the <AutoText text="Farming snails 1"/> document in the demo collection (through the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> browser). Look at its table of contents and compare it to the <Format>&lt;Section&gt;</Format> tags in the HTML document.</Text>
     1693</NumberedItem>
     1694<NumberedItem>
     1695<Text id="st-4">Add a new Section into this document. For example, add a new subsection into the <AutoText text="Introduction"/> chapter. In the text editor, add the following just after the Section tag for the <AutoText text="Introduction"/> section:</Text>
     1696<Format>
     1697&lt;!--<br/>
     1698&lt;Section&gt;<br/>
     1699&nbsp;&nbsp;&lt;Description&gt;<br/>
     1700&nbsp;&nbsp;&nbsp;&nbsp;&lt;Metadata name="Title"&gt;Snails are good to eat.&lt;/Metadata&gt;<br/>
     1701&nbsp;&nbsp;&lt;/Description&gt;<br/>
     1702--&gt;
     1703</Format>
     1704<Text id="st-5">Then just before the next section tag (<AutoText text="What do you need to start?"/>), add the following:</Text>
     1705<Format>
     1706&lt;!--<br/>
     1707&lt;/Section&gt;<br/>
     1708--&gt;
     1709</Format>
     1710<Text id="st-6">The effect of these changes is to make a new subsection inside the <AutoText text="Introduction"/> chapter.</Text>
     1711</NumberedItem>
     1712<NumberedItem>
     1713<Text id="st-7">Open the Greenstone demo collection in the Librarian Interface. In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, note that <AutoText text="HTMLPlug"/> has the <AutoText text="description_tags"/> option set. This option is needed when Section tags are used in the source documents.</Text>
     1714<Text id="st-8">The <AutoText text="metadata_fields"/> option is not valid when <AutoText text="description_tags"/> is set&mdash;all metadata is expected to be in the Section tags if they are being used.</Text>
     1715</NumberedItem>
     1716<NumberedItem>
     1717<Text id="st-9"><b>Build</b> and <b>preview</b> the collection. Look at the <AutoText text="Farming snails 1"/> document again and check that your new section has been added.</Text>
     1718</NumberedItem>
    13651719</Content>
    13661720</Tutorial>
     
    13721726<Version initial="2.60" current="2.70"/>
    13731727<Content>
    1374 <NumberedItem>
    1375 <Text id="0521">Start a new collection called <b>Beatles Bibliography</b>. Enter the requested information and make it a <b>New Collection</b>. There is no need to include any metadata sets because the metadata extracted from the MARC records will appear as extracted metadata. Deselect the <b>Dublin Core</b> metadata set, and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     1728<Comment>
     1729<Text id="0520a">This exercise looks at adding fielded searching to a collection. Fielded searching is best used for metadata rich collections. Here we use bibliographic data in MARC format. We also "explode" the database, enabling editing of the metadata with the Librarian Interface.</Text>
     1730</Comment>
     1731<NumberedItem>
     1732<Text id="0521">Start a new collection called <b>Beatles Bibliography</b> which will contain a collection of MARC records on the Beatles, from the US Library of Congress. Enter the requested information and base it on <AutoText key="glidict::NewCollectionPrompt.NewCollection"/>. There is no need to include any metadata sets because the metadata extracted from the MARC records will appear as extracted metadata. Deselect the <b>Dublin Core</b> metadata set, and click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    13761733<Text id="0521a">A <AutoText key="glidict::NoMetadataSetsSelected.Title"/> warning message will pop-up, alerting you to the fact that you won't be able to manually assign metadata to the collection. In this collection, all the metadata will come from the MARC file; click <AutoText key="glidict::General.OK" type="button"/> to continue. (If you don't want to see this popup again, tick the <AutoText key="glidict::WarningDialog.Dont_Show_Again"/> checkbox.)</Text>
    13771734</NumberedItem>
    13781735<NumberedItem>
    1379 <Text id="0522">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>marc</Path> folder, drag <Path>locbeatles50.marc</Path> into the right-hand pane and drop it there. A popup window asks whether you want to add <AutoText text="MARCPlug" /> to the collection to process this file. Click <AutoText key="glidict::CDM.PlugInManager.Add" type="button"/>, because this plugin will be needed to process the MARC records.</Text>
    1380 </NumberedItem>
    1381 <NumberedItem>
    1382 <Text id="0523">Remove the plugins <AutoText text="TextPlug" /> to <AutoText text="NULPlug" /> by selecting each one in the <AutoText key="glidict::CDM.PlugInManager.Assigned"/> list and clicking <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/> (<AutoText text="ZIPPlug" />, <AutoText text="GAPlug" /> and <AutoText text="MARCPlug" /> remain). It is not strictly necessary to remove these redundant plugins, but it is good practice to include only plugins that are needed, to avoid accidentally including stray documents.</Text>
     1736<Text id="0522">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>sample_files &rarr; marc</Path> folder, drag <Path>locbeatles50.marc</Path> into the right-hand pane and drop it there. A popup window asks whether you want to add <AutoText text="MARCPlug" /> to the collection to process this file. Click <AutoText key="glidict::CDM.PlugInManager.QuickAdd" type="button"/>, because this plugin will be needed to process the MARC records.</Text>
     1737</NumberedItem>
     1738<NumberedItem>
     1739<Text id="0523">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, remove the plugins <AutoText text="TextPlug" /> to <AutoText text="NULPlug" /> by selecting each one in the <AutoText key="glidict::CDM.PlugInManager.Assigned"/> list and clicking <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/> (<AutoText text="ZIPPlug" />, <AutoText text="GAPlug" /> and <AutoText text="MARCPlug" /> remain). It is not strictly necessary to remove these redundant plugins, but it is good practice to include only plugins that are needed, to avoid unwanted (and unexpected) side effects.</Text>
    13831740</NumberedItem>
    13841741<NumberedItem>
     
    13861743</NumberedItem>
    13871744<NumberedItem>
    1388 <Text id="0525">Switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Browse through the <b>titles a-z</b> and view a record or two. Try searching&mdash;for example, find items that include <AutoText text="George Martin"/>.</Text>
     1745<Text id="0525">Switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Browse through the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> and view a record or two. Try searching&mdash;for example, find items that include <AutoText text="rock music"/>.</Text>
    13891746</NumberedItem>
    13901747<NumberedItem>
     
    13971754<Text id="0528"><b>Build</b> the collection and <b>preview</b> the result.</Text>
    13981755</NumberedItem>
    1399 <NumberedItem>
    1400 <Text id="0529">Make each bookshelf node show how many entries it contains by appending this to the <AutoText key="glidict::CDM.GUI.Formats"/> for the <AutoText text="VList" /> format statement in the <AutoText key="glidict::GUI.Design"/> panel:</Text>
    1401 <Format>{If}{[numleafdocs],&lt;td&gt;&lt;i&gt;([numleafdocs])&lt;/i&gt;&lt;/td&gt;}</Format>
    1402 </NumberedItem>
    1403 <NumberedItem>
    1404 <Text id="0530">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>, switch to the <AutoText key="glidict::GUI.Create"/> panel, and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/> (no need to build the collection again).</Text>
    1405 </NumberedItem>
    14061756<Heading>
    14071757<Text id="0531">Adding fielded searching</Text>
     
    14141764</NumberedItem>
    14151765<NumberedItem>
    1416 <Text id="0533"><b>Build</b> the collection once again, and <b>preview</b> the results. Notice that the collection's home page no longer includes a query box. (This is because the search form is too big to fit here nicely.) To search, you have to click <AutoText key="coredm::_Global:labelSearch_"/> in the navigation bar. Note that the <AutoText key="coredm::_Global:linktextPREFERENCES_"/> page has changed to control the advanced searching options.</Text>
    1417 </NumberedItem>
    1418 <Comment>
    1419 <Text id="0534">To finish off the collection, brand it with an image that will be used to represent the collection on the Greenstone page, and appear at the top of each page of the collection</Text>
    1420 </Comment>
    1421 <Heading>
    1422 <Text id="0535">Branding a collection with an image</Text>
    1423 </Heading>
    1424 <NumberedItem>
    1425 <Text id="0536">From the <AutoText key="glidict::CDM.GUI.General"/> section of the <AutoText key="glidict::GUI.Design"/> panel, click the <AutoText key="glidict::General.Browse" type="button"/> button next to the label <AutoText key="glidict::CDM.General.Icon_Collection"/> and use the resulting popup file browser to access the folder <Path>sample_files &rarr; marc</Path>. Select <Path>beatles_logo.jpg</Path> and click &lt;<b>Open</b>&gt;.</Text>
    1426 <Comment>
    1427 <Text id="0537">Greenstone copies the image into your collection area, so the collection will still work when the CD-ROM is removed from the drive.</Text>
    1428 </Comment>
    1429 </NumberedItem>
    1430 <NumberedItem>
    1431 <Text id="0538">Repeat this process for the <AutoText key="glidict::CDM.General.Icon_Collection_Small"/>, selecting the same image.</Text>
    1432 </NumberedItem>
    1433 <NumberedItem>
    1434 <Text id="0539">Now <b>build</b> the collection and <b>preview</b> it.</Text>
     1766<Text id="0533"><b>Build</b> the collection once again, and <b>preview</b> the results. Notice that the collection's home page no longer includes a query box. (This is because the search form is too big to fit here nicely.) To search, you have to click <AutoText key="coredm::_Global:labelSearch_"/> in the navigation bar. Note that the <AutoText key="coredm::_Global:linktextPREFERENCES_" type="italics"/> page has changed to control the advanced searching options.</Text>
     1767</NumberedItem>
     1768<NumberedItem>
     1769<Text id="0533a">Look at the search form in the collection. There are three fields that can be searched: <i>text</i>, <i>Title</i> and <i>Source</i>. Add some more fields to search on by going back to the Librarian Interface.</Text>
     1770</NumberedItem>
     1771<NumberedItem>
     1772<Text id="0533b">In the <AutoText key="glidict::GUI.Design"/> panel, go to the <AutoText key="glidict::CDM.GUI.Indexes"/> section. Remove the <i>source</i> index by selecting it in the <AutoText key="glidict::CDM.IndexManager.Indexes"/> list and clicking <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>.</Text>
     1773</NumberedItem>
     1774<NumberedItem>
     1775<Text id="0533c">Add an index on <b>subjects</b> by selecting <AutoText key="metadata::ex.Subject"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> list (and deselecting anything already selected), and giving it a name in the <AutoText key="glidict::CDM.IndexManager.Index_Name"/> box, e.g. "Subject". Click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>. Add indexes on any other fields that look interesting.</Text>
     1776</NumberedItem>
     1777<NumberedItem>
     1778<Text id="0533d"><b>Rebuild</b> the collection and <b>preview</b> the results. Notice the extra fields in the <AutoText key="coredm::_query:textinfield_"/> drop-down menus in the search form. You can do quite complicated queries by searching for words in different fields at the same time.</Text>
     1779</NumberedItem>
     1780<Heading>
     1781<Text id="0533-1">Exploding the database</Text>
     1782</Heading>
     1783<NumberedItem>
     1784<Text id="0533-3">Go to the <AutoText key="glidict::GUI.Enrich"/> panel and try to see the metadata. It doesn't appear! This is because the metadata is associated with records inside the file, not the file itself.</Text>
     1785<Text id="0533-4">Metadata file types, such as MARC, CDS/ISIS, BibTex etc. can be imported into Greenstone but their metadata cannot be viewd in the Librarian Interface. To edit any metadata you need to go back to the program that created the file.</Text>
     1786<Text id="0533-5">Greenstone provides a new way of <i>exploding</i> a metadata database so that each record appears as an individual document, with viewable and editable metadata. This process is irreversible: once this step has been done, the database is deleted and can no longer be used in its original program.</Text>
     1787</NumberedItem>
     1788<NumberedItem>
     1789<Text id="0533-6">In the <AutoText key="glidict::GUI.Gather"/> panel, you may notice that the MARC database has a different coloured icon to other files. This green icon indicates that a file is a metadata database that can be exploded. Right-click on the file and choose <AutoText key="glidict::ExplodeMetadataPrompt.Title"/> from the menu. A new window opens, containing options for the exploding process. A description of each option can be obtained by hovering the mouse over the option.</Text>
     1790<Text id="0533-7">Turn on the <AutoText text="metadata_set"/> option by checking its box. This option indicates which metadata set to explode the metadata into. The default set is the "Exploded Metadata Set"&mdash;a metadata set which initially has no elements in it, but will receive a new element for each metadata field retrieved from the database.</Text>
     1791</NumberedItem>
     1792<NumberedItem>
     1793<Text id="0533-8">Click <AutoText key="glidict::ExplodeMetadataPrompt.Explode" type="button"/> to start the exploding process. This may take a short while, depending on the size of the database.</Text>
     1794</NumberedItem>
     1795<NumberedItem>
     1796<Text id="0533-9">Once exploding has finished, the MARC database file will have been deleted, and a folder created in its place. This folder contains an empty file for each record in the original database. The metadata for these records can be viewed and edited by switching to the <AutoText key="glidict::GUI.Enrich"/> panel.</Text>
     1797</NumberedItem>
     1798<NumberedItem>
     1799<Text id="0533-10">Because the MARC file is no longer present, and the collection contains empty (.nul) files, we need to change the list of plugins. In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, remove <AutoText text="MARCPlug"/> and add <AutoText text="NULPlug"/> (use the default configuration).</Text>
     1800</NumberedItem>
     1801<NumberedItem>
     1802<Text id="0533-11"><b>Rebuild</b> and <b>preview</b> the collection. You will notice that...</Text>
     1803<Text id="0533-12">The collection previously used extracted (ex.) metadata, but now it uses exploded (exp.) metadata. There is also no longer any text in the documents. Previously, MARCPlug stored the raw record as the "text" of each record. Now that the metadata is in the Librarian Interface, there is no longer the concept of raw record, and so there is no text. We need to modify the collection design to take note of these changes.</Text>
     1804</NumberedItem>
     1805<NumberedItem>
     1806<Text id="0533-13">In the <AutoText key="glidict::CDM.GUI.Indexes"/> section, change the Title index to use <AutoText key="metadata::exp.Title"/>. Select the Title index in the <AutoText key="glidict::CDM.IndexManager.Indexes"/> list. Deselect <AutoText key="metadata::ex.Title"/> in the <AutoText key="glidict::CDM.IndexManager.Source"/> list, and select <AutoText key="metadata::exp.Title"/>. Click <AutoText key="glidict::CDM.IndexManager.MGPP.Replace_Index" type="button"/>. Do the same thing for the Subject index.</Text>
     1807</NumberedItem>
     1808<NumberedItem>
     1809<Text id="0533-14">The text index is no longer any use, so remove that index by selecting it in the <AutoText key="glidict::CDM.IndexManager.Indexes"/> list and clicking <AutoText key="glidict::CDM.IndexManager.Remove_Index" type="button"/>. To enable combined searching across all indexes at once, tick the <AutoText key="glidict::CDM.IndexManager.Allfields_Index"/> checkbox, enter an appropriate name in the <AutoText key="glidict::CDM.IndexManager.Index_Name"/> field (e.g. "All Fields", then click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>. Move this to the top of the list using the <AutoText key="glidict::CDM.Move.Move_Up" type="button"/> and <AutoText key="glidict::CDM.Move.Move_Down" type="button"/> buttons, so that it becomes the default field for searching.</Text>
     1810</NumberedItem>
     1811<NumberedItem>
     1812<Text id="0533-15">In the <AutoText key="glidict::CDM.GUI.Classifiers"/> section, change the Title <AutoText text="AZList"/> to use <AutoText key="metadata::exp.Title"/> metadata. Double click the Title <AutoText text="AZList"/> in the <AutoText key="glidict::CDM.ClassifierManager.Assigned"/> list, and change the <AutoText text="metadata"/> option to use <AutoText key="metadata::exp.Title"/>. Click <AutoText key="glidict::General.OK" type="button"/>. Do the same thing for the Subject <AutoText text="AZCompactList"/>.</Text>
     1813</NumberedItem>
     1814<NumberedItem>
     1815<Text id="0533-16">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, select <AutoText text="VList"/> in the <AutoText key="glidict::CDM.FormatManager.Assigned_Formats"/> list. Change the <AutoText key="glidict::CDM.FormatManager.Editor"/>, replacing</Text>
     1816<Format>
     1817{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}
     1818</Format>
     1819<Text id="0533-17">with</Text>
     1820<Format>
     1821{Or}{[exp.Title],Untitled}
     1822</Format>
     1823<Text id="0533-18">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/></Text>
     1824</NumberedItem>
     1825<NumberedItem>
     1826<Text id="0533-19">Clear the <AutoText text="DocumentHeading"/> format statement by selecting it in the <AutoText key="glidict::CDM.FormatManager.Assigned_Formats"/> list, deleting the contents in the <AutoText key="glidict::CDM.FormatManager.Editor"/>, and clicking <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
     1827<Text id="0533-20">Next, edit the <AutoText text="DocumentText"/> format statement. Delete the contents and replace it with</Text>
     1828<Format>
     1829&lt;table&gt;<br/>
     1830&lt;tr&gt;&lt;td&gt;Title:&lt;/td&gt;&lt;td&gt;[exp.Title]&lt;/td&gt;&lt;/tr&gt;<br/>
     1831&lt;tr&gt;&lt;td&gt;Subject:&lt;/td&gt;&lt;td&gt;[exp.Subject]&lt;/td&gt;&lt;/tr&gt;<br/>
     1832&lt;tr&gt;&lt;td&gt;Publisher:&lt;/td&gt;&lt;td&gt;[exp.Publisher]&lt;/td&gt;&lt;/tr&gt;<br/>
     1833&lt;/table&gt;
     1834</Format>
     1835<Text id="0533-21">Remember to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
     1836</NumberedItem>
     1837<NumberedItem>
     1838<Text id="0533-22">The <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons are not very useful for this collection, so lets get rid of them. Edit the <AutoText text="DocumentButtons"/> format statement, and make it empty.</Text>
     1839</NumberedItem>
     1840<NumberedItem>
     1841<Text id="0533-23"><b>Rebuild</b> and <b>preview</b> the collection.</Text>
     1842</NumberedItem>
     1843</Content>
     1844</Tutorial>
     1845<Tutorial id="cds_isis">
     1846<Title>
     1847<Text id="is-1">CDS/ISIS collection</Text>
     1848</Title>
     1849<Content>
     1850<Comment>
     1851<Text id="is-2">This exercise is similar to the <TutorialRef id="bibliography_collection"/> exercise, except that a CDS/ISIS database is used instead of a MARC database, and we do not explode the database. </Text>
     1852</Comment>
     1853<NumberedItem>
     1854<Text id="is-3">Start a new collection called <b>ISIS Collection</b>, fill out appropriate fields for it, and choose <b>Dublin Core</b> as the metadata set.</Text>
     1855</NumberedItem>
     1856<NumberedItem>
     1857<Text id="is-4">Drag the files from <Path>sample_files &rarr; isis</Path> into the collection. </Text>
     1858</NumberedItem>
     1859<NumberedItem>
     1860<Text id="is-5"><b>Build</b> and <b>preview</b> the collection. The default indexes, classifiers and formats are not very useful for this data. There is no metadata searching, and the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> classifier is completely empty. The filenames classifier is useless because all records come from the same file.</Text>
     1861</NumberedItem>
     1862<NumberedItem>
     1863<Text id="is-6">In the <AutoText key="glidict::GUI.Design"/> panel select <AutoText key="glidict::CDM.GUI.SearchTypes"/> from the left-hand list and activate the <AutoText key="glidict::CDM.SearchTypeManager.Enable"/> option.</Text>
     1864</NumberedItem>
     1865<NumberedItem>
     1866<Text id="is-7">Add form searching to the collection by selecting <AutoText text="form"/> in the <AutoText key="glidict::CDM.SearchTypeManager.SearchType_Selection"/> menu and clicking <AutoText key="glidict::CDM.SearchTypeManager.Add" type="button"/>. Remove plain searching by selecting <AutoText text="plain"/> in the <AutoText key="glidict::CDM.SearchTypeManager.Assigned"/> list, and clicking <AutoText key="glidict::CDM.SearchTypeManager.Remove" type="button"/>.</Text>
     1867</NumberedItem>
     1868<NumberedItem>
     1869<Text id="is-8">In the <AutoText key="glidict::CDM.GUI.Indexes"/> section, remove the useless Source and Title indexes, and add new indexes for Photographer^all, Country^all and Notes^all metadata.</Text>
     1870<Text id="is-9">CDS/ISIS metadata has subfields, and these are represented using ^.</Text>
     1871</NumberedItem>
     1872<NumberedItem>
     1873<Text id="is-10">In the <AutoText key="glidict::CDM.GUI.Classifiers"/> section, remove the existing (useless) classifiers for <AutoText text="Title"/> and <AutoText text="Source"/>, and add a new <AutoText text="AZList"/> for <AutoText text="Photographer"/>.</Text>
     1874</NumberedItem>
     1875<NumberedItem>
     1876<Text id="is-11">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, change the <AutoText text="VList"/> format statement to display <AutoText text="Photograher"/> and <AutoText text="Notes"/> metadata. Change it to look like:</Text>
     1877<Format>
     1878&lt;td valign=top&gt;[link][icon][/link]&lt;/td&gt;<br/>
     1879&lt;td valign=top&gt;&lt;b&gt;[ex.Photographer^all]&lt;/b&gt;&lt;br/&gt;[ex.Notes^all]&lt;/td&gt;
     1880</Format>
     1881</NumberedItem>
     1882<NumberedItem>
     1883<Text id="is-12"><b>Rebuild</b> and <b>preview</b> the collection. </Text>
     1884</NumberedItem>
     1885<Text id="is-13"><AutoText text="ISISPlug"/> stores a nicely formatted version of the record as the document text, and this is what is displayed when we view a record. Lets tidy it up a little more.</Text>
     1886<NumberedItem>
     1887<Text id="is-14">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, remove the <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons by setting the <AutoText text="DocumentButtons"/> format statement to empty.</Text>
     1888</NumberedItem>
     1889<NumberedItem>
     1890<Text id="is-15">Clear the <AutoText text="DocumentHeading"/> format statement to remove the <AutoText text="Untitled" type="quoted"/> at the top of the document.</Text>
     1891</NumberedItem>
     1892<NumberedItem>
     1893<Text id="is-16">Finally, lets link to the raw record, which is stored as <AutoText text="ISISRawRecord"/> metadata. Edit the <AutoText text="DocumentText"/> format statement to look like:</Text>
     1894<Format>
     1895&lt;p&gt;[Text]&lt;/p&gt;<br/>
     1896{If}{_cgiargshowrecord_, <br/>
     1897&lt;b&gt;CDS Record:&lt;/b&gt;&lt;br/&gt;&lt;tt&gt;[ISISRawRecord]&lt;/tt&gt;&lt;p/&gt;<br/>
     1898&lt;center&gt;&lt;a href=\'_gwcgi_?e=_cgiarge_&amp;a=d&amp;d=_cgiargd_\'&gt;Hide CDS Record&lt;/a&gt;&lt;/center&gt;, <br/>
     1899&lt;center&gt;&lt;a href=\'_gwcgi_?e=_cgiarge_&amp;a=d&amp;d=_cgiargd_&amp;showrecord=1\'&gt;Show CDS Record&lt;/a&gt;&lt;/center&gt;<br/>
     1900}
     1901</Format>
     1902</NumberedItem>
     1903<NumberedItem>
     1904<Text id="is-17">Preview the collection.</Text>
     1905</NumberedItem>
     1906</Content>
     1907</Tutorial>
     1908<Tutorial id="using_macro_files">
     1909<Title>
     1910<Text id="mf-1">Customization: macro files and stylesheets</Text>
     1911</Title>
     1912<SampleFiles folder="custom"/>
     1913<Version initial="2.70" current="2.70"/>
     1914<Content>
     1915<Text id="mf-2">The appearance of all pages produced by Greenstone is governed by macro files, which reside in the folder <Path>Greenstone &rarr; macros</Path>, images, and CSS stylesheets, both of which reside in <Path>Greenstone &rarr; images</Path>. </Text>
     1916<Text id="mf-3">A macro takes the form <Format>_macroname_ {macro value}</Format>. Macro names start and end with underscores (_), and the macro value is enclosed in curly brackets ({}). Macro values can be text or HTML, and can include other macros.</Text>
     1917<Text id="mf-4">Macros are grouped into packages, and different packages control the appearance of different pages. For example, the <AutoText text="home"/>, <AutoText text="help"/>, <AutoText text="preferences"/>, <AutoText text="query"/>, <AutoText text="document"/> packages control the home, help, preferences, query, and document pages, respectively. Some macro files contain macros for just one package, for example, <Path>home.dm</Path>, <Path>query.dm</Path>, <Path>document.dm</Path>, while others contain macros for many packages. <Path>base.dm</Path> contains macros used globally, <Path>style.dm</Path> controls the common style of each page, <Path>english.dm</Path>, <Path>french.dm</Path> and other language files contain the text fragments for the entire interface, in that specific language. </Text>
     1918<Text id="mf-5">The output of the library program is a page of HTML which is viewed in a web browser. CSS (Cascading Style Sheets) are often used alongside HTML pages to control the formatting, such as layout, colour, font etc. The default Greenstone stylesheet is <Path>Greenstone &rarr; images &rarr; style.css</Path>.</Text>
     1919<Text id="mf-6">In this exercise, we customize the macros, images and stylesheets to change the appearance of our library. You will not need the Librarian Interface for this exercise.</Text>
     1920<Heading>
     1921<Text id="mf-7">Changing the background and header images</Text>
     1922</Heading>
     1923<NumberedItem>
     1924<Text id="mf-8">Three new images for this exercise can be found in <Path>sample_files &rarr; custom</Path>. Copy <Path>chalk-blue.gif</Path>, <Path>gsdlhead-blue.gif</Path> and <Path>divb-blue.gif</Path> from the <Path>custom</Path> folder into the <Path>Greenstone &rarr; images</Path> folder.</Text>
     1925</NumberedItem>
     1926<NumberedItem>
     1927<Text id="mf-9">Open the file <Path>Greenstone &rarr; macros &rarr; home.dm</Path> in a text editor, e.g. WordPad. Find each occurrence of <Format>gsdlhead.gif</Format> in this file (there are two) and replace each one with <Format>gsdlhead-blue.gif</Format>. (If you are using WordPad, you can use <Menu>Edit &rarr; Find</Menu> to search for the text.)</Text>
     1928<Text id="mf-10">Save <Path>home.dm</Path> and close the file.</Text>
     1929</NumberedItem>
     1930<NumberedItem>
     1931<Text id="mf-11">Open the file <Path>Greenstone &rarr; macros &rarr; style.dm</Path> with the same program. Locate the following part of the file (this is part of the <Format>_cssheader_</Format> macro:</Text>
     1932<Format>
     1933&lt;style type="text/css"&gt;<br/>
     1934body.bgimage \{ background-image: url("_httpimg_/chalk.gif"); \}<br/>
     1935</Format>
     1936<Text id="mf-12">Use copy and paste on the <Format>body.bgimage</Format> line to make it look like this: </Text>
     1937<Format>
     1938&lt;style type="text/css"&gt;<br/>
     1939#body.bgimage \{ background-image: url("_httpimg_/chalk.gif"); \}<br/>
     1940body.bgimage \{ background-image: url("_httpimg_/chalk-blue.gif"); \}<br/>
     1941</Format>
     1942<Text id="mf-13">A hash (#) at the start of a line signals a comment, and Greenstone will ignore this line. We use this to "comment out" the original line and replace it with a modified line. This way it is easy to revert back to the original if necessary. Here we are changing the background image for the <Format>bgimage</Format> section of the <Format>body</Format> of the page to <Format>chalk-blue.gif</Format>.</Text>
     1943<Text id="mf-14">Save <Path>style.dm</Path> and close the file.</Text>
     1944</NumberedItem>
     1945<NumberedItem>
     1946<Text id="mf-15">Preview the home page in a web browser. (On Windows, restart the Greenstone library server.) The page header and background should now use the new graphics.</Text>
     1947<Comment>
     1948<Text id="mf-15a">The final part of this exercise looks at how we determined which images needed replacing, and which macro files should be edited.</Text>
     1949</Comment>
     1950</NumberedItem>
     1951<Heading>
     1952<Text id="mf-16">Changing the colour of the navigation bar, page title and page text</Text>
     1953</Heading>
     1954<Text id="mf-17">Now that the background image is a nice blue colour, lets format the page so that some other parts are blue too. Preview the collection after each change to make sure that it has worked properly. On Windows, macro file changes require a restart of the Greenstone library server. Stylesheet changes may require a force reload in the web browser.</Text>
     1955<NumberedItem>
     1956<Text id="mf-18">First, we'll change the colour of the navigation bar and green divider bars. These use an image as a background, specified in the same macro as the page background.</Text>
     1957<Text id="mf-19">Open <Path>Greenstone &rarr; macros &rarr; style.dm</Path> in a text editor, and find the <Format>_cssheader_</Format> macro that you modified previously. Change the div.navbar and div.divbar parts to use divb-blue.gif instead of bg_green.png:</Text>
     1958<Format>
     1959#div.navbar \{ background-image: url("_httpimg_/bg_green.png"); \}<br/>
     1960div.navbar \{ background-image: url("_httpimg_/divb-blue.gif"); \}<br/>
     1961#div.divbar \{ background-image: url("_httpimg_/bg_green.png"); \}<br/>
     1962div.divbar \{ background-image: url("_httpimg_/divb-blue.gif"); \}<br/>
     1963</Format>
     1964</NumberedItem>
     1965<NumberedItem>
     1966<Text id="mf-20">The selected item on the navigation bar uses the same background, so change that too:</Text>
     1967<Format>
     1968#a.navlink_sel \{ background-image: url("_httpimg_/bg_green.png"); \}<br/>
     1969a.navlink_sel \{ background-image: url("_httpimg_/divb-blue.gif"); \}
     1970</Format>
     1971</NumberedItem>
     1972<NumberedItem>
     1973<Text id="mf-21">Next, we get rid of the background green image on the page and collection titles. Comment out the <Format>p.bannertitle</Format> and <Format>p.collectiontitle</Format> parts:</Text>
     1974<Format>
     1975#p.bannertitle \{background-image: url("_httpimg_/banner_bg.png"); \}<br/>
     1976#p.collectiontitle \{background-image: url("_httpimg_/banner_bg.png"); \}
     1977</Format>
     1978</NumberedItem>
     1979<Text id="mf-22">The following changes will involve making changes to the external stylesheet file. The above style definitions were included in the macro file so that image paths could be dynamically generated. The majority of the style definitions reside in an external style file, <Path>Greenstone &rarr; images &rarr; style.css</Path>.</Text>
     1980<NumberedItem>
     1981<Text id="mf-23">Open <Path>Greenstone &rarr; images &rarr; style.css</Path> in a text editor. Make the following modifications. You might want to preview after each one to see the effect.</Text>
     1982<Text id="mf-24">Change some of the colours:</Text>
     1983<BulletList>
     1984<Bullet>
     1985<Text id="mf-27">Find the <Format>body</Format> style instructions:</Text>
     1986<Format>
     1987body {
     1988  background: #ffffff;<br/>
     1989  color: #000000;<br/>
     1990}
     1991</Format>
     1992<Text id="mf-27a">Change <Format>color</Format> to <Format>teal</Format>.</Text>
     1993</Bullet>
     1994<Bullet>
     1995<Text id="mf-25">For <Format>a.collectiontitle</Format>, change <Format>color</Format> to <Format>blue</Format>.</Text>
     1996</Bullet>
     1997<Bullet>
     1998<Text id="mf-26">For <Format>p.collectiontitle</Format>, add <Format>color: blue;</Format></Text>
     1999</Bullet>
     2000</BulletList>
     2001</NumberedItem>
     2002<NumberedItem>
     2003<Text id="mf-28">For fun, lets switch the positions of the home, help and preferences buttons and the collection name or image.</Text>
     2004<BulletList>
     2005<Bullet>
     2006<Text id="mf-29">For <Format>div.pageinfo</Format>, change both <Format>float</Format> and <Format>text-align</Format> to <Format>left</Format>.</Text>
     2007</Bullet>
     2008<Bullet>
     2009<Text id="mf-30">For <Format>div.collectimage</Format>, change <Format>float</Format> and <Format>text-align</Format> to <Format>right</Format>.</Text>
     2010</Bullet>
     2011</BulletList>
     2012<Text id="mf-31">The look of your library should now be substantially different.</Text>
     2013</NumberedItem>
     2014<Heading>
     2015<Text id="mf-32">Adding a footer</Text>
     2016</Heading>
     2017<NumberedItem>
     2018<Text id="mf-33">Next we add a footer to each page. <Path>Greenstone &rarr; macros &rarr; style.dm</Path> defines a header and footer for each page, and macro files for the different pages define the page content. Open the file <Path>Greenstone &rarr; macros &rarr; style.dm</Path> in a text editor.</Text>
     2019</NumberedItem>
     2020<NumberedItem>
     2021<Text id="mf-34">Locate the <Format>_footer_</Format> macro:</Text>
     2022<Format>
     2023_footer_ {<br/>
     2024&lt;!-- page footer (\_style:footer\_) --&gt;<br/>
     2025_pagefooterextra__endspacer__htmlfooter_<br/>
     2026}
     2027</Format>
     2028<Text id="mf-35">After <Format>_pagefooterextra_</Format> add some text or HTML. For example "&lt;center&gt;&lt;small&gt;Copyright 2006 My Awesome Digital Library&lt;/small&gt;&lt;/center&gt;". The resulting macro will look something like:</Text>
     2029<Format>
     2030_footer_ {<br/>
     2031&lt;!-- page footer (\_style:footer\_) --&gt;<br/>
     2032_pagefooterextra_
     2033&lt;center&gt;&lt;small&gt;Copyright 2006 My Awesome Digital Library&lt;/small&gt;&lt;/center&gt;
     2034_endspacer__htmlfooter_<br/>
     2035}
     2036</Format>
     2037<Comment>
     2038<Text id="mf-36">The <Format>&lt;center&gt;</Format> and <Format>&lt;small&gt;</Format> HTML tags center the text, and make it a smaller size than the rest of the page.</Text>
     2039</Comment>
     2040<Text id="mf-37">Save <Path>style.dm</Path> and close the file.</Text>
     2041</NumberedItem>
     2042<NumberedItem>
     2043<Text id="mf-38">Preview the changes in a web browser. (On Windows, restart the Greenstone library server.) Each page should now have the new text at the bottom.</Text>
     2044</NumberedItem>
     2045<NumberedItem>
     2046<Text id="mf-39">Adding text into the main <Format>_footer_</Format> macro adds it to all pages. To add a footer just to a particular page, use <Format>_pagefooterextra_</Format> in the appropriate macro file. For example, lets add some more text to the footer, this time just on the home page</Text>
     2047<Text id="mf-40">Open the file <Path>Greenstone &rarr; macros &rarr; home.dm</Path> in a text editor. After the line <Format>package home</Format>, add the following text:</Text>
     2048<Format>
     2049_pagefooterextra_ {Collections generated by Me.}
     2050</Format>
     2051<Text id="mf-41">Save <Path>home.dm</Path> and close the file.</Text>
     2052<Text id="mf-42">Preview the home page in a web browser. (On Windows, restart the Greenstone library server.) The home page should now display the new text, while the other pages won't.</Text>
     2053</NumberedItem>
     2054<Heading>
     2055<Text id="mf-43">Make your own Greenstone home page</Text>
     2056</Heading>
     2057<Text id="mf-44">You can make radical changes to a page by changing the macro file completely. For example, here we use a predefined alternative to the home page.</Text>
     2058<NumberedItem>
     2059<Text id="mf-45">Open the file <Path>Greenstone &rarr; etc &rarr; main.cfg</Path> in a text editor. Locate the <AutoText text="macrofiles" type="italics"/> list:</Text>
     2060<Format>
     2061# The list of display macro files used by this receptionist<br/>
     2062macrofiles  tip.dm style.dm base.dm query.dm help.dm pref.dm about.dm \<br/>
     2063&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;document.dm browse.dm status.dm authen.dm users.dm html.dm \<br/>
     2064&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;extlink.dm gsdl.dm extra.dm home.dm collect.dm docs.dm \<br/>
     2065&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bsummary.dm gti.dm gli.dm nav_css.dm \<br/>
     2066&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;english.dm english2.dm french.dm french2.dm spanish.dm \<br/>
     2067&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;spanish2.dm russian.dm russian2.dm usability.dm<br/>
     2068</Format>
     2069<Text id="mf-46">Change the text <Format>home.dm</Format> to <Format>yourhome.dm</Format>. Save and close the file.</Text>
     2070</NumberedItem>
     2071<NumberedItem>
     2072<Text id="mf-47">Preview the newly structured home page in a web browser. (On Windows, restart the Greenstone library server.) </Text>
     2073</NumberedItem>
     2074<NumberedItem>
     2075<Text id="mf-48">Reverse this last change by changing <Format>yourhome.dm</Format> back to <Format>home.dm</Format> in the file <Path>Greenstone &rarr; etc &rarr; main.cfg</Path>. You may also like to reverse the other changes you have made.</Text>
     2076</NumberedItem>
     2077
     2078<Heading>
     2079<Text id="mf-49">Collection specific customisation</Text>
     2080</Heading>
     2081<Text id="mf-50">Macros can also be used to customize single collections. They should be added to a file called <Path>extra.dm</Path> in the <Path>macros</Path> directory of a collection.</Text>
     2082<Text id="mf-51">We use the Word and PDF collection (from exercise <TutorialRef id="word_pdf_collection"/>) as the example for this exercise, but it can be done with any collection. </Text>
     2083<NumberedItem>
     2084<Text id="mf-52">Create a new macros folder for the collection: <Path>Greenstone &rarr; collect &rarr; reports &rarr; macros</Path>. Copy the file <Path>Greenstone &rarr; macros &rarr; extra.dm</Path> into the new folder.</Text>
     2085</NumberedItem>
     2086<NumberedItem>
     2087<Text id="mf-53">First, we change the title of the <AutoText key="coredm::_about:textabcol_"/> section of the about page. Add the following text to the <Path>extra.dm</Path> file:</Text>
     2088<Format>
     2089package about<br/>
     2090<br/>
     2091_textabout_ {<br/>
     2092&lt;div class="section"&gt;<br/>
     2093&lt;h3&gt;Very Interesting Reports Collection.&lt;/h3&gt;<br/>
     2094_Global:collectionextra_<br/>
     2095&lt;/div&gt;<br/>
     2096}
     2097</Format>
     2098<Text id="mf-54">Save the file.</Text>
     2099<Text id="mf-55">Preview the collection. (On Windows, restart the Greenstone library server.) The about page will have a new title underneath the search form.</Text>
     2100</NumberedItem>
     2101<NumberedItem>
     2102<Text id="mf-56">Next we'll do some style customisations for this collection. Add the following text to <Path>extra.dm</Path>:</Text>
     2103<Format>
     2104package Style<br/>
     2105<br/>
     2106_collectionspecificstyle_ {<br/>
     2107#clear the use of a background image<br/>
     2108body.bgimage \{ background-image: none; \}<br/>
     2109# set the background color to pink<br/>
     2110body \{ background: pink; \}<br/>
     2111#clear the background image for the navigation bar, and set its color to red<br/>
     2112div.navbar \{ background-image: none; background-color: red; \}<br/>
     2113#clear the background image for the divider bars, and set their color to red<br/>
     2114div.divbar \{ background-image: none; background-color: red; \}<br/>
     2115}
     2116</Format>
     2117<Text id="mf-57">Preview the collection. (On Windows, restart the Greenstone library server.) The reports collection will now have a pink background, and the navigation bar and divider bars will be red. These changes will only affect this collection.</Text>
     2118</NumberedItem>
     2119<Text id="mf-58">Any macros from the general macro files can be copied into a collection's <Path>extra.dm</Path> file and modified. Remember to include the package declaration to make sure that the macros get applied to the correct page(s).</Text>
     2120<Text id="mf-59">The style modifications made above were minor. The collection still uses the majority of the standard style file. The style declarations in the <Format>_collectionspecifcstyle_</Format> macro get appended to the default ones. To completely change the appearance of a collection, we can use a new style sheet altogether.</Text>
     2121<NumberedItem>
     2122<Text id="mf-59a">Add the following to <Path>extra.dm</Path> after the last modifications:</Text>
     2123<Format>
     2124_cssheader_ {<br/>
     2125&lt;link rel="stylesheet" href="_httpcimages_/style-blue.css" type="text/css"<br/>
     2126&nbsp;&nbsp;title="Blue Style" charset="UTF-8"&gt;<br/>
     2127}
     2128</Format>
     2129<Text id="mf-60">Copy the file <Path>sample_files &rarr; custom &rarr; style-blue.css</Path> into the collection's <Path>images</Path> folder: <Path>Greenstone &rarr; collect &rarr; reports &rarr; images</Path>.</Text>
     2130<Text id="mf-61">Preview the collection; it should look radically different.</Text>
     2131</NumberedItem>
     2132<Heading>
     2133<Text id="mf-63">How to determine which images to replace (advanced)</Text>
     2134</Heading>
     2135<NumberedItem>
     2136<Text id="mf-64">In the first part of this exercise we replaced the default background (<AutoText text="chalk.gif"/>) and header (<AutoText text="gsdlhead.gif"/>) images with new ones. To do this we needed to change the image names in the macro files. How did we know which images we were replacing and which macro files to edit? This exercise shows you how to find out.</Text>
     2137</NumberedItem>
     2138<NumberedItem>
     2139<Text id="mf-65">To find out the names of the images to replace, go to the home page of your digital library in a browser. Right-click on the header image (<AutoText text="Greenstone digital library software" type="quoted"/>) and select "Save picture as". A dialog will pop up and will display the image name: <AutoText text="gsdlhead.gif" type="quoted"/> (or <AutoText text="gsdlhead-blue.gif" type="quoted"/> if you are using the new header). Click Cancel to close the dialog&mdash;you don't need to save the images. Do the same for the background image by right clicking on the left hand green (or blue) swirly bar. This time choose "Save background as" to find the name: <AutoText text="chalk.gif" type="quoted"/> (or <AutoText text="new_background.gif" type="quoted"/>), then click Cancel.</Text>
     2140</NumberedItem>
     2141<NumberedItem>
     2142<Text id="mf-66">These instructions apply to Internet Explorer. Other browsers may have other options in the right-click menu. For example, Mozilla provides "View Image" and "View Background Image" options. Using these options will put the path to the image in the browser address box, and the name can be seen from this.</Text>
     2143</NumberedItem>
     2144<NumberedItem>
     2145<Text id="mf-67">Once you have identified the names of the images to be replaced, you need to find out where they occur in the macro files. To do this, search the macro files for the image names using the <AutoText text="find"/> program, which is run in a command prompt. Open a command prompt using <Menu>Start &rarr; Programs &rarr; Accessories &rarr; Command Prompt</Menu>, or <Menu>Start &rarr; Run</Menu> and enter <Command>cmd</Command> as the name of the program to run.</Text>
     2146<Text id="mf-68">You can type <Command>find/?</Command> to see a description of the program and its arguments.</Text>
     2147
     2148<Text id="mf-69">To search the macro files for <AutoText text="gsdlhead.gif" type="quoted"/> type</Text>
     2149<Command>find "gsdlhead.gif" "C:\Program Files\Greenstome\macros\*.dm"</Command>
     2150<Text id="mf-70"><AutoText text="*.dm"/> means all files ending in <AutoText text=".dm"/>. A list of all macro files will be displayed, along with any matches. You wil see that <Path>home.dm</Path> and <Path>exported_home.dm</Path> both contain <AutoText text="gsdlhead.gif"/>. <Path>home.dm</Path> in the one you want to edit&mdash;<Path>exported_home.dm</Path> is used for the home page when you export a collection to CD-ROM.</Text>
     2151
     2152<Text id="mf-71">Do the same thing for <AutoText text="chalk.gif" type="quoted"/>:</Text>
     2153<Command>find "chalk.gif" "C:\Program Files\Greenstone\macros\*.dm"</Command>
     2154
     2155<Text id="mf-72"><Path>base.dm</Path> is the only file that mentions this image.</Text>
     2156
     2157<Text id="mf-73">Close the command prompt.</Text>
    14352158</NumberedItem>
    14362159</Content>
     
    14592182</NumberedItem>
    14602183<NumberedItem>
    1461 <Text id="0546">Look at the <i>titles a-z</i> browser. Each title has a bookshelf that may include several related items. For example, <i>Hey Jude</i> has a MIDI file, lyrics, and a discography item.</Text>
     2184<Text id="0546">Look at the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> browser. Each title has a bookshelf that may include several related items. For example, <i>Hey Jude</i> has a MIDI file, lyrics, and a discography item.</Text>
    14622185</NumberedItem>
    14632186<NumberedItem>
     
    14882211<Text id="0554">Copy the files provided in</Text>
    14892212<Path>sample_files &rarr; beatles &rarr; advbeat_small</Path>
    1490 <Text id="0555">into your new collection. Do this by opening up <Path>advbeat_small</Path>, selecting the eight items within it (from <Path>discography</Path> to <Path>beatles_midi.zip</Path>), and dragging them across. Because some of these files are in MP3 and MARC formats you will be asked whether to include <AutoText text="MP3Plug" /> and <AutoText text="MARCPlug" /> in your collection. Click <AutoText key="glidict::CDM.PlugInManager.Add" type="button"/>.</Text>
     2213<Text id="0555">into your new collection. Do this by opening up <Path>advbeat_small</Path>, selecting the eight items within it (from <Path>discography</Path> to <Path>beatles_midi.zip</Path>), and dragging them across. Because some of these files are in MP3 and MARC formats you will be asked whether to include <AutoText text="MP3Plug" /> and <AutoText text="MARCPlug" /> in your collection. Click <AutoText key="glidict::CDM.PlugInManager.QuickAdd" type="button"/>.</Text>
    14912214</NumberedItem>
    14922215<NumberedItem>
     
    15102233</NumberedItem>
    15112234<Comment>
    1512 <Text id="0563">Now there's a twist. The <AutoText key="metadata::dc.Title"/> metadata won't appear in titles a-z because the classifier has been instructed to use <AutoText key="metadata::ex.Title"/>. But changing the classifier to use <AutoText key="metadata::dc.Title"/> would miss out all the extracted titles! Fortunately, there's a way of dealing with this by specifying a list of metadata names in the classifier.</Text>
     2235<Text id="0563">Now there's a twist. The <AutoText key="metadata::dc.Title"/> metadata won't appear in <AutoText key="coredm::_Global:labelTitle_" type="italics"/> because the classifier has been instructed to use <AutoText key="metadata::ex.Title"/>. But changing the classifier to use <AutoText key="metadata::dc.Title"/> would miss out all the extracted titles! Fortunately, there's a way of dealing with this by specifying a list of metadata names in the classifier.</Text>
    15132236</Comment>
    15142237<NumberedItem>
     
    17242447</Comment>
    17252448<Comment>
    1726 <Text id="0640">One powerful use of regular expressions in the exercise was to clean up the <AutoText key="coredm::_Global:labelTitle_"/> browser. Perhaps the best way of doing this would be to have proper title metadata. The metadata extracted from HTML files is messy and inconsistent, and this was reflected in the original titles a-z browser. Defining proper title metadata would be simple but rather laborious. Instead, we have opted to use regular expressions in the <AutoText text="AZCompactList"/> classifier to clean up the title metadata. This is difficult to understand, and a bit fiddly to do, but if you can cope with its idiosyncrasies it provides a quick way to clean up the extracted metadata and avoid having to enter a large amount of metadata.</Text>
     2449<Text id="0640">One powerful use of regular expressions in the exercise was to clean up the <AutoText key="coredm::_Global:labelTitle_"/> browser. Perhaps the best way of doing this would be to have proper title metadata. The metadata extracted from HTML files is messy and inconsistent, and this was reflected in the original <AutoText key="coredm::_Global:labelTitle_" type="italics"/> browser. Defining proper title metadata would be simple but rather laborious. Instead, we have opted to use regular expressions in the <AutoText text="AZCompactList"/> classifier to clean up the title metadata. This is difficult to understand, and a bit fiddly to do, but if you can cope with its idiosyncrasies it provides a quick way to clean up the extracted metadata and avoid having to enter a large amount of metadata.</Text>
    17272450</Comment>
    17282451<Heading>
     
    18642587</NumberedItem>
    18652588<NumberedItem>
    1866 <Text id="0678">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section on the <AutoText key="glidict::GUI.Design"/> panel, add <AutoText text="PagedImgPlug" />. Switch on its <AutoText text="screenview"/> configuration option by checking the box. The source images we use were scanned at high resolution and are large files for a browser to download. The <AutoText text="screenview"/> option generates smaller screen-resolution images of each page when the collection is built.</Text>
    1867 </NumberedItem>
    1868 <NumberedItem>
    1869 <Text id="0679">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>niupepa &rarr; sample_items</Path> folder in <Path>sample_files</Path> and drag it into your collection on the right-hand side.</Text>
    1870 </NumberedItem>
    1871 <NumberedItem>
    1872 <Text id="0680">Some of the files you have just dragged in are text files that contain the text extracted from page images. We want these to be processed by <AutoText text="PagedImgPlug" />, not <AutoText text="TEXTPlug" />. Switch to the <AutoText key="glidict::GUI.Design"/> panel and delete <AutoText text="TEXTPlug" />. While you are at it, you could tidy things up by deleting <AutoText text="HTMLPlug" />, <AutoText text="EMAILPlug" />, <AutoText text="PDFPlug" />, <AutoText text="RTFPlug" />, <AutoText text="WordPlug" />, <AutoText text="PSPlug" />, <AutoText text="ISISPlug" /> and <AutoText text="NULPlug" /> as well, since they will not be used.</Text>
    1873 </NumberedItem>
    1874 <NumberedItem>
    1875 <Text id="0681">Now go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> titles.</Text>
     2589<Text id="0679">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>sample_files &rarr; niupepa &rarr; sample_items</Path> folder and drag the two subfolders into your collection on the right-hand side. A popup window asks whether you want to add <AutoText text="PagedImgPlug"/> to the collection to process this file. Click <AutoText key="glidict::CDM.PlugInManager.QuickAdd" type="button"/>, because this plugin will be needed to process the item files.</Text>
     2590</NumberedItem>
     2591<NumberedItem>
     2592<Text id="0680">Some of the files you have just dragged in are the newspaper images; others are text files that contain the text extracted from these images. We want these to be processed by <AutoText text="PagedImgPlug"/>, not <AutoText text="ImagePlug"/> or <AutoText text="TEXTPlug"/>. Switch to the <AutoText key="glidict::GUI.Design"/> panel and delete <AutoText text="ImagePlug"/> and <AutoText text="TEXTPlug"/>. While you are at it, you could tidy things up by deleting <AutoText text="ZIPPlug"/> and all plugins from <AutoText text="HTMLPlug"/> to <AutoText text="NULPlug"/> as well, since they will not be used. <AutoText text="GAPlug"/> and <AutoText text="PagedImgPlug"/> remain.</Text>
     2593</NumberedItem>
     2594<NumberedItem>
     2595<Text id="0678">Open up the configuration window for <AutoText text="PagedImgPlug"/> by double-clicking on the plugin. Switch on its <AutoText text="screenview"/> configuration option by checking the box. The source images we use were scanned at high resolution and are large files for a browser to download. The <AutoText text="screenview"/> option generates smaller screen-resolution images of each page when the collection is built.</Text>
     2596</NumberedItem>
     2597<NumberedItem>
     2598<Text id="0681">Now go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> newspapers.</Text>
    18762599</NumberedItem>
    18772600<Comment>
     
    18822605</Heading>
    18832606<Comment>
    1884 <Text id="0684">Under titles a-z documents from the same series are repeated without any distinguishing features such as date. It would be better to group them by series title and display dates within each group. This can be accomplished using an <AutoText text="AZCompactList"/> classifier rather than <AutoText text="AZList"/>, and tuning the <AutoText text="VList"/> format statement.</Text>
     2607<Text id="0684">Under <AutoText key="coredm::_Global:labelTitle_"/> documents from the same series are repeated without any distinguishing features such as date. It would be better to group them by series title and display dates within each group. This can be accomplished using an <AutoText text="AZCompactList"/> classifier rather than <AutoText text="AZList"/>, and tuning the <AutoText text="VList"/> format statement.</Text>
    18852608</Comment>
    18862609<NumberedItem>
     
    18912614</NumberedItem>
    18922615<NumberedItem>
    1893 <Text id="0687"><b>Modify</b> the format statement for <AutoText text="VList" />. Find the part of the default statement that says</Text>
     2616<Text id="0687"><b>Modify</b> the format statement for <AutoText text="VList" /> (under <AutoText key="glidict::CDM.GUI.Formats"/>). Find the part of the default statement that says</Text>
    18942617<Format>{If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;}</Format>
    18952618<Text id="0689">and change it to</Text>
     
    19012624<Format>&lt;/td&gt;</Format>
    19022625<Text id="0692">append</Text>
    1903 <Format>{If}{[numleafdocs],&lt;td&gt;([numleafdocs] items)&lt;/td&gt;}</Format>
     2626<Format>{If}{[numleafdocs],&lt;td&gt;([numleafdocs])&lt;/td&gt;}</Format>
    19042627<Text id="0692a">and click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
    19052628<Comment>
    1906 <Text id="0693">As a consequence of using the <AutoText text="AZCompactList"/> classifier, bookshelf icons appear when titles are browsed. This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. It works by exploiting the fact that only bookshelf icons define <AutoText text="numleafdocs"/> metadata.</Text>
    1907 </Comment>
    1908 </NumberedItem>
    1909 <Heading>
    1910 <Text id="0694">Suppressing dummy text</Text>
     2629<Text id="0693">As a consequence of using the <AutoText text="AZCompactList"/> classifier, bookshelf icons appear when titles are browsed. This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. It works by exploiting the fact that only bookshelf icons define <Format>[numleafdocs]</Format> metadata.</Text>
     2630</Comment>
     2631</NumberedItem>
     2632<NumberedItem>
     2633<Text id="0690a"><b>Build</b> and <b>preview</b> the collection.</Text>
     2634</NumberedItem>
     2635<Heading>
     2636<Text id="0694">Displaying scanned images and suppressing dummy text</Text>
    19112637</Heading>
    19122638<Comment>
     
    19142640</Comment>
    19152641<NumberedItem>
    1916 <Text id="0696">Staying within the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, under <AutoText key="glidict::CDM.FormatManager.Feature"/> select <AutoText text="DocumentText"/>. The default format string displays the document's plain text, which, if there is none, is set to <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. Change this to:</Text>
    1917 <Format>
    1918 &lt;center&gt;<br/> 
    1919 &nbsp;&nbsp;&lt;table width=_pagewidth_&gt;<br/> 
    1920 &nbsp;&nbsp;&nbsp;&nbsp;&lt;tr&gt;<br/> 
    1921 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td valign=top&gt;[srclink][screenicon][/srclink]&lt;/td&gt;<br/> 
    1922 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;[Text]&lt;/td&gt;<br/> 
    1923 &nbsp;&nbsp;&nbsp;&nbsp;&lt;/tr&gt;<br/> 
    1924 &nbsp;&nbsp;&lt;/table&gt;<br/>
    1925 &lt;/center&gt;
     2642<Text id="0696">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. The default format string displays the document's plain text, which, if there is none, is set to <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. Change this to:</Text>
     2643<Format>
     2644&lt;center&gt;&lt;table width=_pagewidth_&gt;&lt;tr&gt;<br/> 
     2645&nbsp;&nbsp;&lt;td valign=top&gt;[srclink][screenicon][/srclink]&lt;/td&gt;<br/> 
     2646&nbsp;&nbsp;&lt;td&gt;[Text]&lt;/td&gt;<br/> 
     2647&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;
    19262648</Format>
    19272649<Text id="0696a">and click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text>
    1928 <Text id="0697">(available as <Path>niupepa &rarr; doc_tweak.txt</Path>)</Text>
    1929 <Comment>
    1930 <Text id="0698">Including <Format>[screenicon]</Format> has the effect of embedding the screen-sized image generated by switching the screenview option on in <AutoText text="PagedImgPlug"/>. It is hyperlinked to the original image by the construct <Format>[srclink]...[/srclink]</Format>.</Text>
    1931 </Comment>
    1932 </NumberedItem>
    1933 <NumberedItem>
    1934 <Text id="0699">Switch to the <AutoText key="glidict::GUI.Create"/> panel<b>;</b> <b>build</b> and <b>preview</b> the revised collection.</Text>
    1935 </NumberedItem>
    1936 <NumberedItem>
    1937 <Text id="0700">If you like, add a logo and change the background as you have done before. You will find a suitable image in the file <Path>niupepa &rarr; images</Path>, that is activated through <Path>macros &rarr; extra.dm</Path>.</Text>
     2650<Text id="0697">(This format statement can be copied and pasted from the file <Path>sample_files &rarr; niupepa &rarr; doc_tweak.txt</Path>)</Text>
     2651<Comment>
     2652<Text id="0698">Including <Format>[screenicon]</Format> has the effect of embedding the screen-sized image generated by switching the <AutoText text="screenview"/> option on in <AutoText text="PagedImgPlug"/>. It is hyperlinked to the original image by the construct <Format>[srclink]...[/srclink]</Format>.</Text>
     2653</Comment>
     2654<Text id="0698a">This modification will display screenview image, but does nothing about the dummy text <AutoText key="perlmodules::BasPlug.dummy_text" type="plain"/>, which will still be displayed. To get rid of this, edit the <AutoText text="DocumentText"/> format statement again and replace</Text>
     2655<Format>
     2656&lt;td&gt;[Text]&lt;/td&gt;
     2657</Format>
     2658<Text id="0698b">with</Text>
     2659<Format>
     2660{If}{[Text] ne "<AutoText key="perlmodules::BasPlug.dummy_text" type="plain"/> ",&lt;td&gt;[Text]&lt;/td&gt;}
     2661</Format>
     2662<Text id="0698c">Preview the collection and view one of the <AutoText text="Te Waka o Te Iwi"/> documents. The line <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/> should now be gone. (Note that it important to get the text exactly right for this to work, including the space after the ".".)</Text>
     2663</NumberedItem>
     2664<NumberedItem>
     2665<Text id="0699"><b>Preview</b> the revised collection.</Text>
     2666</NumberedItem>
     2667<Heading>
     2668<Text id="0690b">Searching at page level</Text>
     2669</Heading>
     2670<NumberedItem>
     2671<Text id="0690c">The newspaper documents are split into sections, one per page. For large documents, it is useful to be able to search on sections rather than documents. This allows users to more easily locate the relevant information in the document.</Text>
     2672</NumberedItem>
     2673<NumberedItem>
     2674<Text id="0690d">Go to the <AutoText key="glidict::CDM.GUI.Indexes"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Remove the <AutoText key="metadata::ex.Source"/> index. Select the <AutoText text="text"/> index in the <AutoText key="glidict::CDM.IndexManager.Indexes"/> box, and change the <AutoText key="glidict::CDM.IndexManager.Index_Name"/> to "whole newspapers". Click <AutoText key="glidict::CDM.IndexManager.MGPP.Replace_Index" type="button"/>. Create a new index: set the <AutoText key="glidict::CDM.IndexManager.Index_Name"/> to "newspaper pages", keep <AutoText text="text"/> selected in <AutoText key="glidict::CDM.IndexManager.Source"/>, and change <AutoText key="glidict::CDM.IndexManager.Level"/> to <AutoText text="section"/>. Click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>. Click <AutoText key="glidict::CDM.IndexManager.Set_Default" type="button"/> on the right hand side to make the "newspaper pages" index the default.</Text>
     2675</NumberedItem>
     2676<NumberedItem>
     2677<Text id="0690e"><b>Build</b> and <b>preview</b> the collection. Compare searching in the "whole newspapers" index compared to the "newspaper pages" index. A useful search term for this collection is <AutoText text="aroha" type="quoted"/>.</Text>
     2678</NumberedItem>
     2679<NumberedItem>
     2680<Text id="0690f">You will notice that when searching for individual pages, the newspaper image is displayed in the search results. As these images are very large, this is not very useful. To remove this, edit the format statement for <AutoText text="VList"/> (under <AutoText key="glidict::CDM.GUI.Formats"/>), and remove the second line:</Text>
     2681<Format>
     2682&lt;td valign="top"&gt;[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]&lt;/td&gt;
     2683</Format>
     2684<Text id="0690g">Preview the collection&mdash;the search results should be back to normal. </Text>
     2685</NumberedItem>
     2686<NumberedItem>
     2687<Text id="0690h">Now you will notice that page level search results on show the Title of the page (the page number), and not the Title of the newspaper. We'll modify the format statement to show the paper title as well as the page number. In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText text="Search"/> in <AutoText key="glidict::CDM.FormatManager.Feature"/>, and <AutoText text="VList"/> in <AutoText key="glidict::CDM.FormatManager.Part"/>.</Text>
     2688<Text id="0690i">The extracted Title for the current section is specified as <Format>[ex.Title]</Format> while the Title for the parent section is <Format>[parent:ex.Title]</Format>. Since the same <AutoText text="SearchVList"/> format statement is used when searching both whole newspapers and newspaper pages, we need to make sure it works in both cases.</Text>
     2689<Text id="0690j">Set the format statement to the following:</Text>
     2690<Format>
     2691&lt;td valign=top&gt;[link][icon][/link]&lt;/td&gt;<br/>
     2692&lt;td valign=top&gt;<br/>
     2693{If}{[parent:ex.Title],[parent:ex.Title]: }[ex.Title] &lt;br&gt;<br/>
     2694&lt;i&gt;({Or}{[parent:ex.Date],[ex.Date]})&lt;/i&gt;&lt;/td&gt;
     2695</Format>
     2696<Text id="0690k">(The format statement can be copied and pasted from the file <Path>sample_files &rarr; niupepa &rarr; search_tweak.txt</Path>.)</Text>
     2697<Text id="0690l">The first line links to the document. The third line displays the parent Title if there is one, then the Title of the current page or document. The fourth line displays either the parent Date (in the case of pages) or the Date (in the case of documents), in italics (<Format>&lt;i&gt;..&lt;/i&gt;</Format>).</Text>
    19382698</NumberedItem>
    19392699<Comment>
    19402700<Text id="0701">In the collection you have just built, newspapers are grouped by series title, and dates are supplied alongside each one to distinguish it from others in the same series. Users can browse chronologically by date, and when a newspaper page is viewed a preview image is shown on the left that displays the original high-resolution version when clicked, accompanied on the right by the plain-text version of that newspaper (if available).</Text>
    19412701</Comment>
     2702</Content>
     2703</Tutorial>
     2704<Tutorial id="advanced_scanned_image_collection">
     2705<Title>
     2706<Text id="sc1">Advanced scanned image collection</Text>
     2707</Title>
     2708<SampleFiles folder="niupepa"/>
     2709<Prerequisite id="scanned_image_collection"/>
     2710<Version initial="2.70" current="2.70"/>
     2711<Content>
     2712<Comment>
     2713<Text id="sc2">In this exercise we build upon the collection created in <TutorialRef id="scanned_image_collection"/>. We add a new newspaper by creating an item file for it, add a new newspaper using the extended XML item file format, and modify the formatting.</Text>
     2714</Comment>
     2715<Heading>
     2716<Text id="sc3">Adding another newspaper to the collection</Text>
     2717</Heading>
     2718<Comment>
     2719<Text id="sc4">Another newspaper has been scanned and OCRed, but has no item file. We will add this newspaper into the collection, and create an item file for it.</Text>
     2720</Comment>
     2721<NumberedItem>
     2722<Text id="sc5">In the Librarian Interface, open up the Paged Image collection that was created in exercise <TutorialRef id="scanned_image_collection"/> if it is not already open (<Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_New"/></Menu>).</Text>
     2723</NumberedItem>
     2724<NumberedItem>
     2725<Text id="sc6">In the <AutoText key="glidict::GUI.Gather"/> panel, add the folder <Path>sample_files &rarr; niupepa &rarr; new_papers &rarr; 12</Path> to your collection. </Text>
     2726<Text id="sc7">A series of popups ask you about adding plugins to the collection to process the text and image files. Remember that <AutoText text="ImagePlug"/> and <AutoText text="TextPlug"/> were removed from the collection as we wanted these files to be processed by <AutoText text="PagedImgPlug"/>. Click <AutoText key="glidict::CDM.PlugInManager.Ignore" type="button"/> for each popup.</Text>
     2727<Text id="sc8">You may notice that for text files, the Librarian Interface suggests <AutoText text="ProCitePlug"/> as the plugin to add. If you open up the <AutoText key="glidict::CDM.PlugInManager.PlugIn"/> drop down list, you can see that <AutoText text="TEXTPlug"/> is also suggested. Both these plugins process files with extension <AutoText text=".txt" type="italics"/>.</Text>
     2728</NumberedItem>
     2729<NumberedItem>
     2730<Text id="sc9">Inside the <AutoText text="12 "/>folder you can see that there are 4 images and 4 text files.</Text>
     2731</NumberedItem>
     2732<NumberedItem>
     2733<Text id="sc10">Create an item file for the collection. Have a look at an existing item file to see the format. Start up a text editor (e.g. WordPad) to open a new document. Add some metadata. The <AutoText text="Title"/> for this newspaper is <AutoText text="Te Haeata 1859-1862" type="quoted"/>. The <AutoText text="Volume"/> is 3, <AutoText text="Number"/> is 6, and the <AutoText text="Date"/> is <AutoText text="18610902" type="quoted"/>. (Greenstone's date format is <AutoText text="yyyymmdd"/>.) Metadata must be added in the form:</Text>
     2734<Format>
     2735<Text id="sc11">&lt;Metadata name&gt;Metadata value</Text>
     2736</Format>
     2737<Text id="sc12">For this document, the metadata looks like:</Text>
     2738<Format>
     2739&lt;Title&gt;Te Haeata 1859-1862<br/>
     2740&lt;Date&gt;18610902<br/>
     2741&lt;Volume&gt;3<br/>
     2742&lt;Number&gt;6
     2743</Format>
     2744</NumberedItem>
     2745<NumberedItem>
     2746<Text id="sc13">For each page, add a line in the file in the following format:</Text>
     2747<Format>
     2748<Text id="sc14">pagenum:imagefile:textfile::</Text>
     2749</Format>
     2750<Text id="sc15">For example, the first page entry would look like</Text>
     2751<Format>
     27521:images/12_3_6_1.gif:text/12_3_6_1.txt::
     2753</Format>
     2754<Text id="sc16">Note that if there is no text file, you can leave that space blank.</Text>
     2755</NumberedItem>
     2756<NumberedItem>
     2757<Text id="sc17">Save the file using <b>Filename</b> <AutoText text="12_3_6.item" type="italics"/>, and <b>Save as type</b> <i>All files</i>. (Don't save as type <AutoText text=".txt" type="italics"/> as this will save the file as <AutoText text="12_3_6.item.txt"/>.)  Back in the <AutoText key="glidict::GUI.Gather"/> panel of the Librarian Interface, locate the new file in the <b>Workspace</b> tree, and drag it into the collection, adding it to the <AutoText text="12"/> folder.</Text>
     2758</NumberedItem>
     2759<NumberedItem>
     2760<Text id="sc18"><b>Build</b> the collection and <b>preview</b>. Check that your new document has been added.</Text>
     2761</NumberedItem>
     2762<Heading>
     2763<Text id="sc19">XML based item file</Text>
     2764</Heading>
     2765<Text id="sc20">There are two styles of item files. The first, which was used in the previous section, uses a simple text based format, and consists of a list of metadata for the document, and a list of pages. This format allows specification of document level metadata, and a single list of pages.</Text>
     2766<Text id="sc21">The second style is an extended format, and uses XML. It allows a hierarchy of pages, and metadata specification at the page level as well as at the document level. In this section, we add in two newspapers which use XML-based item files.</Text>
     2767<NumberedItem>
     2768<Text id="sc22">In the <AutoText key="glidict::GUI.Gather"/> panel, add the folder <Path>sample_files &rarr; niupepa &rarr; new_papers &rarr; xml</Path> to your collection. </Text>
     2769</NumberedItem>
     2770<NumberedItem>
     2771<Text id="sc23">Open up the file <Path>xml &rarr; 23 &rarr; 23__1.item</Path> and have a look at the XML. This is <AutoText text="Number"/> <AutoText text="1" type="italics"/> of <AutoText text="Series"/> <AutoText text="Matariki 1881" type="italics"/>. The contents of this document have been grouped into two sections: <AutoText text="Supplementary Material"/>, which contains an <AutoText text="Abstract"/>, and <AutoText text="Newspaper Pages"/>, which contains the page images (and OCR text). </Text>
     2772</NumberedItem>
     2773<NumberedItem>
     2774<Text id="sc24">Build and preview the collection. The xml style items have been included, but the display is not very nice.</Text>
     2775</NumberedItem>
     2776<Heading>
     2777<Text id="sc24a">Using <AutoText text="process_exp"/> to control document processing</Text>
     2778</Heading>
     2779<NumberedItem>
     2780<Text id="sc25">Paged documents can be presented with a hierarchical table of contents, or with next and previous page arrows, and a goto page box (like we have done so far). The display type is specified by the <AutoText text="documenttype (hierarchy|paged)"/> option to <AutoText text="PagedImgPlug"/>. The next and previous arrows suit the linear sequence documents, while the table of contents suits the hierarchically organised document. </Text>
     2781<Text id="sc25a">Ordinarily, a Greenstone collection would have one plugin per document type, and all documents of that type get the same processing. In this case, we want to treat the XML-based item files differently from the text-based item files. We can achieve this by adding two PagedImgPlug plugins to the collection, and configuring them differently.</Text>
     2782</NumberedItem>
     2783<NumberedItem>
     2784<Text id="sc26">Close the collection in the Librarian Interface. It will not let you add two of the same plugin (apart from <AutoText text="UnknownPlug"/>), so the second <AutoText text="PagedImgPlug"/> must be added to the collect.cfg file manually.</Text>
     2785</NumberedItem>
     2786<NumberedItem>
     2787<Text id="sc27">Open the file <Path>greenstone &rarr; collect &rarr; pagedimg &rarr; etc &rarr; collect.cfg</Path> in a text editor. Copy the <AutoText text="plugin PagedImgPlug"/> line and paste it above the existing one. Edit the first one so that the two plugins look like:</Text>
     2788<Format>
     2789plugin PagedImgPlug -screenview -minimumsize 100 -documenttype hierarchy -process_exp xml.*.item$<br/>
     2790plugin PagedImgPlug -screenview -minimumsize 100 -documenttype paged
     2791</Format>
     2792<Text id="sc28">The XML based newpapers have been grouped into a folder called <Path>xml</Path>. This enables us to process these files differently, by utilising the <AutoText text="process_exp"/> option which all plugins support. The first <AutoText text="PagedImgPlug"/> in the list looks for item files underneath the <Path>xml</Path> folder. These documents will be processed as hierarchical documents. Item files that don't match the process expression (i.e. aren't underneath the <Path>xml</Path> folder) will be passed onto the second <AutoText text="PagedImgPlug"/>, and these are treated as paged documents.</Text>
     2793<Text id="sc29"><b>Rebuild</b> and <b>preview</b> the collection. Compare the document display for a paged document e.g. <AutoText text="Te Waka o Te Iwi, Vol. 1, No. 1"/> with a hierarchical document, e.g. <AutoText text="Matariki 1881, No. 1"/>.</Text>
     2794</NumberedItem>
     2795<Heading>
     2796<Text id="sc30">Switching between images and text</Text>
     2797</Heading>
     2798<Text id="sc31">We can modify the document display to switch between the text version and the screenview and full size versions. We do this using a combination of format statements and macro files.</Text>
     2799<NumberedItem>
     2800<Text id="sc32">First, copy the new macro file into the collection. Copy <Path>sample_files &rarr; niupepa &rarr; extra.dm</Path> into the <Path>Greenstone &rarr; collect &rarr; pagedimg &rarr; macros</Path> folder.</Text>
     2801</NumberedItem>
     2802<NumberedItem>
     2803<Text id="sc33a">Back in the Librarian Interface, go to the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel.</Text>
     2804</NumberedItem>
     2805<NumberedItem>
     2806<Text id="sc33b">Select <AutoText text="AllowExtendedOptions"/> in the <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and tick <AutoText key="glidict::CDM.FormatManager.Enabled"/>. This allows us to use some extended formatting options.</Text>
     2807</NumberedItem>
     2808<NumberedItem>
     2809<Text id="sc33c">Select the <AutoText text="DocumentHeading"/> format item and set it to the following:</Text>
     2810<Format>
     2811&lt;center&gt;&lt;table width=_pagewidth_&gt;<br/>
     2812&lt;tr valign=top&gt;&lt;td&gt;{Or}{[parent(Top):Series],[Series]}&lt;/td&gt;&lt;/tr&gt;<br/>
     2813&lt;tr valign=top&gt;&lt;td&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;<br/>
     2814[DocumentButtonDetach][DocumentButtonHighlight]<br/>
     2815{If}{_cgiargp_ eq 'fullsize',<br/>
     2816{If}{[screenicon],_document:viewpreview_}<br/>
     2817{If}{[Text] ne \'This document has no text. \',_document:viewtext_},<br/>
     2818{If}{_cgiargp_ eq 'preview',{If}{[srcicon],_document:viewfullsize_}<br/>
     2819{If}{[Text] ne \'This document has no text. \',_document:viewtext_},<br/>
     2820{If}{[srcicon],_document:viewfullsize_}<br/>
     2821{If}{[screenicon],_document:viewpreview_}<br/>
     2822}}<br/>
     2823&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/td&gt;<br/>
     2824&lt;td&gt;[DocTOC]&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;
     2825</Format>
     2826<Text id="sc33d">This format statement can be copied from <Path>sample_files &rarr; niupepa &rarr; adv_doc_heading.txt</Path>. It is quite complicated. </Text>
     2827<Text id="sc33e"><Format>{Or}{[parent(Top):Series],[Series]}</Format> outputs the Series metadata. This is only stored at the top level document level, so if we are at a subsection, we need to get it from the top level (<Format>[parent(Top):Series]</Format>).</Text>
     2828<Text id="sc33f"><Format>[DocumentButtonDetach][DocumentButtonHighlight]</Format> outputs the <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons.</Text>
     2829<Text id="sc33g"><Format>_document:viewpreview_, _document:viewfullsize_, _document:viewtext_</Format> are macros defined in <Path>extra.dm</Path> which output buttons for preview, fullsize and text versions, respectively.</Text>
     2830<Text id="sc33h">The set of nested <Format>{If}</Format> statements determine which buttons are output, depending on which option is currently selected and which options are available. For example, if the user is currently viewing the full sized image, then the fullsize image button is suppressed, and preview and text buttons are only displayed if that information is available for the current page.</Text>
     2831</NumberedItem>
     2832<NumberedItem>
     2833<Text id="sc34a">Select the <AutoText text="DocumentText"/> format statement and set it to:</Text>
     2834<Format>
     2835&lt;center&gt;&lt;table width=_pagewidth_&gt;&lt;tr&gt;&lt;td&gt;<br/>
     2836{If}{_cgiargp_ eq 'fullsize',[srcicon],<br/>
     2837{If}{_cgiargp_ eq 'preview',[screenicon],{If}{[Text] ne \'This document has no text. \',[Text]}}}<br/>
     2838&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;<br/>
     2839</Format>
     2840<Text id="sc34b">This format statement can be copied from <Path>sample_files &rarr; niupepa &rarr; adv_doc_text.txt</Path>. It changes the display based on the <AutoText text="p" type="quoted"/> argument (<Format>_cgiargp_</Format>). This is not used normally for document display, so we can use it here to switch between full size image (<Format>[srcicon]</Format>), preview size image (<Format>[screenicon]</Format>) and text (<Format>[Text]</Format>) versions of each page.</Text>
     2841</NumberedItem>
     2842<NumberedItem>
     2843<Text id="sc35">Preview the collection. View some of the documents&mdash;once you have reaced a newspaper page, you should get fullsize, preview and text options.</Text>
     2844</NumberedItem>
    19422845</Content>
    19432846</Tutorial>
     
    20162919<Format>&lt;h3&gt;[Subject]&lt;/h3&gt;</Format>
    20172920<Comment>
    2018 <Text id="0723">The document heading appears above the detach and no highlighting buttons when you get to a document in the collection. By default <AutoText text="DocumentHeading"/> displays the document's <AutoText key="metadata::ex.Title"/> metadata. In this particular set of OAI exported records, titles are filenames of JPEG images, and the filenames are particularly uninformative (for example, 01dla14). You can see them in the <AutoText key="glidict::GUI.Enrich"/> panel if you select an image in <Path>sample_small &rarr; oai &rarr; JCDLPICS &rarr; srcdocs</Path> and check its <AutoText key="metadata::ex.Filename"/> and <AutoText key="metadata::ex.Title"/> metadata. The above format statement displays <AutoText key="metadata::ex.Subject"/> metadata instead.</Text>
     2921<Text id="0723">The document heading appears above the <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons when you get to a document in the collection. By default <AutoText text="DocumentHeading"/> displays the document's <AutoText key="metadata::ex.Title"/> metadata. In this particular set of OAI exported records, titles are filenames of JPEG images, and the filenames are particularly uninformative (for example, 01dla14). You can see them in the <AutoText key="glidict::GUI.Enrich"/> panel if you select an image in <Path>sample_small &rarr; oai &rarr; JCDLPICS &rarr; srcdocs</Path> and check its <AutoText key="metadata::ex.Filename"/> and <AutoText key="metadata::ex.Title"/> metadata. The above format statement displays <AutoText key="metadata::ex.Subject"/> metadata instead.</Text>
    20192922</Comment>
    20202923</NumberedItem>
     
    21563059</NumberedItem>
    21573060<Comment>
    2158 <Text id="0770">If you browse by titles a-z, you will find 7 documents listed, though only 5 items were exported from DSpace. Two of the original items had alternative forms in their directory folder. DSpace plug-in options control what happens in such situations: the default is to treat them as separate Greenstone documents.</Text>
     3061<Text id="0770">If you browse by <AutoText key="coredm::_Global:labelTitle_" type="italics"/>, you will find 7 documents listed, though only 5 items were exported from DSpace. Two of the original items had alternative forms in their directory folder. DSpace plug-in options control what happens in such situations: the default is to treat them as separate Greenstone documents.</Text>
    21593062</Comment>
    21603063<Comment>
     
    22763179</Content>
    22773180</Tutorial>
     3181<Tutorial id="gems">
     3182<Title>
     3183<Text id="gems-1">Editing metadata sets</Text>
     3184</Title>
     3185<Content>
     3186<Text id="gems-2">GEMS (Greenstone Editor for Metadata Sets) can be used to modify existing metadata sets or create new ones.</Text>
     3187<Heading>
     3188<Text id="gems-3">Running GEMS</Text>
     3189</Heading>
     3190<NumberedItem>
     3191<Text id="gems-4">Start the Greenstone Editor for Metadata Sets (GEMS):</Text>
     3192<Text id="gems-5"><Menu>Start &rarr; All Programs &rarr; Greenstone Digital Library Software &rarr; Greenstone Editor for Metadata Sets</Menu></Text>
     3193</NumberedItem>
     3194<NumberedItem>
     3195<Text id="gems-6">A list of all the available metadata sets is shown on the left hand side. Explore these metadata sets, and see what elements belong to each set. Double click on a folder icon to open the set. A list of elements will be displayed.</Text>
     3196</NumberedItem>
     3197<Heading>
     3198<Text id="gems-7">Creating a new metadata set</Text>
     3199</Heading>
     3200<NumberedItem>
     3201<Text id="gems-8">In this exercise, we will create a new metadata set. In order to save time, we will base it on an existing one: Development Library Subset. From the <AutoText key="glidict::Menu.File"/> menu, select <AutoText key="glidict::Menu.File_New"/> (<AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_New"/>). A popup window appears: <AutoText key="glidict::GEMS.Add_Set"/>. Fill in the fields. Use <AutoText text="My Metadata Set" type="quoted"/> for the <AutoText key="glidict::GEMS.Name"/>, <AutoText text="my" type="quoted"/> for the <AutoText key="glidict::GEMS.Namespace"/>, and select "Development Library Subset Example Metadata" from the <AutoText key="glidict::GEMS.inheritMetadataSet"/> drop down list. Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
     3202</NumberedItem>
     3203<NumberedItem>
     3204<Text id="gems-9">A folder for <AutoText text="My Metadata Set"/> will appear in the metadata set list on the left. Double click the folder icon to see what elements it has. Since it was based on the Development Library Subset metadata set, it contains all the elements from that set.</Text>
     3205</NumberedItem>
     3206<Heading>
     3207<Text id="gems-10">Adding a new element to a metadata set</Text>
     3208</Heading>
     3209<NumberedItem>
     3210<Text id="gems-11">Right click on the <AutoText text="My Metadata Set (my)"/> item in the list of metadata sets, and choose <AutoText key="glidict::GEMS.Add_Element"/> from the menu that appears. In the popup window, type <AutoText text="Category" type="quoted"/> for the <AutoText key="glidict::GEMS.Name"/>, and click <AutoText key="glidict::General.OK" type="button"/>. The new element will appear in the list.</Text>
     3211</NumberedItem>
     3212<NumberedItem>
     3213<Text id="gems-12">Right click on the <AutoText text="my.Category"/> element and select <AutoText key="glidict::GEMS.Add_Attribute"/> from the menu. Select <AutoText text="definition"/> from the <AutoText key="glidict::GEMS.Name"/> drop down list, and enter <AutoText text="The category this resource belongs to" type="quoted"/> in the <AutoText key="glidict::GEMS.Values"/> box. The GLI uses the element definitions when displaying information about a metadata set.</Text>
     3214</NumberedItem>
     3215<NumberedItem>
     3216<Text id="gems-13">Save the new metadata set by <Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_Save"/></Menu>, then close the GEMS by <Menu><AutoText key="glidict::Menu.File"/> &rarr; <AutoText key="glidict::Menu.File_Exit"/></Menu>.</Text>
     3217</NumberedItem>
     3218</Content>
     3219</Tutorial>
    22783220</TutorialList>
Note: See TracChangeset for help on using the changeset viewer.