Changeset 11968
- Timestamp:
- 2006-06-27T16:25:07+12:00 (17 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl-documentation/tutorials/xml-source/tutorial_en.xml
r11897 r11968 44 44 </NumberedItem> 45 45 <NumberedItem> 46 <Text id="0089">The InstallShield Wizard begins to install the UNAIDS pre-packaged collection. Select the English language .</Text>47 </NumberedItem> 48 <NumberedItem> 49 <Text id="0090"> Click the <b><next></b> button.</Text>50 </NumberedItem> 51 <NumberedItem> 52 <Text id="0091">Choose <b>Run from CD-ROM (standard) </b>as the setup type. This is the default and is already selected. Then click <b><next></b>.</Text>53 </NumberedItem> 54 <NumberedItem> 55 <Text id="0092">Click <b>< next> </b>again to install the UNAIDS collection in the default folder, which is <b>C:\Program Files\UNAIDS Library 2.0 [CD-ROM]</b>.</Text>46 <Text id="0089">The InstallShield Wizard begins to install the UNAIDS pre-packaged collection. Select the English language and click <b><OK></b>.</Text> 47 </NumberedItem> 48 <NumberedItem> 49 <Text id="0090">On the welcome screen, click the <b><Next></b> button.</Text> 50 </NumberedItem> 51 <NumberedItem> 52 <Text id="0091">Choose <b>Run from CD-ROM (standard)</b> as the setup type. This is the default and is already selected. Then click <b><Next></b>.</Text> 53 </NumberedItem> 54 <NumberedItem> 55 <Text id="0092">Click <b><Next></b> again to install the UNAIDS collection in the default folder, which is <b>C:\Program Files\UNAIDS Library 2.0 [CD-ROM]</b>.</Text> 56 56 <Comment> 57 57 <Text id="0093">Installation Wizard copies the required files from CD-ROM to disk</Text> … … 59 59 </NumberedItem> 60 60 <NumberedItem> 61 <Text id="0094">Click <b><OK </b>>to confirm completion of UNAIDS collection (twice).</Text>61 <Text id="0094">Click <b><OK></b> to confirm completion of UNAIDS collection (twice).</Text> 62 62 <Comment> 63 63 <Text id="0095">InstallShield quits—the UNAIDS Library is installed.</Text> … … 373 373 <Text id="0193">Installing Greenstone</Text> 374 374 </Title> 375 <Version initial="2.60" current="2.70 "/>375 <Version initial="2.60" current="2.70w"/> 376 376 <Content> 377 377 <Heading> … … 484 484 </Title> 485 485 <Prerequisite id="install_greenstone"/> 486 <Version initial="2.60" current="2.70 "/>486 <Version initial="2.60" current="2.70w"/> 487 487 <Content> 488 488 <Comment> … … 589 589 </Title> 590 590 <SampleFiles folder="hobbits"/> 591 <Version initial="2.60" current="2.70 "/>591 <Version initial="2.60" current="2.70w"/> 592 592 <Content> 593 593 <Comment> … … 678 678 </Content> 679 679 </Tutorial> 680 <Tutorial id=" large_html_collection">680 <Tutorial id="simple_image_collection"> 681 681 <Title> 682 <Text id="03 87">A large collection of HTML files—Tudor</Text>682 <Text id="0337">A simple image collection</Text> 683 683 </Title> 684 <SampleFiles folder=" tudor"/>685 <Version initial="2.60" current="2.70 "/>684 <SampleFiles folder="images"/> 685 <Version initial="2.60" current="2.70w"/> 686 686 <Content> 687 687 <NumberedItem> 688 <Text id="0388">Invoke the Greenstone Librarian Interface (from the Windows <i>Start</i> menu) and start a new collection called <b>tudor</b> (use the <AutoText key="glidict::Menu.File"/> menu). Fill out the pop-up dialog with appropriate values and leave <b>Dublin Core</b>, which is selected by default, as the metadata set.</Text> 689 </NumberedItem> 690 <NumberedItem> 691 <Text id="0389">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>tudor</Path> folder in <Path>sample_files</Path>.</Text> 692 </NumberedItem> 693 <NumberedItem> 694 <Text id="0390">Drag <Path>englishhistory.net</Path> from the left-hand side to the right to include it in your <b>tudor</b> collection.</Text> 695 </NumberedItem> 696 <NumberedItem> 697 <Text id="0391">Switch to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>.</Text> 698 </NumberedItem> 699 <NumberedItem> 700 <Text id="0392">When building has finished, <b>preview</b> the collection.</Text> 701 </NumberedItem> 702 <Heading> 703 <Text id="0392a">Extracting more metadata from the HTML</Text> 704 </Heading> 705 <NumberedItem> 706 <Text id="0393">The browsing facilities in this collection (<AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/>) are based entirely on extracted metadata. Return to the <AutoText key="glidict::GUI.Enrich"/> panel in the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text> 707 </NumberedItem> 708 <NumberedItem> 709 <Text id="0393a">Many HTML documents contain metadata in <Format><meta></Format> tags in the <Format><head></Format> of the page. Open up the <Path>englishhistory.net → tudor → monarchs → boleyn.html</Path> file by navigating to it in the tree on the left hand side, and double clicking it. This will open it in a web browser. View the HTML source of the page (<Menu>View → Source</Menu> in Internet Explorer, <Menu>View → Page Source</Menu> in Mozilla). You will notice that this page has <AutoText text="page_topic,content" type="italics"/> and <AutoText text="author" type="italics"/> metadata.</Text> 710 </NumberedItem> 711 <NumberedItem> 712 <Text id="0393b">By default, <AutoText text="HTMLPlug"/> only looks for Title metadata. Configure the plugin so that it looks for the other metadata too. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Switch on the <AutoText text="metadata_fields"/> option, and set the value to <AutoText text="Title,Author,Page_topic,Content" type="quoted"/>. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 713 </NumberedItem> 714 <NumberedItem> 715 <Text id="0393c">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>rebuild</b> the collection. Go back to the <AutoText key="glidict::GUI.Enrich"/> panel and look at the extracted metadata for some of the HTML files in <Path>englishhistory.net → tudor → monarchs</Path>. The new metadata should new be visible.</Text> 716 </NumberedItem> 717 <Heading> 718 <Text id="0393d">Blocking the stray images</Text> 719 </Heading> 720 <Comment> 721 <Text id="0394">You've probably noticed that the collection contains a few stray image files, as well as the HTML documents. This is a mistake. The issue is that many of the HTML documents include images, and although Greenstone attempts to determine which images belong to HTML pages and only considers other images for inclusion in the collection, in this case it hasn't been completely successful. (This is because the web site from which these files were downloaded occasionally departs from the usual convention of hierarchical structuring.)</Text> 722 </Comment> 723 <NumberedItem> 724 <Text id="0395">Switch back to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Beside <AutoText text="plugin HTMLPlug"/> you will see <AutoText text="-smart_block"/>. This is the option that attempts to identify images in the HTML pages and block them from inclusion—in this case, it's not smart enough! <b>Configure</b> <AutoText text="plugin HTMLPlug"/> again, scroll down the page to locate the <AutoText text="smart_block"/> option, and switch it off.</Text> 725 </NumberedItem> 726 <NumberedItem> 727 <Text id="0396"><b>Rebuild</b> and <b>preview</b> the collection. The collection is exactly as before except that these stray images are suppressed. What is happening is that plug-ins operate as a pipeline: files are passed to each one in turn until one is found that can process it. By default (i.e. without <AutoText text="smart_block"/>) the HTML plug-in blocks <i>all</i> images, which is appropriate for this collection.</Text> 728 </NumberedItem> 729 <Heading> 730 <Text id="0397">Looking at different views of the files in the <AutoText key="glidict::GUI.Gather"/> and <AutoText key="glidict::GUI.Enrich"/> panels</Text> 731 </Heading> 732 <NumberedItem> 733 <Text id="0398">Switch to the <AutoText key="glidict::GUI.Gather"/> panel and in the right-hand side open <Path>englishhistory.net → tudor</Path>.</Text> 734 </NumberedItem> 735 <NumberedItem> 736 <Text id="0400">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu for the right-hand side from <AutoText key="glidict::Filter.All_Files"/> to <AutoText key="glidict::Filter.0"/>. Notice the files displayed above are filtered accordingly, to show only files of this type.</Text> 737 </NumberedItem> 738 <NumberedItem> 739 <Text id="0401">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu to <AutoText key="glidict::Filter.3"/>. Again, the files shown above alter.</Text> 740 </NumberedItem> 741 <NumberedItem> 742 <Text id="0402">Now return the <AutoText key="glidict::Filter.Filter_Tree"/> setting back to <AutoText key="glidict::Filter.All_Files"/>, otherwise you may get confused later. Remember, if the <AutoText key="glidict::GUI.Gather"/> or <AutoText key="glidict::GUI.Enrich"/> panels do not seem to be showing all your files, this could be the problem.</Text> 688 <Text id="0338">In the Librarian Interface, start a new collection (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>) called <b>backdrop</b>. Fill out the fields with appropriate information. For <AutoText key="glidict::NewCollectionPrompt.Base_Collection"/>, select the item <b>Simple image collection (image-e)</b> from the pull-down menu.</Text> 689 <Comment> 690 <Text id="0340a">When you base a collection on an existing one, it inherits all the settings of the old one. You won't be asked to choose a metadata set because the new collection inherits the ones (if any) used by the seed collection.</Text> 691 </Comment> 692 </NumberedItem> 693 <NumberedItem> 694 <Text id="0341">Copy the images provided in <Path>sample_files → images</Path> into your newly-formed collection.</Text> 695 </NumberedItem> 696 <NumberedItem> 697 <Text id="0342">Change to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> the collection.</Text> 698 </NumberedItem> 699 <NumberedItem> 700 <Text id="0343"><b>Preview</b> the result.</Text> 701 </NumberedItem> 702 <NumberedItem> 703 <Text id="0344">Click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar to view a list of the photos ordered by filename and presented as a thumbnail accompanied by some basic data about the image. The structure of this collection is the same as <b>Simple image collection (image-e)</b>, but the content is different.</Text> 704 </NumberedItem> 705 <NumberedItem> 706 <Text id="0345">Back in the Librarian Interface, change to the <AutoText key="glidict::GUI.Enrich"/> panel and view the extracted metadata for <Path>Bear.jpg</Path>.</Text> 707 </NumberedItem> 708 <Heading> 709 <Text id="0347">Adding a metadata set to the collection</Text> 710 </Heading> 711 <Comment> 712 <Text id="0346">We now add our own metadata and use it to give users a new way to browse the collection. We use the Dublin Core metadata set.</Text> 713 </Comment> 714 <NumberedItem> 715 <Text id="0348">The collection (image-e) on which <b>backdrop</b> is based uses only extracted metadata. To add another metadata set, go to the <AutoText key="glidict::GUI.Design"/> panel of the Librarian Interface and click <AutoText key="glidict::CDM.GUI.MetadataSets"/> in the list on the left (the last one). Then click <AutoText key="glidict::CDM.MetadataSetManager.Add" type="button"/> (lower left button).</Text> 716 </NumberedItem> 717 <NumberedItem> 718 <Text id="0349">In the window that pops up, select <AutoText text="dublin.mds"/> and click <AutoText key="glidict::CDM.MetadataSetManager.Chooser.Add" type="button"/>.</Text> 719 </NumberedItem> 720 <NumberedItem> 721 <Text id="0351">Now switch to the <AutoText key="glidict::GUI.Enrich"/> panel by clicking this tab. The metadata for each file now shows the (empty) Dublin Core <AutoText text="dc."/> fields as well as the extracted <AutoText text="ex."/> fields.</Text> 722 </NumberedItem> 723 <Heading> 724 <Text id="0350a">Adding Title and Description metadata</Text> 725 </Heading> 726 <NumberedItem> 727 <Text id="0352">We work with just the first three files (<Path>Bear.jpg</Path>, <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>) to get a flavour of what is possible. First, set each file's <AutoText key="metadata::dc.Title"/> field to be the same as its filename but without the filename extension:</Text> 728 <Text id="0353">Click on <Path>Bear.jpg</Path> so its metadata fields are available, then click on its <AutoText key="metadata::dc.Title"/> field on the right-hand side. Type in <b>Bear</b>.</Text> 729 <Text id="0355">Repeat the process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>.</Text> 730 </NumberedItem> 731 <NumberedItem> 732 <Text id="0355a">Add a description for each image as <AutoText key="metadata::dc.Description"/> metadata.</Text> 733 <Text id="0372">What description should you enter? To remind yourself of a file's content, the Librarian Interface lets you open files by double-clicking them. It launches the appropriate application based on the filename extension, Word for .doc files, Acrobat for .pdf files and so on.</Text> 734 <Text id="0372a">Double-click <Path>Bear.jpg</Path>: on Windows, the image will normally be displayed by Microsoft's Photo Editor (although this depends on how your computer has been set up).</Text> 735 <Text id="0373">Back in the <AutoText key="glidict::GUI.Enrich"/> pane, make sure that <Path>Bear.jpg</Path> is selected in the collection tree on the left hand side. Enter the text <b>Bear in the Rocky Mountains</b> as the value for the <AutoText key="metadata::dc.Description"/> field.</Text> 736 <Text id="0374">Repeat this process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>, adding a suitable description for each.</Text> 737 </NumberedItem> 738 <Heading> 739 <Text id="0357">Change Format Features to display new metadata</Text> 740 </Heading> 741 <NumberedItem> 742 <Text id="0356">Now we customize the collection's appearance. Building or previewing the collection at this point won't reveal anything new. That's because we haven't changed the design of the collection to take advantage of the new metadata.</Text> 743 </NumberedItem> 744 <NumberedItem> 745 <Text id="0358">Go to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Formats"/> from the left-hand list. Leave the feature selection controls at their default values, so that <AutoText key="glidict::CDM.FormatManager.Feature"/> remains blank and <AutoText text="VList" /> is selected as the <AutoText key="glidict::CDM.FormatManager.Part"/>. In the <AutoText key="glidict::CDM.FormatManager.Editor"/>, edit the text as follows:</Text> 746 <BulletList> 747 <Bullet> 748 <Text id="0359">Change <Format>_ImageName_:</Format> to <Format>Title:</Format></Text> 749 </Bullet> 750 <Bullet> 751 <Text id="0359a">Change <Format>[Image]</Format> to <Format>[dc.Title]</Format></Text> 752 </Bullet> 753 <Bullet> 754 <Text id="0359b">After <Format>[dc.Title]<br></Format> add <Format>Description: [dc.Description]<br></Format></Text> 755 </Bullet> 756 </BulletList> 757 <Comment> 758 <Text id="0360">Metadata names are case-sensitive in Greenstone: it is important that you capitalize "Title" and "Description" (and don't capitalize "dc").</Text> 759 </Comment> 760 </NumberedItem> 761 <NumberedItem> 762 <Text id="0361a">Next click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>. The new format statement will be displayed in the list of assigned format statements. The first substitution alters the fragment of text that appears to the right of the thumbnail image, the second alters the item of metadata that follows it. The addition displays the description after the Title.</Text> 763 </NumberedItem> 764 <NumberedItem> 765 <Text id="0362a">Go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>. Once it has finished building, <b>preview</b> the collection. When you click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar the presentation has changed to "Title: Bear" and so on. Each image's description should appear beside the thumbnail, following the title.</Text> 766 </NumberedItem> 767 <Comment> 768 <Text id="0363">After the first three items, the Title and Description become blank because we have only assigned Dublin Core metadata to these first three. To get a full listing, enter all the metadata.</Text> 769 </Comment> 770 <Comment> 771 <Text id="0364">For some design parameters the collection must be rebuilt before the effect of changes can be seen. However, changes to format statements take place immediately and you can see the result straightaway by clicking <b>reload</b> (or <b>refresh</b>) in the web browser. Above, you were asked to build before previewing because you had added metadata.</Text> 772 </Comment> 773 <Heading> 774 <Text id="0365">Changing the size of image thumbnails</Text> 775 </Heading> 776 <NumberedItem> 777 <Text id="0366">Lets change the size of the thumbnail image and make it smaller. Thumbnail images are created by the <AutoText text="ImagePlug"/> plug-in, so we need to access its configuration settings. To do this, switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Plugins"/> from the list on the left. Double-click <AutoText text="plugin ImagePlug"/> to pop up a window that shows its settings. (Alternatively, select <AutoText text="plugin ImagePlug"/> with a single click and then click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> further down the screen). Currently all options are off, so standard defaults are used. Select <AutoText text="thumbnailsize"/>, set it to <AutoText text="50"/>, and click <AutoText key="glidict::General.OK" type="button"/>.</Text> 778 </NumberedItem> 779 <NumberedItem> 780 <Text id="0367"><b>Build</b> and <b>preview</b> the collection.</Text> 781 </NumberedItem> 782 <NumberedItem> 783 <Text id="0368">Once you have seen the result of the change, return to the <AutoText key="glidict::GUI.Design"/> panel, select the configuration options for <AutoText text="ImagePlug"/>, and switch the <AutoText text="thumbnailsize"/> option off so that the thumbnail reverts to its normal size when the collection is re-built.</Text> 784 </NumberedItem> 785 <Heading> 786 <Text id="0380">Adding a browsing classifier based on Description metadata</Text> 787 </Heading> 788 <NumberedItem> 789 <Text id="0381">Now we'll add a new browsing option based on the descriptions. In the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText key="glidict::CDM.GUI.Classifiers"/> from the left-hand list. Set the menu item for <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> to <AutoText text="AZList" />; then click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>.</Text> 790 </NumberedItem> 791 <NumberedItem> 792 <Text id="0382">A window pops up to control the classifier's options. Set the <AutoText text="metadata"/> option to <AutoText key="metadata::dc.Description"/> and click <AutoText key="glidict::General.OK" type="button"/>.</Text> 793 </NumberedItem> 794 <NumberedItem> 795 <Text id="0382a"><b>Build</b> the collection, and <b>preview</b> it. Choose the new <b>descriptions</b> link that appears in the navigation bar.</Text> 796 </NumberedItem> 797 <Comment> 798 <Text id="0383">Only three items are shown, because only items with the relevant metadata (dc.Description in this case) appear in the list. The original browse list includes all photos in the collection because it is based on <AutoText key="metadata::ex.Image"/>, extracted metadata that reflects an image's filename, which is set for all images in the collection.</Text> 799 </Comment> 800 <Heading> 801 <Text id="0384">Creating a searchable index based on Description metadata</Text> 802 </Heading> 803 <NumberedItem> 804 <Text id="0385">Now we'll add an index so that the collection can be searched by descriptions. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Indexes"/> from the left-hand list. Enter the text "descriptions" as the <AutoText key="glidict::CDM.IndexManager.Index_Name"/>, select <AutoText key="metadata::dc.Description"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> list, and click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>.</Text> 805 </NumberedItem> 806 <NumberedItem> 807 <Text id="0386">Switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, then <b>preview</b> it. There is now a <AutoText key="coredm::_Global:labelSearch_"/> button in the navigation bar. As an example, search for the term "bear" in the <i>descriptions</i> index (which is the only index at this point).</Text> 743 808 </NumberedItem> 744 809 </Content> … … 749 814 </Title> 750 815 <SampleFiles folder="Word_and_PDF"/> 751 <Version initial="2.60" current="2.70 "/>816 <Version initial="2.60" current="2.70w"/> 752 817 <Content> 753 818 <Comment> … … 763 828 <Text id="0287">Switch to the <AutoText key="glidict::GUI.Create"/> panel, and <b>build</b> and <b>preview</b> the collection.</Text> 764 829 </NumberedItem> 765 <Comment>766 <Text id="0287a">Some of the documents don't look very nice in Greenstone. One of them, <Path>pdf05-notext.pdf</Path>, could not be processed using the default configuration. Another, <Path>pdf06-weirdchars.pdf</Path>, was processed but looks very strange. Exercise <TutorialRef id="enhanced_pdf"/> looks at how to configure PDFPlug to handle these files better.</Text>767 </Comment>768 830 <Heading> 769 831 <Text id="0287b">Viewing the extracted metadata</Text> … … 785 847 </Heading> 786 848 <NumberedItem> 787 <Text id="0291a">In the <AutoText key="glidict::GUI.Enrich"/> panel, manually add Dublin Core <AutoText key="metadata::dc.Title"/> metadata to those documents which have incorrect <AutoText key="metadata::ex.Title"/> metadata. Select <Path>word03.doc</Path> and double-click to open it. Copy the title of this document (<AutoText text="Greenstone: A comprehensive open-source digital library software system" type="quoted"/>) and return to the Librarian Interface. Scroll up or down in the metadata table until you can see <AutoText key="metadata::dc.Title"/>. Click in the value box , paste in the metadata and press <b>Enter</b>.</Text>849 <Text id="0291a">In the <AutoText key="glidict::GUI.Enrich"/> panel, manually add Dublin Core <AutoText key="metadata::dc.Title"/> metadata to those documents which have incorrect <AutoText key="metadata::ex.Title"/> metadata. Select <Path>word03.doc</Path> and double-click to open it. Copy the title of this document (<AutoText text="Greenstone: A comprehensive open-source digital library software system" type="quoted"/>) and return to the Librarian Interface. Scroll up or down in the metadata table until you can see <AutoText key="metadata::dc.Title"/>. Click in the value box and paste in the metadata.</Text> 788 850 </NumberedItem> 789 851 <NumberedItem> … … 791 853 </NumberedItem> 792 854 <NumberedItem> 793 <Text id="0292a">Close the document when you have finished copying metadata from it. External programs opened when viewing documents must be closed before building the collection, otherwise errors can occur.</Text>794 </NumberedItem> 795 <NumberedItem> 796 <Text id="0293">Next add <AutoText key="metadata::dc.Title"/> and <AutoText key="metadata::dc.Creator"/> metadata for a few of the other documents , including <Path>pdf05-notext.pdf</Path>.</Text>855 <Text id="0292a">Close the document (in Microsoft Word) when you have finished copying metadata from it. External programs opened when viewing documents must be closed before building the collection, otherwise errors can occur.</Text> 856 </NumberedItem> 857 <NumberedItem> 858 <Text id="0293">Next add <AutoText key="metadata::dc.Title"/> and <AutoText key="metadata::dc.Creator"/> metadata for a few of the other documents.</Text> 797 859 </NumberedItem> 798 860 <NumberedItem> … … 910 972 <Prerequisite id="word_pdf_collection"/> 911 973 <Content> 974 <Comment> 975 <Text id="fw-1a">In this exercise, we play around with the format statements in the Word and PDF collection.</Text> 976 </Comment> 912 977 <NumberedItem> 913 978 <Text id="fw-2">Open the <b>reports</b> collection in the Librarian Interface and go to the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel.</Text> … … 917 982 </Heading> 918 983 <NumberedItem> 919 <Text id="fw-3 ">Greenstone's default format statement is complex because it is designed to produce something reasonable under almost any conditions, and also because for practical reasons it needs to be backwards compatible with legacy collections.</Text>920 984 <Text id="fw-3a">In this part of the exercise, we make the format statement simpler without changing the resulting display.</Text> 985 <Text id="fw-3">Greenstone's default format statement is complex because it is designed to produce something reasonable under almost any conditions, and also because for practical reasons it needs to be backwards compatible with legacy collections. For this collection, we don't need all of the complexity.</Text> 921 986 <Text id="fw-4">The default <AutoText text="VList"/> format statement looks like the following:</Text> 922 987 <Format> … … 927 992 [/highlight]{If}{[ex.Source],<br><i>([ex.Source])</i>}</td> 928 993 </Format> 929 <Text id="fw-5">This format statement is the default used for search results, classifiers, and document table of contents. First we will tidy this up a bit. </Text> 930 994 <Text id="fw-5">This format statement is the default used for any vertical list, such as search results, classifiers, and document table of contents.</Text> 931 995 <Text id="fw-6"><Format>{Or}{[ex.thumbicon],[ex.srcicon]}</Format> chooses <i>ex.thumbicon</i> metadata if its there, otherwise chooses <i>ex.srcicon</i> metadata. If neither are present, nothing is displayed. For this collection there is no <i>ex.thumbicon</i> metadata so the choice is not needed.</Text> 932 933 996 <Text id="fw-7">Replace <Format>{Or}{[ex.thumbicon],[ex.srcicon]}</Format> with <Format>[ex.srcicon]</Format>. </Text> 934 935 997 <Text id="fw-8">There is no <i>dls.Title</i> metadata, so remove that element from <Format>{Or}{[dls.Title],[dc.Title],[ex.Title],Untitled}</Format>.</Text> 936 937 998 <Text id="fw-9">The resulting format statement looks like the following:</Text> 938 999 <Format> … … 943 1004 </Format> 944 1005 <Text id="fw-9a">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 945 <Text id="fw-10">Preview the collection to make sure the display hasn't changed.</Text> 946 947 </NumberedItem> 948 <Heading> 949 <Text id="fw-10a">Linking to Greenstone version or original version</Text> 1006 <Text id="fw-10">Preview the collection to make sure the display hasn't changed. You shouldn't notice any difference when looking at search results, classifiers etc. </Text> 1007 </NumberedItem> 1008 <Heading> 1009 <Text id="fw-10a">Linking to Greenstone version or original version of documents</Text> 950 1010 </Heading> 951 1011 <NumberedItem> 952 1012 <Text id="fw-11">For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default <AutoText text="VList"/> format statement links to both versions:</Text> 953 954 1013 <Text id="fw-12"><Format>[link][icon][/link]</Format> links to the Greenstone HTML version, while <Format>[srclink][srcicon][/srclink]</Format> links to the original.</Text> 955 956 <Text id="fw-13">Choose <AutoText text="SearchVList"/> in <AutoText key="glidict::CDM.GUI.Formats"/> by selecting <AutoText text="Search"/> from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Experiment with removing either of the two links from the format statement. Storing and displaying the original allows users to see the correct format, but requires the user to have the relevant program installed. It also increases the size of the collection. The Greenstone version can be viewed in a browser, but may not look as nice.</Text> 957 958 </NumberedItem> 959 <Heading> 960 <Text id="fw-13a">Making bookshelves show how many items they contain</Text> 961 </Heading> 962 <NumberedItem> 963 <Text id="fw-14">Next, we'll customize the format for the <AutoText key="coredm::_labelCreator_" type="italics"/> list. Classifier nodes have only a few pieces of metadata to display: <Format>[ex.Title]</Format> and <Format>[numleafdocs]</Format>. Whatever metadata the classifier has been built on, the node label is always stored as <Format>[ex.Title]</Format>. This is why a Creator is printed out for each bookshelf node even though <i>dc.Creator</i> is not specified in the format statement. <Format>[numleafdocs]</Format> is only defined for bookshelf nodes, so this metadata can be used in an <Format>{If}</Format> statement to make bookshelf nodes and document nodes display differently.</Text> 964 965 </NumberedItem> 966 <NumberedItem> 967 <Text id="fw-15">Make each bookshelf node in the Creator classifier show how many entries it contains. In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the dc.Creator <AutoText text="AZCompactList"/> classifier from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Append the following: </Text> 1014 <Text id="fw-13">Choose <AutoText text="SearchVList"/> in <AutoText key="glidict::CDM.GUI.Formats"/> by selecting <AutoText text="Search"/> from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/> to add the <AutoText text="SearchVList"/> format statement into the list of assigned formats. Experiment with removing either of the two links from the format statement. (Remember to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> after any changes.)</Text> 1015 <Text id="fw-13a">To see the results of your changes, preview the collection and do a search. You are making changes to <AutoText text="SearchVList"/>, which means the changes will only apply to search results.</Text> 1016 <Text id="fw-13b">Storing and displaying the original allows users to see the correct format, but requires the user to have the relevant program installed. It also increases the size of the collection. The Greenstone version can be viewed in a browser, but may not look as nice.</Text> 1017 </NumberedItem> 1018 <Heading> 1019 <Text id="fw-14a">Making bookshelves show how many items they contain</Text> 1020 </Heading> 1021 <NumberedItem> 1022 <Text id="fw-14">Next, we'll customize the format for the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list. Classifier bookshelves have only a few pieces of metadata to display: <Format>[ex.Title]</Format> and <Format>[numleafdocs]</Format>. Whatever metadata the classifier has been built on, the bookshelf label is always stored as <Format>[ex.Title]</Format>. This is why a Creator is printed out for each bookshelf even though <Format>[dc.Creator]</Format> is not specified in the format statement. <Format>[numleafdocs]</Format> is only defined for bookshelves, so this metadata can be used in an <Format>{If}</Format> statement to make bookshelves and documents display differently in the list.</Text> 1023 <Text id="fw-15">Make each bookshelf in the Creator classifier show how many entries it contains. In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="CL2 AZCompactList"/> classifier which is based on <AutoText key="metadata::dc.Creator"/> metadata from the <AutoText key="glidict::CDM.FormatManager.Feature"/> drop down list, and <AutoText text="VList"/> from the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Click the <AutoText key="glidict::CDM.FormatManager.Add" type="button"/> button to add this format into the list of assigned formats. Note that it gets added as <AutoText text="CL2VList"/> in this list: its the <AutoText text="VList"/> format for the second (<AutoText text="CL2"/>) classifier.</Text> 1024 <Text id="fw15a">Append the following text and click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>:</Text> 968 1025 <Format> 969 1026 {If}{[numleafdocs],<td><i>([numleafdocs])</i></td>} 970 1027 </Format> 971 <Text id="fw-16">Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>, switch to the <AutoText key="glidict::GUI.Create"/> panel, and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/> (no need to rebuild). Preview the <AutoText key="coredm::_labelCreator_" type="italics"/> list.</Text>972 <Text id="fw-17">This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. Since only bookshel f nodes define <Format>[numleafdocs]</Format>, only these nodeswill display this. By modifying <AutoText text="CL2VList"/> instead of <AutoText text="VList"/>, the change will only apply to the second classifier (Creators).</Text>1028 <Text id="fw-16">Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>, switch to the <AutoText key="glidict::GUI.Create"/> panel, and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/> (no need to rebuild). Click on the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list and notice that the bookshelves now display how many documents they contain.</Text> 1029 <Text id="fw-17">This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. Since only bookshelves define <Format>[numleafdocs]</Format>, only they will display this. By modifying <AutoText text="CL2VList"/> instead of <AutoText text="VList"/>, the change will only apply to the second classifier (Creators).</Text> 973 1030 </NumberedItem> 974 1031 <Heading> … … 976 1033 </Heading> 977 1034 <NumberedItem> 978 <Text id="fw-18">Next we modify the document nodes in the Creator classifier to display all authors. Back in <AutoText key="glidict::CDM.GUI.Formats"/>, select the <AutoText text="CL2VList"/> format in the list of assigned formats. After <Format>{If}{[ex.Source],<br></Format> in the format statement, add <Format>[sibling:dc.Creator]</Format>.</Text>979 <Text id="fw-19"><Format>[ex.Source]</Format> is not defined for bookshel f nodes, so can also be used to differentiate bookshelves and documents.</Text>1035 <Text id="fw-18">Next we modify the document entries in the Creator classifier to display all authors. Back in <AutoText key="glidict::CDM.GUI.Formats"/>, select the <AutoText text="CL2VList"/> format in the list of assigned formats. After <Format>{If}{[ex.Source],<br></Format> in the format statement, add <Format>[sibling:dc.Creator]</Format>. Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 1036 <Text id="fw-19"><Format>[ex.Source]</Format> is not defined for bookshelves, so can also be used to differentiate bookshelves and documents.</Text> 980 1037 <Text id="fw-20">The resulting format statement looks like:</Text> 981 1038 <Format> … … 988 1045 {If}{[numleafdocs],<td><i>([numleafdocs])</i></td>} 989 1046 </Format> 990 <Text id="fw-21">This will display the Greenstone link, the link to the original, then the Title. For bookshelf nodes, it will also display how many documents the bookshelf contains. For document nodes, it will display all the Authors (Creators), and the source document. <Format>[sibling:dc.Creator]</Format> displays all the Creator metadata for the document, separated by a space (<AutoText text=" " type="quoted"/>). Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list.</Text> 991 <Text id="fw-22">Change the separator between the authors. Modify the format statement, and replace <Format>[sibling:dc.Creator]</Format> with <Format>[sibling(All'<br/>'):dc.Creator]</Format>. This will add a new line after each author. Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list.</Text> 992 <Text id="fw-23">If you have done exercise <TutorialRef id="enhanced_word"/>, the collection will have both dc.Creator and ex.Creator metadata. To display both, you can use <Format>[sibling:dc.Creator] [sibling:ex.Creator]</Format>, or to display dc.Creator if its present, otherwise display ex.Creator, use <Format>{Or}{[sibling:dc.Creator],[sibling:ex.Creator]}</Format>.</Text> 1047 <Text id="fw-21">This will display the Greenstone link, the link to the original, then the Title. For bookshelves, it will also display how many documents the bookshelf contains. For documents, it will display all the Authors (Creators), and the source document. <Format>[sibling:dc.Creator]</Format> displays all the Creator metadata for the document, separated by a space (<AutoText text=" " type="quoted"/>). Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list and make sure that all authors are displayed for documents.</Text> </NumberedItem> 1048 <NumberedItem> 1049 <Text id="fw-22">You can change the separator between the authors. Modify the format statement, and replace <Format>[sibling:dc.Creator]</Format> with <Format>[sibling(All'<br/>'):dc.Creator]</Format>. This will add a new line after each author (<Format><br/></Format> specifies a line break in HTML). Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>. Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list.</Text> 1050 <Text id="fw-23">If you have done exercise <TutorialRef id="enhanced_word"/>, the collection will have both dc.Creator and ex.Creator metadata. To display both, you can use </Text> 1051 <Format> 1052 [sibling:dc.Creator] [sibling:ex.Creator] 1053 </Format> 1054 <Text id="fw-23a">To display dc.Creator if its present, otherwise display ex.Creator, use</Text> 1055 <Format> 1056 {Or}{[sibling:dc.Creator],[sibling:ex.Creator]} 1057 </Format> 993 1058 </NumberedItem> 994 1059 </Content> … … 998 1063 <Text id="ep-1">Enhanced PDF handling</Text> 999 1064 </Title> 1000 < Prerequisite id="word_pdf_collection"/>1001 <Version initial="2.70" current="2.70 "/>1065 <SampleFiles folder="Word_and_PDF"/> 1066 <Version initial="2.70" current="2.70w"/> 1002 1067 <Content> 1003 1068 <Text id="ep-2">Greenstone converts PDF files to HTML using third-party software: <AutoText text="pdftohtml.pl" type="italics"/>. This lets users view these documents even if they don't have the PDF software installed. Unfortunately, sometimes the formatting of the resulting HTML files is not so good.</Text> 1004 1069 <Text id="ep-3">This exercise explores some extra options to the PDF plugin which may produce a nicer version for display. Some of these options use the standard pdftohtml program, others use ImageMagick and Ghostscript to convert the file to a series of images. Ghostscript is a program that can convert Postscript and PDF files to other formats. You can download it from <Link>http://www.cs.wisc.edu/~ghost/</Link> (follow the link to the current stable release).</Text> 1005 1070 <NumberedItem> 1006 <Text id="ep-3a">In the Librarian Interface, open up the <b>reports</b> collection created in the <TutorialRef id="word_pdf_collection"/> exercise. Rebuild the collection and examine the output. You will notice that one of the documents could not be processed. The following messages are shown: "The file pdf05-notext.pdf was recognised but could not be processed by any plugin.", and "15 documents were processed and included in the collection. 1 was rejected".</Text> 1007 </NumberedItem> 1008 <NumberedItem> 1009 <Text id="ep-4">Preview the collection and view the documents. <Path>pdf05-notext.pdf</Path> does not appear. Note that the other PDF documents appear as one long document, with no sections. </Text> 1071 <Text id="ep-3a">In the Librarian Interface, start a new collection called "PDF collection" and base it on <AutoText key="glidict::NewCollectionPrompt.NewCollection"/>.</Text> 1072 <Text id="ep-3b">In the <AutoText key="glidict::GUI.Gather"/> panel, drag just the PDF documents from <Path>sample_files → Word_and_PDF → Documents</Path> into the new collection. Also drag in the PDF documents from <Path>sample_files → Word_and_PDF → difficult_pdf</Path>.</Text> 1073 <Text id="ep-3c">Go to the <AutoText key="glidict::GUI.Create"/> panel and build the collection. Examine the output from the build process. You will notice that one of the documents could not be processed. The following messages are shown: "The file pdf05-notext.pdf was recognised but could not be processed by any plugin.", and "15 documents were processed and included in the collection. 1 was rejected".</Text> 1074 </NumberedItem> 1075 <NumberedItem> 1076 <Text id="ep-4">Preview the collection and view the documents. <Path>pdf05-notext.pdf</Path> does not appear as it could not be processed. <Path>pdf06-weirdchars.pdf</Path> was processed but looks very strange. The other PDF documents appear as one long document, with no sections. </Text> 1010 1077 </NumberedItem> 1011 1078 <Heading> … … 1016 1083 </Comment> 1017 1084 <NumberedItem> 1018 <Text id="0335">Use the <AutoText key="glidict::Menu.File_Options"/> item on the <AutoText key="glidict::Menu.File"/> menu to switch to <AutoText key="glidict::Preferences.Mode.Expert"/> mode and then build the collection again. The <AutoText key="glidict::GUI.Create"/> panel looks different in <AutoText key="glidict::Preferences.Mode.Expert"/> mode because it gives more options: locate the <AutoText key="glidict::CreatePane.Build_Collection" type="button"/> button, near the bottom of the window, and click it. Now a message appears saying that the file could not be processed, and why. Amongst all the output, we get the following message: "Error: PDF contains no extractable text. Could not convert pdf05notext.pdf to HTML format". pdftohtml.pl toconvert a PDF file to HTML if the PDF file has no extractable text.</Text>1085 <Text id="0335">Use the <AutoText key="glidict::Menu.File_Options"/> item on the <AutoText key="glidict::Menu.File"/> menu to switch to <AutoText key="glidict::Preferences.Mode.Expert"/> mode and then build the collection again. The <AutoText key="glidict::GUI.Create"/> panel looks different in <AutoText key="glidict::Preferences.Mode.Expert"/> mode because it gives more options: locate the <AutoText key="glidict::CreatePane.Build_Collection" type="button"/> button, near the bottom of the window, and click it. Now a message appears saying that the file could not be processed, and why. Amongst all the output, we get the following message: "Error: PDF contains no extractable text. Could not convert pdf05notext.pdf to HTML format". pdftohtml.pl cannot convert a PDF file to HTML if the PDF file has no extractable text.</Text> 1019 1086 </NumberedItem> 1020 1087 <NumberedItem> … … 1039 1106 </NumberedItem> 1040 1107 <NumberedItem> 1041 <Text id="ep-14"> Build the collection and preview. All PDF documents have been processed and divided into sections, but each section displays <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. For the conversion to images for PDF documents, no text is extracted. </Text>1108 <Text id="ep-14"><b>Build</b> the collection and <b>preview</b>. All PDF documents have been processed and divided into sections, but each section displays <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. For the conversion to images for PDF documents, no text is extracted. </Text> 1042 1109 </NumberedItem> 1043 1110 <NumberedItem> 1044 1111 <Text id="ep-15">In order to view the documents properly, you will need to modify the format statement. In the <AutoText key="glidict::CDM.GUI.Formats"/> section on the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. Replace </Text> 1045 1046 1112 <Format> 1047 1113 [Text] … … 1049 1115 <Text id="ep-16">with</Text> 1050 1116 <Format> 1117 [srcicon] 1118 </Format> 1119 </NumberedItem> 1120 <NumberedItem> 1121 <Text id="ep-18">Preview the collection. Images from the document are now displayed instead of the extracted text. Both <Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars.pdf</Path> display nicely now.</Text> 1122 <Comment> 1123 <Text id="ep-17">In this collection, we only have PDF documents and they have all been converted to images. If we had other document types in the collection, we should use a different format statement, such as:</Text> 1124 <Format> 1051 1125 {If}{[parent:FileFormat] eq PDF,[srcicon],[Text]} 1052 1126 </Format> 1053 1054 <Text id="ep-17">Because the other documents in the collection do not use images, we only want to show images for PDF documents. <AutoText text="FileFormat"/> is an extracted metadata item which shows the format of the source document. We use this to test whether the documents are PDF or not.</Text> 1055 1056 </NumberedItem> 1057 <NumberedItem> 1058 <Text id="ep-18">Preview the collection from the <AutoText key="glidict::GUI.Create"/> panel. (There is no need to build it). Images from the document are now displayed instead of the extracted text. Both <Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars.pdf</Path> display nicely now. Make sure that the word documents still display properly. </Text> 1127 <Text id="ep-17a"><AutoText text="FileFormat"/> is an extracted metadata item which shows the format of the source document. We can use this to test whether the documents are PDF or not: for PDF documents, display [srcicon], for other documents, display [Text].</Text> 1128 </Comment> 1059 1129 </NumberedItem> 1060 1130 <Heading> … … 1066 1136 <NumberedItem> 1067 1137 <Text id="ep-21">We achieve this by adding two <AutoText text="PDFPlug"/> plugins to the collection, with different options. Currently, the Librarian Interface does not allow you to add the same plugin twice to the collection (with the exception of <AutoText text="UnknownPlug"/>). You will need to edit the collection configuration file by hand.</Text> 1068 <Text id="ep-21a">Close the reports collection in the Librarian Interface. Then open <Path>Greenstone → collect → reports→ etc → collect.cfg</Path> using a text editor, e.g. WordPad. In the list of plugins, add another <AutoText text="PDFPlug"/>, i.e.</Text>1138 <Text id="ep-21a">Close the collection in the Librarian Interface. Then open <Path>Greenstone → collect → pdfcolle → etc → collect.cfg</Path> using a text editor, e.g. WordPad. In the list of plugins, add another <AutoText text="PDFPlug"/>, i.e.</Text> 1069 1139 <Format> 1070 1140 plugin PDFPlug … … 1075 1145 <NumberedItem> 1076 1146 <Text id="ep-23">Open up the collection again in the Librarian Interface, and go to the <AutoText key="glidict::GUI.Gather"/> panel. Make a new folder called <AutoText text="notext" type="quoted"/>: right click in the collection panel and select <AutoText key="glidict::CollectionPopupMenu.New_Folder"/> from the menu. Change the <AutoText key="glidict::NewFolderOrFilePrompt.Folder_Name"/> to <AutoText text="notext" type="quoted"/>, and click <AutoText key="glidict::General.OK" type="button"/>.</Text> 1077 <Text id="ep-23a">Move the two pdf files that have problems with html (<Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars</Path>.pdf 1147 <Text id="ep-23a">Move the two pdf files that have problems with html (<Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars</Path>.pdf) into this folder by drag and drop. We will set up the plugins so that PDF files in this <Path>notext</Path> folder are processed differently to the other PDF files.</Text> 1078 1148 </NumberedItem> 1079 1149 <NumberedItem> … … 1090 1160 plugin PDFPlug -convert_to html -use_sections 1091 1161 </Format> 1092 1093 1162 <Text id="ep-27">The <AutoText text="paged_img" type="italics"/> version must come earlier in the list than the <AutoText text="html" type="italics"/> version. The <AutoText text="process_exp"/> for the first <AutoText text="PDFPlug"/> will process any PDF files in the <Path>notext</Path> directory. The second <AutoText text="PDFPlug"/> will process any PDF files that are not processed by the first one.</Text> 1094 1095 1163 <Text id="ep-28">Note that all plugins have the <AutoText text="process_exp"/> option, and this can be used to customize which documents are processed by which plugin. This option is only visible in <AutoText key="glidict::Preferences.Mode.Systems"/> and <AutoText key="glidict::Preferences.Mode.Expert"/> modes.</Text> 1096 1164 <Text id="ep-29">Change back to <AutoText key="glidict::Preferences.Mode.Librarian"/> mode.</Text> 1097 1165 </NumberedItem> 1098 1166 <NumberedItem> 1099 <Text id="ep-30">Edit the <AutoText text="DocumentText"/> format statement. PDF files processed as HTML will not have images to display, so we need to make sure they get text displayed instead.</Text> 1100 <Text id="ep-1">Change the first <Format>[srcicon]</Format> element in the following part with <Format>{Or}{[srcicon],[Text]}</Format>, i.e. change</Text> 1101 <Format> 1102 {If}{[parent:FileFormat] eq PDF,[srcicon],[Text]} 1103 </Format> 1104 <Text id="ep-32">to</Text> 1105 <Format> 1106 {If}{[parent:FileFormat] eq PDF, {Or}{[srcicon],[Text]},[Text]} 1107 </Format> 1108 </NumberedItem> 1109 <NumberedItem> 1110 <Text id="ep-33">Build and preview the collection. All PDF documents should look relatively nice. Try searching this collection. You will be able to locate the PDFs that were converted to HTML (try e.g. <AutoText text="bibliography" type="quoted"/>), but not the ones that were converted to images (try searching for <AutoText text="banana" type="quoted"/> or <AutoText text="METS" type="quoted"/>).</Text> 1167 <Text id="ep-30">Edit the <AutoText text="DocumentText"/> format statement. PDF files processed as HTML will not have images to display, so we need to make sure they get text displayed instead. Change <Format>[srcicon]</Format> to <Format>{Or}{[srcicon],[Text]}</Format>.</Text> 1168 </NumberedItem> 1169 <NumberedItem> 1170 <Text id="ep-33">Build and preview the collection. All PDF documents should look relatively nice. Try searching this collection. You will be able to search for the PDFs that were converted to HTML (try e.g. <AutoText text="bibliography" type="quoted"/>), but not the ones that were converted to images (try searching for <AutoText text="banana" type="quoted"/> or <AutoText text="METS" type="quoted"/>).</Text> 1111 1171 </NumberedItem> 1112 1172 </Content> … … 1114 1174 <Tutorial id="enhanced_word"> 1115 1175 <Title> 1116 <Text id="ew- ">Enhanced Word document handling</Text>1176 <Text id="ew-a">Enhanced Word document handling</Text> 1117 1177 </Title> 1118 1178 <Content> … … 1235 1295 </Content> 1236 1296 </Tutorial> 1237 <Tutorial id="simple_image_collection">1238 <Title>1239 <Text id="0337">A simple image collection</Text>1240 </Title>1241 <SampleFiles folder="images"/>1242 <Version initial="2.60" current="2.70"/>1243 <Content>1244 <NumberedItem>1245 <Text id="0338">In the Librarian Interface, start a new collection (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>) called <b>backdrop</b>. Fill out the fields with appropriate information. For <AutoText key="glidict::NewCollectionPrompt.Base_Collection"/>, select the item <b>Simple image collection (image-e)</b> from the pull-down menu.</Text>1246 <Comment>1247 <Text id="0340a">When you base a collection on an existing one, it inherits all the settings of the old one. You won't be asked to choose a metadata set because the new collection inherits the ones (if any) used by the seed collection.</Text>1248 </Comment>1249 </NumberedItem>1250 <NumberedItem>1251 <Text id="0341">Copy the images provided in <Path>sample_files → images</Path> into your newly-formed collection.</Text>1252 </NumberedItem>1253 <NumberedItem>1254 <Text id="0342">Change to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> the collection.</Text>1255 </NumberedItem>1256 <NumberedItem>1257 <Text id="0343"><b>Preview</b> the result.</Text>1258 </NumberedItem>1259 <NumberedItem>1260 <Text id="0344">Click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar to view a list of the photos ordered by filename and presented as a thumbnail accompanied by some basic data about the image. The structure of this collection is the same as <b>Simple image collection (image-e)</b>, but the content is different.</Text>1261 </NumberedItem>1262 <NumberedItem>1263 <Text id="0345">Back in the Librarian Interface, change to the <AutoText key="glidict::GUI.Enrich"/> panel and view the extracted metadata for <Path>Bear.jpg</Path>.</Text>1264 </NumberedItem>1265 <Heading>1266 <Text id="0347">Adding a metadata set to the collection</Text>1267 </Heading>1268 <Comment>1269 <Text id="0346">We now add our own metadata and use it to give users a new way to browse the collection. We use the Dublin Core metadata set.</Text>1270 </Comment>1271 <NumberedItem>1272 <Text id="0348">The collection (image-e) on which <b>backdrop</b> is based uses only extracted metadata. To add another metadata set, go to the <AutoText key="glidict::GUI.Design"/> panel of the Librarian Interface and click <AutoText key="glidict::CDM.GUI.MetadataSets"/> in the list on the left (the last one). Then click <AutoText key="glidict::CDM.MetadataSetManager.Add" type="button"/> (lower left button).</Text>1273 </NumberedItem>1274 <NumberedItem>1275 <Text id="0349">In the window that pops up, select <AutoText text="dublin.mds"/> and click <AutoText key="glidict::CDM.MetadataSetManager.Chooser.Add" type="button"/>.</Text>1276 </NumberedItem>1277 <NumberedItem>1278 <Text id="0351">Now switch to the <AutoText key="glidict::GUI.Enrich"/> panel by clicking this tab. The metadata for each file now shows the (empty) Dublin Core <AutoText text="dc."/> fields as well as the extracted <AutoText text="ex."/> fields.</Text>1279 </NumberedItem>1280 <Heading>1281 <Text id="0350a">Adding Title and Description metadata</Text>1282 </Heading>1283 <NumberedItem>1284 <Text id="0352">We work with just the first three files (<Path>Bear.jpg</Path>, <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>) to get a flavour of what is possible. First, set each file's <AutoText key="metadata::dc.Title"/> field to be the same as its filename but without the filename extension:</Text>1285 <Text id="0353">Click on <Path>Bear.jpg</Path> so its metadata fields are available, then click on its <AutoText key="metadata::dc.Title"/> field on the right-hand side. Type in <b>Bear</b>, and click <b>Enter</b>.</Text>1286 <Text id="0355">Repeat the process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>.</Text>1287 </NumberedItem>1288 <NumberedItem>1289 <Text id="0355a">Add a description for each image as <AutoText key="metadata::dc.Description"/> metadata.</Text>1290 <Text id="0372">What description should you enter? To remind yourself of a file's content, the Librarian Interface lets you open files by double-clicking them. It launches the appropriate application based on the filename extension, Word for .doc files, Acrobat for .pdf files and so on.</Text>1291 <Text id="0372a">Double-click <Path>Bear.jpg</Path>: on Windows, the image will normally be displayed by Microsoft's Photo Editor (although this depends on how your computer has been set up).</Text>1292 <Text id="0373">Back in the <AutoText key="glidict::GUI.Enrich"/> pane, make sure that <Path>Bear.jpg</Path> is selected in the collection tree on the left hand side. Enter the text <b>Bear in the Rocky Mountains</b> as the value for the <AutoText key="metadata::dc.Description"/> field and press <b>Enter</b> to have it added.</Text>1293 <Text id="0374">Repeat this process for <Path>Cat.jpg</Path> and <Path>Cheetah.jpg</Path>, adding a suitable description for each.</Text>1294 </NumberedItem>1295 <Heading>1296 <Text id="0357">Change Format Features to display new metadata</Text>1297 </Heading>1298 <NumberedItem>1299 <Text id="0356">Now we customize the collection's appearance. Building or previewing the collection at this point won't reveal anything new. That's because we haven't changed the design of the collection to take advantage of the new metadata.</Text>1300 </NumberedItem>1301 <NumberedItem>1302 <Text id="0358">Go to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Formats"/> from the left-hand list. Leave the feature selection controls at their default values, so that <AutoText key="glidict::CDM.FormatManager.Feature"/> remains blank and <AutoText text="VList" /> is selected as the <AutoText key="glidict::CDM.FormatManager.Part"/>. In the <AutoText key="glidict::CDM.FormatManager.Editor"/>, edit the text as follows:</Text>1303 <BulletList>1304 <Bullet>1305 <Text id="0359">Change <Format>_ImageName_:</Format> to <Format>Title:</Format></Text>1306 </Bullet>1307 <Bullet>1308 <Text id="0359a">Change <Format>[Image]</Format> to <Format>[dc.Title]</Format></Text>1309 </Bullet>1310 <Bullet>1311 <Text id="0359b">After <Format>[dc.Title]<br></Format> add <Format>Description: [dc.Description]<br></Format></Text>1312 </Bullet>1313 </BulletList>1314 <Comment>1315 <Text id="0360">Metadata names are case-sensitive in Greenstone: it is important that you capitalize "Title" and "Description" (and don't capitalize "dc").</Text>1316 </Comment>1317 </NumberedItem>1318 <NumberedItem>1319 <Text id="0361a">Next click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>. The new format statement will be displayed in the list of assigned format statements. The first substitution alters the fragment of text that appears to the right of the thumbnail image, the second alters the item of metadata that follows it. The addition displays the description after the Title.</Text>1320 </NumberedItem>1321 <NumberedItem>1322 <Text id="0362a">Go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>. Once it has finished building, <b>preview</b> the collection. When you click on <AutoText key="coredm::_Global:labelBrwse_"/> in the navigation bar the presentation has changed to "Title: Bear" and so on. Each image's description should appear beside the thumbnail, following the title.</Text>1323 </NumberedItem>1324 <Comment>1325 <Text id="0363">After the first three items, the Title and Description become blank because we have only assigned Dublin Core metadata to these first three. To get a full listing, enter all the metadata.</Text>1326 </Comment>1327 <Comment>1328 <Text id="0364">For some design parameters the collection must be rebuilt before the effect of changes can be seen. However, changes to format statements take place immediately and you can see the result straightaway by clicking <b>reload</b> (or <b>refresh</b>) in the web browser. Above, you were asked to build before previewing because you had added metadata.</Text>1329 </Comment>1330 <Heading>1331 <Text id="0365">Changing the size of image thumbnails</Text>1332 </Heading>1333 <NumberedItem>1334 <Text id="0366">Lets change the size of the thumbnail image and make it smaller. Thumbnail images are created by the <AutoText text="ImagePlug"/> plug-in, so we need to access its configuration settings. To do this, switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Plugins"/> from the list on the left. Double-click <AutoText text="plugin ImagePlug"/> to pop up a window that shows its settings. (Alternatively, select <AutoText text="plugin ImagePlug"/> with a single click and then click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/> further down the screen). Currently all options are off, so standard defaults are used. Select <AutoText text="thumbnailsize"/>, set it to <AutoText text="50"/>, and click <AutoText key="glidict::General.OK" type="button"/>.</Text>1335 </NumberedItem>1336 <NumberedItem>1337 <Text id="0367"><b>Build</b> and <b>preview</b> the collection.</Text>1338 </NumberedItem>1339 <NumberedItem>1340 <Text id="0368">Once you have seen the result of the change, return to the <AutoText key="glidict::GUI.Design"/> panel, select the configuration options for <AutoText text="ImagePlug"/>, and switch the <AutoText text="thumbnailsize"/> option off so that the thumbnail reverts to its normal size when the collection is re-built.</Text>1341 </NumberedItem>1342 <Heading>1343 <Text id="0380">Adding a browsing classifier based on Description metadata</Text>1344 </Heading>1345 <NumberedItem>1346 <Text id="0381">Now we'll add a new browsing option based on the descriptions. In the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText key="glidict::CDM.GUI.Classifiers"/> from the left-hand list. Set the menu item for <AutoText key="glidict::CDM.ClassifierManager.Classifier"/> to <AutoText text="AZList" />; then click <AutoText key="glidict::CDM.ClassifierManager.Add" type="button"/>.</Text>1347 </NumberedItem>1348 <NumberedItem>1349 <Text id="0382">A window pops up to control the classifier's options. Set the <AutoText text="metadata"/> option to <AutoText key="metadata::dc.Description"/> and click <AutoText key="glidict::General.OK" type="button"/>.</Text>1350 </NumberedItem>1351 <NumberedItem>1352 <Text id="0382a"><b>Build</b> the collection, and <b>preview</b> it. Choose the new <b>descriptions</b> link that appears in the navigation bar.</Text>1353 </NumberedItem>1354 <Comment>1355 <Text id="0383">Only three items are shown, because only items with the relevant metadata (dc.Description in this case) appear in the list. The original browse list includes all photos in the collection because it is based on <AutoText key="metadata::ex.Image"/>, extracted metadata that reflects an image's filename, which is set for all images in the collection.</Text>1356 </Comment>1357 <Heading>1358 <Text id="0384">Creating a searchable index based on Description metadata</Text>1359 </Heading>1360 <NumberedItem>1361 <Text id="0385">Now we'll add an index so that the collection can be searched by descriptions. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select <AutoText key="glidict::CDM.GUI.Indexes"/> from the left-hand list. Enter the text "descriptions" as the <AutoText key="glidict::CDM.IndexManager.Index_Name"/>, select <AutoText key="metadata::dc.Description"/> from the <AutoText key="glidict::CDM.IndexManager.Source"/> list, and click <AutoText key="glidict::CDM.IndexManager.Add_Index" type="button"/>.</Text>1362 </NumberedItem>1363 <NumberedItem>1364 <Text id="0386">Switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, then <b>preview</b> it. There is now a <AutoText key="coredm::_Global:labelSearch_"/> button in the navigation bar. As an example, search for the term "bear" in the <i>descriptions</i> index (which is the only index at this point).</Text>1365 </NumberedItem>1366 </Content>1367 </Tutorial>1368 1297 <Tutorial id="export_to_CDROM"> 1369 1298 <Title> 1370 1299 <Text id="0403">Exporting a collection to CD-ROM/DVD</Text> 1371 1300 </Title> 1372 <Prerequisite id="large_html_collection"/> 1373 <Version initial="2.60" current="2.70"/> 1301 <Version initial="2.60" current="2.70w"/> 1374 1302 <Content> 1375 1303 <Comment> … … 1391 1319 </Content> 1392 1320 </Tutorial> 1393 <Tutorial id=" downloading_from_internet">1321 <Tutorial id="large_html_collection"> 1394 1322 <Title> 1395 <Text id="0 411">Downloading files from the web</Text>1323 <Text id="0387">A large collection of HTML files—Tudor</Text> 1396 1324 </Title> 1397 < Prerequisite id="large_html_collection"/>1398 <Version initial="2.60" current="2.70 "/>1325 <SampleFiles folder="tudor"/> 1326 <Version initial="2.60" current="2.70w"/> 1399 1327 <Content> 1400 <Comment> 1401 <Text id="0412">The Greenstone Librarian Interface's Download panel allows you to download individual files, parts of websites, and indeed whole websites, from the web.</Text> 1402 </Comment> 1403 <NumberedItem> 1404 <Text id="0413">Start a new collection called <b>webtudor</b>, and base it on <AutoText key="glidict::NewCollectionPrompt.NewCollection"/></Text> 1405 </NumberedItem> 1406 <NumberedItem> 1407 <Text id="0414">In a web browser, visit <Link>http://englishhistory.net</Link>, follow the link to <i>Tudor England</i>, and click <<b>Enter</b>>. You should be at the URL</Text> 1408 <Link>http://englishhistory.net/tudor/contents.html</Link> 1409 <Text id="0415">This is where we started the downloading process to obtain the files you have been using for the <b>tudor</b> collection. You could do the same thing by copying this URL from the web browser, pasting it into the <AutoText key="glidict::GUI.Download"/> panel, and clicking the <AutoText key="glidict::Mirroring.Download" type="button"/> button. However, several megabytes will be downloaded, which might strain your network resources—or your patience! For a faster exercise we focus on a smaller section of the site. </Text> 1410 </NumberedItem> 1411 <NumberedItem> 1412 <Text id="0415a">In the <AutoText key="glidict::GUI.Download"/> panel, enter this URL</Text> 1413 <Link>http://englishhistory.net/tudor/citizens/</Link> 1414 <Text id="0417">into the <AutoText key="glidict::Mirroring.Source_URL"/> box. There are several options that govern how the download process proceeds. To copy just the <i>citizens</i> section of the website, select <AutoText key="glidict::Mirroring.Higher_Directories"/>. If you don't do this (or if you miss out the terminating "/"), the downloading process will follow links to other areas of the <i>englishhistory.net</i> website and grab those as well. Set <AutoText key="glidict::Mirroring.Download_Depth"/> to <AutoText key="glidict::Mirroring.Download_Depth.Unlimited"/>—we want to follow as many links as necessary to download all the pages.</Text> 1415 </NumberedItem> 1416 <NumberedItem> 1417 <Text id="0417a">If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Open the <AutoText key="glidict::Preferences.Connection"/> tab in <Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_Options"/></Menu> and switch on the <AutoText key="glidict::Preferences.Connection.Use_Proxy"/> checkbox. Enter the proxy server address and port number in the <AutoText key="glidict::Preferences.Connection.Proxy_Host"/> and <AutoText key="glidict::Preferences.Connection.Proxy_Port"/> boxes. Click <AutoText key="General.OK" type="button"/>.</Text> 1418 </NumberedItem> 1419 <NumberedItem> 1420 <Text id="0418">Now click <AutoText key="glidict::Mirroring.Download" type="button"/>. If you have set proxy information in <AutoText key="glidict::Menu.File_Options"/>, a popup will ask for you user name and password. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing.</Text> 1421 <Comment> 1422 <Text id="0419">More detailed information can be obtained by clicking <AutoText key="glidict::Mirroring.DownloadJob.Log" type="button"/>. The process can be paused and restarted as needed, or stopped altogether by clicking <AutoText key="glidict::Mirroring.DownloadJob.Close" type="button"/>. Downloading can be a lengthy process involving multiple sites, and so Greenstone allows additional downloads to be queued up. When new URLs are pasted into the <AutoText key="glidict::Mirroring.Source_URL"/> box and <AutoText key="glidict::Mirroring.Download" type="button"/> clicked, a new progress bar is appended to those already present in the lower half of the panel. When the currently active download item completes, the next is started automatically.</Text> 1423 </Comment> 1424 </NumberedItem> 1425 <NumberedItem> 1426 <Text id="0420">Downloaded files are stored in a top-level folder called <AutoText key="glidict::Tree.DownloadedFiles"/> that appears on the left-hand side of the <AutoText key="glidict::GUI.Gather"/> panel. You may not need all the downloaded files, and you choose which you want by dragging selected files from this folder over into the collection area on the right-hand side, just like we have done before when selecting data from the <Path>sample_files</Path> folder. In this example we will include everything that has been downloaded.</Text> 1427 <Text id="0421">Select the <Path>englishhistory.net</Path> folder within <AutoText key="glidict::Tree.DownloadedFiles"/> and drag it across into the collection area.</Text> 1428 </NumberedItem> 1429 <NumberedItem> 1430 <Text id="0422">Switch to the <AutoText key="glidict::GUI.Create"/> panel to <b>build</b> and <b>preview</b> the collection. It is smaller than the previous collection because we included only the <i>citizens</i> files. However, these now represent the latest versions of the documents.</Text> 1431 </NumberedItem> 1432 </Content> 1433 </Tutorial> 1434 <Tutorial id="web_linking"> 1435 <Title> 1436 <Text id="0423">Pointing to documents on the web</Text> 1437 </Title> 1438 <Prerequisite id="downloading_from_internet"/> 1439 <Version initial="2.60" current="2.70"/> 1440 <Content> 1441 <NumberedItem> 1442 <Text id="0424">Open up your <b>webtudor</b> collection, and in the <AutoText key="glidict::GUI.Gather"/> panel inspect the files you dragged into it. The first folder is <Path>englishhistory.net</Path>, which opens up to reveal <Path>tudor</Path>, and so on. The files represent a complete sweep of the pages (and supporting images) that constitute the <i>Tudor citizens</i> section of the <i>englishhistory.net</i> web site. They were downloaded from the web in a way that preserved the structure of the original site. This allows any page's original URL to be reconstructed from the folder hierarchy.</Text> 1443 </NumberedItem> 1444 <NumberedItem> 1445 <Text id="0425">In the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Plugins"/> section, then select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Locate the <AutoText text="file_is_url"/> option (about halfway down the first block of items) and switch it on. While you are there, switch off the <AutoText text="smart_block"/> option so that stray images are not processed. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 1446 <Text id="0426">Setting this option to the <AutoText text="HTMLPlug"/> means that Greenstone sets an additional piece of metadata for each document called <AutoText text="URL"/>, which gives its original URL.</Text> 1447 <Text id="0427">It is important that the files gathered in the collection start with the web domain name (<i>englishhistory.net</i> in this case). The conversion process will not work if you dragged over a subfolder, for example the <Path>tudor</Path> folder, because this will set <AutoText text="URL"/> metadata to something like</Text> 1448 <Indent> 1449 http://tudor/citizens/... 1450 </Indent> 1451 <Text id="0428">rather than</Text> 1452 <Indent> 1453 http://englishhistory.net/tudor/citizens/... 1454 </Indent> 1455 <Text id="0429">If you have copied over a subfolder previously, delete it and make a fresh copy. Drag the folder in the right-hand side of the <AutoText key="glidict::GUI.Gather"/> panel on to the trash can in the lower right corner. Then obtain a fresh copy of the files by dragging across the <Path>englishhistory.net</Path> folder from the <AutoText key="glidict::Tree.DownloadedFiles"/> folder on the left-hand side.</Text> 1456 </NumberedItem> 1457 <NumberedItem> 1458 <Text id="0430">To make use of the new URL metadata, the icon link must be changed to serve up the original URL rather than the copy stored in the digital library. Go to the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Formats"/> section and edit the <AutoText text="VList" /> format statement by replacing</Text> 1459 <Format>[link][icon][/link]</Format> 1460 <Text id="0431">with</Text> 1461 <Format>[weblink][webicon][/weblink]</Format> 1462 <Text id="0432">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> to commit the change.</Text> 1463 </NumberedItem> 1464 <NumberedItem> 1465 <Text id="0433">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> and <b>preview</b> the collection. Note that the document icons have changed. The collection behaves exactly as before, except that when you click a document icon your web browser retrieves the original document from the web (assuming it is still there by the time you do this exercise!). If you are working offline you will be unable to retrieve the document.</Text> 1328 <NumberedItem> 1329 <Text id="0388">Invoke the Greenstone Librarian Interface (from the Windows <i>Start</i> menu) and start a new collection called <b>tudor</b> (use the <AutoText key="glidict::Menu.File"/> menu). Fill out the pop-up dialog with appropriate values and leave <b>Dublin Core</b>, which is selected by default, as the metadata set.</Text> 1330 </NumberedItem> 1331 <NumberedItem> 1332 <Text id="0389">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>tudor</Path> folder in <Path>sample_files</Path>.</Text> 1333 </NumberedItem> 1334 <NumberedItem> 1335 <Text id="0390">Drag <Path>englishhistory.net</Path> from the left-hand side to the right to include it in your <b>tudor</b> collection.</Text> 1336 </NumberedItem> 1337 <NumberedItem> 1338 <Text id="0391">Switch to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Build_Collection" type="button"/>.</Text> 1339 </NumberedItem> 1340 <NumberedItem> 1341 <Text id="0392">When building has finished, <b>preview</b> the collection.</Text> 1342 </NumberedItem> 1343 <Heading> 1344 <Text id="0392a">Extracting more metadata from the HTML</Text> 1345 </Heading> 1346 <NumberedItem> 1347 <Text id="0393">The browsing facilities in this collection (<AutoText key="coredm::_Global:labelTitle_" type="italics"/> and <AutoText key="coredm::_Global:labelSource_" type="italics"/>) are based entirely on extracted metadata. Return to the <AutoText key="glidict::GUI.Enrich"/> panel in the Librarian Interface and examine the metadata that has been extracted for some of the files.</Text> 1348 </NumberedItem> 1349 <NumberedItem> 1350 <Text id="0393a">Many HTML documents contain metadata in <Format><meta></Format> tags in the <Format><head></Format> of the page. Open up the <Path>englishhistory.net → tudor → monarchs → boleyn.html</Path> file by navigating to it in the tree on the left hand side, and double clicking it. This will open it in a web browser. View the HTML source of the page (<Menu>View → Source</Menu> in Internet Explorer, <Menu>View → Page Source</Menu> in Mozilla). You will notice that this page has <AutoText text="page_topic,content" type="italics"/> and <AutoText text="author" type="italics"/> metadata.</Text> 1351 </NumberedItem> 1352 <NumberedItem> 1353 <Text id="0393b">By default, <AutoText text="HTMLPlug"/> only looks for Title metadata. Configure the plugin so that it looks for the other metadata too. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Switch on the <AutoText text="metadata_fields"/> option, and set the value to</Text> 1354 <Format> 1355 Title,Author,Page_topic,Content 1356 </Format> 1357 <Text id="0393b-1">Make sure that you have copied this exactly, with no spaces. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 1358 </NumberedItem> 1359 <NumberedItem> 1360 <Text id="0393c">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>rebuild</b> the collection. Go back to the <AutoText key="glidict::GUI.Enrich"/> panel and look at the extracted metadata for some of the HTML files in <Path>englishhistory.net → tudor → monarchs</Path>. The new metadata should new be visible.</Text> 1361 </NumberedItem> 1362 <Heading> 1363 <Text id="0393d">Blocking the stray images</Text> 1364 </Heading> 1365 <Comment> 1366 <Text id="0394">You've probably noticed that the collection contains a few stray image files, as well as the HTML documents. This is a mistake. The issue is that many of the HTML documents include images, and although Greenstone attempts to determine which images belong to HTML pages and only considers other images for inclusion in the collection, in this case it hasn't been completely successful. (This is because the web site from which these files were downloaded occasionally departs from the usual convention of hierarchical structuring.)</Text> 1367 </Comment> 1368 <NumberedItem> 1369 <Text id="0395">Switch back to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Beside <AutoText text="plugin HTMLPlug"/> you will see <AutoText text="-smart_block"/>. This is the option that attempts to identify images in the HTML pages and block them from inclusion—in this case, it's not smart enough! <b>Configure</b> <AutoText text="plugin HTMLPlug"/> again, scroll down the page to locate the <AutoText text="smart_block"/> option, and switch it off.</Text> 1370 </NumberedItem> 1371 <NumberedItem> 1372 <Text id="0396"><b>Rebuild</b> and <b>preview</b> the collection. The collection is exactly as before except that these stray images are suppressed. What is happening is that plug-ins operate as a pipeline: files are passed to each one in turn until one is found that can process it. By default (i.e. without <AutoText text="smart_block"/>) the HTML plug-in blocks <i>all</i> images, which is appropriate for this collection.</Text> 1373 </NumberedItem> 1374 <Heading> 1375 <Text id="0397">Looking at different views of the files in the <AutoText key="glidict::GUI.Gather"/> and <AutoText key="glidict::GUI.Enrich"/> panels</Text> 1376 </Heading> 1377 <NumberedItem> 1378 <Text id="0398">Switch to the <AutoText key="glidict::GUI.Gather"/> panel and in the right-hand side open <Path>englishhistory.net → tudor</Path>.</Text> 1379 </NumberedItem> 1380 <NumberedItem> 1381 <Text id="0400">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu for the right-hand side from <AutoText key="glidict::Filter.All_Files"/> to <AutoText key="glidict::Filter.0"/>. Notice the files displayed above are filtered accordingly, to show only files of this type.</Text> 1382 </NumberedItem> 1383 <NumberedItem> 1384 <Text id="0401">Change the <AutoText key="glidict::Filter.Filter_Tree"/> menu to <AutoText key="glidict::Filter.3"/>. Again, the files shown above alter.</Text> 1385 </NumberedItem> 1386 <NumberedItem> 1387 <Text id="0402">Now return the <AutoText key="glidict::Filter.Filter_Tree"/> setting back to <AutoText key="glidict::Filter.All_Files"/>, otherwise you may get confused later. Remember, if the <AutoText key="glidict::GUI.Gather"/> or <AutoText key="glidict::GUI.Enrich"/> panels do not seem to be showing all your files, this could be the problem.</Text> 1466 1388 </NumberedItem> 1467 1389 </Content> … … 1472 1394 </Title> 1473 1395 <Prerequisite id="large_html_collection"/> 1474 <Version initial="2.60" current="2.70 "/>1396 <Version initial="2.60" current="2.70w"/> 1475 1397 <Content> 1476 1398 <Comment> … … 1583 1505 </Title> 1584 1506 <Prerequisite id="large_html_collection"/> 1585 <Version initial="2.60" current="2.70 "/>1507 <Version initial="2.60" current="2.70w"/> 1586 1508 <Content> 1587 1509 <NumberedItem> … … 1596 1518 <Text id="0469">This displays something that looks like this: </Text> 1597 1519 <Indent> 1598 <table><tr><td><img width='15' height='20' src=" tutorial_files/itext.gif"/></td><td width='408' valign='top'>A discussion of question five from Tudor Quiz: Henry VIII <br/><i>(quizstuff.html)</i></td></tr></table>1520 <table><tr><td><img width='15' height='20' src="../tutorial_files/itext.gif"/></td><td width='408' valign='top'>A discussion of question five from Tudor Quiz: Henry VIII <br/><i>(quizstuff.html)</i></td></tr></table> 1599 1521 </Indent> 1600 1522 <Text id="0472">for a particular document whose <i>Title</i> metadata is <AutoText text="A discussion of question five from Tudor Quiz: Henry VIII"/> and whose <i>Source</i> metadata is <AutoText text="quizstuff.html"/>.</Text> … … 1615 1537 <Text id="0476"><b>Preview</b> the result (you don't need to build the collection, because changes to format statements take effect immediately). Look at some search results and at the <AutoText key="coredm::_Global:labelTitle_"/> list. They are just the same as before! Under most circumstances this far simpler format statement is entirely equivalent to Greenstone's more complex default. </Text> 1616 1538 <Comment> 1617 <Text id="0478">But there's a problem. Beside the bookshelves in the <AutoText key="coredm::_Global:labelSubject_"/> browser, beneath the subject appears a mysterious "()". What is printed on these bookshelf nodes is governed by the same format statement, and though bookshelf nodes of the hierarchy have associated <i>Title</i> metadata—their title is the name of the metadata value associated with that bookshelf—they do not have <AutoText key="metadata::ex.Source"/> metadata, so it comes out blank.</Text>1539 <Text id="0478">But there's a problem. Beside the bookshelves in the <AutoText key="coredm::_Global:labelSubject_"/> browser, beneath the subject appears a mysterious "()". What is printed for these bookshelves is governed by the same format statement, and though bookshelf nodes of the hierarchy have associated <i>Title</i> metadata—their title is the name of the metadata value associated with that bookshelf—they do not have <AutoText key="metadata::ex.Source"/> metadata, so it comes out blank.</Text> 1618 1540 </Comment> 1619 1541 </NumberedItem> … … 1649 1571 <NumberedItem> 1650 1572 <Text id="0490">Now go to the <AutoText key="glidict::GUI.Create"/> panel and click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>. Documents in the search results list will be displayed like this:</Text> 1651 <table><tr><td><img width='15' height='20' src=" tutorial_files/itext.gif" /></td><td width='408' valign='top'>A discussion of question five from Tudor Quiz: Henry VIII <br/>1573 <table><tr><td><img width='15' height='20' src="../tutorial_files/itext.gif" /></td><td width='408' valign='top'>A discussion of question five from Tudor Quiz: Henry VIII <br/> 1652 1574 Tudor period|Others</td></tr></table> 1653 1575 <Text id="0493">(The vertical bar appears because this <i>dc.Subject and Keywords</i> metadata is hierarchical metadata. Unfortunately there is no way to get at individual components of the hierarchy. For most metadata, such as title and author, this isn't a problem.)</Text> … … 1671 1593 </NumberedItem> 1672 1594 <NumberedItem> 1673 <Text id="0498">Go to the <AutoText key="glidict::GUI.Create"/> panel, click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>, and examine the subject hierarchy again to see the effect of your changes. </Text>1595 <Text id="0498">Go to the <AutoText key="glidict::GUI.Create"/> panel, click <AutoText key="glidict::CreatePane.Preview_Collection" type="button"/>, and examine the subject hierarchy again to see the effect of your changes. Bookshelves should say <AutoText text="Bookshelf title:"/> and then the title, while documents will display <AutoText text="Title:"/> and the title. Note that the number of documents in the bookshelf is not displayed: we are using <Format>[numleafdocs]</Format> to test what kind of item in the list we are at, but we are not displaying it.</Text> 1674 1596 </NumberedItem> 1675 1597 </Content> … … 1731 1653 </Content> 1732 1654 </Tutorial> 1655 <Tutorial id="downloading_from_internet"> 1656 <Title> 1657 <Text id="0411">Downloading files from the web</Text> 1658 </Title> 1659 <Version initial="2.60" current="2.70w"/> 1660 <Content> 1661 <Comment> 1662 <Text id="0412">The Greenstone Librarian Interface's Download panel allows you to download individual files, parts of websites, and indeed whole websites, from the web.</Text> 1663 </Comment> 1664 <NumberedItem> 1665 <Text id="0413">Start a new collection called <b>webtudor</b>, and base it on <AutoText key="glidict::NewCollectionPrompt.NewCollection"/></Text> 1666 </NumberedItem> 1667 <NumberedItem> 1668 <Text id="0414">In a web browser, visit <Link>http://englishhistory.net</Link>, follow the link to <i>Tudor England</i>, and click <<b>Enter</b>>. You should be at the URL</Text> 1669 <Link>http://englishhistory.net/tudor/contents.html</Link> 1670 <Text id="0415">This is where we started the downloading process to obtain the files you have been using for the <b>tudor</b> collection. You could do the same thing by copying this URL from the web browser, pasting it into the <AutoText key="glidict::GUI.Download"/> panel, and clicking the <AutoText key="glidict::Mirroring.Download" type="button"/> button. However, several megabytes will be downloaded, which might strain your network resources—or your patience! For a faster exercise we focus on a smaller section of the site. </Text> 1671 </NumberedItem> 1672 <NumberedItem> 1673 <Text id="0415a">In the <AutoText key="glidict::GUI.Download"/> panel, enter this URL</Text> 1674 <Link>http://englishhistory.net/tudor/citizens/</Link> 1675 <Text id="0417">into the <AutoText key="glidict::Mirroring.Source_URL"/> box. There are several options that govern how the download process proceeds. To copy just the <i>citizens</i> section of the website, select <AutoText key="glidict::Mirroring.Higher_Directories"/>. If you don't do this (or if you miss out the terminating "/"), the downloading process will follow links to other areas of the <i>englishhistory.net</i> website and grab those as well. Set <AutoText key="glidict::Mirroring.Download_Depth"/> to <AutoText key="glidict::Mirroring.Download_Depth.Unlimited"/>—we want to follow as many links as necessary to download all the pages.</Text> 1676 </NumberedItem> 1677 <NumberedItem> 1678 <Text id="0417a">If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Open the <AutoText key="glidict::Preferences.Connection"/> tab in <Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_Options"/></Menu> and switch on the <AutoText key="glidict::Preferences.Connection.Use_Proxy"/> checkbox. Enter the proxy server address and port number in the <AutoText key="glidict::Preferences.Connection.Proxy_Host"/> and <AutoText key="glidict::Preferences.Connection.Proxy_Port"/> boxes. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 1679 </NumberedItem> 1680 <NumberedItem> 1681 <Text id="0418">Now click <AutoText key="glidict::Mirroring.Download" type="button"/>. If you have set proxy information in <AutoText key="glidict::Menu.File_Options"/>, a popup will ask for you user name and password. Once the download has started, a progress bar appears in the lower half of the panel that reports on how the downloading process is doing.</Text> 1682 <Comment> 1683 <Text id="0419">More detailed information can be obtained by clicking <AutoText key="glidict::Mirroring.DownloadJob.Log" type="button"/>. The process can be paused and restarted as needed, or stopped altogether by clicking <AutoText key="glidict::Mirroring.DownloadJob.Close" type="button"/>. Downloading can be a lengthy process involving multiple sites, and so Greenstone allows additional downloads to be queued up. When new URLs are pasted into the <AutoText key="glidict::Mirroring.Source_URL"/> box and <AutoText key="glidict::Mirroring.Download" type="button"/> clicked, a new progress bar is appended to those already present in the lower half of the panel. When the currently active download item completes, the next is started automatically.</Text> 1684 </Comment> 1685 </NumberedItem> 1686 <NumberedItem> 1687 <Text id="0420">Downloaded files are stored in a top-level folder called <AutoText key="glidict::Tree.DownloadedFiles"/> that appears on the left-hand side of the <AutoText key="glidict::GUI.Gather"/> panel. You may not need all the downloaded files, and you choose which you want by dragging selected files from this folder over into the collection area on the right-hand side, just like we have done before when selecting data from the <Path>sample_files</Path> folder. In this example we will include everything that has been downloaded.</Text> 1688 <Text id="0421">Select the <Path>englishhistory.net</Path> folder within <AutoText key="glidict::Tree.DownloadedFiles"/> and drag it across into the collection area.</Text> 1689 </NumberedItem> 1690 <NumberedItem> 1691 <Text id="0422">Switch to the <AutoText key="glidict::GUI.Create"/> panel to <b>build</b> and <b>preview</b> the collection. It is smaller than the previous collection because we included only the <i>citizens</i> files. However, these now represent the latest versions of the documents.</Text> 1692 </NumberedItem> 1693 </Content> 1694 </Tutorial> 1695 <Tutorial id="web_linking"> 1696 <Title> 1697 <Text id="0423">Pointing to documents on the web</Text> 1698 </Title> 1699 <Prerequisite id="downloading_from_internet"/> 1700 <Version initial="2.60" current="2.70w"/> 1701 <Content> 1702 <NumberedItem> 1703 <Text id="0424">Open up your <b>webtudor</b> collection, and in the <AutoText key="glidict::GUI.Gather"/> panel inspect the files you dragged into it. The first folder is <Path>englishhistory.net</Path>, which opens up to reveal <Path>tudor</Path>, and so on. The files represent a complete sweep of the pages (and supporting images) that constitute the <i>Tudor citizens</i> section of the <i>englishhistory.net</i> web site. They were downloaded from the web in a way that preserved the structure of the original site. This allows any page's original URL to be reconstructed from the folder hierarchy.</Text> 1704 </NumberedItem> 1705 <NumberedItem> 1706 <Text id="0425">In the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Plugins"/> section, then select the <AutoText text="plugin HTMLPlug"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Locate the <AutoText text="file_is_url"/> option (about halfway down the first block of items) and switch it on. While you are there, switch off the <AutoText text="smart_block"/> option so that stray images are not processed. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 1707 <Text id="0426">Setting this option to the <AutoText text="HTMLPlug"/> means that Greenstone sets an additional piece of metadata for each document called <AutoText text="URL"/>, which gives its original URL.</Text> 1708 <Text id="0427">It is important that the files gathered in the collection start with the web domain name (<i>englishhistory.net</i> in this case). The conversion process will not work if you dragged over a subfolder, for example the <Path>tudor</Path> folder, because this will set <AutoText text="URL"/> metadata to something like</Text> 1709 <Indent> 1710 http://tudor/citizens/... 1711 </Indent> 1712 <Text id="0428">rather than</Text> 1713 <Indent> 1714 http://englishhistory.net/tudor/citizens/... 1715 </Indent> 1716 <Text id="0429">If you have copied over a subfolder previously, delete it and make a fresh copy. Drag the folder in the right-hand side of the <AutoText key="glidict::GUI.Gather"/> panel on to the trash can in the lower right corner. Then obtain a fresh copy of the files by dragging across the <Path>englishhistory.net</Path> folder from the <AutoText key="glidict::Tree.DownloadedFiles"/> folder on the left-hand side.</Text> 1717 </NumberedItem> 1718 <NumberedItem> 1719 <Text id="0430">To make use of the new URL metadata, the icon link must be changed to serve up the original URL rather than the copy stored in the digital library. Go to the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText key="glidict::CDM.GUI.Formats"/> section and edit the <AutoText text="VList" /> format statement by replacing</Text> 1720 <Format>[link][icon][/link]</Format> 1721 <Text id="0431">with</Text> 1722 <Format>[weblink][webicon][/weblink]</Format> 1723 <Text id="0432">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> to commit the change.</Text> 1724 </NumberedItem> 1725 <NumberedItem> 1726 <Text id="0433">Switch to the <AutoText key="glidict::GUI.Create"/> panel and <b>build</b> and <b>preview</b> the collection. Note that the document icons have changed. The collection behaves exactly as before, except that when you click a document icon your web browser retrieves the original document from the web (assuming it is still there by the time you do this exercise!). If you are working offline you will be unable to retrieve the document.</Text> 1727 </NumberedItem> 1728 </Content> 1729 </Tutorial> 1733 1730 <Tutorial id="bibliography_collection"> 1734 1731 <Title> … … 1736 1733 </Title> 1737 1734 <SampleFiles folder="marc"/> 1738 <Version initial="2.60" current="2.70 "/>1735 <Version initial="2.60" current="2.70w"/> 1739 1736 <Content> 1740 1737 <Comment> … … 1914 1911 <td valign=top><b>[ex.Photographer^all]</b><br/>[ex.Notes^all]</td> 1915 1912 </Format> 1913 <Text id="is-11a">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 1916 1914 </NumberedItem> 1917 1915 <NumberedItem> … … 1920 1918 <Text id="is-13"><AutoText text="ISISPlug"/> stores a nicely formatted version of the record as the document text, and this is what is displayed when we view a record. Lets tidy it up a little more.</Text> 1921 1919 <NumberedItem> 1922 <Text id="is-14">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, remove the <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons by setting the <AutoText text="DocumentButtons"/> format statement to empty .</Text>1923 </NumberedItem> 1924 <NumberedItem> 1925 <Text id="is-15"> Clear the <AutoText text="DocumentHeading"/> format statement to remove the <AutoText text="Untitled" type="quoted"/> at the top of the document.</Text>1920 <Text id="is-14">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, remove the <AutoText key="coredm::_document:textDETACH_" type="italics"/> and <AutoText key="coredm::_document:textNOHIGHLIGHT_" type="italics"/> buttons by setting the <AutoText text="DocumentButtons"/> format statement to empty, and clicking <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 1921 </NumberedItem> 1922 <NumberedItem> 1923 <Text id="is-15">Remove the <AutoText text="Untitled" type="quoted"/> at the top of the document by setting the <AutoText text="DocumentHeading"/> format statement to empty and clicking <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 1926 1924 </NumberedItem> 1927 1925 <NumberedItem> … … 1935 1933 } 1936 1934 </Format> 1935 <Text id="is-16a">Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 1936 1937 1937 </NumberedItem> 1938 1938 <NumberedItem> … … 1946 1946 </Title> 1947 1947 <SampleFiles folder="custom"/> 1948 <Version initial="2.70" current="2.70 "/>1948 <Version initial="2.70" current="2.70w"/> 1949 1949 <Content> 1950 1950 <Text id="mf-2">The appearance of all pages produced by Greenstone is governed by macro files, which reside in the folder <Path>Greenstone → macros</Path>, images, and CSS stylesheets, both of which reside in <Path>Greenstone → images</Path>. </Text> … … 2201 2201 </Title> 2202 2202 <SampleFiles folder="beatles"/> 2203 <Version initial="2.60" current="2.70 "/>2203 <Version initial="2.60" current="2.70w"/> 2204 2204 <Content> 2205 2205 <NumberedItem> … … 2237 2237 <Prerequisite id="multimedia_collection_explore"/> 2238 2238 <SampleFiles folder="beatles"/> 2239 <Version initial="2.60" current="2.70 "/>2239 <Version initial="2.60" current="2.70w"/> 2240 2240 <Content> 2241 2241 <Comment> … … 2243 2243 </Comment> 2244 2244 <NumberedItem> 2245 <Text id="0552">Start a new collection (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>) called <b>small _beatles</b>, basing it on the default "New Collection." (Basing it on the existing Advanced Beatles collection would make your life far easier, but we want you to learn how to build it from scratch!) Fill out the fields with appropriate information. Use the Dublin Core metadata set (set by default).</Text>2245 <Text id="0552">Start a new collection (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>) called <b>small beatles</b>, basing it on the default "New Collection." (Basing it on the existing Advanced Beatles collection would make your life far easier, but we want you to learn how to build it from scratch!) Fill out the fields with appropriate information. Use the Dublin Core metadata set (set by default).</Text> 2246 2246 </NumberedItem> 2247 2247 <NumberedItem> … … 2310 2310 <Text id="0575"><b>Build</b> the collection again and <b>preview</b> it.</Text> 2311 2311 </NumberedItem> 2312 <Comment> 2313 <Text id="0575a">Note how we assigned dc.Format metadata to all documents in the collection with a minimum of labour. We did this by capitalizing on the folder structure of the original information. Even though we complained earlier about how messy this folder structure is, you can still take advantage of it when assigning metadata.</Text> 2314 </Comment> 2312 2315 <Heading> 2313 2316 <Text id="0579">Suppressing dummy text</Text> 2314 2317 </Heading> 2315 2318 <NumberedItem> 2316 <Text id="0580">Alongside the Audio files there is an MP3 icon, which plays the audio when you click it, and also a text document that contains some dummy text. This isn't supposed to be seen, but to suppress it you have to fiddle with a format statement.</Text>2319 <Text id="0580">Alongside the Audio files there is an MP3 icon, which plays the audio when you click it, and also a text document that contains some dummy text. Image files also have dummy documents. These dummy documents aren't supposed to be seen, but to suppress them you have to fiddle with a format statement. </Text> 2317 2320 <BulletList> 2318 2321 <Bullet> … … 2320 2323 </Bullet> 2321 2324 <Bullet> 2322 <Text id="0582">Ensure that <AutoText text="VList" /> is selected, and make the changes that are highlighted below. You need to insert three lines into the first line, and delete the second line.<br/> <br/> Change:</Text> 2325 <Text id="0582">Ensure that <AutoText text="VList" /> is selected, and make the changes that are highlighted below. You need to insert five lines into the first line, and delete the second line. (Note, the changes are available in a text file, see below.)</Text> 2326 <Text id="0582a">Change:</Text> 2323 2327 <Format> 2324 2328 <td valign=top><highlight>[link][icon][/link]</highlight></td><br/> … … 2332 2336 <td valign=top><br/> 2333 2337 <highlight>{If}{[dc.Format] eq 'Audio', </highlight><br/> 2334 <highlight> [srclink][srcicon][/srclink], </highlight><br/> 2335 <highlight> [link][icon][/link]}</highlight></td> <br/> 2336 <td valign=top>[highlight] {Or}{[dls.Title],[dc.Title],[Title],Untitled} [/highlight]{If}{[ex.Source],<br><i>([ex.Source])</i>}</td> 2338 <highlight>[srclink][srcicon][/srclink], </highlight><br/> 2339 <highlight>{If}{[dc.Format] eq 'Images',</highlight><br/> 2340 <highlight>[srclink][thumbicon][/srclink],</highlight><br/> 2341 <highlight>[link][icon][/link]}}</highlight></td> <br/> 2342 <td valign=top>[highlight]<br/> 2343 {Or}{[dls.Title],[dc.Title],[Title],Untitled}<br/> 2344 [/highlight]{If}{[ex.Source],<br><i>([ex.Source])</i>}</td> 2337 2345 </Format> 2338 2346 </Bullet> … … 2343 2351 <Text id="0585">To make this easier for you we have prepared a plain text file that contains the new text. In WordPad open the following file:</Text> 2344 2352 <Path>sample_files → beatles → format_tweaks → audio_tweak.txt</Path> 2345 <Text id="0586">(Be sure to use WordPad rather than Notepad, because Notepad does not display the line breaks correctly.) Place it in the copy buffer by highlighting the text in WordPad and selecting <Menu>Edit → Copy</Menu>. Now move back to the Librarian Interface, highlight all the text that makes up the current VListformat statement, and use <Menu><AutoText key="glidict::Menu.Edit"/> → <AutoText key="glidict::Menu.Edit_Paste"/></Menu> to transform the old statement to the new one. Remember to press <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> when finished.</Text>2353 <Text id="0586">(Be sure to use WordPad rather than Notepad, because Notepad does not display the line breaks correctly.) Place it in the copy buffer by highlighting the text in WordPad and selecting <Menu>Edit → Copy</Menu>. Now move back to the Librarian Interface, highlight all the text that makes up the current <AutoText text="VList"/> format statement, and use <Menu><AutoText key="glidict::Menu.Edit"/> → <AutoText key="glidict::Menu.Edit_Paste"/></Menu> to transform the old statement to the new one. Remember to press <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> when finished.</Text> 2346 2354 <Text id="0589"><b>Preview</b> the result. You may need to click the browser's <<b>Reload</b>> button to force it to re-load the page.</Text> 2347 2355 </NumberedItem> … … 2354 2362 <td valign=top><br/> 2355 2363 {If}{[dc.Format] eq 'Audio',<br/> 2356 [srclink][srcicon][/srclink],<br/> 2357 [link][icon][/link]}</td> <br/> 2358 <td valign=top>[highlight] {Or}{[dls.Title],[dc.Title],[Title],Untitled} [/highlight]<highlight>{If}{[ex.Source],<br><i>([ex.Source])</i>}</highlight></td></Format> 2364 [srclink][srcicon][/srclink],<br/> 2365 {If}{[dc.Format] eq 'Images',<br/> 2366 [srclink][thumbicon][/srclink],<br/> 2367 [link][icon][/link]}}</td> <br/> 2368 <td valign=top>[highlight]<br/> 2369 {Or}{[dls.Title],[dc.Title],[Title],Untitled}<br/> 2370 [/highlight]<highlight>{If}{[ex.Source],<br><i>([ex.Source])</i>}</highlight></td></Format> 2359 2371 </Bullet> 2360 2372 </BulletList> 2361 <Text id="0595">Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> after all this work! <b>Preview</b> the result (you don't need to build the collection.)</Text>2373 <Text id="0595">Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/> after all this work! <b>Preview</b> the result (you don't need to rebuild the collection.)</Text> 2362 2374 </NumberedItem> 2363 2375 <Heading> … … 2389 2401 </Heading> 2390 2402 <NumberedItem> 2391 <Text id="0606">Make the bookshelves show how many documents they contain by inserting a line in the <AutoText text="VList"/> format statement in the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel :</Text>2403 <Text id="0606">Make the bookshelves show how many documents they contain by inserting a line in the <AutoText text="VList"/> format statement in the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel. The added line is shown highlighted below. The complete format statement can be copied from <Path>sample_files → beatles → format_tweaks → show_num_docs.txt</Path>.</Text> 2392 2404 <Format> 2393 2405 <td valign=top><br/> 2394 2406 {If}{[dc.Format] eq 'Audio',<br/> 2395 [srclink][srcicon][/srclink],<br/> 2396 [link][icon][/link]}</td><br/> 2407 [srclink][srcicon][/srclink],<br/> 2408 {If}{[dc.Format] eq 'Images',<br/> 2409 [srclink][thumbicon][/srclink],<br/> 2410 [link][icon][/link]}}</td><br/> 2397 2411 <highlight><td>{If}{[numleafdocs],([numleafdocs])}</td></highlight><br/> 2398 <td valign=top>[highlight] {Or}{[dls.Title],[dc.Title],[Title],Untitled} [/highlight]</td></Format> 2399 <Text id="0607">You will find this text in <Path>format_tweaks → show_num_docs.txt</Path>, which can be copied and pasted in as before. Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 2400 <Text id="0609"><b>Preview</b> the result (you don't need to build the collection.)</Text> 2401 </NumberedItem> 2402 <NumberedItem> 2403 <Text id="0610">Now turn to the images. Dummy documents are displayed here too. To suppress these dummy documents, change the <AutoText text="VList" /> format statement in the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel again by adding the two highlighted lines, and the close curly bracket:</Text> 2404 <Format><td valign=top><br/> 2405 {If}{[dc.Format] eq 'Audio',<br/> 2406 [srclink][srcicon][/srclink],<br/> 2407 <highlight>{If}{[dc.Format] eq 'Images',</highlight><br/> 2408 <highlight>[srclink][thumbicon][/srclink],</highlight><br/> 2409 [link][icon][/link]}<highlight>}</highlight></td><br/> 2410 <td>{If}{[numleafdocs],([numleafdocs])}</td><br/> 2411 <td valign=top>[highlight] {Or}{[dls.Title],[dc.Title],[Title],Untitled} [/highlight]</td></Format> 2412 </NumberedItem> 2412 <td valign=top>[highlight]<br/> 2413 {Or}{[dls.Title],[dc.Title],[Title],Untitled}<br/> 2414 [/highlight]</td></Format> 2415 <Text id="0607">Don't forget to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 2416 <Text id="0609"><b>Preview</b> the result (you don't need to build the collection.) Bookshelves in the titles and browse classifiers should show how many documents they contain.</Text> 2417 </NumberedItem> 2418 <Heading> 2419 <Text id="0612a">Adding a Phind phrase browser</Text> 2420 </Heading> 2413 2421 <NumberedItem> 2414 2422 <Text id="0612">In the <AutoText key="glidict::CDM.GUI.Classifiers"/> section on the <AutoText key="glidict::GUI.Design"/> panel, add a <AutoText text="Phind"/> classifier. Leave the settings at their defaults: this generates a phrase browsing classifier that sources its phrases from <i>Title</i> and <i>text</i>.</Text> 2415 </NumberedItem> 2416 <NumberedItem> 2417 <Text id="0613">To complete the collection, use the browse button of <AutoText key="glidict::CDM.General.Icon_Collection"/> in the <AutoText key="glidict::CDM.GUI.General"/> section of the <AutoText key="glidict::GUI.Design"/> panel to select the following image:</Text> 2418 <Path>advbeat_large → images → beatlesmm.png</Path> 2419 <Text id="0616"><b>Build</b> the collection again and <b>preview</b> it.</Text> 2420 </NumberedItem> 2421 <Comment> 2422 <Text id="0617">Note how we assigned dc.Format metadata to all documents in the collection with a minimum of labour. We did this by capitalizing on the folder structure of the original information. Even though we complained earlier about how messy this folder structure is, you can still take advantage of it when assigning metadata.</Text> 2423 </Comment> 2423 <Text id="0612b"><b>Build</b> the collection again and <b>preview</b> it. Select the new "phrases" option from the navigation bar. Enter a single word in the text box, such as <AutoText text="band" type="quotes"/>. The phrase browser will present you with phrases found in the collection containing the search term. This can provide a useful way of browsing a very large collection. Note that even though it is called a phrase browser, only single terms can be used as the starting point for browsing.</Text> 2424 </NumberedItem> 2425 <Heading> 2426 <Text id="0612a">Branding the collection with an image</Text> 2427 </Heading> 2428 <NumberedItem> 2429 <Text id="0613">To complete the collection, lets give it a new image for the top left corner of the page. Go to the <AutoText key="glidict::CDM.GUI.General"/> section of the <AutoText key="glidict::GUI.Design"/> panel. Use the browse button of <AutoText key="glidict::CDM.General.Icon_Collection"/> to select the following image:</Text> 2430 <Path>sample_files → beatles → advbeat_large → images → beatlesmm.png</Path> 2431 <Text id="0613a">Preview the collection, and make sure the new image appears.</Text> 2432 </NumberedItem> 2424 2433 <Heading> 2425 2434 <Text id="0623">Using <AutoText text="UnknownPlug"/></Text> … … 2497 2506 </NumberedItem> 2498 2507 <NumberedItem> 2499 <Text id="0646">Copy the <Path>images</Path> and <Path>macros</Path> folders located there into your collection's top-level folder. (It's OK to overwrite the existing <Path>images</Path> folder: the image in it is included in the folder being copied.) The <Path>images</Path> folder includes some useful icons, and the <Path>macros</Path> folder defines some macro names that use these images. To see the macro definitions, take a look by using a text editor to open the file <Path>extra.dm</Path> in the <Path>macros</Path> folder.</Text> 2508 <Text id="0645a">Open up another file browser, and locate the small beatles collection in your Greenstone installation:</Text> 2509 <Path>greenstone → collect → smallbea</Path> 2510 <Text id="0645b"><AutoText text="smallbea"/> is the folder name generated by Greenstone for this collection. You can determine what the folder name is for a collection by looking at the title bar of the Librarian Interface: the folder name is displayed in brackets after the collection name.</Text> 2511 </NumberedItem> 2512 <NumberedItem> 2513 <Text id="0646">Using the file browser, copy the <Path>images</Path> and <Path>macros</Path> folders from the <Path>advbeat_large</Path> folder into the <Path>smallbea</Path> folder. (It's OK to overwrite the existing <Path>images</Path> folder: the image in it is included in the folder being copied.) The <Path>images</Path> folder includes some useful icons, and the <Path>macros</Path> folder defines some macro names that use these images. To see the macro definitions, take a look by using a text editor to open the file <Path>extra.dm</Path> in the <Path>macros</Path> folder.</Text> 2500 2514 </NumberedItem> 2501 2515 <Heading> … … 2533 2547 </Heading> 2534 2548 <NumberedItem> 2535 <Text id="0653">Open your collection's <Path>macros</Path> folder and locate the <Path>extra.dm</Path> file within it. <b> Right-click</b> on it. If prompted, select <b>WordPad</b> as the application to open it with.</Text>2536 </NumberedItem> 2537 <NumberedItem> 2538 <Text id="0654">The file content is fairly brief, specifying only what needs to be overridden from the default behaviour for this collection. In WordPad, near the top of the file you should see:</Text>2549 <Text id="0653">Open your collection's <Path>macros</Path> folder and locate the <Path>extra.dm</Path> file within it. <b>Open</b> it in a text editor, e.g. WordPad.</Text> 2550 </NumberedItem> 2551 <NumberedItem> 2552 <Text id="0654">The file content is fairly brief, specifying only what needs to be overridden from the default behaviour for this collection. Near the top of the file you should see:</Text> 2539 2553 <Format> 2540 2554 _collectionspecificstyle_ {<br/> … … 2545 2559 } 2546 2560 </Format> 2547 <Text id="0655">Use copy and paste on these lines to make this part of the file look like:</Text> 2548 <Format> 2549 # Original statements<br/> 2550 #_collectionspecificstyle_ {<br/> 2551 #<style><br/> 2552 #body.bgimage \{ background-image: url("_httpcimages_/beat_margin.gif"); \}<br/> 2553 #\#page \{ margin-left: 120px; \} <br/> 2554 #</style><br/> 2555 #}<br/> 2556 <br/> 2557 _collectionspecificstyle_ {<br/> 2558 <style><br/> 2559 body.bgimage \{ background-image: url("_httpcimages_/tile.jpg"); \}<br/> 2560 </style><br/> 2561 } 2562 </Format> 2563 <Text id="0656">A hash (#) at the start of line signals a comment, and Greenstone ignores the following text. We use this to comment out the original statements and replace them with modified lines. It is useful to retain the original version in case we need to restore the original lines at a later date. These lines relate to the background image used. The new image <Path>tile.jpg</Path> was also in the <Path>images</Path> folder that was copied across previously.</Text> 2564 </NumberedItem> 2565 <NumberedItem> 2566 <Text id="0657">Within <b>WordPad</b>, save <i>extra.dm</i>.</Text> 2561 <Text id="0655">Replace the text <AutoText text="beat_margin.gif" type="quotes"/> with <AutoText text="tile.jpg" type="quotes"/>. Save the file. </Text> 2562 <Text id="0656">This line relates to the background image used. The new image <Path>tile.jpg</Path> was in the <Path>images</Path> folder that was copied across previously.</Text> 2567 2563 </NumberedItem> 2568 2564 <NumberedItem> … … 2570 2566 <Text id="0659">Other features can be altered by editing the macro files—for example, the headers and footers used on each page, and the highlighting style used for search terms (specify a different colour, use bold etc.).</Text> 2571 2567 </NumberedItem> 2572 <NumberedItem>2573 <Text id="0660">If you want to you can reverse the most recent change you made by commenting out the new lines added (add #) and uncommenting the original lines (delete # character). Remember to save the file. To undo all the customized changes made, delete the content of the <Path>macros</Path> and <Path>images</Path> folders.</Text>2574 </NumberedItem>2575 2568 <Heading> 2576 2569 <Text id="0661">Building a full-size version of the collection</Text> … … 2583 2576 </Bullet> 2584 2577 <Bullet> 2585 <Text id="0664">Start a new collection called <i> advbeat</i> (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>).</Text>2586 </Bullet> 2587 <Bullet> 2588 <Text id="0665">Base this new collection on <i>small _beatles</i>.</Text>2578 <Text id="0664">Start a new collection called <i>large beatles</i> (<Menu><AutoText key="glidict::Menu.File"/> → <AutoText key="glidict::Menu.File_New"/></Menu>).</Text> 2579 </Bullet> 2580 <Bullet> 2581 <Text id="0665">Base this new collection on <i>small beatles</i>.</Text> 2589 2582 </Bullet> 2590 2583 <Bullet> … … 2592 2585 </Bullet> 2593 2586 <Bullet> 2594 <Text id="0670"><b>Build</b> the collection and previewthe result. (If you want the collection to have an icon, you will have to add it from the <AutoText key="glidict::GUI.Design"/> panel.)</Text>2587 <Text id="0670"><b>Build</b> the collection and <b>preview</b> the result. (If you want the collection to have an icon, you will have to add it from the <AutoText key="glidict::GUI.Design"/> panel.)</Text> 2595 2588 </Bullet> 2596 2589 </BulletList> … … 2612 2605 </Title> 2613 2606 <SampleFiles folder="niupepa"/> 2614 <Version initial="2.60" current="2.70 "/>2607 <Version initial="2.60" current="2.70w"/> 2615 2608 <Content> 2616 2609 <Comment> … … 2633 2626 </NumberedItem> 2634 2627 <NumberedItem> 2635 <Text id="0681">Now go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> newspapers. </Text>2628 <Text id="0681">Now go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> newspapers. Note that only the <AutoText text="Te Whetu o Te Tau" type="italics"/> newspapers have text; <AutoText text="Te Waka o Te Iwi" type="italics"/> papers don't.</Text> 2636 2629 </NumberedItem> 2637 2630 <Comment> … … 2651 2644 </NumberedItem> 2652 2645 <NumberedItem> 2653 <Text id="0687">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, select the <AutoText key="metadata::ex.Title"/> classifier in the <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and <AutoText text="VList"/> in the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Delete the contents of the <AutoText key="glidict::CDM.FormatManager.Editor"/> box, and add the following :</Text>2646 <Text id="0687">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, select the <AutoText key="metadata::ex.Title"/> classifier in the <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and <AutoText text="VList"/> in the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Delete the contents of the <AutoText key="glidict::CDM.FormatManager.Editor"/> box, and add the following text. (This format statement can be copied and pasted from the file <Path>sample_files → niupepa → formats → titles_tweak.txt</Path>.)</Text> 2654 2647 <Format> 2655 2648 <td valign="top">[link][icon][/link]</td><br/> … … 2662 2655 </Format> 2663 2656 <Text id="0687a">Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>.</Text> 2664 <Text id="0687b">(This format statement can be copied and pasted from the file <Path>sample_files → niupepa → formats → titles_tweak.txt</Path>)</Text>2665 2657 </NumberedItem> 2666 2658 <NumberedItem> … … 2675 2667 </Comment> 2676 2668 <NumberedItem> 2677 <Text id="0696">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. The default format string displays the document's plain text, which, if there is none, is set to <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. Change this to :</Text>2678 <Format> 2679 < center><table><tr><br/>2680 & nbsp; <td valign=top>[srclink][screenicon][/srclink]</td><br/>2681 & nbsp; <td valign=top>[Text]</td><br/>2682 </tr></table> </center>2669 <Text id="0696">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select the <AutoText text="DocumentText"/> format statement. The default format string displays the document's plain text, which, if there is none, is set to <AutoText key="perlmodules::BasPlug.dummy_text" type="quoted"/>. Change this to the following text. (This format statement can be copied and pasted from the file <Path>sample_files → niupepa → formats → doc_tweak.txt</Path>)</Text> 2670 <Format> 2671 <table><tr><br/> 2672 <td valign=top>[srclink][screenicon][/srclink]</td><br/> 2673 <td valign=top>[Text]</td><br/> 2674 </tr></table> 2683 2675 </Format> 2684 2676 <Text id="0696a">and click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 2685 <Text id="0697">(This format statement can be copied and pasted from the file <Path>sample_files → niupepa → formats → doc_tweak.txt</Path>)</Text>2686 2677 <Comment> 2687 2678 <Text id="0698">Including <Format>[screenicon]</Format> has the effect of embedding the screen-sized image generated by switching the <AutoText text="screenview"/> option on in <AutoText text="PagedImgPlug"/>. It is hyperlinked to the original image by the construct <Format>[srclink]...[/srclink]</Format>.</Text> … … 2728 2719 <Text id="0690h-1">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Design"/> panel, select <AutoText text="Search"/> in <AutoText key="glidict::CDM.FormatManager.Feature"/>, and <AutoText text="VList"/> in <AutoText key="glidict::CDM.FormatManager.Part"/>. The previous changes modified <AutoText text="VList"/>, so they will apply to all <AutoText text="VList"/>s that don't have specific format statements. These next changes are made to <AutoText text="SearchVList"/> so will only apply to search results.</Text> 2729 2720 <Text id="0690i">The extracted Title for the current section is specified as <Format>[ex.Title]</Format> while the Title for the parent section is <Format>[parent:ex.Title]</Format>. Since the same <AutoText text="SearchVList"/> format statement is used when searching both whole newspapers and newspaper pages, we need to make sure it works in both cases.</Text> 2730 <Text id="0690j">Set the format statement to the following :</Text>2721 <Text id="0690j">Set the format statement to the following text (it can be copied and pasted from the file <Path>sample_files → niupepa → formats → search_tweak.txt</Path>.)</Text> 2731 2722 <Format> 2732 2723 <td valign="top">[link][icon][/link]</td><br/> … … 2740 2731 </td> 2741 2732 </Format> 2742 <Text id="1690j-1">and click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>.</Text> 2743 <Text id="0690k">(The format statement can be copied and pasted from the file <Path>sample_files → niupepa → formats → search_tweak.txt</Path>.)</Text> 2733 <Text id="1690j-1">Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>.</Text> 2744 2734 <Text id="0690l"><b>Preview</b> the search results. Items display newspaper title, Volume, Number and Date if available, and pages also display the page number.</Text> 2745 2735 </NumberedItem> … … 2755 2745 <SampleFiles folder="niupepa"/> 2756 2746 <Prerequisite id="scanned_image_collection"/> 2757 <Version initial="2.70" current="2.70 "/>2747 <Version initial="2.70" current="2.70w"/> 2758 2748 <Content> 2759 2749 <Comment> … … 2843 2833 <Text id="sc31">We can modify the document display to switch between the text version and the screenview and full size versions. We do this using a combination of format statements and macro files.</Text> 2844 2834 <NumberedItem> 2845 <Text id="sc32">First, copy the new macro file into the collection. Create a new folder <Path>Greenstone → collect → pagedimg → macros</Path>. Copy <Path>sample_files → niupepa → macros → extra.dm</Path> into this folder.</Text> 2835 <Text id="sc32">First of all we will add a macro file to the collection. In a file browser outside of Greenstone, locate the Paged Image collection in your Greenstone installation: <Path>Greenstone → collect → pagedima</Path>. Create a new folder called <Path>macros</Path> in the <Path>pagedima</Path> folder.</Text> 2836 <Text id="sc32a">Also in a file browser, locate the file <Path>sample_files → niupepa → macros → extra.dm</Path>. Copy this file and paste it into the new <Path>macros</Path> folder you just created.</Text> 2846 2837 </NumberedItem> 2847 2838 <NumberedItem> … … 2852 2843 </NumberedItem> 2853 2844 <NumberedItem> 2854 <Text id="sc33c">Select the <AutoText text="DocumentHeading"/> format item and set it to the following :</Text>2845 <Text id="sc33c">Select the <AutoText text="DocumentHeading"/> format item and set it to the following text (which can copied from <Path>sample_files → niupepa → formats → adv_doc_heading.txt</Path>).</Text> 2855 2846 <Format> 2856 2847 <div class="heading_title">{Or}{[parent(Top):ex.Title],[ex.Title]}</div><br/> … … 2863 2854 </Format> 2864 2855 <Text id="sc33c-1">Click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 2865 <Text id="sc33d">This format statement can be copied from <Path>sample_files → niupepa → formats → adv_doc_heading.txt</Path>.</Text>2866 2856 <Text id="sc33e"><Format>{Or}{[parent(Top):ex.Title],[ex.Title]}</Format> outputs the newspaper Title metadata. This is only stored at the top level of the document, so if we are at a subsection, we need to get it from the top (<Format>[parent(Top):ex.Title]</Format>). Note that we can't just use <Format>[parent:ex.Title]</Format> as this retrieves the Title from the immediate parent node, which may not be the top node of the document.</Text> 2867 2857 <Text id="sc33g"><Format>_document:viewpreview_, _document:viewfullsize_, _document:viewtext_</Format> are macros defined in <Path>extra.dm</Path> which output buttons for preview, fullsize and text versions, respectively. We choose which buttons to display based on what metadata and text the document has.</Text> … … 2870 2860 </NumberedItem> 2871 2861 <NumberedItem> 2872 <Text id="sc34a">Select the <AutoText text="DocumentText"/> format statement and set it to :</Text>2862 <Text id="sc34a">Select the <AutoText text="DocumentText"/> format statement and set it to the following text (which can be copied from <Path>sample_files → niupepa → formats → adv_doc_text.txt</Path>):</Text> 2873 2863 <Format> 2874 2864 {If}{_cgiargp_ eq 'fullsize',[srcicon],<br/> … … 2877 2867 </Format> 2878 2868 <Text id="sc34a-1">Remember to click <AutoText key="glidict::CDM.FormatManager.Replace" type="button"/>.</Text> 2879 <Text id="sc34b">This format statement c an be copied from <Path>sample_files → niupepa → formats → adv_doc_text.txt</Path>. It changes the display based on the <AutoText text="p" type="quoted"/> argument (<Format>_cgiargp_</Format>). This is not used normally for document display, so we can use it here to switch between full size image (<Format>[srcicon]</Format>), preview size image (<Format>[screenicon]</Format>) and text (<Format>[Text]</Format>) versions of each page.</Text>2869 <Text id="sc34b">This format statement changes the display based on the <AutoText text="p" type="quoted"/> argument (<Format>_cgiargp_</Format>). This is not used normally for document display, so we can use it here to switch between full size image (<Format>[srcicon]</Format>), preview size image (<Format>[screenicon]</Format>) and text (<Format>[Text]</Format>) versions of each page.</Text> 2880 2870 </NumberedItem> 2881 2871 <NumberedItem> … … 2889 2879 </Title> 2890 2880 <SampleFiles folder="oai"/> 2891 <Version initial="2.60" current="2.70 "/>2881 <Version initial="2.60" current="2.70w"/> 2892 2882 <Content> 2893 2883 <Comment> … … 2992 2982 </Title> 2993 2983 <Prerequisite id="OAI_collection"/> 2994 <Version initial="2.60" current="2.70 "/>2984 <Version initial="2.60" current="2.70w"/> 2995 2985 <Content> 2996 2986 <Comment> … … 3043 3033 <Text id="0750">Use METS as Greenstone's Internal Representation</Text> 3044 3034 </Title> 3045 <Prerequisite id="large_html_collection"/> 3046 <Version initial="2.60" current="2.70"/> 3035 <Version initial="2.60" current="2.70w"/> 3047 3036 <Content> 3048 3037 <NumberedItem> 3049 <Text id="0751">In the Greenstone Librarian Interface, open the <b>Tudor</b> collection.</Text>3038 <Text id="0751">In the Greenstone Librarian Interface, open up one of your existing collections, for example the <b>hobbits</b> collection.</Text> 3050 3039 </NumberedItem> 3051 3040 <Comment> … … 3065 3054 </NumberedItem> 3066 3055 <NumberedItem> 3067 <Text id="0759">In your Windows file browser, locate the <Path>archives</Path> folder for the Tudor collection. For each document in the collection, Greenstone has generated two files: <Path>docmets.xml</Path>, the core METS description, and <Path>doctxt.xml</Path>, a supporting file. (Note: unless you are connected to the Internet you will be unable to view <Path>doctxt.xml</Path> in your web browser, because it refers to a remote resource.) Depending on the source documents there may be additional files, such as the images used within a web page. One of METS' many features is the ability to reference information in external XML files. Greenstone uses this to tie the content of the document, which is stored in the external XML file <Path>doctxt.xml</Path>, to its hierarchical structure, which is described in the core METS file <Path>docmets.xml</Path>.</Text>3056 <Text id="0759">In your Windows file browser, locate the <Path>archives</Path> folder for the collection you are working with. For each document in the collection, Greenstone has generated two files: <Path>docmets.xml</Path>, the core METS description, and <Path>doctxt.xml</Path>, a supporting file. (Note: unless you are connected to the Internet you will be unable to view <Path>doctxt.xml</Path> in your web browser, because it refers to a remote resource.) Depending on the source documents there may be additional files, such as the images used within a web page. One of METS' many features is the ability to reference information in external XML files. Greenstone uses this to tie the content of the document, which is stored in the external XML file <Path>doctxt.xml</Path>, to its hierarchical structure, which is described in the core METS file <Path>docmets.xml</Path>.</Text> 3068 3057 </NumberedItem> 3069 3058 </Content> … … 3074 3063 </Title> 3075 3064 <SampleFiles folder="dspace"/> 3076 <Version initial="2.60" current="2.70 "/>3065 <Version initial="2.60" current="2.70w"/> 3077 3066 <Content> 3078 3067 <NumberedItem> … … 3147 3136 {If}{[numleafdocs],([numleafdocs]) [ex.Title],[dc.Title]} 3148 3137 </Format> 3149 <Text id="0784">and click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>. This will display the number of documents for each bookshelf in the authorsclassifier.</Text>3138 <Text id="0784">and click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/>. This will display the number of documents for each bookshelf in the <AutoText key="coredm::_Global:labelContributor_" type="italics"/> classifier.</Text> 3150 3139 </NumberedItem> 3151 3140 <NumberedItem> … … 3153 3142 </NumberedItem> 3154 3143 <Comment> 3155 <Text id="0787">There are still only 5 documents, but against some of the entries —for example, <AutoText text="Interview with Bob Dylan" type="quoted"/>—appears the line <AutoText text="Also available as:" type="quoted"/> followed by icons that link to the alternative representations.</Text>3144 <Text id="0787">There are still only 5 documents, but against some of the entries appears the line <AutoText text="Also available as:" type="quoted"/> followed by icons that link to the alternative representations.</Text> 3156 3145 </Comment> 3157 3146 </Content> … … 3162 3151 </Title> 3163 3152 <Prerequisite id="dspace_to_greenstone"/> 3164 <Version initial="2.60" current="2.70 "/>3153 <Version initial="2.60" current="2.70w"/> 3165 3154 <Content> 3166 3155 <Comment>
Note:
See TracChangeset
for help on using the changeset viewer.