Show
Ignore:
Timestamp:
13.12.2010 13:59:36 (9 years ago)
Author:
kjdon
Message:

tidied up the scanned_image_collection tutorial

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r23281 r23456  
    24352435<Text id="0679">In the <AutoText key="glidict::GUI.Gather"/> panel, open the <Path>sample_files &rarr; niupepa &rarr; sample_items</Path> folder and drag the two subfolders into your collection on the right-hand side. A popup window asks whether you want to add <AutoText text="PagedImagePlugin"/> to the collection: click <AutoText key="glidict::CDM.PlugInManager.QuickAdd" type="button"/>, because this plugin will be needed to process the item files.</Text> 
    24362436</NumberedItem> 
    2437 <NumberedItem> 
    2438 <Text id="0680">Some of the files you have just dragged in are the newspaper images; others are text files that contain the text extracted from these images. We want these to be processed by <AutoText text="PagedImagePlugin"/>, not <AutoText text="ImagePlugin"/> or <AutoText text="TextPlugin"/>. Switch to the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel and <i>delete <AutoText text="ImagePlugin"/> and <AutoText text="TextPlugin"/></i>.</Text> 
    2439 </NumberedItem> 
    2440 <NumberedItem> 
    2441 <Text id="0678">Open up the configuration window for <AutoText text="PagedImagePlugin"/> by double-clicking on the plugin. Switch on its <AutoText text="create_screenview"/> configuration option by checking the box. The source images we use were scanned at high resolution and are large files for a browser to download. The <AutoText text="create_screenview"/> option generates smaller screen-resolution images of each page when the collection is built. Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 
    2442 </NumberedItem> 
    2443 <NumberedItem> 
    2444 <Text id="0681">Now go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> newspapers. Note that only the <AutoText text="Te Whetu o Te Tau" type="italics"/> newspapers have text; <AutoText text="Te Waka o Te Iwi" type="italics"/> papers don't.</Text> 
     2437<Comment> 
     2438<Text id="0678"><AutoText text="PagedImagePlugin"/> will process the item files, creating a document for each one with a separate section for each page listed. Thumbnail and screen-resolution sized images of each page image will be generated.</Text> 
     2439</Comment> 
     2440<NumberedItem> 
     2441<Text id="0681">Go to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection and <b>preview</b> the result. Search for <AutoText text="waka" type="quoted"/> and view one of the titles listed (all three appear as <AutoText text="Te Whetu o Te Tau" type="italics"/>). Browse by <AutoText key="coredm::_Global:labelTitle_"/> and view one of the <AutoText text="Te Waka o Te Iwi" type="italics"/> newspapers. Note that only the <AutoText text="Te Whetu o Te Tau" type="italics"/> newspapers have text; <AutoText text="Te Waka o Te Iwi" type="italics"/> papers don't.</Text> 
    24452442</NumberedItem> 
    24462443<Comment> 
     
    24512448</Heading> 
    24522449<Comment> 
    2453 <Text id="0684">Under <AutoText key="coredm::_Global:labelTitle_"/> documents from the same series are repeated without any distinguishing features such as date, volume or number. It would be better to group them by series title and display other information within each group. This can be accomplished using an <AutoText text="AZCompactList"/> classifier rather than <AutoText text="List"/>, and tuning the classifier's format statement.</Text> 
    2454 </Comment> 
    2455 <NumberedItem> 
    2456 <Text id="0685">In the <AutoText key="glidict::GUI.Design"/> panel, under the <AutoText key="glidict::CDM.GUI.Classifiers"/> section, delete the <AutoText text="List" /> classifier for <AutoText key="metadata::ex.Source"/>. Select the classifier for <AutoText text="dc.Title;ex.Title"/> and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. Set <AutoText text="bookshelf_type"/> to <AutoText text="always"/>. This will create a bookshelf as in the <AutoText text="AZCompactList"/> classifier for duplicate items.</Text> 
    2457 </NumberedItem> 
    2458 <NumberedItem> 
    2459 <Text id="0686">Now add a <AutoText text="DateList" /> classifier, setting its <AutoText text="metadata"/> option to <AutoText key="metadata::ex.Date"/>.</Text> 
    2460 </NumberedItem> 
    2461 <NumberedItem> 
    2462 <Text id="0686a"><b>Build</b> the collection, and <b>preview</b> the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list and the <AutoText key="coredm::_Global:labelDate_" type="italics"/> list.</Text> 
     2450<Text id="0684">Under <AutoText key="coredm::_Global:labelTitle_"/>, documents from the same series are repeated without any distinguishing features such as date, volume or number. It would be better to group them by series title and display other information within each group. This can be accomplished using the <AutoText text="-bookshelf_type"/> option to the <AutoText text="List"/> classifier, and tuning the classifier's format statement.</Text> 
     2451</Comment> 
     2452<NumberedItem> 
     2453<Text id="0685">In the <AutoText key="glidict::GUI.Design"/> panel, under the <AutoText key="glidict::CDM.GUI.Classifiers"/> section, delete the <AutoText text="List" /> classifier for <AutoText key="metadata::ex.Source"/>. This classifier is not much use.</Text> 
     2454</NumberedItem> 
     2455<NumberedItem> 
     2456<Text id="0685a">Select the classifier for <AutoText text="dc.Title;ex.Title"/> and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. Set <AutoText text="bookshelf_type"/> to <AutoText text="always"/>. This will create a bookshelf for each Title in the collection. Note, setting this option to <AutoText text="duplicate_only"/> will only create a bookshelf when more than one document shares a Title. </Text> 
     2457</NumberedItem> 
     2458<NumberedItem> 
     2459<Text id="0686a"><b>Build</b> the collection, and <b>preview</b> the <AutoText key="coredm::_Global:labelTitle_" type="italics"/> list.</Text> 
    24632460</NumberedItem> 
    24642461<NumberedItem> 
     
    24762473<Text id="0687c">As a consequence of using the <AutoText text="bookshelf_type"/> option of the <AutoText text="List"/> classifier, bookshelf icons appear when titles are browsed. This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf. It works by exploiting the fact that only bookshelf icons define <Format>[numleafdocs]</Format> metadata. For document nodes, Title is not displayed. Instead, Volume, Number and Date information are displayed.</Text> 
    24772474</NumberedItem> 
    2478 <NumberedItem> 
    2479 <Text id="0691">The <AutoText key="coredm::_Global:labelDate_" type="italics"/> list groups documents by date. Greenstone's internal date format is YYYYMMDD, for example 18580601, and this is crucial for the <AutoText text="DateList" /> classifier (CL2) to correctly parse date metadata and generate an ordered date list. However, the date has been made to look nice by adding a <AutoText text="[format:]"/> macro to date metadata in the format statement.</Text> 
    2480 </NumberedItem> 
    2481 <NumberedItem> 
    2482 <Text id="0691a">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Format"/> panel, select the <AutoText text="DateList" /> classifier <i>and set <AutoText key="glidict::CDM.FormatManager.Part"/> to <AutoText text="DateList"/></i>. Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/> to add this format statement to your collection. Replace the last line</Text> 
     2475<Heading> 
     2476<Text id="0690b">Browsing documents by Date.</Text> 
     2477</Heading> 
     2478<NumberedItem> 
     2479<Text id="0686">Back in the <AutoText key="glidict::GUI.Design"/> panel, under the <AutoText key="glidict::CDM.GUI.Classifiers"/> section, add a <AutoText text="DateList" /> classifier, leaving its <AutoText text="metadata"/> option set to <AutoText key="metadata::ex.Date"/>.</Text> 
     2480</NumberedItem> 
     2481<NumberedItem> 
     2482<Text id="0686b"><b>Build</b> the collection, and <b>preview</b> the <AutoText key="coredm::_Global:labelDate_" type="italics"/> list.</Text> 
     2483</NumberedItem> 
     2484<NumberedItem> 
     2485<Text id="0691">The <AutoText key="coredm::_Global:labelDate_" type="italics"/> list groups documents by date. Greenstone's internal date format is YYYYMMDD, for example 18580601, and this is crucial for the <AutoText text="DateList" /> classifier to correctly parse date metadata and generate an ordered date list. However, the date has been made to look nice by adding a <AutoText text="[format:]"/> macro to Date metadata in the format statement.</Text> 
     2486</NumberedItem> 
     2487<NumberedItem> 
     2488<Text id="0691a">In the <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Format"/> panel, select <AutoText key="glidict::CDM.FormatManager.AllFeatures"/>  in the <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and <AutoText text="DateList" /> in the <AutoText key="glidict::CDM.FormatManager.Part"/> list. Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/> to add this format statement to your collection. Replace the last line</Text> 
    24832489<Format> 
    24842490&lt;td&gt;{Or}{[format:dc.Date],[format:exp.Date],[format:ex.Date]}&lt;/td&gt; 
     
    25162522<Text id="0698b">with</Text> 
    25172523<Format> 
    2518 {If}{[NoText] ne '1',&lt;td valign=top&gt;[Text]&lt;/td&gt;} 
     2524{If}{[NoText],,&lt;td valign=top&gt;[Text]&lt;/td&gt;} 
    25192525</Format> 
    25202526</NumberedItem> 
     
    25322538</NumberedItem> 
    25332539<NumberedItem> 
     2540<Text id="0690d-1">Set the display text used for the level drop-down menu by going to the <AutoText key="glidict::CDM.GUI.SearchMetadata"/> section on the <AutoText key="glidict::GUI.Format"/> panel. Set the document level text to "newspaper", and the section level text to "page".</Text> 
     2541</NumberedItem> 
     2542<NumberedItem> 
    25342543<Text id="0690e"><b>Build</b> and <b>preview</b> the collection.</Text> 
    2535 </NumberedItem> 
    2536 <NumberedItem> 
    2537 <Text id="0690d-1">Set the display text used for the level drop-down menu by going to the <AutoText key="glidict::CDM.GUI.SearchMetadata"/> section on the <AutoText key="glidict::GUI.Format"/> panel. Set the document level text to "newspaper", and the section level text to "page".</Text> 
    2538 <Text id="0690d-2">Refresh in your web browser. Compare searching at "newspaper" level with searching at "page" level. A useful search term for this collection is <AutoText text="aroha" type="quoted"/>.</Text> 
    2539 <Text id="0690f">You will notice that when searching for individual pages, a thumbnail of the newspaper image is displayed in the search results.</Text> 
    2540 </NumberedItem> 
    2541 <NumberedItem> 
    2542 <Text id="0690f-2">Let's remove the filename from the display. Go to <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Format"/> panel in the Librarian Interface, choose <AutoText key="glidict::CDM.FormatManager.AllFeatures"/>  in <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and select the <AutoText text="VList"/> format statement from the list of assigned format statements. Remove the following from the last line of the format string:</Text> 
     2544<Text id="0690d-2">Compare searching at "newspaper" level with searching at "page" level. A useful search term for this collection is <AutoText text="aroha" type="quoted"/>.</Text> 
     2545</NumberedItem> 
     2546<Heading> 
     2547<Text id="0690-tidy">Tidying up search results</Text> 
     2548</Heading> 
     2549<Comment> 
     2550<Text id="0690f">You will notice that when searching for individual pages, a thumbnail of the newspaper image is displayed in the search results. For text pages like this, these are not very useful. Lets tell <AutoText text="PagedImagePlugin"/> not to generate thumbnails.</Text> 
     2551</Comment> 
     2552<NumberedItem> 
     2553<Text id="0690f-1">In the <AutoText key="glidict::GUI.Design"/> panel, under the <AutoText key="glidict::CDM.GUI.Plugins"/> section, select <AutoText text="PagedImagePlugin"/> from the <AutoText key="glidict::CDM.PlugInManager.Assigned"/> list and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. Switch on the <AutoText text="create_thumbnail"/> option and set its value to <AutoText text="false"/>.</Text> 
     2554</NumberedItem> 
     2555<NumberedItem> 
     2556<Text id="0690e"><b>Rebuild</b> and <b>preview</b> the collection, doing a search at page level.</Text> 
     2557</NumberedItem> 
     2558<Comment> 
     2559<Text id="0690e-1">Search results at newspaper level display the original filename. Lets remove that also.</Text> 
     2560</Comment> 
     2561<NumberedItem> 
     2562<Text id="0690f-2">Go to <AutoText key="glidict::CDM.GUI.Formats"/> section of the <AutoText key="glidict::GUI.Format"/> panel in the Librarian Interface, choose <AutoText key="glidict::CDM.FormatManager.AllFeatures"/>  in <AutoText key="glidict::CDM.FormatManager.Feature"/> list, and select the <AutoText text="VList"/> format statement from the list of assigned format statements. Remove the following from the last line of the format string:</Text> 
    25432563<Format> 
    25442564{If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;} 
     
    25462566<Text id="0690g"><b>Preview</b> the collection. </Text> 
    25472567</NumberedItem> 
    2548 <NumberedItem> 
    2549 <Text id="0690h">Now you will notice that page level search results only show the Title of the page (the page number), and not the Title of the newspaper. We'll modify the format statement to show the newspaper title as well as the page number. Also, lets add in Volume and Number information too. </Text> 
     2568<Comment> 
     2569<Text id="0690h">You might notice that newspaper level search results only display the newspaper Title, and not any volume information, while page level search results only show the Title of the page (the page number), and not the Title of the newspaper. We'll modify the format statement to show Volume and Number information, and for page results, the newspaper title as well as the page number.</Text> 
     2570</Comment> 
     2571<NumberedItem> 
    25502572<Text id="0690h-1">In the <AutoText key="glidict::CDM.GUI.Formats"/> section, select <AutoText text="Search"/> in <AutoText key="glidict::CDM.FormatManager.Feature"/>, and <AutoText text="VList"/> in <AutoText key="glidict::CDM.FormatManager.Part"/>. Click <AutoText key="glidict::CDM.FormatManager.Add" type="button"/> to add this format to the collection. The previous changes modified <AutoText text="VList"/>, so they will apply to all <AutoText text="VList"/>s that don't have specific format statements. These next changes are made to <AutoText text="SearchVList"/> so will only apply to search results. </Text> 
    25512573<Text id="0690i">The extracted Title for the current section is specified as <Format>[ex.Title]</Format> while the Title for the parent section is <Format>[parent:ex.Title]</Format>. Since the same <AutoText text="SearchVList"/> format statement is used when searching both whole newspapers and newspaper pages, we need to make sure it works in both cases.</Text> 
     
    25562578{If}{[parent:ex.Title],[parent:ex.Title] Volume [parent:ex.Volume] Number [parent:ex.Number]: Page [ex.Title],<br/> 
    25572579[ex.Title] Volume [ex.Volume] Number [ex.Number]}<br/> 
    2558 &lt;br/&gt;&lt;i&gt;({Or}{[parent:ex.Date],[ex.Date],undated})&lt;/i&gt;&lt;/td&gt;<br/> 
     2580&lt;br/&gt;&lt;i&gt;({Or}{[format:parent:ex.Date],[format:ex.Date],undated})&lt;/i&gt;&lt;/td&gt;<br/> 
    25592581&lt;/td&gt; 
    25602582</Format> 
     
    32043226<NumberedItem> 
    32053227<Text id="mgpp-3">Appending <b>#u</b> to a query term will explicitly set the current search to <AutoText key="coredm::_preferences:textnostem_"/>. </Text> 
    3206 <Text id="mgpp-4">Note that using hotkeys will only affect that query term. That is, hotkeys are used per term. For example, if a query expresssion contains more than one term, some terms can have hotkeys and others not, and the hotkeys can be different for different terms. This provides a fine-grained control of the query, whereas changing settings in the <AutoText key="coredm::_Global:linktextPREFERENCES_"/> page will affect the query as a whole.</Text> 
     3228<Text id="mgpp-4">Note that using hotkeys will only affect that query term. That is, hotkeys are used per term. For example, if a query expression contains more than one term, some terms can have hotkeys and others not, and the hotkeys can be different for different terms. This provides a fine-grained control of the query, whereas changing settings in the <AutoText key="coredm::_Global:linktextPREFERENCES_"/> page will affect the query as a whole.</Text> 
    32073229</NumberedItem> 
    32083230<NumberedItem>