Changeset 25799 for documentation

Show
Ignore:
Timestamp:
19.06.2012 22:03:07 (7 years ago)
Author:
ak19
Message:

Tutorial updates for the final part of the Formatting the Word and PDF Collection, and most of the Enhanced PDF handling tutorial. Only the parts of the tutorials that apply to Linux machines. Still need to try out the Windows-specific parts of the tutorial.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r25794 r25799  
    986986<Text id="fw-17a">Displaying multi-valued metadata</Text> 
    987987</Heading> 
     988<MajorVersion number="2"> 
    988989<NumberedItem> 
    989990<Text id="fw-18">Next we modify the document entries in the Creator classifier to display all authors. Back in <AutoText key="glidict::CDM.GUI.Formats"/>, select the <AutoText text="CL2VList"/> format in the list of assigned formats. After <Format>{If}{[ex.Source],&lt;br&gt;</Format> in the format statement, add <Format>[sibling:dc.Creator]</Format>.</Text> 
     
    10121013</Format> 
    10131014</NumberedItem> 
     1015</MajorVersion> 
     1016<MajorVersion number="3"> 
     1017<NumberedItem> 
     1018<Text id="fw-18-3">Next we modify the document entries in the Creator classifier to display all authors. Back in <AutoText key="glidict::CDM.GUI.Formats"/>, select the <AutoText key="coredm::_Global:labelBrowse_"/> format in the list of assigned formats. Edit the format statement for <b>documentNode</b> after the part where it displays the Title metadata, so that it now additionally contains the new line highlighted below. This will display the dc.Creator metadata.</Text> 
     1019<Format> 
     1020    &lt;td valign=&quot;top&quot;&gt;<br /> 
     1021      <Tab n="1"/>&lt;gsf:link type=&quot;document&quot;&gt;<br /> 
     1022        <Tab n="2"/>&lt;gsf:choose-metadata&gt;<br /> 
     1023          <Tab n="3"/>&lt;gsf:metadata name=&quot;dc.Title&quot;/&gt;<br /> 
     1024          <Tab n="3"/>&lt;gsf:metadata name=&quot;ex.dc.Title&quot;/&gt;<br /> 
     1025          <Tab n="3"/>&lt;gsf:metadata name=&quot;Title&quot;/&gt;<br /> 
     1026          <Tab n="3"/>&lt;gsf:default&gt;Untitled&lt;/gsf:default&gt;<br /> 
     1027        <Tab n="2"/>&lt;/gsf:choose-metadata&gt;<br /> 
     1028        <Tab n="2"/>&lt;gsf:switch&gt;<br /> 
     1029          <Tab n="3"/>&lt;gsf:metadata name=&quot;Source&quot;/&gt;<br /> 
     1030          <Tab n="3"/>&lt;gsf:when test=&quot;exists&quot;&gt;<br /> 
     1031            <Tab n="4"/>&lt;br/&gt;<br /> 
     1032            <Tab n="4"/>&lt;i&gt;(&lt;gsf:metadata name=&quot;Source&quot;/&gt;)&lt;/i&gt;<br /> 
     1033          <Tab n="3"/>&lt;/gsf:when&gt;<br /> 
     1034        <Tab n="2"/>&lt;/gsf:switch&gt;<br /> 
     1035      <Tab n="1"/>&lt;/gsf:link&gt;<br /> 
     1036      <Tab n="1"/>&lt;br/&gt;<br /> 
     1037      <Tab n="1"/><highlight>&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot; /&gt;</highlight><br /> 
     1038    &lt;/td&gt;<br /> 
     1039</Format> 
     1040<Text id="fw-21">The format statement as it is above will now display the Greenstone link, the link to the original, then the Title as before. Since it's defined for <b>documentNode</b>s, it will display all the Authors (Creators), and the source document for documents. The additional line <Format>&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot;/&gt;</Format> displays all the Creator metadata for the document, separated by a comma (<AutoText text=", " type="quoted"/>), while <Format>&lt;gsf:metadata name=&quot;dc.Creator&quot; /&gt;</Format> displays only the first author. Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list and make sure that all authors are displayed for documents.</Text>   
     1041</NumberedItem> 
     1042<NumberedItem> 
     1043<Text id="fw-22">You can change the separator between the authors. Modify the format statement, and replace <Format>&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot;/&gt;</Format> with <Format>&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot; separator=&quot;&amp;lt;br/&amp;gt;&quot; /&gt;</Format>. This will add a new line after each author (<Format>&amp;lt;br/&amp;gt;</Format> is the escaped version of <Format>&lt;br/&gt;</Format> and specifies a line break in HTML and XML). Preview the <AutoText key="coredm::_Global:labelCreator_" type="italics"/> list.</Text> 
     1044<Text id="fw-23">If you have done exercise <TutorialRef id="enhanced_word"/>, the collection will have both dc.Creator and ex.Creator metadata. To display both, you can use </Text> 
     1045<Format> 
     1046&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot;/&gt; <br /> 
     1047&lt;gsf:metadata name=&quot;ex.Creator&quot; multiple=&quot;true&quot;/&gt;<br /> 
     1048</Format> 
     1049<Text id="fw-23a">To display dc.Creator if it is present, otherwise display ex.Creator, use</Text> 
     1050<Format> 
     1051&lt;gsf:choose-metadata&gt;<br /> 
     1052  <Tab n="1"/>&lt;gsf:metadata name=&quot;dc.Creator&quot; multiple=&quot;true&quot;/&gt;<br /> 
     1053  <Tab n="1"/>&lt;gsf:metadata name=&quot;ex.Creator&quot; multiple=&quot;true&quot;/&gt;<br />   
     1054&lt;/gsf:choose-metadata&gt;<br /> 
     1055</Format> 
     1056</NumberedItem> 
     1057</MajorVersion> 
    10141058<Heading> 
    10151059<Text id="0321d-1">Advanced multi-valued metadata</Text> 
     
    10731117<Text id="ep-6">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, configure <AutoText text="PDFPlugin"/>. Switch on the <AutoText text="use_sections"/> option. </Text> 
    10741118<Text id="ep-6a">In the <AutoText key="glidict::CDM.GUI.Indexes"/> section, check the <AutoText key="glidict::CDM.LevelManager.Section"/> checkbox to build the indexes on section level as well as document level.</Text> 
    1075 <Text id="ep-7"><b>Build</b> and <b>preview</b> the collection. View the text versions of some of the PDF documents. Note that these are now split into a series of pages, and a "go to page" box is provided. The format is still a bit ugly though, and pdf05-notext.pdf is still not processed.</Text> 
     1119<Text id="ep-7"><b>Build</b> and <b>preview</b> the collection. View the text versions of some of the PDF documents. <MajorVersion number="2">Note that these are now split into a series of pages, and a "go to page" box is provided.  
     1120</MajorVersion> 
     1121<MajorVersion number="3">Note that these are now split into a series of pages, and two means of jumping between various pages is provided: on the left, individual pages are listed vertically by page number and clicking the "plus" box next to a page will expand its contents, while on the right there's a box with a horizontal scroller which can be used to scroll to the page you wish to view. 
     1122</MajorVersion> 
     1123The format is still a bit ugly though, and pdf05-notext.pdf is still not processed.</Text> 
    10761124</NumberedItem> 
    10771125<Heading> 
     
    10841132<Text id="ep-13">In the <AutoText key="glidict::CDM.GUI.Plugins"/> section, configure <AutoText text="PDFPlugin"/>. Set the <AutoText text="convert_to"/> option to one of the image types, e.g. <AutoText text="pagedimg_jpg"/>. Switch off the <AutoText text="use_sections"/> option, as it is not used with image conversion. </Text> 
    10851133</NumberedItem> 
    1086 <NumberedItem> 
    1087 <Text id="ep-14"><b>Build</b> the collection and <b>preview</b>. All PDF documents (including pdf05-notext.pdf) have been processed and divided into sections, but each section displays <AutoText key="perlmodules::BasePlugin.dummy_text" type="quoted"/>. For the conversion to images for PDF documents, no text is extracted. </Text> 
     1134<MajorVersion number="3"> 
     1135<NumberedItem> 
     1136<Text id="ep-14-3"><b>Build</b> the collection and <b>preview</b>.  
     1137All PDF documents (including pdf05-notext.pdf) have been processed and divided into sections. 
     1138Images from the document are now displayed instead of the extracted text. Both <Path>pdf05-notext.pdf</Path> and <Path>pdf06-weirdchars.pdf</Path> display nicely now.</Text> 
     1139</NumberedItem> 
     1140</MajorVersion> 
     1141<MajorVersion number="2"> 
     1142<NumberedItem> 
     1143<Text id="ep-14">All PDF documents (including pdf05-notext.pdf) have been processed and divided into sections, but each section displays <AutoText key="perlmodules::BasePlugin.dummy_text" type="quoted"/>. For the conversion to images for PDF documents, no text is extracted.</Text> 
    10881144</NumberedItem> 
    10891145<NumberedItem> 
     
    11071163</Comment> 
    11081164</NumberedItem> 
     1165</MajorVersion> 
    11091166<Heading> 
    11101167<Text id="ep-19">Using <AutoText text="process_exp"/> to control document processing (advanced)</Text> 
     
    11351192<Text id="ep-28">Note that all plugins have the <AutoText text="process_exp"/> option, and this can be used to customize which documents are processed by which plugin.</Text> 
    11361193</NumberedItem> 
     1194<MajorVersion number="2"> 
    11371195<NumberedItem> 
    11381196<Text id="ep-30">Edit the <AutoText text="DocumentText"/> format statement. PDF files processed as HTML will not have images to display, so we need to make sure they get text displayed instead. Change <Format>[srcicon]</Format> to <Format>{If}{[NoText] eq "1",[srcicon],[Text]}</Format>.</Text> 
    11391197</NumberedItem> 
     1198</MajorVersion> 
    11401199<NumberedItem> 
    11411200<Text id="ep-33">Build and preview the collection. All PDF documents should look relatively nice. Try searching this collection. You will be able to search for the PDFs that were converted to HTML (try e.g. <AutoText text="bibliography" type="quoted"/>), but not the ones that were converted to images (try searching for <AutoText text="FAO" type="quoted"/> or <AutoText text="METS" type="quoted"/>).</Text>