Changeset 37338


Ignore:
Timestamp:
2023-02-22T10:48:54+13:00 (15 months ago)
Author:
anupama
Message:

Further updates to the GS3 tutorials

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r36969 r37338  
    933933</Heading>
    934934<NumberedItem>
    935 <Text id="0304">In the Librarian Interface, look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlugin"/>, <AutoText text="TextPlugin"/>, <AutoText text="HTMLPlugin"/>, <AutoText text="EmailPlugin"/>, <AutoText text="PowerPointPlugin"/>, <AutoText text="ExcelPlugin"/>, <AutoText text="ImagePlugin"/>, <AutoText text="ISISPlug"/> and <AutoText text="NULPlugin"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GreenstoneXMLPlugin"/> is required for any type of source collection and should not be removed.</Text>
     935<Text id="0304">In the Librarian Interface, look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlugin"/>, <AutoText text="TextPlugin"/>, <AutoText text="HTMLPlugin"/>, <AutoText text="EmailPlugin"/>, <AutoText text="PowerPointPlugin"/>, <AutoText text="ExcelPlugin"/>, <AutoText text="ImagePlugin"/>, <AutoText text="ISISPlug"/>, <AutoText text="NULPlugin"/> and <AutoText text="OAIPlugin"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GreenstoneXMLPlugin"/> is required for any type of source collection and should not be removed.</Text>
    936936</NumberedItem>
    937937<Heading>
     
    10901090<NumberedItem>
    10911091<Text id="fw-11-3">For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default <AutoText text="Browse" /> format statement links to both versions, but the format statement for <AutoText text="Search"/> links only to the converted version of the original file. That is, this format statement:</Text>
    1092 <Format>&lt;gsf:link type=&quot;document&quot;&gt;<br/>
    1093         <Tab n="1"/>&lt;gsf:icon type=&quot;document&quot;/&gt;<br/>
    1094       &lt;/gsf:link&gt;</Format>
     1092<Format>&lt;td&gt;<br/>
     1093  <Tab n="1"/>&lt;gsf:link type=&quot;document&quot;&gt;<br/>
     1094    <Tab n="2"/>&lt;gsf:icon type=&quot;document&quot;/&gt;<br/>
     1095  <Tab n="1"/>&lt;/gsf:link&gt;<br/>
     1096&lt;/td&gt;<br/></Format>
    10951097<Text id="fw-12-3">links to the Greenstone HTML version, while</Text>
    1096 <Format>&lt;gsf:link type=&quot;source&quot;&gt;<br/>   
    1097           <Tab n="1"/>&lt;gsf:metadata name=&quot;srcicon&quot;/&gt;<br/>
    1098       &lt;/gsf:link&gt;<br/>
    1099 </Format>
     1098<Format>&lt;td&gt;<br/>
     1099  <Tab n="1"/>&lt;gsf:link type=&quot;source&quot;&gt;<br/>   
     1100          <Tab n="2"/>&lt;gsf:metadata name=&quot;srcicon&quot;/&gt;<br/>
     1101      <Tab n="1"/>&lt;/gsf:link&gt;<br/>
     1102&lt;/td&gt;<br/></Format>
    11001103<Text id="fw-12a-3">links to the original.</Text>
    11011104<Text id="fw-13-3">Choose <AutoText text="Search"/> in <AutoText key="glidict::CDM.GUI.Formats"/>. Experiment with removing and restoring either of the two links from the format statement, previewing the effect of each change.</Text>
     
    16821685<MajorVersion number="3">
    16831686<NumberedItem>
    1684 <Text id="fw-24a-3">Next we'll customize the <AutoText text="search"/> format statement to highlight the query terms in a PDF file when it is opened from the search result list. This requires Acrobat Reader 7.0 version or higher, and currently only works on a Microsoft Windows platform.</Text>
     1687<Text id="fw-24a-3">Next we'll customize the <AutoText text="search"/> format statement to highlight the query terms in a PDF file when it is opened from the search result list. This requires Acrobat Reader 7.0 version or higher, and currently only works on a Microsoft Windows platform and Linux systems.</Text>
    16851688</NumberedItem>
    16861689<NumberedItem>
     
    17011704  &lt;&#47;td&gt;<br/>
    17021705  <br />
     1706  <highlight>
    17031707  &lt;td valign=&quot;top&quot;&gt;<br/>
    1704   <highlight>
    17051708  &lt;gsf:switch&gt;<br/>
    17061709    <Tab n="1"/>&lt;gsf:metadata name=&quot;FileFormat&quot;/&gt;<br/>
     
    17211724        <Tab n="2"/>&lt;/gsf:link&gt;<br/>
    17221725    <Tab n="1"/>&lt;/gsf:otherwise&gt;<br/>
    1723   &lt;/gsf:switch&gt;</highlight><br/> 
    1724   &lt;&#47;td&gt;<br/>
     1726  &lt;/gsf:switch&gt;<br/> 
     1727  &lt;&#47;td&gt;</highlight><br/>
    17251728  <br />
    17261729&lt;td valign=&quot;top&quot;&gt;<br/>
     
    20292032<Text id="assoc-files-24">Note: When Greenstone encounters a file that matches the provided <Format>associate_ext</Format> value (<Format>pdf</Format> in our case), it sets the metadata value <AutoText text="ex.equivDocIcon"/> for that document to be the macro <i>_iconXXX_</i>, where <i>XXX</i> is whatever the filename extension is (so <AutoText text="_iconpdf_" type="italics"/> in our case). As long as there is an existing macro defined for that combination of the word <i>icon</i> and the filename extension, then a suitable icon will be displayed when the document appears in a VList. For <i>pdf</i> the displayed icon will be <img src="../tutorial_files/ipdf.gif"/>.</Text>
    20302033</NumberedItem>
     2034<MajorVersion number="3">
     2035<NumberedItem>
     2036<Text id="assoc-files-25a">Go to Format Features &rarr; search and you will see:</Text>
     2037<Format>
     2038  &lt;gsf:template match="documentNode"&gt;<br/>
     2039    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     2040      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     2041        <Tab n="3"/>&lt;Tab n="3"/&gt;&lt;gsf:icon type="document"/&gt;<br/>
     2042      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     2043    <Tab n="1"/>&lt;/td&gt;<br/>
     2044    <Tab n="1"/>&lt;td&gt;<br/>
     2045      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     2046        <Tab n="3"/>&lt;xsl:call-template name="choose-title"/&gt;<br/>
     2047      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     2048    <Tab n="1"/>&lt;/td&gt;<br/>
     2049  &lt;/gsf:template&gt;<br/>
     2050</Format>
     2051<Text id="assoc-files-25b">The above will only display search results where there is a link to the Greenstone generated HTML version of the original source document, followed by the title of the document.</Text>
     2052<Text id="assoc-files-25c">Change the above to:</Text>
     2053<Format>
     2054  &lt;gsf:template match="documentNode"&gt;<br/>
     2055    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     2056      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     2057        <Tab n="3"/>&lt;Tab n="3"/&gt;&lt;gsf:icon type="document"/&gt;<br/>
     2058      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     2059    <Tab n="1"/>&lt;/td&gt;<br/>
     2060    <br/>
     2061    <highlight>
     2062    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     2063       <Tab n="2"/>&lt;gsf:link type="source"&gt;<br/>
     2064         <Tab n="3"/>&lt;gsf:choose-metadata&gt;<br/>
     2065           <Tab n="4"/>&lt;gsf:metadata name="thumbicon"/&gt;<br/>
     2066           <Tab n="4"/>&lt;gsf:metadata name="srcicon"/&gt;<br/>
     2067         <Tab n="3"/>&lt;/gsf:choose-metadata&gt;<br/>
     2068       <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     2069    <Tab n="1"/>&lt;/td&gt;<br/>
     2070    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     2071       <Tab n="2"/>&lt;gsf:metadata name="equivDocLink"/&gt;<br/>
     2072       <Tab n="2"/>&lt;gsf:metadata name="equivDocIcon"/&gt;<br/>
     2073       <Tab n="2"/>&lt;gsf:metadata name="/equivDocLink"/&gt;<br/>
     2074    <Tab n="1"/>&lt;/td&gt;<br/>
     2075    </highlight>
     2076    <br/>
     2077    <Tab n="1"/>&lt;td&gt;<br/>
     2078      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     2079        <Tab n="3"/>&lt;xsl:call-template name="choose-title"/&gt;<br/>
     2080      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     2081    <Tab n="1"/>&lt;/td&gt;<br/>
     2082  &lt;/gsf:template&gt;<br/>
     2083</Format>
     2084<Text id="assoc-files-25d">Now, following the link to Greenstone's HTML document, there is a link to the source document (the doc file) and a link to its equivalent doc (the equivalent PDF file in our example).</Text>
     2085</NumberedItem>
     2086</MajorVersion>
    20312087</Content>
    20322088</Tutorial>
     
    20952151<Text id="0393b">By default, <AutoText text="HTMLPlugin"/> only looks for Title metadata. Configure the plugin so that it looks for the other metadata too. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Select the <AutoText text="plugin HTMLPlugin"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Switch on the <AutoText text="metadata_fields"/> option, and set the value to</Text>
    20962152<Format>
    2097 Title,Author,Page_topic,Content
    2098 </Format>
     2153Title,Author,Page_topic,Content</Format>
    20992154<Text id="0393b-1">Click <AutoText key="glidict::General.OK" type="button"/>.</Text>
    21002155</NumberedItem>
     
    21562211<Text id="0444">Now switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Choose the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelSubject_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Subject.buttonname" /></MajorVersion> link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.</Text>
    21572212</NumberedItem>
     2213<!--
    21582214<Heading>
    21592215<Text id="0457">Adding a hierarchical phrase browser (PHIND)</Text>
     
    21742230</Comment>
    21752231</NumberedItem>
     2232-->
    21762233<Heading>
    21772234<Text id="0446">Partitioning the full-text index based on metadata values</Text>
     
    40284085<Text id="0690a"><MajorVersion number="2">Refresh in the web browser to view</MajorVersion><MajorVersion number="3"><b>Preview</b></MajorVersion> the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelTitle_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Title.buttonname" /></MajorVersion> list.</Text>
    40294086<Text id="0687c">As a consequence of using the <AutoText text="bookshelf_type"/> option of the <AutoText text="List"/> classifier, bookshelf icons appear when titles are browsed. This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf<MajorVersion number="3"> for classifier nodes</MajorVersion>. <MajorVersion number="2">It works by exploiting the fact that only bookshelf icons define <Format>[numleafdocs]</Format> metadata.</MajorVersion> For document nodes, Title is not displayed. Instead, Volume, Number and Date information are displayed.</Text>
    4030 <Text id="0687d"><MajorVersion number="2">You may notice that the <AutoText text="Titles"/> browser shows the volume numbers in inverse order. To correct this, in</MajorVersion><MajorVersion number="3">In</MajorVersion> the <AutoText key="glidict::GUI.Design"/> Pane, under <AutoText key="glidict::CDM.GUI.Classifiers"/>, configure the <AutoText text="titles" /> <AutoText text="List" /> classifier. Tick <AutoText text="sort_leaf_nodes_using"/> and set the metadata to <Format>ex.Volume|ex.Number</Format>. Rebuilding now will ensure the <i>ex.Volume</i> Number of each newspaper are listed in numeric order. This has the effect of also sorting the <i>ex.Number</i> value for each <i>ex.Volume</i>.</Text>
     4087<Text id="0687d"><MajorVersion number="2">You may notice that the <AutoText text="Titles"/> browser shows the volume numbers in inverse order. To correct this, in</MajorVersion><MajorVersion number="3">In</MajorVersion> the <AutoText key="glidict::GUI.Design"/> Pane, under <AutoText key="glidict::CDM.GUI.Classifiers"/>, configure the <AutoText text="titles"/>&nbsp;<AutoText text="List" /> classifier. Tick <AutoText text="sort_leaf_nodes_using"/> and set the metadata to <Format>ex.Volume|ex.Number</Format>. Rebuilding now will ensure the <i>ex.Volume</i> Number of each newspaper are listed in numeric order. This has the effect of also sorting the <i>ex.Number</i> value for each <i>ex.Volume</i>.</Text>
    40314088</NumberedItem>
    40324089<Heading>
     
    49094966<Text id="ucp-22">Open a DOS prompt on Windows or a terminal on Mac/Linux and experiment to see what it takes to convert your Greenstone installation's <Format>web/sites/localsite/collect/DjVuColl/superhero.djvu</Format> file.</Text>
    49104967<Text id="ucp-22a">You may have to invoke <Format>djvutxt</Format> using its full filepath, in which case on Windows the command would look like:</Text>
    4911 <Format>C:\PATH\TO\YOUR\djvutxt C:\PATH\TO\YOUR\GS\web\sites\localsite\collect\DjVuColl\superhero.djvu C:\PATH\TO\YOUR\GS\superhero.txt</Format>
     4968<Format>C:\PATH\TO\YOUR\djvutxt C:\PATH\TO\YOUR\GS\web\sites\localsite\collect\DjVuColl\import\superhero.djvu C:\PATH\TO\YOUR\GS\superhero.txt</Format>
    49124969<Text id="ucp-22b">while on Unix systems the command would look like:</Text>
    4913 <Format>/PATH/TO/YOUR/djvutxt /PATH/TO/YOUR/GS/web/sites/localsite/collect/DjVuColl/superhero.djvu /PATH/TO/YOUR/GS/superhero.txt</Format>
     4970<Format>/PATH/TO/YOUR/djvutxt /PATH/TO/YOUR/GS/web/sites/localsite/collect/DjVuColl/import/superhero.djvu /PATH/TO/YOUR/GS/superhero.txt</Format>
     4971<Text id="ucp-22c">If you compiled up djvulibre from source, djvutxt will be in <Format>/PATH/TO/YOUR/djvulibre/bin/djvutxt</Format>.</Text>
    49144972<Text id="ucp-23">Once you have the command working, inspect the output file. You should see mostly legible text in it. Only when you've been able to successfully complete this step should you proceed to the next steps.</Text>
    49154973</NumberedItem>
     
    49384996</NumberedItem>
    49394997<NumberedItem><Text id="ucp-39">Greenstone doesn't have an icon for DjVu documents, since it doesn't know about the format. If you Google for the djvu icon, you'd probably find the <Link url="https://en.wikipedia.org/wiki/DjVu">Wikipedia page for it</Link>.</Text>
    4940 <Text id="ucp-40">Save one of their DjVu icon images. Then open the image in Windows Paint or GIMP or another image editor, and use the application's scaling feature to scale the image's height or the width (whichever is greater) to anywhere between 26 and 32 pixels. Save the scaled image as a GIF file with the name "<Format>idjvu.gif</Format>", storing it in your Greenstone installation's <Format>web/interfaces/default/images</Format> folder.</Text>
     4998<Text id="ucp-40">Save one of their DjVu icon images. Then open the image in Windows Paint or GIMP or another image editor, and use the application's scaling feature to scale the image's height or the width (whichever is greater) to anywhere between 26 and 32 pixels. Save the scaled image as a GIF file with the name "<Format>idjvu.gif</Format>", storing it in your Greenstone installation's <Format>web/interfaces/default/images</Format> folder. You can also use free online image resizing websites to carry out this step.</Text>
    49414999</NumberedItem>
    49425000<NumberedItem><Text id="ucp-41">Greenstone knows nothing about the <Format>icondjvu</Format> macro we defined as the value for UnknownConverterPlugin's <Format>srcicon</Format> field, so we have to teach Greenstone about this new macro. Use a text editor to open your Greenstone 3's <Format>web/sites/localsite/siteConfig.xml</Format> file.</Text>
     
    49485006<Text id="ucp-45">The above has now associated the icon image we want appearing for the djvu document with the macro we defined for the srcicon field in UnknownConverterPlugin's configuration.</Text>
    49495007</NumberedItem>
    4950 <NumberedItem><Text id="ucp-45">Restart GLI, which will restart the Greenstone server, reloading the <Format>siteConfig.xml</Format> you have just edited. Rebuild the DjVu Collection again and preview it. This time, when you browse and search the collection, you should see the djvu icon appearing in place of the unknown icon for your DjVu document.</Text>
    4951 </NumberedItem>
    4952 <NumberedItem><Text id="ucp-45">Having designed your collection to handle DjVu documents, you can now add any other documents, including more DjVu documents. Greenstone should now be able to index the text content of DjVu documents in the collection to make them searchable, in all instances where text can be successfully extracted from them by <Format>djvutxt</Format>.</Text>
     5008<NumberedItem><Text id="ucp-46">Restart GLI, which will restart the Greenstone server, reloading the <Format>siteConfig.xml</Format> you have just edited. Rebuild the DjVu Collection again and preview it. This time, when you browse the collection, you should see the djvu icon appearing in place of the unknown icon for your DjVu document.</Text>
     5009</NumberedItem>
     5010<NumberedItem><Text id="ucp-47">Having designed your collection to handle DjVu documents, you can now add any other documents, including more DjVu documents. Greenstone should now be able to index the text content of DjVu documents in the collection to make them searchable, in all instances where text can be successfully extracted from them by <Format>djvutxt</Format>.</Text>
     5011<Text id="ucp-47a">Make the search format statement look like below, then try searching:</Text>
     5012<Format>
     5013  &lt;gsf:template match="documentNode"&gt;<br/>
     5014    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     5015      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     5016        <Tab n="3"/>&lt;gsf:icon type="document"/&gt;<br/>
     5017      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     5018    <Tab n="1"/>&lt;/td&gt;<br/>
     5019    <Tab n="1"/>&lt;td valign="top"&gt;<br/>
     5020      <Tab n="2"/>&lt;gsf:link type="source"&gt;<br/>
     5021        <Tab n="3"/>&lt;gsf:choose-metadata&gt;<br/>
     5022          <Tab n="4"/>&lt;gsf:metadata name="thumbicon"/&gt;<br/>
     5023          <Tab n="4"/>&lt;gsf:metadata name="srcicon"/&gt;<br/>
     5024        <Tab n="3"/>&lt;/gsf:choose-metadata&gt;<br/>
     5025      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     5026    <Tab n="1"/>&lt;/td&gt;<br/>
     5027    <Tab n="1"/>&lt;td&gt;<br/>
     5028      <Tab n="2"/>&lt;gsf:link type="document"&gt;<br/>
     5029        <Tab n="3"/>&lt;xsl:call-template name="choose-title"/&gt;<br/>
     5030      <Tab n="2"/>&lt;/gsf:link&gt;<br/>
     5031      <Tab n="2"/>&lt;gsf:switch&gt;<br/>
     5032        <Tab n="3"/>&lt;gsf:metadata name="equivDocLink"/&gt;<br/>
     5033        <Tab n="3"/>&lt;gsf:when test="exists"&gt;<br/>
     5034        <Tab n="4"/>Also available as: &lt;gsf:metadata name="equivDocLink"/&gt;&lt;gsf:metadata name="equivDocIcon"/&gt;&lt;gsf:metadata name="/equivDocLink"/&gt;<br/>
     5035    <Tab n="3"/>&lt;/gsf:when&gt;<br/>
     5036      <Tab n="2"/>&lt;/gsf:switch&gt;<br/>
     5037    <Tab n="1"/>&lt;/td&gt;<br/>
     5038  &lt;/gsf:template&gt;<br/>
     5039</Format>
    49535040</NumberedItem>
    49545041</Content>
     
    58155902</Comment>
    58165903<NumberedItem>
    5817 <Text id="themes-21">Return to the <AutoText text="TutorialTheme"/> folder (in <Path>Greenstone3 &rarr; web &rarr; interfaces &rarr; default &rarr;  style &rarr; themes</Path>). Open <AutoText text="index.html" type="italics"/> in a web browser. Scroll down so that the Datepicker calendar is completely visible on your screen, and take a screen shot. (On Windows, this is done by pressing the print screen - <AutoText text="PrtScn"/> - button.) </Text>
     5904<Text id="themes-21">Return to the <AutoText text="TutorialTheme"/> folder (in <Path>Greenstone3 &rarr; web &rarr; interfaces &rarr; default &rarr; style &rarr; themes</Path>). Open <AutoText text="index.html" type="italics"/> in a web browser. Scroll down so that the Datepicker calendar is completely visible on your screen. Take a screenshot: either by using your browser's screenshot feature, first selecting the outline of the Datepicker image, or else use your PC's ability to take the screen shot. (On Windows, you can do this by pressing the print screen - <AutoText text="PrtScn"/> - button.)</Text>
    58185905</NumberedItem>
    58195906<NumberedItem>
Note: See TracChangeset for help on using the changeset viewer.