Changeset 37338
- Timestamp:
- 2023-02-22T10:48:54+13:00 (4 weeks ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
documentation/trunk/tutorials/xml-source/tutorial_en.xml
r36969 r37338 933 933 </Heading> 934 934 <NumberedItem> 935 <Text id="0304">In the Librarian Interface, look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlugin"/>, <AutoText text="TextPlugin"/>, <AutoText text="HTMLPlugin"/>, <AutoText text="EmailPlugin"/>, <AutoText text="PowerPointPlugin"/>, <AutoText text="ExcelPlugin"/>, <AutoText text="ImagePlugin"/>, <AutoText text="ISISPlug"/> and <AutoText text="NULPlugin"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GreenstoneXMLPlugin"/> is required for any type of source collection and should not be removed.</Text>935 <Text id="0304">In the Librarian Interface, look at the <AutoText key="glidict::CDM.GUI.Plugins"/> section of the <AutoText key="glidict::GUI.Design"/> panel, by clicking on this in the list to the left. Here you can add, configure or remove plugins to be used in the collection. There is no need to remove any plugins, but it will speed up processing a little. In this case we have only Word, PDF, RTF, and PostScript documents, and can remove the <AutoText text="ZIPPlugin"/>, <AutoText text="TextPlugin"/>, <AutoText text="HTMLPlugin"/>, <AutoText text="EmailPlugin"/>, <AutoText text="PowerPointPlugin"/>, <AutoText text="ExcelPlugin"/>, <AutoText text="ImagePlugin"/>, <AutoText text="ISISPlug"/>, <AutoText text="NULPlugin"/> and <AutoText text="OAIPlugin"/> plugins. To delete a plugin, select it and click <AutoText key="glidict::CDM.PlugInManager.Remove" type="button"/>. <AutoText text="GreenstoneXMLPlugin"/> is required for any type of source collection and should not be removed.</Text> 936 936 </NumberedItem> 937 937 <Heading> … … 1090 1090 <NumberedItem> 1091 1091 <Text id="fw-11-3">For collections with documents that undergo a conversion process during importing (e.g. Word, PDF, PowerPoint documents, but not text, HTML documents), the original file is stored in the collection along with the converted version. The default <AutoText text="Browse" /> format statement links to both versions, but the format statement for <AutoText text="Search"/> links only to the converted version of the original file. That is, this format statement:</Text> 1092 <Format><gsf:link type="document"><br/> 1093 <Tab n="1"/><gsf:icon type="document"/><br/> 1094 </gsf:link></Format> 1092 <Format><td><br/> 1093 <Tab n="1"/><gsf:link type="document"><br/> 1094 <Tab n="2"/><gsf:icon type="document"/><br/> 1095 <Tab n="1"/></gsf:link><br/> 1096 </td><br/></Format> 1095 1097 <Text id="fw-12-3">links to the Greenstone HTML version, while</Text> 1096 <Format><gsf:link type="source"><br/> 1097 <Tab n="1"/><gsf:metadata name="srcicon"/><br/> 1098 </gsf:link><br/> 1099 </Format> 1098 <Format><td><br/> 1099 <Tab n="1"/><gsf:link type="source"><br/> 1100 <Tab n="2"/><gsf:metadata name="srcicon"/><br/> 1101 <Tab n="1"/></gsf:link><br/> 1102 </td><br/></Format> 1100 1103 <Text id="fw-12a-3">links to the original.</Text> 1101 1104 <Text id="fw-13-3">Choose <AutoText text="Search"/> in <AutoText key="glidict::CDM.GUI.Formats"/>. Experiment with removing and restoring either of the two links from the format statement, previewing the effect of each change.</Text> … … 1682 1685 <MajorVersion number="3"> 1683 1686 <NumberedItem> 1684 <Text id="fw-24a-3">Next we'll customize the <AutoText text="search"/> format statement to highlight the query terms in a PDF file when it is opened from the search result list. This requires Acrobat Reader 7.0 version or higher, and currently only works on a Microsoft Windows platform .</Text>1687 <Text id="fw-24a-3">Next we'll customize the <AutoText text="search"/> format statement to highlight the query terms in a PDF file when it is opened from the search result list. This requires Acrobat Reader 7.0 version or higher, and currently only works on a Microsoft Windows platform and Linux systems.</Text> 1685 1688 </NumberedItem> 1686 1689 <NumberedItem> … … 1701 1704 </td><br/> 1702 1705 <br /> 1706 <highlight> 1703 1707 <td valign="top"><br/> 1704 <highlight>1705 1708 <gsf:switch><br/> 1706 1709 <Tab n="1"/><gsf:metadata name="FileFormat"/><br/> … … 1721 1724 <Tab n="2"/></gsf:link><br/> 1722 1725 <Tab n="1"/></gsf:otherwise><br/> 1723 </gsf:switch>< /highlight><br/>1724 </td>< br/>1726 </gsf:switch><br/> 1727 </td></highlight><br/> 1725 1728 <br /> 1726 1729 <td valign="top"><br/> … … 2029 2032 <Text id="assoc-files-24">Note: When Greenstone encounters a file that matches the provided <Format>associate_ext</Format> value (<Format>pdf</Format> in our case), it sets the metadata value <AutoText text="ex.equivDocIcon"/> for that document to be the macro <i>_iconXXX_</i>, where <i>XXX</i> is whatever the filename extension is (so <AutoText text="_iconpdf_" type="italics"/> in our case). As long as there is an existing macro defined for that combination of the word <i>icon</i> and the filename extension, then a suitable icon will be displayed when the document appears in a VList. For <i>pdf</i> the displayed icon will be <img src="../tutorial_files/ipdf.gif"/>.</Text> 2030 2033 </NumberedItem> 2034 <MajorVersion number="3"> 2035 <NumberedItem> 2036 <Text id="assoc-files-25a">Go to Format Features → search and you will see:</Text> 2037 <Format> 2038 <gsf:template match="documentNode"><br/> 2039 <Tab n="1"/><td valign="top"><br/> 2040 <Tab n="2"/><gsf:link type="document"><br/> 2041 <Tab n="3"/><Tab n="3"/><gsf:icon type="document"/><br/> 2042 <Tab n="2"/></gsf:link><br/> 2043 <Tab n="1"/></td><br/> 2044 <Tab n="1"/><td><br/> 2045 <Tab n="2"/><gsf:link type="document"><br/> 2046 <Tab n="3"/><xsl:call-template name="choose-title"/><br/> 2047 <Tab n="2"/></gsf:link><br/> 2048 <Tab n="1"/></td><br/> 2049 </gsf:template><br/> 2050 </Format> 2051 <Text id="assoc-files-25b">The above will only display search results where there is a link to the Greenstone generated HTML version of the original source document, followed by the title of the document.</Text> 2052 <Text id="assoc-files-25c">Change the above to:</Text> 2053 <Format> 2054 <gsf:template match="documentNode"><br/> 2055 <Tab n="1"/><td valign="top"><br/> 2056 <Tab n="2"/><gsf:link type="document"><br/> 2057 <Tab n="3"/><Tab n="3"/><gsf:icon type="document"/><br/> 2058 <Tab n="2"/></gsf:link><br/> 2059 <Tab n="1"/></td><br/> 2060 <br/> 2061 <highlight> 2062 <Tab n="1"/><td valign="top"><br/> 2063 <Tab n="2"/><gsf:link type="source"><br/> 2064 <Tab n="3"/><gsf:choose-metadata><br/> 2065 <Tab n="4"/><gsf:metadata name="thumbicon"/><br/> 2066 <Tab n="4"/><gsf:metadata name="srcicon"/><br/> 2067 <Tab n="3"/></gsf:choose-metadata><br/> 2068 <Tab n="2"/></gsf:link><br/> 2069 <Tab n="1"/></td><br/> 2070 <Tab n="1"/><td valign="top"><br/> 2071 <Tab n="2"/><gsf:metadata name="equivDocLink"/><br/> 2072 <Tab n="2"/><gsf:metadata name="equivDocIcon"/><br/> 2073 <Tab n="2"/><gsf:metadata name="/equivDocLink"/><br/> 2074 <Tab n="1"/></td><br/> 2075 </highlight> 2076 <br/> 2077 <Tab n="1"/><td><br/> 2078 <Tab n="2"/><gsf:link type="document"><br/> 2079 <Tab n="3"/><xsl:call-template name="choose-title"/><br/> 2080 <Tab n="2"/></gsf:link><br/> 2081 <Tab n="1"/></td><br/> 2082 </gsf:template><br/> 2083 </Format> 2084 <Text id="assoc-files-25d">Now, following the link to Greenstone's HTML document, there is a link to the source document (the doc file) and a link to its equivalent doc (the equivalent PDF file in our example).</Text> 2085 </NumberedItem> 2086 </MajorVersion> 2031 2087 </Content> 2032 2088 </Tutorial> … … 2095 2151 <Text id="0393b">By default, <AutoText text="HTMLPlugin"/> only looks for Title metadata. Configure the plugin so that it looks for the other metadata too. Switch to the <AutoText key="glidict::GUI.Design"/> panel and select the <AutoText key="glidict::CDM.GUI.Plugins"/> section. Select the <AutoText text="plugin HTMLPlugin"/> line and click <AutoText key="glidict::CDM.PlugInManager.Configure" type="button"/>. A popup window appears. Switch on the <AutoText text="metadata_fields"/> option, and set the value to</Text> 2096 2152 <Format> 2097 Title,Author,Page_topic,Content 2098 </Format> 2153 Title,Author,Page_topic,Content</Format> 2099 2154 <Text id="0393b-1">Click <AutoText key="glidict::General.OK" type="button"/>.</Text> 2100 2155 </NumberedItem> … … 2156 2211 <Text id="0444">Now switch to the <AutoText key="glidict::GUI.Create"/> panel, <b>build</b> the collection, and <b>preview</b> it. Choose the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelSubject_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Subject.buttonname" /></MajorVersion> link that appears in the navigation bar, and click the bookshelves to navigate around the four-entry hierarchy that you have created.</Text> 2157 2212 </NumberedItem> 2213 <!-- 2158 2214 <Heading> 2159 2215 <Text id="0457">Adding a hierarchical phrase browser (PHIND)</Text> … … 2174 2230 </Comment> 2175 2231 </NumberedItem> 2232 --> 2176 2233 <Heading> 2177 2234 <Text id="0446">Partitioning the full-text index based on metadata values</Text> … … 4028 4085 <Text id="0690a"><MajorVersion number="2">Refresh in the web browser to view</MajorVersion><MajorVersion number="3"><b>Preview</b></MajorVersion> the new <MajorVersion number="2"><AutoText key="coredm::_Global:labelTitle_"/></MajorVersion><MajorVersion number="3"><AutoText key="gs3::metadata_names::Title.buttonname" /></MajorVersion> list.</Text> 4029 4086 <Text id="0687c">As a consequence of using the <AutoText text="bookshelf_type"/> option of the <AutoText text="List"/> classifier, bookshelf icons appear when titles are browsed. This revised format statement has the effect of specifying in brackets how many items are contained within a bookshelf<MajorVersion number="3"> for classifier nodes</MajorVersion>. <MajorVersion number="2">It works by exploiting the fact that only bookshelf icons define <Format>[numleafdocs]</Format> metadata.</MajorVersion> For document nodes, Title is not displayed. Instead, Volume, Number and Date information are displayed.</Text> 4030 <Text id="0687d"><MajorVersion number="2">You may notice that the <AutoText text="Titles"/> browser shows the volume numbers in inverse order. To correct this, in</MajorVersion><MajorVersion number="3">In</MajorVersion> the <AutoText key="glidict::GUI.Design"/> Pane, under <AutoText key="glidict::CDM.GUI.Classifiers"/>, configure the <AutoText text="titles" /><AutoText text="List" /> classifier. Tick <AutoText text="sort_leaf_nodes_using"/> and set the metadata to <Format>ex.Volume|ex.Number</Format>. Rebuilding now will ensure the <i>ex.Volume</i> Number of each newspaper are listed in numeric order. This has the effect of also sorting the <i>ex.Number</i> value for each <i>ex.Volume</i>.</Text>4087 <Text id="0687d"><MajorVersion number="2">You may notice that the <AutoText text="Titles"/> browser shows the volume numbers in inverse order. To correct this, in</MajorVersion><MajorVersion number="3">In</MajorVersion> the <AutoText key="glidict::GUI.Design"/> Pane, under <AutoText key="glidict::CDM.GUI.Classifiers"/>, configure the <AutoText text="titles"/> <AutoText text="List" /> classifier. Tick <AutoText text="sort_leaf_nodes_using"/> and set the metadata to <Format>ex.Volume|ex.Number</Format>. Rebuilding now will ensure the <i>ex.Volume</i> Number of each newspaper are listed in numeric order. This has the effect of also sorting the <i>ex.Number</i> value for each <i>ex.Volume</i>.</Text> 4031 4088 </NumberedItem> 4032 4089 <Heading> … … 4909 4966 <Text id="ucp-22">Open a DOS prompt on Windows or a terminal on Mac/Linux and experiment to see what it takes to convert your Greenstone installation's <Format>web/sites/localsite/collect/DjVuColl/superhero.djvu</Format> file.</Text> 4910 4967 <Text id="ucp-22a">You may have to invoke <Format>djvutxt</Format> using its full filepath, in which case on Windows the command would look like:</Text> 4911 <Format>C:\PATH\TO\YOUR\djvutxt C:\PATH\TO\YOUR\GS\web\sites\localsite\collect\DjVuColl\ superhero.djvu C:\PATH\TO\YOUR\GS\superhero.txt</Format>4968 <Format>C:\PATH\TO\YOUR\djvutxt C:\PATH\TO\YOUR\GS\web\sites\localsite\collect\DjVuColl\import\superhero.djvu C:\PATH\TO\YOUR\GS\superhero.txt</Format> 4912 4969 <Text id="ucp-22b">while on Unix systems the command would look like:</Text> 4913 <Format>/PATH/TO/YOUR/djvutxt /PATH/TO/YOUR/GS/web/sites/localsite/collect/DjVuColl/superhero.djvu /PATH/TO/YOUR/GS/superhero.txt</Format> 4970 <Format>/PATH/TO/YOUR/djvutxt /PATH/TO/YOUR/GS/web/sites/localsite/collect/DjVuColl/import/superhero.djvu /PATH/TO/YOUR/GS/superhero.txt</Format> 4971 <Text id="ucp-22c">If you compiled up djvulibre from source, djvutxt will be in <Format>/PATH/TO/YOUR/djvulibre/bin/djvutxt</Format>.</Text> 4914 4972 <Text id="ucp-23">Once you have the command working, inspect the output file. You should see mostly legible text in it. Only when you've been able to successfully complete this step should you proceed to the next steps.</Text> 4915 4973 </NumberedItem> … … 4938 4996 </NumberedItem> 4939 4997 <NumberedItem><Text id="ucp-39">Greenstone doesn't have an icon for DjVu documents, since it doesn't know about the format. If you Google for the djvu icon, you'd probably find the <Link url="https://en.wikipedia.org/wiki/DjVu">Wikipedia page for it</Link>.</Text> 4940 <Text id="ucp-40">Save one of their DjVu icon images. Then open the image in Windows Paint or GIMP or another image editor, and use the application's scaling feature to scale the image's height or the width (whichever is greater) to anywhere between 26 and 32 pixels. Save the scaled image as a GIF file with the name "<Format>idjvu.gif</Format>", storing it in your Greenstone installation's <Format>web/interfaces/default/images</Format> folder. </Text>4998 <Text id="ucp-40">Save one of their DjVu icon images. Then open the image in Windows Paint or GIMP or another image editor, and use the application's scaling feature to scale the image's height or the width (whichever is greater) to anywhere between 26 and 32 pixels. Save the scaled image as a GIF file with the name "<Format>idjvu.gif</Format>", storing it in your Greenstone installation's <Format>web/interfaces/default/images</Format> folder. You can also use free online image resizing websites to carry out this step.</Text> 4941 4999 </NumberedItem> 4942 5000 <NumberedItem><Text id="ucp-41">Greenstone knows nothing about the <Format>icondjvu</Format> macro we defined as the value for UnknownConverterPlugin's <Format>srcicon</Format> field, so we have to teach Greenstone about this new macro. Use a text editor to open your Greenstone 3's <Format>web/sites/localsite/siteConfig.xml</Format> file.</Text> … … 4948 5006 <Text id="ucp-45">The above has now associated the icon image we want appearing for the djvu document with the macro we defined for the srcicon field in UnknownConverterPlugin's configuration.</Text> 4949 5007 </NumberedItem> 4950 <NumberedItem><Text id="ucp-45">Restart GLI, which will restart the Greenstone server, reloading the <Format>siteConfig.xml</Format> you have just edited. Rebuild the DjVu Collection again and preview it. This time, when you browse and search the collection, you should see the djvu icon appearing in place of the unknown icon for your DjVu document.</Text> 4951 </NumberedItem> 4952 <NumberedItem><Text id="ucp-45">Having designed your collection to handle DjVu documents, you can now add any other documents, including more DjVu documents. Greenstone should now be able to index the text content of DjVu documents in the collection to make them searchable, in all instances where text can be successfully extracted from them by <Format>djvutxt</Format>.</Text> 5008 <NumberedItem><Text id="ucp-46">Restart GLI, which will restart the Greenstone server, reloading the <Format>siteConfig.xml</Format> you have just edited. Rebuild the DjVu Collection again and preview it. This time, when you browse the collection, you should see the djvu icon appearing in place of the unknown icon for your DjVu document.</Text> 5009 </NumberedItem> 5010 <NumberedItem><Text id="ucp-47">Having designed your collection to handle DjVu documents, you can now add any other documents, including more DjVu documents. Greenstone should now be able to index the text content of DjVu documents in the collection to make them searchable, in all instances where text can be successfully extracted from them by <Format>djvutxt</Format>.</Text> 5011 <Text id="ucp-47a">Make the search format statement look like below, then try searching:</Text> 5012 <Format> 5013 <gsf:template match="documentNode"><br/> 5014 <Tab n="1"/><td valign="top"><br/> 5015 <Tab n="2"/><gsf:link type="document"><br/> 5016 <Tab n="3"/><gsf:icon type="document"/><br/> 5017 <Tab n="2"/></gsf:link><br/> 5018 <Tab n="1"/></td><br/> 5019 <Tab n="1"/><td valign="top"><br/> 5020 <Tab n="2"/><gsf:link type="source"><br/> 5021 <Tab n="3"/><gsf:choose-metadata><br/> 5022 <Tab n="4"/><gsf:metadata name="thumbicon"/><br/> 5023 <Tab n="4"/><gsf:metadata name="srcicon"/><br/> 5024 <Tab n="3"/></gsf:choose-metadata><br/> 5025 <Tab n="2"/></gsf:link><br/> 5026 <Tab n="1"/></td><br/> 5027 <Tab n="1"/><td><br/> 5028 <Tab n="2"/><gsf:link type="document"><br/> 5029 <Tab n="3"/><xsl:call-template name="choose-title"/><br/> 5030 <Tab n="2"/></gsf:link><br/> 5031 <Tab n="2"/><gsf:switch><br/> 5032 <Tab n="3"/><gsf:metadata name="equivDocLink"/><br/> 5033 <Tab n="3"/><gsf:when test="exists"><br/> 5034 <Tab n="4"/>Also available as: <gsf:metadata name="equivDocLink"/><gsf:metadata name="equivDocIcon"/><gsf:metadata name="/equivDocLink"/><br/> 5035 <Tab n="3"/></gsf:when><br/> 5036 <Tab n="2"/></gsf:switch><br/> 5037 <Tab n="1"/></td><br/> 5038 </gsf:template><br/> 5039 </Format> 4953 5040 </NumberedItem> 4954 5041 </Content> … … 5815 5902 </Comment> 5816 5903 <NumberedItem> 5817 <Text id="themes-21">Return to the <AutoText text="TutorialTheme"/> folder (in <Path>Greenstone3 → web → interfaces → default → style → themes</Path>). Open <AutoText text="index.html" type="italics"/> in a web browser. Scroll down so that the Datepicker calendar is completely visible on your screen, and take a screen shot. (On Windows, this is done by pressing the print screen - <AutoText text="PrtScn"/> - button.)</Text>5904 <Text id="themes-21">Return to the <AutoText text="TutorialTheme"/> folder (in <Path>Greenstone3 → web → interfaces → default → style → themes</Path>). Open <AutoText text="index.html" type="italics"/> in a web browser. Scroll down so that the Datepicker calendar is completely visible on your screen. Take a screenshot: either by using your browser's screenshot feature, first selecting the outline of the Datepicker image, or else use your PC's ability to take the screen shot. (On Windows, you can do this by pressing the print screen - <AutoText text="PrtScn"/> - button.)</Text> 5818 5905 </NumberedItem> 5819 5906 <NumberedItem>
Note:
See TracChangeset
for help on using the changeset viewer.