Index: documentation/trunk/tutorials/xml-source/tutorial_en.xml
===================================================================
--- documentation/trunk/tutorials/xml-source/tutorial_en.xml (revision 32907)
+++ documentation/trunk/tutorials/xml-source/tutorial_en.xml (revision 32978)
@@ -1511,9 +1511,7 @@
- Prior to Greenstone 3.09, Greenstone shipped with a plugin called . It was the plugin Greenstone used to convert PDF files to HTML using the third-party software . PDFPlugin allowed users to view PDF documents even if they didn't have the PDF software installed. Unfortunately, sometimes the formatting of the resulting HTML files was not so good. Earlier versions of this tutorial would provide some instruction on extra options to the PDFPlugin for producing a nicer version for display.
+ Prior to Greenstone 3.09, Greenstone shipped with a plugin called . It was the plugin Greenstone used to convert PDF files to HTML using the third-party software . PDFPlugin allowed users to view PDF documents even if they didn't have the PDF software installed. Unfortunately, sometimes the formatting of the resulting HTML files was not so good. Earlier versions of this tutorial would provide some instruction on extra options to the PDFPlugin for producing a nicer version for display. The older pdftohtml process could however not cope with much newer versions of PDF unless PDFPlugin's pdfbox_conversion option was switched on.
- Furthermore, the older pdftohtml process could not cope with much newer versions of PDF unless PDFPlugin's pdfbox_conversion option was switched on.
-
- Starting with Greenstone 3.09, some older pdf processing functionality has been restructured into , while shifting the pdfbox_conversion option into . PDFv2Plugin further makes use of third-party software , which better copes with newer PDFs (without requiring the pdfbox_conversion option to be activated). PDFv2Plugin comes with several new preconfigured settings to produce output files in html, text, image or image and text formats, that can better reflect the appearance of an input PDF document's pages. Behind the scenes, PDFv2Plugin is configured to use the third-party xpdf-tools or pdfbox software for each output setting.
+ Starting with Greenstone 3.09, some older pdf processing functionality has been restructured into , while shifting the pdfbox_conversion option into . PDFv2Plugin further makes use of third-party software , which better copes with newer PDFs, thus no longer requiring activating the pdfbox_conversion option when dealing with newer PDFs. PDFv2Plugin comes with several new preconfigured settings to produce output files in html, text, image or image and text formats, that can better reflect the appearance of an input PDF document's pages. Behind the scenes, PDFv2Plugin is configured to use the third-party xpdf-tools or pdfbox software for each output setting.From Greenstone 3.09 onwards, PDFv2Plugin is added to a new collection's Document Plugins pipleline by default, in place of the now defunct PDFPlugin. In any instance where you particularly prefer the original PDFPlugin's HTML output for a PDF, you can now use PDFv1Plugin instead, as it still retains this functionality.
@@ -1528,5 +1526,5 @@
-Preview the collection and view the documents. Inspect pdf01 and pdf03 first. There's a table of contents is provided to the right. Clicking on a page in the table of contents will scroll to that page. Another way of navigating can be found to the left, where individual pages are listed vertically by page number and clicking the "plus" box next to a page will expand its contents. The pdfs have been sectionalised into groups of 10 pages, each group further containing a section for each individual page. If your pdf contained 10 or fewer pages, there won't two levels of sectionalising, just one.
+Preview the collection and view the documents. Inspect pdf01 and pdf03 first. There's a table of contents is provided to the right. Clicking on a page in the table of contents will scroll to that page. Another way of navigating can be found to the left, where individual pages are listed vertically by page number and clicking the "plus" box next to a page will expand its contents. The pdfs have been sectionalised into groups of 10 pages, each group further containing a section for each individual page. If your pdf contained 10 or fewer pages, there won't be two levels of sectionalising, just one.If you visit a given page and try to select and copy the text, you can. These are not entirely images of the pdf's pages (like screenshots of a pdf page), but are HTML pages that combine the images of the background of each pdf page with the actual text of that page superimposed. The latter is what makes the text selectable.If you return to GLI's Design pane and double click on PDFv2Plugin in Document Plugins, then you will see that the convert_to option is set to paged_pretty_html. This is the default PDF convert_to type and produces the kind of sectionalised HTML pages consisting of background images and superimposed text that you see with pdf01 and pdf03.
@@ -1608,5 +1606,5 @@
-Switch to the section of the panel. Add a second instance of by selecting from the drop-down list, and clicking . This plugin will come after the first PDFv2Plugin instance, so we configure it to process PDF documents as sectionalised HTML. Leave the option on , and switch on the option. Click .
+Switch to the section of the panel. Add a second instance of by selecting from the drop-down list, and clicking . This plugin will come after the first PDFv2Plugin instance, so we configure it to process PDF documents as sectionalised HTML by leaving the option on the default, . Click .
@@ -2140,5 +2138,5 @@
-Next we'll add an interactive hierarchical phrase browsing classifier to this collection. Java applet support is being or has been phased out in various browsers and browser versions. As a result the following will not work on Microsoft Edge browsers, among others.
+Next we'll add an interactive hierarchical phrase browsing classifier to this collection. Java applet support is being or has been phased out in various browsers and browser versions. As a result the following will not work on Microsoft Edge and some other browsers.
@@ -2186,9 +2184,9 @@
-Search for the term Mary again, as that is likely to be common in all five index partitions, and check that the numbers of words (not documents) add up.
-
-The text in the drop down box on the search page is based on the filters each partition was built on. To change the text that is displayed, go to the section of the panel. The single filter partitions have sensible default text, but the combined partition does not. Set the for the combined partition to "all". Preview the collection.
+
+Search for the term Mary again, as that is likely to be common in all five index partitions, and check that the numbers of words (not documents) in the search results for the 4 individual indexes add up to the number of words for the all index.
+Controlling the building process
@@ -2198,5 +2196,5 @@
-Switch to the panel. Expand the top panel to be able to see the options for collection building. Scroll to view them all, then select on the left and view the options that are then displayed to the right. Select and set its numeric counter to . (When in GLI's Mode, the option for the import process are located under the of the panel.) Now build.
+Switch to the panel. Expand the top panel to be able to see the options for collection building. Scroll to view them all. Select and set its numeric counter to . (When in GLI's Mode, the option for the import process are located under the of the panel.) Now build.
@@ -2620,5 +2618,5 @@
If your computer is behind a firewall or proxy server, you will need to edit the proxy settings in the Librarian Interface. Click the button. Switch on the checkbox. Enter the proxy server address and port number in the and boxes.
- URLs that start with https, or URLs that resolve to https, will additionally need the and corresponding filled in too, before web pages can be downloaded from there.
+ URLs that start with https, or URLs that resolve to https, will additionally need the and corresponding filled in too, before web pages can be downloaded from there.Websites at https URLs often have a security certificate, but not always. For instance, https://englishhistory.net does not have one. To instruct GLI to nevertheless download pages from https URLs that don't have a security certificate, you'll also need to switch on the checkbox.Once you've finished configuring the proxy settings, click to close the dialog.
@@ -3625,5 +3623,5 @@
- Java applet support is being or has been phased out in various browsers and browser versions. As a result the following step will not work on Microsoft Edge browsers, among others. If you're using such a browser, you may skip this step.
+ Java applet support is being or has been phased out in various browsers and browser versions. As a result the following step will not work on Microsoft Edge and some other browsers. If you're using such a browser, you may skip this step.
@@ -3635,7 +3633,10 @@
-To complete the collection, let's give it a new image for the top left corner of the pagelink from the main page. Go to the section of the panel. Use the browse button of to select the following image:
+To complete the collection, let's give it a new image for the top left corner of the pagelink from the main page. Go to the section of the panel. Use the browse button of the to select the following image:
+sample_files → beatles → advbeat_large → images → tile.jpg
+You can also set an image for the link to the collection's home page here. For this, use the browse button of to select the following image:sample_files → beatles → advbeat_large → images → beatlesmm.png
-Preview the collection, and make sure the new image appears.
+Preview the collection, and make sure the new image appears on the collection's about page.
+Also go to the digital library home page by clicking on the My Greenstone Library link at the top left. On the home page, look through the links to all the collections in your digital library to find the one to the Small Beatles collection. This link should now be denoted by an image bearing the text "BeatlesMultimedia".
@@ -4148,5 +4149,5 @@
In the section of the panel, select in to adjust how search results are displayed., and in . Click to add this format to the collection. The previous changes modified , so they will apply to all s that don't have specific format statements. These next changes are made to so will only apply to search results.
-The extracted Title for the current section is specified as [ex.Title]<gsf:metadata name="Title"/> while the Title for the parent section is [parent:ex.Title]<gsf:metadata name="Title" select="parent"/>. Since the same format statement is used when searching both whole newspapers and newspaper pages, we need to make sure it works in both cases.
+The extracted Title for the current section is specified as [ex.Title]<gsf:metadata name="Title"/> while the Title for the parent section is [parent:ex.Title]<gsf:metadata name="Title" select="parent"/> (if using metadata assigned at the document or root level, this would be <gsf:metadata name="Title" select="root"/>). Since the same format statement is used when searching both whole newspapers and newspaper pages, we need to make sure it works in both cases.Set the format statement to the following text (it can be copied and pasted from the file sample_files → niupepa → formats → search_tweak.txt):
@@ -4178,6 +4179,7 @@
<i> <gsf:choose-metadata>
+ <gsf:metadata name="Date" format="formatDate" /> <gsf:metadata name="Date" select="parent" format="formatDate" />
- <gsf:metadata name="Date" format="formatDate" />
+ <gsf:metadata name="Date" select="root" format="formatDate" /> <gsf:default>undated</gsf:default> </gsf:choose-metadata>
@@ -4518,7 +4520,7 @@
<td>Caption:</td> <td><i><gsf:metadata name="ex.dc.Description"/></i><br/>
- <a><xsl:attribute name="href"><gsf:metadata name="ex.dc.OrigURL"/></xsl:attribute>
+ <gsf:link type="source"> original <gsf:metadata name="ImageWidth"/>x<gsf:metadata name="ImageHeight"/> <gsf:metadata name="ImageType"/> available
- </a>
+ </gsf:link> </td> </tr>