Ignore:
Timestamp:
2023-05-26T18:43:41+12:00 (11 months ago)
Author:
anupama
Message:
  1. Images GPS collection is back to working with Google maps displaying, albeit darkened and marked as being in developer's mode. Nevertheless, Dr Bainbridge has devised instructions that have now been tested as to how to get the Google maps to display with an API key for localhost (but need a Google account, an API key set up as mentioned, and a credit card linked to the Google account). 2. Additional configuration for Paged Images collection for ease of viewing all the documents when browsing the titles classifier. 3. Correcting spelling.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r37748 r37751  
    785785<Comment>
    786786<Text id="images-gps-0">In this tutorial, we'll be looking at building a collection that takes advantage of the GPS metadata embedded in image files. Using this data, we can plot the images on a map based on where they were taken.</Text>
     787</Comment>
     788<Comment>
     789  <Text id="images-gps-0a">In doing this tutorial, if the maps are not available for viewing at all, you will need to have a Google Maps API key. This is done through https://console.cloud.google.com/apis
     790Only for the duration of this tutorial, set up a Google API key and restrict it to <Format>localhost:8383/greenstone3/*</Format> (You'll want to disable it again afterward for security purposes.) Having created an API key, follow the remaining instructions given in your Greenstone3 installation's <Format>resources/web/servlets.xml.in</Format> file for the param-name "googlemaps_api_key".<br/>
     791  Note: Even though Google provides a free tier for usage, at the time of writing they still require you to register a credit card with your Google developer account.</Text>
    787792</Comment>
    788793<NumberedItem>
     
    18691874<Text id="ew-34">Now turn off <AutoText text="windows_scripting"/> in the <AutoText key="glidict::GUI.Design"/> panel, and <b>rebuild</b> the collection again. All the documents should still be processed, because Greenstone's document plugin pipeline is now set up with an <AutoText text="UnknownConverterPlugin"/> configured to use <i>Apache Tika</i> to extract text from Word documents by default (including docx files). <b>Preview</b> the collection and revisit the document view of the docx file. This time, the html produced should look very different: much more basic. This is because <i>Tika</i> supports extracting text from different document formats, including word documents, but is not optimised for html presentation. However, this does mean full text searching will be available for docx files too when Greenstone is installed out-of-the-box.</Text>
    18701875<Text id="ew-35">So at a pinch, you can always use Greenstone's now default document plugins setup, to process a collection that includes docx files, to at least support full text searching of the contents of docx files, even if the document view (the HTML view) of docx files processed with Tika may not look as formatted as the original source document. Presentation may be of secondary importance, since by default Greenstone will anyway provide a link to the original source document in its original format (in this case, a link to the docx file).</Text>
    1871 <Comment><Text id="ew-36">Above, we shifted the <AutoText text="UnknownConverterPlugin"/> that uses Apache Tika to below the <AutoText text="WordPlugin"/> in the document plugin pipeline, because we want to force <AutoText text="WordPlugin"/> to attempt to process all word documents first, when it recognises them. Apache Tika can always process Word documents, but we favour <AutoText text="WordPlugin"/> to try processing them first, including the newer docx files, which it can do when on Windows machines with Word installed and <AutoText text="windows_scripting"/> turned on. Turning off <AutoText text="windows_scripting"/> instructs the <AutoText text="WordPlugin"/> not to make use of Word to convert doc(x) files to html, and so <AutoText text="WordPlugin"/> is not able to process docx files. As a result, the document plugins in the pipeline pass the unprocessed docx file further down the pipeline to the <AutoText text="UnknownConverterPlugin"/> that is able to process the docx file as it's pre-configure to make use of Apache Tika to extract text from Word documents.</Text></Comment>
     1876<Comment><Text id="ew-36">Above, we shifted the <AutoText text="UnknownConverterPlugin"/> that uses Apache Tika to below the <AutoText text="WordPlugin"/> in the document plugin pipeline, because we want to force <AutoText text="WordPlugin"/> to attempt to process all word documents first, when it recognises them. Apache Tika can always process Word documents, but we favour <AutoText text="WordPlugin"/> to try processing them first, including the newer docx files, which it can do when on Windows machines with Word installed and <AutoText text="windows_scripting"/> turned on. Turning off <AutoText text="windows_scripting"/> instructs the <AutoText text="WordPlugin"/> not to make use of Word to convert doc(x) files to html, and so <AutoText text="WordPlugin"/> is not able to process docx files. As a result, the document plugins in the pipeline pass the unprocessed docx file further down the pipeline to the <AutoText text="UnknownConverterPlugin"/> that is able to process the docx file as it's pre-configured to make use of Apache Tika to extract text from Word documents.</Text></Comment>
    18721877</NumberedItem>
    18731878</MajorVersion>
     
    40294034<NumberedItem>
    40304035<Text id="0685a">Select the classifier for <AutoText text="dc.Title;ex.Title"/> and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. Set <AutoText text="bookshelf_type"/> to <AutoText text="always"/>. This will create a bookshelf for each Title in the collection. Note, setting this option to <AutoText text="duplicate_only"/> will only create a bookshelf when more than one document shares a Title. </Text>
     4036</NumberedItem>
     4037<NumberedItem>
     4038<Text id="0685b">Setting this List classifier's <AutoText text="partition_type_within_level"/> to <AutoText text="none"/> will further allow you to browse the few documents in this collection all in one page, instead of having the additional level of browsing by starting letter, presented horizontally at the top. Click <AutoText key="glidict::General.OK"/> to finish configuring the classifier.</Text>
    40314039</NumberedItem>
    40324040<NumberedItem>
Note: See TracChangeset for help on using the changeset viewer.