Changeset 37751
- Timestamp:
- 2023-05-26T18:43:41+12:00 (8 days ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
documentation/trunk/tutorials/xml-source/tutorial_en.xml
r37748 r37751 785 785 <Comment> 786 786 <Text id="images-gps-0">In this tutorial, we'll be looking at building a collection that takes advantage of the GPS metadata embedded in image files. Using this data, we can plot the images on a map based on where they were taken.</Text> 787 </Comment> 788 <Comment> 789 <Text id="images-gps-0a">In doing this tutorial, if the maps are not available for viewing at all, you will need to have a Google Maps API key. This is done through https://console.cloud.google.com/apis 790 Only for the duration of this tutorial, set up a Google API key and restrict it to <Format>localhost:8383/greenstone3/*</Format> (You'll want to disable it again afterward for security purposes.) Having created an API key, follow the remaining instructions given in your Greenstone3 installation's <Format>resources/web/servlets.xml.in</Format> file for the param-name "googlemaps_api_key".<br/> 791 Note: Even though Google provides a free tier for usage, at the time of writing they still require you to register a credit card with your Google developer account.</Text> 787 792 </Comment> 788 793 <NumberedItem> … … 1869 1874 <Text id="ew-34">Now turn off <AutoText text="windows_scripting"/> in the <AutoText key="glidict::GUI.Design"/> panel, and <b>rebuild</b> the collection again. All the documents should still be processed, because Greenstone's document plugin pipeline is now set up with an <AutoText text="UnknownConverterPlugin"/> configured to use <i>Apache Tika</i> to extract text from Word documents by default (including docx files). <b>Preview</b> the collection and revisit the document view of the docx file. This time, the html produced should look very different: much more basic. This is because <i>Tika</i> supports extracting text from different document formats, including word documents, but is not optimised for html presentation. However, this does mean full text searching will be available for docx files too when Greenstone is installed out-of-the-box.</Text> 1870 1875 <Text id="ew-35">So at a pinch, you can always use Greenstone's now default document plugins setup, to process a collection that includes docx files, to at least support full text searching of the contents of docx files, even if the document view (the HTML view) of docx files processed with Tika may not look as formatted as the original source document. Presentation may be of secondary importance, since by default Greenstone will anyway provide a link to the original source document in its original format (in this case, a link to the docx file).</Text> 1871 <Comment><Text id="ew-36">Above, we shifted the <AutoText text="UnknownConverterPlugin"/> that uses Apache Tika to below the <AutoText text="WordPlugin"/> in the document plugin pipeline, because we want to force <AutoText text="WordPlugin"/> to attempt to process all word documents first, when it recognises them. Apache Tika can always process Word documents, but we favour <AutoText text="WordPlugin"/> to try processing them first, including the newer docx files, which it can do when on Windows machines with Word installed and <AutoText text="windows_scripting"/> turned on. Turning off <AutoText text="windows_scripting"/> instructs the <AutoText text="WordPlugin"/> not to make use of Word to convert doc(x) files to html, and so <AutoText text="WordPlugin"/> is not able to process docx files. As a result, the document plugins in the pipeline pass the unprocessed docx file further down the pipeline to the <AutoText text="UnknownConverterPlugin"/> that is able to process the docx file as it's pre-configure to make use of Apache Tika to extract text from Word documents.</Text></Comment>1876 <Comment><Text id="ew-36">Above, we shifted the <AutoText text="UnknownConverterPlugin"/> that uses Apache Tika to below the <AutoText text="WordPlugin"/> in the document plugin pipeline, because we want to force <AutoText text="WordPlugin"/> to attempt to process all word documents first, when it recognises them. Apache Tika can always process Word documents, but we favour <AutoText text="WordPlugin"/> to try processing them first, including the newer docx files, which it can do when on Windows machines with Word installed and <AutoText text="windows_scripting"/> turned on. Turning off <AutoText text="windows_scripting"/> instructs the <AutoText text="WordPlugin"/> not to make use of Word to convert doc(x) files to html, and so <AutoText text="WordPlugin"/> is not able to process docx files. As a result, the document plugins in the pipeline pass the unprocessed docx file further down the pipeline to the <AutoText text="UnknownConverterPlugin"/> that is able to process the docx file as it's pre-configured to make use of Apache Tika to extract text from Word documents.</Text></Comment> 1872 1877 </NumberedItem> 1873 1878 </MajorVersion> … … 4029 4034 <NumberedItem> 4030 4035 <Text id="0685a">Select the classifier for <AutoText text="dc.Title;ex.Title"/> and click <AutoText key="glidict::CDM.ClassifierManager.Configure" type="button"/>. Set <AutoText text="bookshelf_type"/> to <AutoText text="always"/>. This will create a bookshelf for each Title in the collection. Note, setting this option to <AutoText text="duplicate_only"/> will only create a bookshelf when more than one document shares a Title. </Text> 4036 </NumberedItem> 4037 <NumberedItem> 4038 <Text id="0685b">Setting this List classifier's <AutoText text="partition_type_within_level"/> to <AutoText text="none"/> will further allow you to browse the few documents in this collection all in one page, instead of having the additional level of browsing by starting letter, presented horizontally at the top. Click <AutoText key="glidict::General.OK"/> to finish configuring the classifier.</Text> 4031 4039 </NumberedItem> 4032 4040 <NumberedItem>
Note:
See TracChangeset
for help on using the changeset viewer.