Changeset 32035 for documentation


Ignore:
Timestamp:
2017-10-09T18:11:53+13:00 (7 years ago)
Author:
ak19
Message:

Some more minor changes to the new tutorial. Not yet tried or written up the windows version.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorials/xml-source/tutorial_en.xml

    r32032 r32035  
    47904790<Comment><Text id="ucp-05">An example would be djvu files, for which Greenstone provides no custom plugin. However, there's a free commandline tool available for unix systems that can convert from djvu to one of the text based format that Greenstone can process, text or html. So in this case, you could try using the UnknownConverterPlugin with the commandline tool on djvu files that you've gathered. The result should be that the djvu files in your Greenstone collection are now searchable.</Text></Comment>
    47914791<Heading><Text id="ucp-06">Working with DjVu documents in Greenstone</Text></Heading>
    4792 <Text id="ucp-07">DjVu documents (pronounced like the French phrase <i>déjà vu</i>) are becoming a popular document format. <Link url="http://djvu.sourceforge.net/doc/index.html">DjVuLibre</Link>, which provides open source tools for processing DjVu documents, describes DjVu as</Text>
     4792<Text id="ucp-07">DjVu (pronounced like the French phrase <i>déjà vu</i>) is a <Link url="https://www.djvuzone.org/">document format</Link> suited for archiving digital documents. <Link url="http://djvu.sourceforge.net/doc/index.html">DjVuLibre</Link>, which provides open source tools for processing DjVu documents, describes DjVu as</Text>
    47934793<Comment><Text id="ucp-07a">"a web-centric format and software platform for distributing documents and images. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu is used by hundreds of academic, commercial, governmental, and non-commercial web sites around the world."</Text></Comment>
    4794 <Text id="ucp-08">In this part of the tutorial we'll see how to get Greenstone to not just include a collection's DjVu documents, but make them searchable too.</Text>
     4794<Text id="ucp-08">In this part of the tutorial we'll see how to get Greenstone to not just include a collection's DjVu documents, but make them searchable too. There are several tools out there to convert a DjVu document into text or HTML. For instance, Linux users can install the <i>ocrodjvu</i> package and use its <i>djvu2hocr</i> tool to extract the text content in HTML format. Janusz S. Bien, a Greenstone user on the mailing list, has recommended it as being of possible use to Greenstone users, as it's a front-end to OCR programs. In this tutorial, however, we'll look at using <i>djvutxt</i> which is part of the DjVuLibre suite of tools.</Text>
    47954795<Heading><Text id="ucp-09">Extracting the text from DjVu documents with DjVuLibre's djvutxt</Text></Heading>
    47964796<NumberedItem><Text id="ucp-10">Start up GLI and create a new collection called <i>DjVu Collection</i>.</Text>
Note: See TracChangeset for help on using the changeset viewer.