Changeset 32035
- Timestamp:
- 2017-10-09T18:11:53+13:00 (6 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
documentation/trunk/tutorials/xml-source/tutorial_en.xml
r32032 r32035 4790 4790 <Comment><Text id="ucp-05">An example would be djvu files, for which Greenstone provides no custom plugin. However, there's a free commandline tool available for unix systems that can convert from djvu to one of the text based format that Greenstone can process, text or html. So in this case, you could try using the UnknownConverterPlugin with the commandline tool on djvu files that you've gathered. The result should be that the djvu files in your Greenstone collection are now searchable.</Text></Comment> 4791 4791 <Heading><Text id="ucp-06">Working with DjVu documents in Greenstone</Text></Heading> 4792 <Text id="ucp-07">DjVu documents (pronounced like the French phrase <i>déjà vu</i>) are becoming a popular document format. <Link url="http://djvu.sourceforge.net/doc/index.html">DjVuLibre</Link>, which provides open source tools for processing DjVu documents, describes DjVu as</Text>4792 <Text id="ucp-07">DjVu (pronounced like the French phrase <i>déjà vu</i>) is a <Link url="https://www.djvuzone.org/">document format</Link> suited for archiving digital documents. <Link url="http://djvu.sourceforge.net/doc/index.html">DjVuLibre</Link>, which provides open source tools for processing DjVu documents, describes DjVu as</Text> 4793 4793 <Comment><Text id="ucp-07a">"a web-centric format and software platform for distributing documents and images. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu is used by hundreds of academic, commercial, governmental, and non-commercial web sites around the world."</Text></Comment> 4794 <Text id="ucp-08">In this part of the tutorial we'll see how to get Greenstone to not just include a collection's DjVu documents, but make them searchable too. </Text>4794 <Text id="ucp-08">In this part of the tutorial we'll see how to get Greenstone to not just include a collection's DjVu documents, but make them searchable too. There are several tools out there to convert a DjVu document into text or HTML. For instance, Linux users can install the <i>ocrodjvu</i> package and use its <i>djvu2hocr</i> tool to extract the text content in HTML format. Janusz S. Bien, a Greenstone user on the mailing list, has recommended it as being of possible use to Greenstone users, as it's a front-end to OCR programs. In this tutorial, however, we'll look at using <i>djvutxt</i> which is part of the DjVuLibre suite of tools.</Text> 4795 4795 <Heading><Text id="ucp-09">Extracting the text from DjVu documents with DjVuLibre's djvutxt</Text></Heading> 4796 4796 <NumberedItem><Text id="ucp-10">Start up GLI and create a new collection called <i>DjVu Collection</i>.</Text>
Note:
See TracChangeset
for help on using the changeset viewer.