Changeset 34195
- Timestamp:
- 2020-06-16T18:05:13+12:00 (4 years ago)
- Location:
- gs2-extensions/gstika/trunk/java
- Files:
-
- 1 added
- 1 moved
Legend:
- Unmodified
- Added
- Removed
-
gs2-extensions/gstika/trunk/java/ocr-pdfs-config.xml
r34193 r34195 41 41 42 42 To get Tika to work with Tesseract to OCR pages of a scanned PDF: 43 1. always pass in this file as __config=/path/to/tika-config.xml to tika-app-*.jar cmd,43 1. always pass in this file as --config=/path/to/tika-config.xml to tika-app-*.jar cmd, 44 44 2. AND do one of the following: 45 45 a. Set the above outputType param to "txt" so Tesseract produces the OCR in .txt format, and things should work, … … 52 52 53 53 More information about tesseract config options by running: 54 tesseract __print-parameters54 tesseract --print-parameters 55 55 --> 56 56 <param name="language" type="string">eng</param>
Note:
See TracChangeset
for help on using the changeset viewer.