Changeset 34187

Timestamp:
16.06.2020 15:22:04 (3 weeks ago)
Author:
ak19
Message:

Committing the tika-config.xml that sets up Tika's PDFParser and TesseractOCRParser to OCR PDFs. Without this, despite Tika detecting Tesseract, PDFs weren't getting OCR-ed. This problem wasn't documented anywhere either and onlly by change did I find what was needed: that a correctly configured tika-config.xml was compulsory to get PDFs OCR-ed by Tika+Tesseract, and that the Tesseract installation I created had been missing TESSDATA_PREFIX/configs/hocr

Location:
gs2-extensions/gstika/trunk
Files:
1 added
3 modified