source: gs2-extensions/gstika/trunk/gstika.tar.gz@ 34187

Last change on this file since 34187 was 34187, checked in by ak19, 4 years ago

Committing the tika-config.xml that sets up Tika's PDFParser and TesseractOCRParser to OCR PDFs. Without this, despite Tika detecting Tesseract, PDFs weren't getting OCR-ed. This problem wasn't documented anywhere either and onlly by change did I find what was needed: that a correctly configured tika-config.xml was compulsory to get PDFs OCR-ed by Tika+Tesseract, and that the Tesseract installation I created had been missing TESSDATA_PREFIX/configs/hocr

  • Property svn:mime-type set to application/octet-stream
File size: 63.3 MB

HTML preview not available, since the file size exceeds 256.0 KB.Try downloading the file instead.

Note: See TracBrowser for help on using the repository browser.