source: gs2-extensions/gstika/trunk/gstika.zip@ 34196

Last change on this file since 34196 was 34196, checked in by ak19, 4 years ago

Updating gstika tarballs too with the latest changes to the tika config file: Renaming config files so one is configured for OCR-ing PDFs, the other for turning off OCR when Tesseract is installed (else Tika will autodetect if OCR-ing applies when Tesseract is installed. Maybe there's some minor savings in overhead with a no-ocr-config.xml?). With no config flag passed to tika, it will by default perform OCR only where it applies and if Tesseract is installed. Because by default Tika only extracts text and does not extract images, and you need to expressly turn image extraction on with -z/--extract, there is no such overhead, except maybe for PDFs where each page is an image. However, in gstika, the GS specific custom flags introduced (html-with-imgs and xhtml-with-imgs) do extract text and images simultaneously and so may need the no-ocr-config.xml to shave off this overhead if no automatic OCR-ing on docs is needed.

  • Property svn:mime-type set to application/octet-stream
File size: 63.3 MB

HTML preview not available, since the file size exceeds 256.0 KB.Try downloading the file instead.

Note: See TracBrowser for help on using the repository browser.