Changeset 34178 for gs2-extensions

Timestamp:
2020-06-15T22:44:34+12:00 (4 years ago)
Author:
ak19
Message:

CASCADE-MAKE for Tesseract, the OCR tool. I'm thinking of expanding the UnknownPlugin tutorial to include using it with Tika for processing docx and for using the Pluging with Tika and Tesseract to OCR image-only pdfs. I have tested the compiled tesseract and on a sample tif image, and it works. But I've still to test the Tika with Tesseract combination. The libz, libpng, (lib)jpeg, (lib)tif and jpeg2000 packages are from Imagemagick. Leptonica needs them (not sure about jpeg2000) and libgif. No libgif yet. Libtool and Leptonica are the dependencies for Tesseract itself. I'm including just the English language data in tessdata folder. Others are available from https://github.com/tesseract-ocr/tessdata . I've added a file called LinksAndNotesOnCompilingManually.txt documenting reading on TikaOCR, how to compile up Tesseract and my pre cascade-make attempts to compile tesseract on Ubuntu. But then I followed the existing use of Cascade-Make in GS2-extensions gnome-lib and imagemagick to get Tesseract compiled up. I don't know how to add in support for cross compilation.

Location:
gs2-extensions/tesseract
Files:
28 added

Note: See TracChangeset for help on using the changeset viewer.