Changeset 34175 for gs2-extensions


Ignore:
Timestamp:
2020-06-15T03:28:28+12:00 (4 years ago)
Author:
ak19
Message:

Minor changes to folder names

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs2-extensions/gstika/trunk/GS_TIKA_README.txt

    r34174 r34175  
    28281. HTML:   
    2929
    30 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --html /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.htm
     30GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --html /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.htm
    3131
    32322. XHTML - looks the same as HTML:
    3333
    34 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --xml /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
     34GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --xml /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
    3535
    36363. PLAIN TEXT CONTENT - NO META:
    3737
    38 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
     38GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
    3939
    4040  a. PLAIN TEXT WITH META:
    4141
    42 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
     42GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html
    4343
    4444  b. JUST META:
    4545
    46 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --metadata /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html)
     46GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --metadata /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html)
    4747   
    48484. IMAGES CAN'T DO HTML + IMAGES IN ONE STEP by throwing in any of the above flags in addition):
    4949
    5050Extracts all attachments (images etc) into specified dir (-z or --extract and then specify a dir for it)
    51 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --extract --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx     
     51GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --extract --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx       
    5252
    5353
     
    5555C. COMPARE OUTPUT - IMG EXTRACTION vs TEXT:
    5656--------------------------------------------------------------
    57 * GS3/gs2build/ext/tika>java -jar tika-app-*.jar -z --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx
     57* GS3/gs2build/ext/gstika>java -jar tika-app-*.jar -z --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx
    5858
    5959INFO  As a convenience, TikaCLI has turned on extraction of
     
    7272
    7373
    74 * GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx
     74* GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx
    7575
    7676Jun 14, 2020 1:29:42 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
     
    1861862. It stands alone and can be compiled and run against the tika-app-*.jar file on the classpath:
    187187To compile
    188    GS3/gs2build/ext/tika>javac -cp `pwd`/tika-app-*.jar org/greenstone/tika/GSTikaCLI.java
     188   GS3/gs2build/ext/gstika>javac -cp `pwd`/lib/tika-app-*.jar org/greenstone/tika/GSTikaCLI.java
    189189To run:
    190    GS3/gs2build/ext/tika>java -cp "`pwd`/tika-app-*.jar:." org.greenstone.tika.GSTikaCLI --html-with-images <inputfilepath> > output.html
     190   GS3/gs2build/ext/gstika>java -cp "`pwd`/lib/tika-app-*.jar:." org.greenstone.tika.GSTikaCLI --html-with-images <inputfilepath> > output.html
    191191
    192192(Can pass existing flags, e.g. --html for html without images extracted)
     
    194194To compile code that lives in a directory called "src" and compile it into a directory called "build":
    195195
    196    GS3/gs2build/ext/tika>javac -cp `pwd`/tika-app-*.jar -d `pwd`/build src/org/greenstone/tika/GSTikaCLI.java
     196   GS3/gs2build/ext/gstika>javac -cp `pwd`/lib/tika-app-*.jar -d `pwd`/build src/org/greenstone/tika/GSTikaCLI.java
    197197
    198198To run the compiled class that's now in folder "build":
    199    GS3/gs2build/ext/tika>javac -cp "`pwd`/tika-app-*.jar:`pwd`/build" --html-with-images <inputfilepath> > output.html
     199   GS3/gs2build/ext/gstika>javac -cp "`pwd`/lib/tika-app-*.jar:`pwd`/build" --html-with-images <inputfilepath> > output.html
    200200
    201201
     
    215215
    216216
    217 cd gs2build/ext/tika
     217cd gs2build/ext/gstika
    218218./makeGSTikaCLI.sh
    219219./GSTikaCLI.sh --html-with-images <inputfile> > <outputfile>
    220220e.g. ./GSTikaCLI.sh --html-with-imgs --pretty-print --encoding=UTF-8 tmp/<file>.docx > tmp/<file>.html
     221
     222
    221223--------------------------------------------------------------
    222224F. COMPILING TIKA FROM SOURCE
Note: See TracChangeset for help on using the changeset viewer.