Changeset 34175

Show
Ignore:
Timestamp:
15.06.2020 03:28:28 (4 weeks ago)
Author:
ak19
Message:

Minor changes to folder names

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • gs2-extensions/gstika/trunk/GS_TIKA_README.txt

    r34174 r34175  
    28281. HTML:     
    2929 
    30 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --html /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.htm 
     30GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --html /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.htm 
    3131 
    32322. XHTML - looks the same as HTML: 
    3333 
    34 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --xml /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
     34GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --xml /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
    3535 
    36363. PLAIN TEXT CONTENT - NO META: 
    3737 
    38 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
     38GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
    3939 
    4040  a. PLAIN TEXT WITH META: 
    4141 
    42 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
     42GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html 
    4343 
    4444  b. JUST META: 
    4545 
    46 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --metadata /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html) 
     46GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --metadata /PATH/TO/testword.docx > /PATH/TO/GS3/gs2build/ext/tmp/testword.html) 
    4747     
    48484. IMAGES CAN'T DO HTML + IMAGES IN ONE STEP by throwing in any of the above flags in addition): 
    4949 
    5050Extracts all attachments (images etc) into specified dir (-z or --extract and then specify a dir for it) 
    51 GS3/gs2build/ext/tika>java -jar tika-app-*.jar --extract --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx      
     51GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --extract --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx        
    5252 
    5353 
     
    5555C. COMPARE OUTPUT - IMG EXTRACTION vs TEXT: 
    5656-------------------------------------------------------------- 
    57 * GS3/gs2build/ext/tika>java -jar tika-app-*.jar -z --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx 
     57* GS3/gs2build/ext/gstika>java -jar tika-app-*.jar -z --extract-dir=/PATH/TO/GS3/gs2build/ext/tmp /PATH/TO/testword.docx 
    5858 
    5959INFO  As a convenience, TikaCLI has turned on extraction of 
     
    7272 
    7373 
    74 * GS3/gs2build/ext/tika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx 
     74* GS3/gs2build/ext/gstika>java -jar tika-app-*.jar --text-main /PATH/TO/testword.docx 
    7575 
    7676Jun 14, 2020 1:29:42 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem 
     
    1861862. It stands alone and can be compiled and run against the tika-app-*.jar file on the classpath: 
    187187To compile 
    188    GS3/gs2build/ext/tika>javac -cp `pwd`/tika-app-*.jar org/greenstone/tika/GSTikaCLI.java 
     188   GS3/gs2build/ext/gstika>javac -cp `pwd`/lib/tika-app-*.jar org/greenstone/tika/GSTikaCLI.java 
    189189To run: 
    190    GS3/gs2build/ext/tika>java -cp "`pwd`/tika-app-*.jar:." org.greenstone.tika.GSTikaCLI --html-with-images <inputfilepath> > output.html 
     190   GS3/gs2build/ext/gstika>java -cp "`pwd`/lib/tika-app-*.jar:." org.greenstone.tika.GSTikaCLI --html-with-images <inputfilepath> > output.html 
    191191 
    192192(Can pass existing flags, e.g. --html for html without images extracted) 
     
    194194To compile code that lives in a directory called "src" and compile it into a directory called "build": 
    195195 
    196    GS3/gs2build/ext/tika>javac -cp `pwd`/tika-app-*.jar -d `pwd`/build src/org/greenstone/tika/GSTikaCLI.java 
     196   GS3/gs2build/ext/gstika>javac -cp `pwd`/lib/tika-app-*.jar -d `pwd`/build src/org/greenstone/tika/GSTikaCLI.java 
    197197 
    198198To run the compiled class that's now in folder "build": 
    199    GS3/gs2build/ext/tika>javac -cp "`pwd`/tika-app-*.jar:`pwd`/build" --html-with-images <inputfilepath> > output.html 
     199   GS3/gs2build/ext/gstika>javac -cp "`pwd`/lib/tika-app-*.jar:`pwd`/build" --html-with-images <inputfilepath> > output.html 
    200200 
    201201 
     
    215215 
    216216 
    217 cd gs2build/ext/tika 
     217cd gs2build/ext/gstika 
    218218./makeGSTikaCLI.sh 
    219219./GSTikaCLI.sh --html-with-images <inputfile> > <outputfile> 
    220220e.g. ./GSTikaCLI.sh --html-with-imgs --pretty-print --encoding=UTF-8 tmp/<file>.docx > tmp/<file>.html 
     221 
     222 
    221223-------------------------------------------------------------- 
    222224F. COMPILING TIKA FROM SOURCE