Ignore:
Timestamp:
2020-06-14T19:11:13+12:00 (4 years ago)
Author:
ak19
Message:

Some minor improvements to the UnknownConverterPlugin settings for tika's conversion (of docx files) to html. Also documenting the reasoning.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml

    r34169 r34172  
    8787            <!-- Configuring an UnknownConverterPlugin for docx processing with Tika -->
    8888            <plugin name="UnknownConverterPlugin">
    89               <option name="-exec_cmd" value="java -jar $GSDLHOME/ext/tika/tika-app-1.24.1.jar --html %%INPUT_FILE &gt; %%OUTPUT"/>
     89              <option name="-exec_cmd" value="java -jar $GSDLHOME/ext/tika/tika-app-1.24.1.jar --html --pretty-print --encoding=UTF-8 %%INPUT_FILE &gt; %%OUTPUT"/>
    9090              <option name="-convert_to" value="html"/>
    9191              <option name="-mime_type" value="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/>
Note: See TracChangeset for help on using the changeset viewer.