Context Navigation

← Previous Change
Next Change →

collectionConfig.xml

Timestamp:

2020-06-14T03:40:21+12:00 (4 years ago)

Author:

ak19

Message:

All GS3 needs to convert docx files to basic html (no images) out of the box. 1. Adding in the Tika jar with its Apache 2.0 licence, a handcrafted notice derived from the license, and a Readme with links and examples of its use. 2. Updating model collectionConfig.xml with a pre-configured UnknownConverterPlugin to use the tika jar to convert docx to basic html. So all future GS3 collections will have this set up in the document pipeline and be ready for docx files. When the chance arises, need to set up a model coll for GS2 that uses the UnknownConverterPlugin in this way too.

File:

: 1 edited

main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml

-              r33740
+              r34169
             <plugin name="EmailPlugin"/>
             <plugin name="PDFv2Plugin"/>
+            <!-- Configuring an UnknownConverterPlugin for docx processing with Tika -->
+            <plugin name="UnknownConverterPlugin">
+              <option name="-exec_cmd" value="java -jar $GSDLHOME/ext/tika/tika-app-1.24.1.jar --html %%INPUT_FILE &gt; %%OUTPUT"/>
+              <option name="-convert_to" value="html"/>
+              <option name="-mime_type" value="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/>
+              <option name="-srcicon" value="icondocx"/>
+              <option name="-process_extension" value="docx"/>
+            </plugin>
             <plugin name="RTFPlugin"/>
             <plugin name="WordPlugin"/>

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 34169 for main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml

Legend:

main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml

Download in other formats: