Ignore:
Timestamp:
2017-10-20T21:33:01+13:00 (7 years ago)
Author:
ak19
Message:

Fixes to get icecite to convert PDFs to txt on Windows. See added sections in the GS-Icecite-README file committed

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/gs-icecite/GS-Icecite-README

    r32029 r32051  
    20201. In order to compile up Icecite, you will have to set up the environment for JDK8:
    2121
    22     export JAVA_HOME=/opt/java8/
     22    export JAVA_HOME=/opt/java8
    2323    export PATH=$JAVA_HOME/bin:$PATH
    2424
     
    138138Since we provide the absolute path to the jar nested within pdf-cli, we no longer need to cd into pdf-cli first to run the jar executable.
    139139
     140
     1414. In order to get IceCite built on Linux to work on Windows, to convert PDF to txt, make the following 2 changes to both the following java files both found in icecite/commons/src/main/java/de/freiburg/iif/path/
     142
     143- PathUtils.java
     144- LineReader.java
     145
     146Changes to make:
     147a. Add the import statement
     148   import java.net.URISyntaxException;
     149
     150b. Replace
     151    Path jarFile = Paths.get(codeSource.getLocation().getPath());
     152with
     153    // GREENSTONE MOD:
     154    // The following line causes problem on Windows with parsing
     155    // the cmdline args when running pdf-cli jar:
     156    //Path jarFile = Paths.get(codeSource.getLocation().getPath());
     157    // See https://stackoverflow.com/questions/43972777/exception-in-thread-main-java-nio-file-invalidpathexception-illegal-char
     158    // for the error message and solution   
     159    Path jarFile = null;
     160    try {
     161    String jarPath = Paths.get(codeSource.getLocation().toURI()).toString();
     162    jarFile = Paths.get(jarPath);
     163    } catch(URISyntaxException e) {
     164    System.err.println("**** URISyntaxException. Couldn't convert CodeSource URL to URI: " + codeSource.getLocation());
     165    // fallback to old way that works on linux, since declaring this method as
     166    // "throws URISyntaxException" will require dealing with that bubbled up
     167    // exception in all calling methods. As this appears to be a common utility
     168    // method, that could make for a lot of calling code that needs editing
     169    jarFile = Paths.get(codeSource.getLocation().getPath());
     170    }
     171
     172c. When running on either Linux or Windows, provide the full filepaths to both input and output files. Using ~/ in filepaths on Linux, to denote home folders, is alright.
     173A windows command looks as follows, note double quotes in place of single ones around the classpath value, and the Windows PATH separator in classpath. But the backslashes in classpath also work if they're forward slashes:
     174
     175    java -classpath "C:\Path\to\GS3\ext\icecite\gs-installed-jars\*;C:\Path\to\GS3\icecite\pdf-cli\target\pdf-cli-0.0.1-SNAPSHOT-jar-with-dependencies.jar" cli.PdfParserCommandLine --format txt --feature words C:\Path\to\24.pdf C:\Path\to\24converted.txt
Note: See TracChangeset for help on using the changeset viewer.