Changeset 31762

Show
Ignore:
Timestamp:
29.06.2017 17:29:43 (4 weeks ago)
Author:
ak19
Message:

Changed the placeholder names to what Dr Bainbridge suggested, which have % signs prefixed to them.

Location:
main/trunk/greenstone2/perllib
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/UnknownConverterPlugin.pm

    r31761 r31762  
    280280    #$cmd ="/Scratch/ak19/gs3-svn-15Nov2016/packages/jre/bin/java -cp \"/Scratch/ak19/gs3-svn-15Nov2016/gs2build/ext/pdf-box/lib/java/pdfbox-app.jar\" -Dline.separator=\"<br />\" org.apache.pdfbox.ExtractText -html \"/Scratch/ak19/tutorial_sample_files/pdfbox/A9-access-best-practices.pdf\" \"/Scratch/ak19/gs3-svn-15Nov2016/pdf-tmp/1.html\""; 
    281281 
    282     #$cmd ="/Scratch/ak19/gs3-svn-15Nov2016/packages/jre/bin/java -cp \"/Scratch/ak19/gs3-svn-15Nov2016/gs2build/ext/pdf-box/lib/java/pdfbox-app.jar\" -Dline.separator=\"<br />\" org.apache.pdfbox.ExtractText -html INPUT_FILE OUTPUT"; 
     282    #$cmd ="/Scratch/ak19/gs3-svn-15Nov2016/packages/jre/bin/java -cp \"/Scratch/ak19/gs3-svn-15Nov2016/gs2build/ext/pdf-box/lib/java/pdfbox-app.jar\" -Dline.separator=\"<br />\" org.apache.pdfbox.ExtractText -html %INPUT_FILE %OUTPUT"; 
    283283 
    284284    # replace occurrences of placeholders in cmd string 
    285285    #$cmd =~ s@\"@\\"@g; 
    286     $cmd =~ s@INPUT_FILE@\"$input_filename\"@g; 
     286    $cmd =~ s@%INPUT_FILE@\"$input_filename\"@g; 
    287287    if(defined $output_dirname) { 
    288     $cmd =~ s@OUTPUT@\"$output_dirname\"@g; 
     288    $cmd =~ s@%OUTPUT@\"$output_dirname\"@g; 
    289289    } else { 
    290     $cmd =~ s@OUTPUT@\"$output_filename\"@g; 
     290    $cmd =~ s@%OUTPUT@\"$output_filename\"@g; 
    291291    } 
    292292 
  • main/trunk/greenstone2/perllib/strings.properties

    r31757 r31762  
    12711271TextPlugin.title_sub:Substitution expression to modify string stored as Title. Used by, for example, PostScriptPlugin to remove "Page 1" etc from text used as the title. 
    12721272 
    1273 UnknownConverterPlugin.desc:If you have a custom conversion tool installed that you're able to run from the command line to convert from an unsupported document format to text, HTML or a series of images in jpg, png or gif form, then provide that command to this Plugin. It will then run the command for you, capturing the output for indexing by Greenstone, making any documents that aren't converted to images searchable. Set the process_extension to the suffix of files to be converted. Set convert_to to be the output format that the conversion command will generate, which will determine the output file's suffix. Use INPUT_FILE and OUTPUT as place holders in the command, which Greenstone will replace. It will pass in the full path to each file that matches the process_extension suffix in turn as INPUT_FILE. OUTPUT will be replaced with a path in the temporary folder of the output file with suffix determined by the value of convert_to. If convert_to is a pagedimg type, Greenstone sets OUTPUT to be a directory to contain the expected files and will create an item file collating the parts of the document. 
     1273UnknownConverterPlugin.desc:If you have a custom conversion tool installed that you're able to run from the command line to convert from an unsupported document format to text, HTML or a series of images in jpg, png or gif form, then provide that command to this Plugin. It will then run the command for you, capturing the output for indexing by Greenstone, making any documents that aren't converted to images searchable. Set the process_extension to the suffix of files to be converted. Set convert_to to be the output format that the conversion command will generate, which will determine the output file's suffix. Use %INPUT_FILE and %OUTPUT as place holders in the command, which Greenstone will replace. It will pass in the full path to each file that matches the process_extension suffix in turn as %INPUT_FILE. $OUTPUT will be replaced with a path in the temporary folder of the output file with suffix determined by the value of convert_to. If convert_to is a pagedimg type, Greenstone sets %OUTPUT to be a directory to contain the expected files and will create an item file collating the parts of the document. 
    12741274 
    12751275UnknownConverterPlugin.exec_cmd:Command line command string to execute that will do the conversion. Quoted elements need to have the quotes escaped with a backslash to preserve them.