Ignore:
Timestamp:
2003-07-03T15:59:04+12:00 (21 years ago)
Author:
mdewsnip
Message:

Further work on standardising option descriptions. Specifically, in preparation for translating the option descriptions into other languages, all the option description strings have been moved in a "resource bundle" file (modelled on a Java resource bundle). (This also has the advantage of reducing the number of duplicate descriptions). The option descriptions in the plugins, classifiers, mkcol.pl, import.pl and buildcol.pl have been replaced with keys into this resource bundle (perllib/strings.rb). When translating the strings in this file into a new language, the new resource bundle should be named strings_<language-code>.rb (where <language-code> is a combination of language and country, eg. 'fr_FR' for the version of French spoken in France).

To support these changes, the PrintUsage module (perllib/printusage.pm) has new code for reading resource bundles and displaying the correct strings. Also, pluginfo.pl, classinfo.pl, mkcol.pl, import.pl and buildcol.pl have a new option (-language) for specifying the language code to display option descriptions in.

If a resource bundle for the specified language code does not exist, a generic resource bundle is used (strings.rb). This currently contains the English text descriptions. However, for users who always use Greenstone in another language, it would be easier to rename the standard file to strings_en_US.rb and rename the resource bundle of their desired language to strings.rb. This would mean they would not have to constantly specify their language with the -language option, since the default resource bundle will suit them.

Currently, the encoding names (in encodings.pm) are not part of this scheme. These are displayed as part of BasPlug's input_encoding option. It is debatable whether these names would be worth translating into other languages.

Parse errors in plugins and classifiers currently cause them to display the usage information using the default resource bundle. It is likely that BasPlug will soon have an option added to specify the language for the usage information in this case. (Note that this does not include using pluginfo.pl or classinfo.pl to display usage information - these have a -language option).

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/perllib/plugins/BasPlug.pm

    r4845 r4873  
    4545use printusage;
    4646
    47 my $unicode_list = 
     47my $unicode_list =
    4848    [ { 'name' => "auto",
    49     'desc' => "Use text categorization algorithm to automatically identify the encoding of each source document. This will be slower than explicitly setting the encoding but will work where more than one encoding is used within the same collection." } ,
     49    'desc' => "{BasPlug.input_encoding.auto}" },
    5050      { 'name' => "ascii",
    51     'desc' => "Plain 7 bit ascii. This may be a bit faster than using iso_8859_1. Beware of using this on a collection of documents that may contain characters outside the plain 7 bit ascii set though (e.g. German or French documents containing accents), use iso_8859_1 instead." },
     51    'desc' => "{BasPlug.input_encoding.ascii}" },
    5252      { 'name' => "utf8",
    53     'desc' => "either utf8 or unicode -- automatically detected." },
     53    'desc' => "{BasPlug.input_encoding.utf8}" },
    5454      { 'name' => "unicode",
    55     'desc' => "just unicode" } ];
    56 
    57 my $arguments = 
     55    'desc' => "{BasPlug.input_encoding.unicode}" } ];
     56
     57my $arguments =
    5858    [ { 'name' => "process_exp",
    59     'desc' => "A perl regular expression to match against filenames. Matching filenames will be processed by this plugin. For example, using '(?i).html?\$' matches all documents ending in .htm or .html (case-insensitive).",
     59    'desc' => "{BasPlug.process_exp}",
    6060    'type' => "string",
    6161    'deft' => "",
    6262    'reqd' => "no" },
    6363      { 'name' => "block_exp",
    64     'desc' => "Files matching this regular expression will be blocked from being passed to any later plugins in the list. This has no real effect other than to prevent lots of warning messages about input files you don't care about. Each plugin might have a default block_exp. e.g. by default HTMLPlug blocks any files with .gif, .jpg, .jpeg, .png or .css file extensions.",
    65     'type' => 'string',
     64    'desc' => "{BasPlug.block_exp}",
     65    'type' => "string",
    6666    'deft' => "",
    6767    'reqd' => "no" },
    6868      { 'name' => "input_encoding",
    69     'desc' => "The encoding of the source documents. Documents will be converted from these encodings and stored internally as utf8.",
     69    'desc' => "{BasPlug.input_encoding}",
    7070    'type' => "enum",
    7171    'list' => $unicode_list,
     
    7373    'deft' => "auto" } ,
    7474      { 'name' => "default_encoding",
    75     'desc' => "Use this encoding if -input_encoding is set to 'auto' and the text categorization algorithm fails to extract the encoding or extracts an encoding unsupported by Greenstone.",
     75    'desc' => "{BasPlug.default_encoding}",
    7676    'type' => "enum",
    7777    'reqd' => "no",
    7878        'deft' => "utf8" },
    7979      { 'name' => "extract_language",
    80     'desc' => "Identify the language of each document and set 'Language' metadata. Note that this will be done automatically if -input_encoding is 'auto'.",
     80    'desc' => "{BasPlug.extract_language}",
    8181    'type' => "flag",
    8282    'reqd' => "no" },
    8383      { 'name' => "default_language",
    84     'desc' => "If Greenstone fails to work out what language a document is the 'Language' metadata element will be set to this value. The default is 'en' (ISO 639 language symbols are used: en = English). Note that if -input_encoding is not set to 'auto' and -extract_language is not set, all documents will have their 'Language' metadata set to this value.",
     84    'desc' => "{BasPlug.default_language}",
    8585    'type' => "language",
    8686    'deft' => "en",
    8787    'reqd' => "no" },
    8888      { 'name' => "extract_acronyms",
    89     'desc' => "Extract acronyms from within text and set as metadata.",
     89    'desc' => "{BasPlug.extract_acronyms}",
    9090    'type' => "flag",
    9191    'reqd' => "no" },
    9292      { 'name' => "markup_acronyms",
    93     'desc' => "Add acronym metadata into document text.",
     93    'desc' => "{BasPlug.markup_acronyms}",
    9494    'type' => "flag",
    9595    'reqd' => "no" },
    9696      { 'name' => "first",
    97     'desc' => "Comma separated list of first sizes to extract from the text into a metadata field. The field is called 'FirstNNN'.",
     97    'desc' => "{BasPlug.first}",
    9898    'type' => "string",
    9999    'reqd' => "no" },
    100100      { 'name' => "extract_email",
    101     'desc' => "Extract email addresses as metadata.",
     101    'desc' => "{BasPlug.extract_email}",
    102102    'type' => "flag",
    103103    'reqd' => "no" },
    104104      { 'name' => "extract_historical_years",
    105     'desc' => "Extract time-period information from historical documents.  This is stored as metadata with the document. There is a search interface for this metadata, which you can include in your collection by adding the statement, \"format QueryInterface DateSearch\" to your collection configuration file.",
     105    'desc' => "{BasPlug.extract_historical_years}",
    106106    'type' => "flag",
    107107    'reqd' => "no" },
    108108      { 'name' => "maximum_year",
    109     'desc' => "The maximum historical date to be used as metadata (in a Common Era date, such as 1950).",
     109    'desc' => "{BasPlug.maximum_year}",
    110110    'type' => "int",
    111111    'deft' => (localtime)[5]+1900,
    112112    'reqd' => "no"},
    113113      { 'name' => "maximum_century",
    114     'desc' => "The maximum named century to be extracted as historical metadata (e.g. 14 will extract all references up to the 14th century).",
     114    'desc' => "{BasPlug.maximum_century}",
    115115    'type' => "int",
    116116    'deft' => "-1",
    117117    'reqd' => "no" },
    118118      { 'name' => "no_bibliography",
    119     'desc' => "Do not try to block bibliographic dates when extracting historical dates.",
     119    'desc' => "{BasPlug.no_bibliography}",
    120120    'type' => "flag",
    121121    'reqd' => "no"},
    122122      { 'name' => "cover_image",
    123     'desc' => "Will look for a prefix.jpg file (where prefix is the same prefix as the file being processed) and associate it as a cover image.",
     123    'desc' => "{BasPlug.cover_image}",
    124124    'type' => "flag",
    125125    'reqd' => "no" } ];
     
    131131
    132132
     133sub get_arguments
     134{
     135    local $self = shift(@_);
     136    local $optionlistref = $self->{'option_list'};
     137    local @optionlist = @$optionlistref;
     138    local $pluginoptions = pop(@$optionlistref);
     139    local $pluginarguments = $pluginoptions->{'args'};
     140    return $pluginarguments;
     141}
     142
     143
    133144sub print_xml_usage
    134145{
    135146    local $self = shift(@_);
    136 
    137     print STDERR "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\n";
    138     $self->print_xml();
     147    local $language = shift(@_);
     148
     149    &PrintUsage::print_xml_header();
     150    $self->print_xml($language);
    139151}
    140152
     
    143155{
    144156    local $self = shift(@_);
     157    local $language = shift(@_);
    145158
    146159    local $optionlistref = $self->{'option_list'};
     
    155168    print STDERR "  <Arguments>\n";
    156169    if (defined($pluginoptions->{'args'})) {
    157     &PrintUsage::print_options_xml($pluginoptions->{'args'});
     170    &PrintUsage::print_options_xml($language, $pluginoptions->{'args'});
    158171    }
    159172
    160173    # Recurse up the plugin hierarchy
    161     $self->print_xml();
     174    $self->print_xml($language);
    162175
    163176    print STDERR "  </Arguments>\n";
     
    169182{
    170183    local $self = shift(@_);
     184    local $language = shift(@_);
    171185
    172186    # Print the usage message for a plugin (recursively)
    173187    local $descoffset = $self->determine_description_offset(0);
    174     $self->print_plugin_usage($descoffset, 1);
     188    $self->print_plugin_usage($language, $descoffset, 1);
    175189}
    176190
     
    205219{
    206220    local $self = shift(@_);
     221    local $language = shift(@_);
    207222    local $descoffset = shift(@_);
    208223    local $isleafclass = shift(@_);
     
    234249
    235250    # Display the plugin options
    236     &PrintUsage::print_options_txt($pluginargs, $optiondescoffset);
     251    &PrintUsage::print_options_txt($language, $pluginargs, $optiondescoffset);
    237252    }
    238253
    239254    # Recurse up the plugin hierarchy
    240     $self->print_plugin_usage($descoffset, 0);
     255    $self->print_plugin_usage($language, $descoffset, 0);
    241256    $self->{'option_list'} = \@optionlist;
    242257}
     
    380395    print STDERR "\nThe $plugin_name plugin uses an incorrect general option (general options are those\n";
    381396    print STDERR "available to all plugins). Check your collect.cfg configuration file.\n";
    382         # &print_general_usage($plugin_name);
    383     $self->print_txt_usage();
     397    $self->print_txt_usage("");  # Use default resource bundle
    384398    die "\n";
    385399    }
Note: See TracChangeset for help on using the changeset viewer.