Changeset 6584 for trunk/gsdl/bin/script


Ignore:
Timestamp:
2004-01-22T14:17:30+13:00 (20 years ago)
Author:
kjdon
Message:

Fiddled around with segmenting for chinese text. Haven't changed how the
segmentation is done, or what character ranges are used.
But when its done is now controlled by the collect.cfg. There is a new
option, separate_cjk, values true or false, default false. Segmentation
is only done if this is set to true. This is passed as a global option to
all plugins by the import.pl script, so the user just needs to add it
once to the config file, not as an option to all plugins.
The queryaction uses this option too to determine whether or not to segment
the query.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/bin/script/import.pl

    r6407 r6584  
    348348    # options must be known before we read the collect.cfg))
    349349    my $plugins = [];
     350    my @global_opts = ();
     351   
    350352    $configfilename = &util::filename_cat ($ENV{'GSDLCOLLECTDIR'}, "etc", "collect.cfg");
    351353    if (-e $configfilename) {
     
    414416        $gli = 1;
    415417    }
     418
     419    # global plugin stuff
     420    if (defined $collectcfg->{'separate_cjk'}&& $collectcfg->{'separate_cjk'} =~ /^true$/i) {
     421        push @global_opts, "-separate_cjk";
     422    }
     423   
    416424
    417425    } else {
     
    433441
    434442    # load all the plugins
    435     $pluginfo = &plugin::load_plugins ($plugins, $verbosity, $out, $faillog);
     443    $pluginfo = &plugin::load_plugins ($plugins, $verbosity, $out, $faillog, \@global_opts);
    436444    if (scalar(@$pluginfo) == 0) {
    437445    print $out &lookup_string("{import.no_plugins_loaded}") . "\n";
Note: See TracChangeset for help on using the changeset viewer.