Ignore:
Timestamp:
2004-01-22T14:17:30+13:00 (20 years ago)
Author:
kjdon
Message:

Fiddled around with segmenting for chinese text. Haven't changed how the
segmentation is done, or what character ranges are used.
But when its done is now controlled by the collect.cfg. There is a new
option, separate_cjk, values true or false, default false. Segmentation
is only done if this is set to true. This is passed as a global option to
all plugins by the import.pl script, so the user just needs to add it
once to the config file, not as an option to all plugins.
The queryaction uses this option too to determine whether or not to segment
the query.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/src/recpt/documentaction.cpp

    r5917 r6584  
    10081008 
    10091009  if (!args["q"].empty() && args.getintarg("hl")) {
     1010
     1011    ColInfoResponse_t *cinfo = recpt->get_collectinfo_ptr (collectproto, collection, logout);
     1012    bool segment = false;
     1013    if (cinfo != NULL) {
     1014      segment = cinfo->isSegmented;
     1015    }
    10101016    FilterRequest_t request;
    10111017    comerror_t err;
    10121018    request.filterResultOptions = FRmatchTerms;
    10131019    text_t formattedstring = args["q"];
    1014     format_querystring (formattedstring, args.getintarg("b"));
     1020    format_querystring (formattedstring, args.getintarg("b"), segment);
    10151021    set_queryfilter_options (request, formattedstring, args);
    10161022    collectproto->filter (args["c"], request, queryresponse, err, logout);
Note: See TracChangeset for help on using the changeset viewer.