Ignore:
Timestamp:
2004-01-22T14:17:30+13:00 (20 years ago)
Author:
kjdon
Message:

Fiddled around with segmenting for chinese text. Haven't changed how the
segmentation is done, or what character ranges are used.
But when its done is now controlled by the collect.cfg. There is a new
option, separate_cjk, values true or false, default false. Segmentation
is only done if this is set to true. This is passed as a global option to
all plugins by the import.pl script, so the user just needs to add it
once to the config file, not as an option to all plugins.
The queryaction uses this option too to determine whether or not to segment
the query.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/src/recpt/querytools.cpp

    r4757 r6584  
    154154}
    155155
    156 void format_querystring (text_t &querystring, int querymode) {
     156void format_querystring (text_t &querystring, int querymode, bool segment) {
    157157  text_t formattedstring;
    158158
     159  if (querymode == 1 && !segment) return;
     160 
    159161  text_t::const_iterator here = querystring.begin();
    160162  text_t::const_iterator end = querystring.end();
     
    171173                 *here == '!' || *here == '&')) {
    172174      formattedstring.push_back(' ');
    173     } else {
     175    } else if (segment) {
    174176      if ((*here >= 0x4e00 && *here <= 0x9fa5) ||
    175177      (*here >= 0xf900 && *here <= 0xfa2d)) {
     
    184186    space = false;
    185187      }
     188   
     189    } else {
     190      formattedstring.push_back (*here);
    186191    }
    187192    here ++;
Note: See TracChangeset for help on using the changeset viewer.