Ignore:
Timestamp:
2008-08-25T09:58:13+12:00 (16 years ago)
Author:
kjdon
Message:

cjk character segmentation. text_t chars not big enough to handle numbers > 0xffff. have commented these ranges out in c++ and perl until we implement a better solution. these high ranges are only for extension sets anyway, so most common words will be segmented

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gsdl/trunk/runtime-src/src/recpt/querytools.cpp

    r16645 r16980  
    305305      formattedstring.push_back(' ');
    306306    } else if (segment) {
    307       if ((*here >= 0x2e80 && *here <= 0xfa6a) ||
    308           (*here >= 0x20000 && *here <= 0x2a6d6) ||
    309       (*here >= 0x2f800 && *here <= 0x2fa1d)) {
     307      if ((*here >= 0x2e80 && *here <= 0xd7a3) ||
     308      ( *here >= 0xf900 && *here <= 0xfa6a)) {
     309    /* text_t not big enough to handle these. */
     310    /*    (*here >= 0x20000 && *here <= 0x2a6d6) ||
     311      (*here >= 0x2f800 && *here <= 0x2fa1d)) { */
    310312   
    311313    // CJK character
Note: See TracChangeset for help on using the changeset viewer.