Opened 16 years ago

Closed 16 years ago

#342 closed defect (fixed)

CJK character segmentation

Reported by: kjdon Owned by: kjdon
Priority: high Milestone: Next Release (2 or 3)
Component: Collection Building Severity: enhancement
Keywords: build-overhaul Cc:

Description

My plugin changes have meant that this doesn't work anymore.

TODO:

  • Make it work for Japanese and Korean (I think I have done this but not committed yet)
  • The option is not available anymore for all plugins. We used to use a global collect.cfg option which was added to all plugins, and also used by runtime. How to do this now?
  • Add the option to all plugins that have text, not just ReadTextFile ones.

Change History (3)

comment:1 by kjdon, 16 years ago

Status: newassigned

comment:2 by kjdon, 16 years ago

the option is now part of AutoExtractMetadata (which needs to be renamed).

works for chinese japanese and korean.

Just have the config file and gli issue left to do.

comment:3 by kjdon, 16 years ago

Resolution: fixed
Status: assignedclosed

separate_cjk is now an indexoption (along with stem, case, accentfold)

The text is segmented before going to the indexer.

Note: See TracTickets for help on using tickets.