Opened 16 years ago
Closed 16 years ago
#342 closed defect (fixed)
CJK character segmentation
Reported by: | kjdon | Owned by: | kjdon |
---|---|---|---|
Priority: | high | Milestone: | Next Release (2 or 3) |
Component: | Collection Building | Severity: | enhancement |
Keywords: | build-overhaul | Cc: |
Description
My plugin changes have meant that this doesn't work anymore.
TODO:
- Make it work for Japanese and Korean (I think I have done this but not committed yet)
- The option is not available anymore for all plugins. We used to use a global collect.cfg option which was added to all plugins, and also used by runtime. How to do this now?
- Add the option to all plugins that have text, not just ReadTextFile ones.
Change History (3)
comment:1 by , 16 years ago
Status: | new → assigned |
---|
comment:2 by , 16 years ago
comment:3 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
separate_cjk is now an indexoption (along with stem, case, accentfold)
The text is segmented before going to the indexer.
Note:
See TracTickets
for help on using tickets.
the option is now part of AutoExtractMetadata (which needs to be renamed).
works for chinese japanese and korean.
Just have the config file and gli issue left to do.