Opened 14 years ago
#681 new defect
mgpp word separator
Reported by: | kjdon | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Collection building wishlist |
Component: | Collection Building | Severity: | major |
Keywords: | Cc: |
Description
As reported in mailing list, 12-4-2010
Where can I define my own "word separator character" or bypass some characters in word separator functions in the Greenstone?
It seems like, my collection in the Greenstone is considering some Unicode special control characters as a space. For example, according to the Unicode standard, Mongolian text contents have four special control characters to change shapes (glyphs). Those are 1.Free Variation Selector One(FSV1) (U+180B), 2. Free Variation Selector Two (FSV2)(U+180C), 3. Free Variation Selector Three (FSV3)(U+180D) and 4. Mongolian vowel separator (MSV)(U+180E). Those control characters must be considered as a part of the word whether are in the middle, beginning and end of the word. For example, abc'MSV'defg is the single word, not two words 'abc' and 'defg'. I`ve failed to retrieve such words in the Greenstone. The Greenstone retrieves Mongolian words with control characters as two or more separate words (several control characters could used in a single word).