6.3.1: Search Index Options

There are some additional options controlling how the indexes are built. These may not be available for a particular index, in which case will be greyed out.

Stemming and case-folding may be enabled or disabled for MG and MGPP indexes. If enabled, stemmed and case-folded indexes will be created, and the user will have the option of searching with case folding and stemming on or off. If disabled, searching will be case-sensitive and unstemmed, and the options will not be displayed on the preferences page of the collection.

Accent-folding is available for MGPP indexes. This works in a similar way to case-folding, but instead of lower and upper case letters matching, letters with diacritics match those without. The Lucene index is accent-folded automatically, but no option to switch this on and off will be displayed to the user on the collection's preferences page.

Chinese, Japanese and Korean text is often not segmented into individual words. As indexing relies on word breaks being present in the text, this results in an unsearchable index. Setting the "CJK Text Segmentation" option will add spaces between each Chinese/Japanese/Korean character in the text and in search terms, so that character level searching is carried out.