Ignore:
Timestamp:
2015-06-30T19:17:36+12:00 (9 years ago)
Author:
ak19
Message:

Final commit (I think) to get update to solr getTerms() to work on gs3 checkout. The solr-jdbm-demo collection needed to be rebuilt with the changes to the index. This time added in other .xml files from the lucene/solr upgrade to the colleciton, and updated schema.xml and solrconfig.xml. This last is especially necessary as it uses the new Greenstone custom SearchHandler to get getTerms() to work.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/solr/trunk/src/collect/solr-jdbm-demo/etc/conf/schema.xml

    r28092 r30001  
    4646-->
    4747
    48 <schema name="example" version="1.4">
     48<schema name="example" version="1.5">
    4949  <!-- attribute "name" is the name of this schema and is only used for display purposes.
    50        Applications should change this to reflect the nature of the search collection.
    51        version="1.4" is Solr's version number for the schema syntax and semantics.  It should
    52        not normally be changed by applications.
    53        1.0: multiValued attribute did not exist, all fields are multiValued by nature
     50       version="x.y" is Solr's version number for the schema syntax and
     51       semantics.  It should not normally be changed by applications.
     52
     53       1.0: multiValued attribute did not exist, all fields are multiValued
     54            by nature
    5455       1.1: multiValued attribute introduced, false by default
    55        1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
     56       1.2: omitTermFreqAndPositions attribute introduced, true by default
     57            except for text fields.
    5658       1.3: removed optional field compress feature
    57        1.4: default auto-phrase (QueryParser feature) to off
     59       1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
     60            behavior when a single string produces multiple tokens.  Defaults
     61            to off for version >= 1.4
     62       1.5: omitNorms defaults to true for primitive field types
     63            (int, float, boolean, string...)
    5864     -->
    5965
     66<fields>
     67   <!-- Valid attributes for fields:
     68     name: mandatory - the name for the field
     69     type: mandatory - the name of a field type from the
     70       <types> fieldType section
     71     indexed: true if this field should be indexed (searchable or sortable)
     72     stored: true if this field should be retrievable
     73     docValues: true if this field should have doc values. Doc values are
     74       useful for faceting, grouping, sorting and function queries. Although not
     75       required, doc values will make the index faster to load, more
     76       NRT-friendly and more memory-efficient. They however come with some
     77       limitations: they are currently only supported by StrField, UUIDField
     78       and all Trie*Fields, and depending on the field type, they might
     79       require the field to be single-valued, be required or have a default
     80       value (check the documentation of the field type you're interested in
     81       for more information)
     82     multiValued: true if this field may contain multiple values per document
     83     omitNorms: (expert) set to true to omit the norms associated with
     84       this field (this disables length normalization and index-time
     85       boosting for the field, and saves some memory).  Only full-text
     86       fields or fields that need an index-time boost need norms.
     87       Norms are omitted for primitive (non-analyzed) types by default.
     88     termVectors: [false] set to true to store the term vector for a
     89       given field.
     90       When using MoreLikeThis, fields used for similarity should be
     91       stored for best performance.
     92     termPositions: Store position information with the term vector. 
     93       This will increase storage costs.
     94     termOffsets: Store offset information with the term vector. This
     95       will increase storage costs.
     96     required: The field is required.  It will throw an error if the
     97       value does not exist
     98     default: a value that should be used if no value is specified
     99       when adding a document.
     100   -->
     101
     102   <!-- field names should consist of alphanumeric or underscore characters only and
     103      not start with a digit.  This is not currently strictly enforced,
     104      but other field names will not have first class support from all components
     105      and back compatibility is not guaranteed.  Names with both leading and
     106      trailing underscores (e.g. _version_) are reserved.
     107   -->
     108
     109   <!-- If you remove this field, you must _also_ disable the update log in solrconfig.xml
     110      or Solr won't start. _version_ and update log are required for SolrCloud
     111   -->
     112
     113   <field name="docOID" type="string" indexed="true" stored="true" required="true" />
     114
     115    <field name="ZZ" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
     116    <field name="TX" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
     117    <field name="TI" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
     118    <field name="SU" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
     119    <field name="ORG" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
     120
     121
     122   <field name="_version_" type="long" indexed="true" stored="true"/>
     123   
     124   <!-- points to the root document of a block of nested documents. Required for nested
     125      document support, may be removed otherwise
     126   -->
     127   <field name="_root_" type="string" indexed="true" stored="false"/>
     128
     129   <!-- Only remove the "id" field if you have a very good reason to. While not strictly
     130     required, it is highly recommended. A <uniqueKey> is present in almost all Solr
     131     installations. See the <uniqueKey> declaration below where <uniqueKey> is set to "id".
     132   -->   
     133<!--
     134   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
     135-->
     136
     137<!--       
     138   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
     139   <field name="name" type="text_general" indexed="true" stored="true"/>
     140   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
     141   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
     142   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
     143   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
     144
     145   <field name="weight" type="float" indexed="true" stored="true"/>
     146   <field name="price"  type="float" indexed="true" stored="true"/>
     147   <field name="popularity" type="int" indexed="true" stored="true" />
     148   <field name="inStock" type="boolean" indexed="true" stored="true" />
     149-->
     150   <field name="store" type="location" indexed="true" stored="true"/>
     151
     152   <!-- Common metadata fields, named specifically to match up with
     153     SolrCell metadata when parsing rich documents such as Word, PDF.
     154     Some fields are multiValued only because Tika currently may return
     155     multiple values for them. Some metadata is parsed from the documents,
     156     but there are some which come from the client context:
     157       "content_type": From the HTTP headers of incoming stream
     158       "resourcename": From SolrCell request param resource.name
     159   -->
     160
     161<!--
     162   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
     163   <field name="subject" type="text_general" indexed="true" stored="true"/>
     164   <field name="description" type="text_general" indexed="true" stored="true"/>
     165   <field name="comments" type="text_general" indexed="true" stored="true"/>
     166   <field name="author" type="text_general" indexed="true" stored="true"/>
     167   <field name="keywords" type="text_general" indexed="true" stored="true"/>
     168   <field name="category" type="text_general" indexed="true" stored="true"/>
     169   <field name="resourcename" type="text_general" indexed="true" stored="true"/>
     170   <field name="url" type="text_general" indexed="true" stored="true"/>
     171   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
     172   <field name="last_modified" type="date" indexed="true" stored="true"/>
     173   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
     174-->
     175
     176   <!-- Main body of document extracted by SolrCell.
     177        NOTE: This field is not indexed by default, since it is also copied to "text"
     178        using copyField below. This is to save space. Use this field for returning and
     179        highlighting document content. Use the "text" field to search the content. -->
     180   <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
     181   
     182
     183   <!-- catchall field, containing all other searchable text fields (implemented
     184        via copyField further on in this schema  -->
     185   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
     186
     187   <!-- catchall text field that indexes tokens both normally and in reverse for efficient
     188        leading wildcard queries. -->
     189   <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>
     190
     191   <!-- non-tokenized version of manufacturer to make it easier to sort or group
     192        results by manufacturer.  copied from "manu" via copyField -->
     193   <field name="manu_exact" type="string" indexed="true" stored="false"/>
     194
     195   <field name="payloads" type="payloads" indexed="true" stored="true"/>
     196
     197
     198   <!--
     199     Some fields such as popularity and manu_exact could be modified to
     200     leverage doc values:
     201     <field name="popularity" type="int" indexed="true" stored="true" docValues="true" />
     202     <field name="manu_exact" type="string" indexed="false" stored="false" docValues="true" />
     203     <field name="cat" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
     204
     205
     206     Although it would make indexing slightly slower and the index bigger, it
     207     would also make the index faster to load, more memory-efficient and more
     208     NRT-friendly.
     209     -->
     210
     211   <!-- Dynamic field definitions allow using convention over configuration
     212       for fields via the specification of patterns to match field names.
     213       EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
     214       RESTRICTION: the glob-like pattern in the name attribute must have
     215       a "*" only at the start or the end.  -->
     216   
     217   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
     218   <dynamicField name="*_is" type="int"    indexed="true"  stored="true"  multiValued="true"/>
     219   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />
     220   <dynamicField name="*_ss" type="string"  indexed="true"  stored="true" multiValued="true"/>
     221   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
     222   <dynamicField name="*_ls" type="long"   indexed="true"  stored="true"  multiValued="true"/>
     223   <dynamicField name="*_t"  type="text_general"    indexed="true"  stored="true"/>
     224   <dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>
     225   <dynamicField name="*_en"  type="text_en"    indexed="true"  stored="true" multiValued="true"/>
     226   <dynamicField name="*_b"  type="boolean" indexed="true" stored="true"/>
     227   <dynamicField name="*_bs" type="boolean" indexed="true" stored="true"  multiValued="true"/>
     228   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
     229   <dynamicField name="*_fs" type="float"  indexed="true"  stored="true"  multiValued="true"/>
     230   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
     231   <dynamicField name="*_ds" type="double" indexed="true"  stored="true"  multiValued="true"/>
     232
     233   <!-- Type used to index the lat and lon components for the "location" FieldType -->
     234   <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false" />
     235
     236   <dynamicField name="*_dt"  type="date"    indexed="true"  stored="true"/>
     237   <dynamicField name="*_dts" type="date"    indexed="true"  stored="true" multiValued="true"/>
     238   <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>
     239
     240   <!-- some trie-coded dynamic fields for faster range queries -->
     241   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
     242   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
     243   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
     244   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
     245   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>
     246
     247   <dynamicField name="*_pi"  type="pint"    indexed="true"  stored="true"/>
     248   <dynamicField name="*_c"   type="currency" indexed="true"  stored="true"/>
     249
     250   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
     251   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
     252
     253   <dynamicField name="random_*" type="random" />
     254
     255   <!-- dynamic field for sort/facet fields, which are strings by default. ie not tokenised -->
     256   <dynamicField name="by*" type="string" indexed="true" stored="false" multiValued="false" />
     257
     258   <!-- uncomment the following to ignore any fields that don't already match an existing
     259        field name or dynamic field, rather than reporting them as an error.
     260        alternately, change the type="ignored" to some other type e.g. "text" if you want
     261        unknown fields indexed and/or stored by default -->
     262   <!--dynamicField name="*" type="ignored" multiValued="true" /-->
     263   
     264 </fields>
     265
     266
     267 <!-- Field to use to determine and enforce document uniqueness.
     268      Unless this field is marked with required="false", it will be a required field
     269   -->
     270 <uniqueKey>docOID</uniqueKey>
     271
     272 <!-- DEPRECATED: The defaultSearchField is consulted by various query parsers when
     273  parsing a query string that isn't explicit about the field.  Machine (non-user)
     274  generated queries are best made explicit, or they can use the "df" request parameter
     275  which takes precedence over this.
     276  Note: Un-commenting defaultSearchField will be insufficient if your request handler
     277  in solrconfig.xml defines "df", which takes precedence. That would need to be removed.
     278 <defaultSearchField>text</defaultSearchField> -->
     279
     280 <!-- DEPRECATED: The defaultOperator (AND|OR) is consulted by various query parsers
     281  when parsing a query string to determine if a clause of the query should be marked as
     282  required or optional, assuming the clause isn't already marked by some operator.
     283  The default is OR, which is generally assumed so it is not a good idea to change it
     284  globally here.  The "q.op" request parameter takes precedence over this.
     285 <solrQueryParser defaultOperator="OR"/> -->
     286
     287  <!-- copyField commands copy one field to another at the time a document
     288        is added to the index.  It's used either to index the same field differently,
     289        or to add multiple fields to the same field for easier/faster searching.  -->
     290<!--
     291   <copyField source="cat" dest="text"/>
     292   <copyField source="name" dest="text"/>
     293   <copyField source="manu" dest="text"/>
     294   <copyField source="features" dest="text"/>
     295   <copyField source="includes" dest="text"/>
     296   <copyField source="manu" dest="manu_exact"/>
     297-->
     298
     299   <!-- Copy the price into a currency enabled field (default USD) -->
     300<!--
     301   <copyField source="price" dest="price_c"/>
     302-->
     303
     304   <!-- Text fields from SolrCell to search by default in our catch-all field -->
     305<!--
     306   <copyField source="title" dest="text"/>
     307   <copyField source="author" dest="text"/>
     308   <copyField source="description" dest="text"/>
     309   <copyField source="keywords" dest="text"/>
     310   <copyField source="content" dest="text"/>
     311   <copyField source="content_type" dest="text"/>
     312   <copyField source="resourcename" dest="text"/>
     313   <copyField source="url" dest="text"/>
     314-->
     315
     316   <!-- Create a string version of author for faceting -->
     317<!--
     318   <copyField source="author" dest="author_s"/>
     319-->
     320   
     321   <!-- Above, multiple source fields are copied to the [text] field.
     322      Another way to map multiple source fields to the same
     323      destination field is to use the dynamic field syntax.
     324      copyField also supports a maxChars to copy setting.  -->
     325       
     326   <!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->
     327
     328   <!-- copy name to alphaNameSort, a field designed for sorting by name -->
     329   <!-- <copyField source="name" dest="alphaNameSort"/> -->
     330 
    60331  <types>
    61332    <!-- field type definitions. The "name" attribute is
     
    63334       attribute and any other attributes determine the real
    64335       behavior of the fieldType.
    65          Class names starting with "solr" refer to java classes in the
    66        org.apache.solr.analysis package.
     336         Class names starting with "solr" refer to java classes in a
     337       standard package such as org.apache.solr.analysis
    67338    -->
    68339
    69     <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
    70     <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
     340    <!-- The StrField type is not analyzed, but indexed/stored verbatim.
     341       It supports doc values but in that case the field needs to be
     342       single-valued and either required or have a default value.
     343      -->
     344    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    71345
    72346    <!-- boolean type: "true" or "false" -->
    73     <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
    74     <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
    75     <fieldtype name="binary" class="solr.BinaryField"/>
    76 
    77     <!-- The optional sortMissingLast and sortMissingFirst attributes are
    78          currently supported on types that are sorted internally as strings.
    79            This includes "string","boolean","sint","slong","sfloat","sdouble","pdate"
     347    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
     348
     349    <!-- sortMissingLast and sortMissingFirst attributes are optional attributes are
     350         currently supported on types that are sorted internally as strings
     351         and on numeric types.
     352         This includes "string","boolean", and, as of 3.5 (and 4.x),
     353         int, float, long, date, double, including the "Trie" variants.
    80354       - If sortMissingLast="true", then a sort on this field will cause documents
    81355         without the field to come after documents with the field,
     
    91365    <!--
    92366      Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
     367
     368      These fields support doc values, but they require the field to be
     369      single-valued and either be required or have a default value.
    93370    -->
    94     <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    95     <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    96     <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    97     <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
     371    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
     372    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
     373    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
     374    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
    98375
    99376    <!--
     
    107384     A precisionStep of 0 disables indexing at different precision levels.
    108385    -->
    109     <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    110     <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    111     <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    112     <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
     386    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
     387    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
     388    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
     389    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
    113390
    114391    <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
     
    134411         Note: For faster range queries, consider the tdate type
    135412      -->
    136     <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
     413    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
    137414
    138415    <!-- A Trie based date field for faster date range queries and date faceting. -->
    139     <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
    140 
     416    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
     417
     418
     419    <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
     420    <fieldtype name="binary" class="solr.BinaryField"/>
    141421
    142422    <!--
    143423      Note:
    144       These should only be used for compatibility with existing indexes (created with older Solr versions)
    145       or if "sortMissingFirst" or "sortMissingLast" functionality is needed. Use Trie based fields instead.
    146 
     424      These should only be used for compatibility with existing indexes (created with lucene or older Solr versions).
     425      Use Trie based fields instead. As of Solr 3.5 and 4.x, Trie based fields support sortMissingFirst/Last
     426     
    147427      Plain numeric field types that store and index the text
    148       value verbatim (and hence don't support range queries, since the
     428      value verbatim (and hence don't correctly support range queries, since the
    149429      lexicographic ordering isn't equal to the numeric ordering)
    150430    -->
    151     <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
    152     <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
    153     <fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
    154     <fieldType name="pdouble" class="solr.DoubleField" omitNorms="true"/>
    155     <fieldType name="pdate" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
    156 
    157 
    158     <!--
    159       Note:
    160       These should only be used for compatibility with existing indexes (created with older Solr versions)
    161       or if "sortMissingFirst" or "sortMissingLast" functionality is needed. Use Trie based fields instead.
    162 
    163       Numeric field types that manipulate the value into
    164       a string value that isn't human-readable in its internal form,
    165       but with a lexicographic ordering the same as the numeric ordering,
    166       so that range queries work correctly.
    167     -->
    168     <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
    169     <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
    170     <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
    171     <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
    172 
     431    <fieldType name="pint" class="solr.IntField"/>
     432    <fieldType name="plong" class="solr.LongField"/>
     433    <fieldType name="pfloat" class="solr.FloatField"/>
     434    <fieldType name="pdouble" class="solr.DoubleField"/>
     435    <fieldType name="pdate" class="solr.DateField" sortMissingLast="true"/>
    173436
    174437    <!-- The "RandomSortField" is not used to store or search any
    175438         data.  You can declare fields of this type it in your schema
    176439         to generate pseudo-random orderings of your docs for sorting
    177          purposes.  The ordering is generated based on the field name
    178          and the version of the index, As long as the index version
     440         or function purposes.  The ordering is generated based on the field
     441         name and the version of the index. As long as the index version
    179442         remains unchanged, and the same field name is reused,
    180443         the ordering of the docs will be consistent. 
    181444         If you want different psuedo-random orderings of documents,
    182445         for the same version of the index, use a dynamicField and
    183          change the name
     446         change the field name in the request.
    184447     -->
    185448    <fieldType name="random" class="solr.RandomSortField" indexed="true" />
     
    198461
    199462    <!-- One can also specify an existing Analyzer class that has a
    200          default constructor via the class attribute on the analyzer element
     463         default constructor via the class attribute on the analyzer element.
     464         Example:
    201465    <fieldType name="text_greek" class="solr.TextField">
    202466      <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
     
    219483      <analyzer type="index">
    220484        <tokenizer class="solr.StandardTokenizerFactory"/>
    221         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
     485        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    222486        <!-- in this example, we will only use synonyms at query time
    223487        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
     
    227491      <analyzer type="query">
    228492        <tokenizer class="solr.StandardTokenizerFactory"/>
    229         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
     493        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    230494        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    231495        <filter class="solr.LowerCaseFilterFactory"/>
     
    235499    <!-- A text field with defaults appropriate for English: it
    236500         tokenizes with StandardTokenizer, removes English stop words
    237          (stopwords_en.txt), down cases, protects words from protwords.txt, and
     501         (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
    238502         finally applies Porter's stemming.  The query time analyzer
    239503         also applies synonyms from synonyms.txt. -->
     
    245509        -->
    246510        <!-- Case insensitive stop word removal.
    247           add enablePositionIncrements=true in both the index and query
    248           analyzers to leave a 'gap' for more accurate phrase queries.
    249511        -->
    250512        <filter class="solr.StopFilterFactory"
    251513                ignoreCase="true"
    252                 words="stopwords_en.txt"
    253                 enablePositionIncrements="true"
     514                words="lang/stopwords_en.txt"
    254515                />
    255516        <filter class="solr.LowerCaseFilterFactory"/>
     
    259520        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    260521    -->
    261         <filter class="solr.PorterStemFilterFactory"/>
     522        <!--<filter class="solr.PorterStemFilterFactory"/>-->
     523    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    262524      </analyzer>
    263525      <analyzer type="query">
     
    266528        <filter class="solr.StopFilterFactory"
    267529                ignoreCase="true"
    268                 words="stopwords_en.txt"
    269                 enablePositionIncrements="true"
     530                words="lang/stopwords_en.txt"
    270531                />
    271532        <filter class="solr.LowerCaseFilterFactory"/>
     
    275536        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    276537    -->
    277         <filter class="solr.PorterStemFilterFactory"/>
     538        <!--<filter class="solr.PorterStemFilterFactory"/>-->
     539    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    278540      </analyzer>
    279541    </fieldType>
     
    286548     non-alphanumeric chars.  This means certain compound word
    287549     cases will work, for example query "wi fi" will match
    288      document "WiFi" or "wi-fi".  However, other cases will still
    289      not match, for example if the query is "wifi" and the
    290      document is "wi fi" or if the query is "wi-fi" and the
    291      document is "wifi".
     550     document "WiFi" or "wi-fi".
    292551        -->
    293552    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
     
    298557        -->
    299558        <!-- Case insensitive stop word removal.
    300           add enablePositionIncrements=true in both the index and query
    301           analyzers to leave a 'gap' for more accurate phrase queries.
    302559        -->
    303560        <filter class="solr.StopFilterFactory"
    304561                ignoreCase="true"
    305                 words="stopwords_en.txt"
    306                 enablePositionIncrements="true"
     562                words="lang/stopwords_en.txt"
    307563                />
    308564        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    309565        <filter class="solr.LowerCaseFilterFactory"/>
    310566        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    311         <filter class="solr.PorterStemFilterFactory"/>
     567        <!--<filter class="solr.PorterStemFilterFactory"/>-->
     568        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    312569      </analyzer>
    313570      <analyzer type="query">
     
    316573        <filter class="solr.StopFilterFactory"
    317574                ignoreCase="true"
    318                 words="stopwords_en.txt"
    319                 enablePositionIncrements="true"
     575                words="lang/stopwords_en.txt"
    320576                />
    321577        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    322578        <filter class="solr.LowerCaseFilterFactory"/>
    323579        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    324         <filter class="solr.PorterStemFilterFactory"/>
     580        <!--<filter class="solr.PorterStemFilterFactory"/>-->
     581        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    325582      </analyzer>
    326583    </fieldType>
     
    332589        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    333590        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    334         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"/>
     591        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
    335592        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    336593        <filter class="solr.LowerCaseFilterFactory"/>
     
    348605      <analyzer type="index">
    349606        <tokenizer class="solr.StandardTokenizerFactory"/>
    350         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
     607        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    351608        <filter class="solr.LowerCaseFilterFactory"/>
    352609        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
     
    356613        <tokenizer class="solr.StandardTokenizerFactory"/>
    357614        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    358         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
     615        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    359616        <filter class="solr.LowerCaseFilterFactory"/>
    360617      </analyzer>
     
    396653             information on pattern and replacement string syntax.
    397654             
    398              http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
     655             http://java.sun.com/j2se/1.6.0/docs/api/java/util/regex/package-summary.html
    399656          -->
    400657        <filter class="solr.PatternReplaceFilterFactory"
     
    437694    </fieldType>
    438695
    439     <fieldType name="text_path" class="solr.TextField" positionIncrementGap="100">
    440       <analyzer>
    441         <tokenizer class="solr.PathHierarchyTokenizerFactory"/>
     696    <!--
     697      Example of using PathHierarchyTokenizerFactory at index time, so
     698      queries for paths match documents at that path, or in descendent paths
     699    -->
     700    <fieldType name="descendent_path" class="solr.TextField">
     701      <analyzer type="index">
     702    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
     703      </analyzer>
     704      <analyzer type="query">
     705    <tokenizer class="solr.KeywordTokenizerFactory" />
     706      </analyzer>
     707    </fieldType>
     708    <!--
     709      Example of using PathHierarchyTokenizerFactory at query time, so
     710      queries for paths match documents at that path, or in ancestor paths
     711    -->
     712    <fieldType name="ancestor_path" class="solr.TextField">
     713      <analyzer type="index">
     714    <tokenizer class="solr.KeywordTokenizerFactory" />
     715      </analyzer>
     716      <analyzer type="query">
     717    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
    442718      </analyzer>
    443719    </fieldType>
     
    463739    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
    464740
    465    <!--
    466     A Geohash is a compact representation of a latitude longitude pair in a single field.
    467     See http://wiki.apache.org/solr/SpatialSearch
     741    <!-- An alternative geospatial field type new to Solr 4.  It supports multiValued and polygon shapes.
     742      For more information about this and other Spatial fields new to Solr 4, see:
     743      http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
     744    -->
     745    <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
     746        geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
     747
     748   <!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType
     749        Parameters:
     750          defaultCurrency: Specifies the default currency if none specified. Defaults to "USD"
     751          precisionStep:   Specifies the precisionStep for the TrieLong field used for the amount
     752          providerClass:   Lets you plug in other exchange provider backend:
     753                           solr.FileExchangeRateProvider is the default and takes one parameter:
     754                             currencyConfig: name of an xml file holding exchange rates
     755                           solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org:
     756                             ratesFileLocation: URL or path to rates JSON file (default latest.json on the web)
     757                             refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60)
    468758   -->
    469     <fieldtype name="geohash" class="solr.GeoHashField"/>
     759    <fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />
     760             
     761
     762
     763   <!-- some examples for different languages (generally ordered by ISO code) -->
     764
     765    <!-- Arabic -->
     766    <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
     767      <analyzer>
     768        <tokenizer class="solr.StandardTokenizerFactory"/>
     769        <!-- for any non-arabic -->
     770        <filter class="solr.LowerCaseFilterFactory"/>
     771        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" />
     772        <!-- normalizes ﻯ to ï»±, etc -->
     773        <filter class="solr.ArabicNormalizationFilterFactory"/>
     774        <filter class="solr.ArabicStemFilterFactory"/>
     775      </analyzer>
     776    </fieldType>
     777
     778    <!-- Bulgarian -->
     779    <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">
     780      <analyzer>
     781        <tokenizer class="solr.StandardTokenizerFactory"/>
     782        <filter class="solr.LowerCaseFilterFactory"/>
     783        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_bg.txt" />
     784        <filter class="solr.BulgarianStemFilterFactory"/>       
     785      </analyzer>
     786    </fieldType>
     787   
     788    <!-- Catalan -->
     789    <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
     790      <analyzer>
     791        <tokenizer class="solr.StandardTokenizerFactory"/>
     792        <!-- removes l', etc -->
     793        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ca.txt"/>
     794        <filter class="solr.LowerCaseFilterFactory"/>
     795        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ca.txt" />
     796        <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>       
     797      </analyzer>
     798    </fieldType>
     799   
     800    <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
     801    <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
     802      <analyzer>
     803        <tokenizer class="solr.StandardTokenizerFactory"/>
     804        <!-- normalize width before bigram, as e.g. half-width dakuten combine  -->
     805        <filter class="solr.CJKWidthFilterFactory"/>
     806        <!-- for any non-CJK -->
     807        <filter class="solr.LowerCaseFilterFactory"/>
     808        <filter class="solr.CJKBigramFilterFactory"/>
     809      </analyzer>
     810    </fieldType>
     811
     812    <!-- Kurdish -->
     813    <fieldType name="text_ckb" class="solr.TextField" positionIncrementGap="100">
     814      <analyzer>
     815        <tokenizer class="solr.StandardTokenizerFactory"/>
     816        <filter class="solr.SoraniNormalizationFilterFactory"/>
     817        <!-- for any latin text -->
     818        <filter class="solr.LowerCaseFilterFactory"/>
     819        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ckb.txt"/>
     820        <filter class="solr.SoraniStemFilterFactory"/>
     821      </analyzer>
     822    </fieldType>
     823
     824    <!-- Czech -->
     825    <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">
     826      <analyzer>
     827        <tokenizer class="solr.StandardTokenizerFactory"/>
     828        <filter class="solr.LowerCaseFilterFactory"/>
     829        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_cz.txt" />
     830        <filter class="solr.CzechStemFilterFactory"/>       
     831      </analyzer>
     832    </fieldType>
     833   
     834    <!-- Danish -->
     835    <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">
     836      <analyzer>
     837        <tokenizer class="solr.StandardTokenizerFactory"/>
     838        <filter class="solr.LowerCaseFilterFactory"/>
     839        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_da.txt" format="snowball" />
     840        <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>       
     841      </analyzer>
     842    </fieldType>
     843   
     844    <!-- German -->
     845    <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
     846      <analyzer>
     847        <tokenizer class="solr.StandardTokenizerFactory"/>
     848        <filter class="solr.LowerCaseFilterFactory"/>
     849        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
     850        <filter class="solr.GermanNormalizationFilterFactory"/>
     851        <filter class="solr.GermanLightStemFilterFactory"/>
     852        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
     853        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
     854      </analyzer>
     855    </fieldType>
     856   
     857    <!-- Greek -->
     858    <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
     859      <analyzer>
     860        <tokenizer class="solr.StandardTokenizerFactory"/>
     861        <!-- greek specific lowercase for sigma -->
     862        <filter class="solr.GreekLowerCaseFilterFactory"/>
     863        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" />
     864        <filter class="solr.GreekStemFilterFactory"/>
     865      </analyzer>
     866    </fieldType>
     867   
     868    <!-- Spanish -->
     869    <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
     870      <analyzer>
     871        <tokenizer class="solr.StandardTokenizerFactory"/>
     872        <filter class="solr.LowerCaseFilterFactory"/>
     873        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" format="snowball" />
     874        <filter class="solr.SpanishLightStemFilterFactory"/>
     875        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->
     876      </analyzer>
     877    </fieldType>
     878   
     879    <!-- Basque -->
     880    <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">
     881      <analyzer>
     882        <tokenizer class="solr.StandardTokenizerFactory"/>
     883        <filter class="solr.LowerCaseFilterFactory"/>
     884        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_eu.txt" />
     885        <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>
     886      </analyzer>
     887    </fieldType>
     888   
     889    <!-- Persian -->
     890    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
     891      <analyzer>
     892        <!-- for ZWNJ -->
     893        <charFilter class="solr.PersianCharFilterFactory"/>
     894        <tokenizer class="solr.StandardTokenizerFactory"/>
     895        <filter class="solr.LowerCaseFilterFactory"/>
     896        <filter class="solr.ArabicNormalizationFilterFactory"/>
     897        <filter class="solr.PersianNormalizationFilterFactory"/>
     898        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" />
     899      </analyzer>
     900    </fieldType>
     901   
     902    <!-- Finnish -->
     903    <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">
     904      <analyzer>
     905        <tokenizer class="solr.StandardTokenizerFactory"/>
     906        <filter class="solr.LowerCaseFilterFactory"/>
     907        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fi.txt" format="snowball" />
     908        <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
     909        <!-- less aggressive: <filter class="solr.FinnishLightStemFilterFactory"/> -->
     910      </analyzer>
     911    </fieldType>
     912   
     913    <!-- French -->
     914    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
     915      <analyzer>
     916        <tokenizer class="solr.StandardTokenizerFactory"/>
     917        <!-- removes l', etc -->
     918        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>
     919        <filter class="solr.LowerCaseFilterFactory"/>
     920        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" />
     921        <filter class="solr.FrenchLightStemFilterFactory"/>
     922        <!-- less aggressive: <filter class="solr.FrenchMinimalStemFilterFactory"/> -->
     923        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="French"/> -->
     924      </analyzer>
     925    </fieldType>
     926   
     927    <!-- Irish -->
     928    <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">
     929      <analyzer>
     930        <tokenizer class="solr.StandardTokenizerFactory"/>
     931        <!-- removes d', etc -->
     932        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ga.txt"/>
     933        <!-- removes n-, etc. position increments is intentionally false! -->
     934        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/hyphenations_ga.txt"/>
     935        <filter class="solr.IrishLowerCaseFilterFactory"/>
     936        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ga.txt"/>
     937        <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>
     938      </analyzer>
     939    </fieldType>
     940   
     941    <!-- Galician -->
     942    <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">
     943      <analyzer>
     944        <tokenizer class="solr.StandardTokenizerFactory"/>
     945        <filter class="solr.LowerCaseFilterFactory"/>
     946        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_gl.txt" />
     947        <filter class="solr.GalicianStemFilterFactory"/>
     948        <!-- less aggressive: <filter class="solr.GalicianMinimalStemFilterFactory"/> -->
     949      </analyzer>
     950    </fieldType>
     951   
     952    <!-- Hindi -->
     953    <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">
     954      <analyzer>
     955        <tokenizer class="solr.StandardTokenizerFactory"/>
     956        <filter class="solr.LowerCaseFilterFactory"/>
     957        <!-- normalizes unicode representation -->
     958        <filter class="solr.IndicNormalizationFilterFactory"/>
     959        <!-- normalizes variation in spelling -->
     960        <filter class="solr.HindiNormalizationFilterFactory"/>
     961        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hi.txt" />
     962        <filter class="solr.HindiStemFilterFactory"/>
     963      </analyzer>
     964    </fieldType>
     965   
     966    <!-- Hungarian -->
     967    <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">
     968      <analyzer>
     969        <tokenizer class="solr.StandardTokenizerFactory"/>
     970        <filter class="solr.LowerCaseFilterFactory"/>
     971        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hu.txt" format="snowball" />
     972        <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>
     973        <!-- less aggressive: <filter class="solr.HungarianLightStemFilterFactory"/> -->   
     974      </analyzer>
     975    </fieldType>
     976   
     977    <!-- Armenian -->
     978    <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
     979      <analyzer>
     980        <tokenizer class="solr.StandardTokenizerFactory"/>
     981        <filter class="solr.LowerCaseFilterFactory"/>
     982        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" />
     983        <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
     984      </analyzer>
     985    </fieldType>
     986   
     987    <!-- Indonesian -->
     988    <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">
     989      <analyzer>
     990        <tokenizer class="solr.StandardTokenizerFactory"/>
     991        <filter class="solr.LowerCaseFilterFactory"/>
     992        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt" />
     993        <!-- for a less aggressive approach (only inflectional suffixes), set stemDerivational to false -->
     994        <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>
     995      </analyzer>
     996    </fieldType>
     997   
     998    <!-- Italian -->
     999    <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">
     1000      <analyzer>
     1001        <tokenizer class="solr.StandardTokenizerFactory"/>
     1002        <!-- removes l', etc -->
     1003        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/>
     1004        <filter class="solr.LowerCaseFilterFactory"/>
     1005        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" />
     1006        <filter class="solr.ItalianLightStemFilterFactory"/>
     1007        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> -->
     1008      </analyzer>
     1009    </fieldType>
     1010   
     1011    <!-- Japanese using morphological analysis (see text_cjk for a configuration using bigramming)
     1012
     1013         NOTE: If you want to optimize search for precision, use default operator AND in your query
     1014         parser config with <solrQueryParser defaultOperator="AND"/> further down in this file.  Use
     1015         OR if you would like to optimize for recall (default).
     1016    -->
     1017    <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
     1018      <analyzer>
     1019      <!-- Kuromoji Japanese morphological analyzer/tokenizer (JapaneseTokenizer)
     1020
     1021           Kuromoji has a search mode (default) that does segmentation useful for search.  A heuristic
     1022           is used to segment compounds into its parts and the compound itself is kept as synonym.
     1023
     1024           Valid values for attribute mode are:
     1025              normal: regular segmentation
     1026              search: segmentation useful for search with synonyms compounds (default)
     1027            extended: same as search mode, but unigrams unknown words (experimental)
     1028
     1029           For some applications it might be good to use search mode for indexing and normal mode for
     1030           queries to reduce recall and prevent parts of compounds from being matched and highlighted.
     1031           Use <analyzer type="index"> and <analyzer type="query"> for this and mode normal in query.
     1032
     1033           Kuromoji also has a convenient user dictionary feature that allows overriding the statistical
     1034           model with your own entries for segmentation, part-of-speech tags and readings without a need
     1035           to specify weights.  Notice that user dictionaries have not been subject to extensive testing.
     1036
     1037           User dictionary attributes are:
     1038                     userDictionary: user dictionary filename
     1039             userDictionaryEncoding: user dictionary encoding (default is UTF-8)
     1040
     1041           See lang/userdict_ja.txt for a sample user dictionary file.
     1042
     1043           Punctuation characters are discarded by default.  Use discardPunctuation="false" to keep them.
     1044
     1045           See http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese language support.
     1046        -->
     1047        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
     1048        <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/>-->
     1049        <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (蟞曞圢) -->
     1050        <filter class="solr.JapaneseBaseFormFilterFactory"/>
     1051        <!-- Removes tokens with certain part-of-speech tags -->
     1052        <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" />
     1053        <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->
     1054        <filter class="solr.CJKWidthFilterFactory"/>
     1055        <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->
     1056        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" />
     1057        <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -->
     1058        <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
     1059        <!-- Lower-cases romaji characters -->
     1060        <filter class="solr.LowerCaseFilterFactory"/>
     1061      </analyzer>
     1062    </fieldType>
     1063   
     1064    <!-- Latvian -->
     1065    <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">
     1066      <analyzer>
     1067        <tokenizer class="solr.StandardTokenizerFactory"/>
     1068        <filter class="solr.LowerCaseFilterFactory"/>
     1069        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_lv.txt" />
     1070        <filter class="solr.LatvianStemFilterFactory"/>
     1071      </analyzer>
     1072    </fieldType>
     1073   
     1074    <!-- Dutch -->
     1075    <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
     1076      <analyzer>
     1077        <tokenizer class="solr.StandardTokenizerFactory"/>
     1078        <filter class="solr.LowerCaseFilterFactory"/>
     1079        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" />
     1080        <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/>
     1081        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>
     1082      </analyzer>
     1083    </fieldType>
     1084   
     1085    <!-- Norwegian -->
     1086    <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">
     1087      <analyzer>
     1088        <tokenizer class="solr.StandardTokenizerFactory"/>
     1089        <filter class="solr.LowerCaseFilterFactory"/>
     1090        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_no.txt" format="snowball" />
     1091        <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>
     1092        <!-- less aggressive: <filter class="solr.NorwegianLightStemFilterFactory" variant="nb"/> -->
     1093        <!-- singular/plural: <filter class="solr.NorwegianMinimalStemFilterFactory" variant="nb"/> -->
     1094        <!-- The "light" and "minimal" stemmers support variants: nb=BokmÃ¥l, nn=Nynorsk, no=Both -->
     1095      </analyzer>
     1096    </fieldType>
     1097   
     1098    <!-- Portuguese -->
     1099    <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
     1100      <analyzer>
     1101        <tokenizer class="solr.StandardTokenizerFactory"/>
     1102        <filter class="solr.LowerCaseFilterFactory"/>
     1103        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" />
     1104        <filter class="solr.PortugueseLightStemFilterFactory"/>
     1105        <!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> -->
     1106        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->
     1107        <!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->
     1108      </analyzer>
     1109    </fieldType>
     1110   
     1111    <!-- Romanian -->
     1112    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
     1113      <analyzer>
     1114        <tokenizer class="solr.StandardTokenizerFactory"/>
     1115        <filter class="solr.LowerCaseFilterFactory"/>
     1116        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" />
     1117        <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
     1118      </analyzer>
     1119    </fieldType>
     1120   
     1121    <!-- Russian -->
     1122    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
     1123      <analyzer>
     1124        <tokenizer class="solr.StandardTokenizerFactory"/>
     1125        <filter class="solr.LowerCaseFilterFactory"/>
     1126        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" />
     1127        <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
     1128        <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> -->
     1129      </analyzer>
     1130    </fieldType>
     1131    <!-- Russian with morphology-->
     1132    <fieldType name="text_ru_morph" class="solr.TextField" positionIncrementGap="100">
     1133          <analyzer>
     1134          <tokenizer class="solr.StandardTokenizerFactory"/>
     1135          <filter class="solr.LowerCaseFilterFactory"/>
     1136          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" />
     1137          <filter class="org.apache.lucene.morphology.russian.RussianFilterFactory"/>
     1138          </analyzer>
     1139    </fieldType>
     1140 
     1141    <!-- Swedish -->
     1142    <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">
     1143      <analyzer>
     1144        <tokenizer class="solr.StandardTokenizerFactory"/>
     1145        <filter class="solr.LowerCaseFilterFactory"/>
     1146        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_sv.txt" format="snowball" />
     1147        <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>
     1148        <!-- less aggressive: <filter class="solr.SwedishLightStemFilterFactory"/> -->
     1149      </analyzer>
     1150    </fieldType>
     1151   
     1152    <!-- Thai -->
     1153    <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
     1154      <analyzer>
     1155        <tokenizer class="solr.StandardTokenizerFactory"/>
     1156        <filter class="solr.LowerCaseFilterFactory"/>
     1157        <filter class="solr.ThaiWordFilterFactory"/>
     1158        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_th.txt" />
     1159      </analyzer>
     1160    </fieldType>
     1161   
     1162    <!-- Turkish -->
     1163    <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
     1164      <analyzer>
     1165        <tokenizer class="solr.StandardTokenizerFactory"/>
     1166        <filter class="solr.TurkishLowerCaseFilterFactory"/>
     1167        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_tr.txt" />
     1168        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
     1169      </analyzer>
     1170    </fieldType>
     1171
    4701172 </types>
    471 
    472 
    473  <fields>
    474    <!-- Valid attributes for fields:
    475      name: mandatory - the name for the field
    476      type: mandatory - the name of a previously defined type from the
    477        <types> section
    478      indexed: true if this field should be indexed (searchable or sortable)
    479      stored: true if this field should be retrievable
    480      multiValued: true if this field may contain multiple values per document
    481      omitNorms: (expert) set to true to omit the norms associated with
    482        this field (this disables length normalization and index-time
    483        boosting for the field, and saves some memory).  Only full-text
    484        fields or fields that need an index-time boost need norms.
    485      termVectors: [false] set to true to store the term vector for a
    486        given field.
    487        When using MoreLikeThis, fields used for similarity should be
    488        stored for best performance.
    489      termPositions: Store position information with the term vector. 
    490        This will increase storage costs.
    491      termOffsets: Store offset information with the term vector. This
    492        will increase storage costs.
    493      default: a value that should be used if no value is specified
    494        when adding a document.
    495    -->
    496 
    497    <field name="docOID" type="string" indexed="true" stored="true" required="true" />
    498 
    499     <field name="ZZ" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
    500     <field name="TX" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
    501     <field name="TI" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
    502     <field name="SU" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
    503     <field name="ORG" type="text_en_splitting" indexed="true" stored="false" multiValued="true" />
    504 
    505 <!--
    506    <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
    507    <field name="name" type="text_general" indexed="true" stored="true"/>
    508    <field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
    509    <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
    510    <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
    511    <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
    512    <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
    513 
    514    <field name="weight" type="float" indexed="true" stored="true"/>
    515    <field name="price"  type="float" indexed="true" stored="true"/>
    516    <field name="popularity" type="int" indexed="true" stored="true" />
    517    <field name="inStock" type="boolean" indexed="true" stored="true" />
    518 -->
    519 
    520    <!--
    521    The following store examples are used to demonstrate the various ways one might _CHOOSE_ to
    522     implement spatial.  It is highly unlikely that you would ever have ALL of these fields defined.
     1173 
     1174  <!-- Similarity is the scoring routine for each document vs. a query.
     1175       A custom Similarity or SimilarityFactory may be specified here, but
     1176       the default is fine for most applications. 
     1177       For more info: http://wiki.apache.org/solr/SchemaXml#Similarity
    5231178    -->
    524    <field name="store" type="location" indexed="true" stored="true"/>
    525 
    526    <!-- Common metadata fields, named specifically to match up with
    527      SolrCell metadata when parsing rich documents such as Word, PDF.
    528      Some fields are multiValued only because Tika currently may return
    529      multiple values for them.
    530    -->
    531 <!--
    532    <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
    533    <field name="subject" type="text_general" indexed="true" stored="true"/>
    534    <field name="description" type="text_general" indexed="true" stored="true"/>
    535    <field name="comments" type="text_general" indexed="true" stored="true"/>
    536    <field name="author" type="text_general" indexed="true" stored="true"/>
    537    <field name="keywords" type="text_general" indexed="true" stored="true"/>
    538    <field name="category" type="text_general" indexed="true" stored="true"/>
    539    <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
    540    <field name="last_modified" type="date" indexed="true" stored="true"/>
    541    <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
    542 -->
    543 
    544 
    545    <!-- catchall field, containing all other searchable text fields (implemented
    546         via copyField further on in this schema  -->
    547    <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
    548 
    549    <!-- catchall text field that indexes tokens both normally and in reverse for efficient
    550         leading wildcard queries. -->
    551    <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>
    552 
    553    <!-- non-tokenized version of manufacturer to make it easier to sort or group
    554         results by manufacturer.  copied from "manu" via copyField -->
    555    <field name="manu_exact" type="string" indexed="true" stored="false"/>
    556 
    557    <field name="payloads" type="payloads" indexed="true" stored="true"/>
    558 
    559    <!-- Uncommenting the following will create a "timestamp" field using
    560         a default value of "NOW" to indicate when each document was indexed.
    561      -->
    562    <!--
    563    <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
    564      -->
    565    
    566 
    567    <!-- Dynamic field definitions.  If a field name is not found, dynamicFields
    568         will be used if the name matches any of the patterns.
    569         RESTRICTION: the glob-like pattern in the name attribute must have
    570         a "*" only at the start or the end.
    571         EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
    572         Longer patterns will be matched first.  if equal size patterns
    573         both match, the first appearing in the schema will be used.  -->
    574    <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
    575    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
    576    <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
    577    <dynamicField name="*_t"  type="text_general"    indexed="true"  stored="true"/>
    578    <dynamicField name="*_txt" type="text_general"    indexed="true"  stored="true" multiValued="true"/>
    579    <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
    580    <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
    581    <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
    582 
    583    <!-- Type used to index the lat and lon components for the "location" FieldType -->
    584    <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>
    585 
    586    <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
    587    <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>
    588 
    589    <!-- some trie-coded dynamic fields for faster range queries -->
    590    <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
    591    <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
    592    <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
    593    <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
    594    <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>
    595 
    596    <dynamicField name="*_pi"  type="pint"    indexed="true"  stored="true"/>
    597 
    598    <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
    599    <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
    600 
    601    <dynamicField name="random_*" type="random" />
    602 <!-- dynamic field for sort/facet fields, which are strings by default. ie not tokenised. Can't be multivalued - ie can only have one value per document -->
    603     <dynamicField name="by*" type="string" indexed="true" stored="false" multiValued="false" />
    604    <!-- uncomment the following to ignore any fields that don't already match an existing
    605         field name or dynamic field, rather than reporting them as an error.
    606         alternately, change the type="ignored" to some other type e.g. "text" if you want
    607         unknown fields indexed and/or stored by default -->
    608    <!--dynamicField name="*" type="ignored" multiValued="true" /-->
    609    
    610  </fields>
    611 
    612  <!-- Field to use to determine and enforce document uniqueness.
    613       Unless this field is marked with required="false", it will be a required field
    614    -->
    615  <uniqueKey>docOID</uniqueKey>
    616 
    617  <!-- field for the QueryParser to use when an explicit fieldname is absent -->
    618  <defaultSearchField>text</defaultSearchField>
    619 
    620  <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
    621  <solrQueryParser defaultOperator="OR"/>
    622 
    623   <!-- copyField commands copy one field to another at the time a document
    624         is added to the index.  It's used either to index the same field differently,
    625         or to add multiple fields to the same field for easier/faster searching.  -->
    626 
    627 <!--
    628    <copyField source="cat" dest="text"/>
    629    <copyField source="name" dest="text"/>
    630    <copyField source="manu" dest="text"/>
    631    <copyField source="features" dest="text"/>
    632    <copyField source="includes" dest="text"/>
    633    <copyField source="manu" dest="manu_exact"/>
    634 -->
    635    
    636    <!-- Above, multiple source fields are copied to the [text] field.
    637       Another way to map multiple source fields to the same
    638       destination field is to use the dynamic field syntax.
    639       copyField also supports a maxChars to copy setting.  -->
    640        
    641    <!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->
    642 
    643    <!-- copy name to alphaNameSort, a field designed for sorting by name -->
    644    <!-- <copyField source="name" dest="alphaNameSort"/> -->
    645  
    646 
    647  <!-- Similarity is the scoring routine for each document vs. a query.
    648       A custom similarity may be specified here, but the default is fine
    649       for most applications.  -->
    650  <!-- <similarity class="org.apache.lucene.search.DefaultSimilarity"/> -->
    651  <!-- ... OR ...
    652       Specify a SimilarityFactory class name implementation
    653       allowing parameters to be used.
    654  -->
    655  <!--
    656  <similarity class="com.example.solr.CustomSimilarityFactory">
    657    <str name="paramkey">param value</str>
    658  </similarity>
    659  -->
    660 
     1179  <!--
     1180     <similarity class="com.example.solr.CustomSimilarityFactory">
     1181       <str name="paramkey">param value</str>
     1182     </similarity>
     1183    -->
    6611184
    6621185</schema>
Note: See TracChangeset for help on using the changeset viewer.