root/other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/ef-solr.properties @ 31252

Revision 31252, 0.8 KB (checked in by davidb, 4 years ago)

Support for icu-tokenize property added, plus relevant refactoring.

RevLine 
[31024]1
[31028]2#wcsa-ef-ingest.process-ef-json-mode = per-volume
3wcsa-ef-ingest.process-ef-json-mode = per-page
4
[31100]5#wcsa-ef-ingest.solr-clode-nodes = 10.11.0.53:8983,10.11.0.54:8983,10.11.0.55:8983
6wcsa-ef-ingest.solr-cloud-nodes = gc0:8983,gc1:8983,gc2:8983,gc3:8983,gc4:8983,gc5:8983,gc6:8983,gc7:8983,gc8:8983,gc9:8983
[31046]7wcsa-ef-ingest.strict-file-io = false
[31252]8wcsa-ef-ingest.icu-tokenize = flase
[31029]9
[31095]10# For guide on number of partitions to use, see "Parallelized collections" section of:
11#   https://spark.apache.org/docs/2.0.1/programming-guide.html
12# which suggests 2-4 * num_cores
13#
14# For a more detailed discussion see:
15#   http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
16   
17# wcsa-ef-ingest.num-partitions = 12
18wcsa-ef-ingest.num-partitions = 120
19
[31100]20spark.executor.cores=11
Note: See TracBrowser for help on using the browser.