Ignore:
Timestamp:
2016-12-20T14:15:05+13:00 (7 years ago)
Author:
davidb
Message:

Support for icu-tokenize property added, plus relevant refactoring.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/ProcessForWhitelist.java

    r31251 r31252  
    7272       
    7373        boolean strict_file_io = Boolean.getBoolean("wcsa-ef-ingest.strict-file-io");
    74                
     74        boolean icu_tokenize = Boolean.getBoolean("wcsa-ef-ingest.icu-tokenize");
     75       
    7576        PerVolumeWordStreamFlatmap paged_solr_wordfreq_flatmap
    7677            = new PerVolumeWordStreamFlatmap(_input_dir,_verbosity,
    7778                                     per_vol_progress_accum,per_vol,
     79                                     icu_tokenize,
    7880                                     strict_file_io);
    7981        JavaRDD<String> words = json_list_data.flatMap(paged_solr_wordfreq_flatmap);
Note: See TracChangeset for help on using the changeset viewer.