Changeset 30984 for other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java
- Timestamp:
- 2016-10-29T15:45:38+13:00 (7 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/hathitrust/solr-extracted-features/trunk/src/main/java/org/hathitrust/PrepareForIngest.java
r30979 r30984 5 5 6 6 import org.apache.spark.api.java.*; 7 import org.apache.spark.util.DoubleAccumulator; 7 8 import org.apache.spark.SparkConf; 8 9 … … 46 47 JavaRDD<String> json_list_data = jsc.textFile(_json_list_filename,NUM_PARTITIONS).cache(); 47 48 48 PagedJSON paged_json = new PagedJSON(_input_dir, _solr_url,_output_dir,_verbosity); 49 long num_volumes = json_list_data.count(); 50 double per_vol = 100.0/(double)num_volumes; 51 52 DoubleAccumulator progress_accum = jsc.sc().doubleAccumulator("ProgressPercent"); 53 54 //sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x -> accum.add(x)); 55 // ... 56 // 10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s 57 58 //accum.value(); 59 60 PagedJSON paged_json = new PagedJSON(_input_dir,_solr_url,_output_dir,_verbosity, progress_accum,per_vol); 49 61 JavaRDD<String> json_ids = json_list_data.flatMap(paged_json).cache(); 50 62
Note:
See TracChangeset
for help on using the changeset viewer.