Context Navigation

Changeset 31256 for other-projects

Timestamp:

2016-12-20T16:44:40+13:00 (7 years ago)

Author:

davidb

Message:

Earlier check of output directory to prevent large scale processing, when saving ultimate output will not work

File:

-              r31255
+              r31256
         JavaSparkContext jsc = new JavaSparkContext(conf);
+        String filename_root = _json_list_filename.replaceAll(".*/","").replaceAll("\\..*$","");
+        String output_directory = "whitelist-" + filename_root + "-out";
+        if (ClusterFileIO.exists(output_dir))
+        {
+            System.err.println("Error: " + output_directory + " already exists.  Spark unable to write output data");
+            jsc.close();
+            System.exit(1);
+        }
         int num_partitions = Integer.getInteger("wcsa-ef-ingest.num-partitions", DEFAULT_NUM_PARTITIONS);
         JavaRDD<String> json_list_data = jsc.textFile(_json_list_filename,num_partitions).cache();
 …
         count_sorted.setName("descending-word-frequency");
+        String filename_root = _json_list_filename.replaceAll(".*/","").replaceAll("\\..*$","");
+        String output_directory = "whitelist-" + filename_root + "-out";
         //sorted_swaped_back_pair.saveAsTextFile(output_directory);

Note: See TracChangeset for help on using the changeset viewer.