Ignore:
Timestamp:
2019-09-13T17:44:41+12:00 (5 years ago)
Author:
ak19
Message:

Improved the code to use a static block to load the needed properties from config.properties and initialise some static final ints from there. Code now uses the logger for debugging. New properties in config.properties. Returned code to use a counter, recordCount, re-zeroed for each WETProcessor since the count was used for unique filenames, and filename prefixes are unique for each warc.wet file. So these prefixes, in combination with keeping track of the recordcount per warc.wet file, each WET record written out to a file is assigned a unique filename. (No longer need a running total of all WET records across warc.wet files processed ensuring uniqueness of filenames.) All appears to still work similarly to previous commit in creating discard and keep subfolders.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

    r33457 r33467  
    221221 
    222222
    223 
    224223vagrant@node1:~$ locate guava.jar
    225224/usr/share/java/guava.jar
     
    243242vagrant@node1:~/ia-hadoop-tools$ hdfs dfs -put /usr/share/java/guava.jar /usr/local/hadoop/share/hadoop/common/.
    244243put: `/usr/local/hadoop/share/hadoop/common/.': No such file or directory
    245 # hadoop classpath locations are not hdfs filesystem
     244# hadoop classpath locations are not on the hdfs filesystem, but on the regular fs
    246245
    247246vagrant@node1:~/ia-hadoop-tools$ sudo cp /usr/share/java/guava.jar /usr/local/hadoop/share/hadoop/common/.
Note: See TracChangeset for help on using the changeset viewer.