Changeset 33496
- Timestamp:
- 2019-09-22T19:23:28+12:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt
r33467 r33496 291 291 292 292 293 293 ANOTHER WAY (DR BAINBRIDGE'S WAY) TO CREATE SINGLE .CSV FILE FROM /part* FILES AND VIEW ITS CONTENTS: 294 294 vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-csv/part* > file.csv.gz 295 295 vagrant@node1:~/cc-index-table$ less file.csv.gz … … 302 302 vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-unzipped-csv/cc-mri.csv | wc -l 303 303 5767 304 305 306 For a month later, the August 2019 crawl: 307 vagrant@node1:~$ hdfs dfs -cat hdfs:///user/vagrant/CC-MAIN-2019-35/cc-mri-unzipped-csv/cc-mri.csv | wc -l 308 9318 304 309 305 310 -----------------------------------------
Note:
See TracChangeset
for help on using the changeset viewer.