Changeset 33496


Ignore:
Timestamp:
2019-09-22T19:23:28+12:00 (5 years ago)
Author:
ak19
Message:

Minor changes to reading list file

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

    r33467 r33496  
    291291
    292292
    293 
     293ANOTHER WAY (DR BAINBRIDGE'S WAY) TO CREATE SINGLE .CSV FILE FROM /part* FILES AND VIEW ITS CONTENTS:
    294294vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-csv/part* > file.csv.gz
    295295vagrant@node1:~/cc-index-table$ less file.csv.gz
     
    302302vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-unzipped-csv/cc-mri.csv | wc -l
    3033035767
     304
     305
     306For a month later, the August 2019 crawl:
     307vagrant@node1:~$ hdfs dfs -cat hdfs:///user/vagrant/CC-MAIN-2019-35/cc-mri-unzipped-csv/cc-mri.csv | wc -l
     3089318
    304309
    305310-----------------------------------------
Note: See TracChangeset for help on using the changeset viewer.