Changeset 33496

Show
Ignore:
Timestamp:
22.09.2019 19:23:28 (4 weeks ago)
Author:
ak19
Message:

Minor changes to reading list file

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

    r33467 r33496  
    291291 
    292292 
    293  
     293ANOTHER WAY (DR BAINBRIDGE'S WAY) TO CREATE SINGLE .CSV FILE FROM /part* FILES AND VIEW ITS CONTENTS: 
    294294vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-csv/part* > file.csv.gz 
    295295vagrant@node1:~/cc-index-table$ less file.csv.gz  
     
    302302vagrant@node1:~/cc-index-table$ hdfs dfs -cat hdfs:///user/vagrant/cc-mri-unzipped-csv/cc-mri.csv | wc -l 
    3033035767 
     304 
     305 
     306For a month later, the August 2019 crawl: 
     307vagrant@node1:~$ hdfs dfs -cat hdfs:///user/vagrant/CC-MAIN-2019-35/cc-mri-unzipped-csv/cc-mri.csv | wc -l 
     3089318 
    304309 
    305310-----------------------------------------