Context Navigation

Vagrant-Spark-Hadoop.txt

Timestamp:

2019-08-30T18:27:21+12:00 (5 years ago)

Author:

ak19

Message:

Minor clarification and inclusion of helpful command

File:

-              r33446
+              r33448
 Hints to solve it were at https://stackoverflow.com/questions/45972929/scala-dataframereader-keep-column-headers
 The actual solution is to edit the CCIndexWarcExport.java as follows:
+. set option(header) to false since the csv file contains no header row, only data rows.
+. set option(header) to false since the csv file contains no header row, only data rows. You can confirm the csv has no header row by doing
+   hdfs dfs -cat hdfs:///user/vagrant/cc-mri-csv/part* | head -5
 . The 4 column names are inferred as _c0 to _c3, not as url/warc_filename etc.

Note: See TracChangeset for help on using the changeset viewer.