- Timestamp:
- 2019-08-30T18:27:21+12:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt
r33446 r33448 116 116 Hints to solve it were at https://stackoverflow.com/questions/45972929/scala-dataframereader-keep-column-headers 117 117 The actual solution is to edit the CCIndexWarcExport.java as follows: 118 1. set option(header) to false since the csv file contains no header row, only data rows. 118 1. set option(header) to false since the csv file contains no header row, only data rows. You can confirm the csv has no header row by doing 119 hdfs dfs -cat hdfs:///user/vagrant/cc-mri-csv/part* | head -5 120 119 121 2. The 4 column names are inferred as _c0 to _c3, not as url/warc_filename etc. 120 122
Note:
See TracChangeset
for help on using the changeset viewer.