Timestamp:
2019-08-29T19:12:39+12:00 (5 years ago)
Author:
ak19
Message:
  1. Committing working version of export_maori_subset.sh which takes the csv file from running export_maori_index.csv.sh as input and gets the warc files at the specified offsets. 2. Notes on the changes necessary to the Java code (cc-index-table/src/main/java/org/commoncrawl/spark/examples/CCIndexWarcExport.java) to get the export_maori_subset.sh to run without exceptions so far. 3. The otherwise untested export_maori_subset_from_scratch.sh script which would perform the sql query and feed that in to getting the WARC records instead of producing an intermediate csv file.
File:
1 added

Note: See TracChangeset for help on using the changeset viewer.