Changeset 33445

Timestamp:
29.08.2019 17:01:12 (3 weeks ago)
Author:
ak19
Message:

The first working hadoop spark script for processing common crawl data. This one successfully got all the commoncrawl warc INDEX data for the specified period where content_languages contained mri (as any of the document's 3 primary languages) and put them out into a csv file

Location:
gs3-extensions/maori-lang-detection/bin/hadoop-spark-scripts
Files:
2 added