Changeset 33445 for gs3-extensions

Timestamp:
2019-08-29T17:01:12+12:00 (5 years ago)
Author:
ak19
Message:

The first working hadoop spark script for processing common crawl data. This one successfully got all the commoncrawl warc INDEX data for the specified period where content_languages contained mri (as any of the document's 3 primary languages) and put them out into a csv file

Location:
gs3-extensions/maori-lang-detection/bin/hadoop-spark-scripts
Files:
2 added

Note: See TracChangeset for help on using the changeset viewer.