Context Navigation

← Previous Change
Next Change →

Vagrant-Spark-Hadoop.txt

Timestamp:

2019-10-03T22:38:00+13:00 (5 years ago)

Author:

ak19

Message:

Mainly changes to crawling-Nutch.txt and some minor changes to other txt files. crawling-Nutch.txt now documents my attempts to successfully run nutch v2 on the davidb homepage site and crawl it entirely and dump the text output into the local or hadoop filesystem. I also ran 2 different numbers of nutch cycles (generate-fetch-parse-updatedb) to download the site: 10 cycles and 15 cycles. I paid attention to the output the second time, it stopped after 6 cycles saying there was nothing new to fetch. So it seems to have a built-in termination test, allowing site mirroring. Running readdb with the -stats flag allowed me to check that both times, it downloaded 44 URLs.

File:

: 1 edited

gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

-              r33499
+              r33545
 https://www.guru99.com/create-your-first-hadoop-program.html
+Some Hadoop commands
+* https://community.cloudera.com/t5/Support-Questions/Closed-How-to-store-output-of-shell-script-in-HDFS/td-p/229933
+* https://stackoverflow.com/questions/26513861/checking-if-directory-in-hdfs-already-exists-or-not
 --------------
 To run firefox/anything graphical inside the VM run by vagrant, have to ssh -Y onto both analytics and then to the vagrant VM from analytics:

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 33545 for gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

Legend:

gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt

Download in other formats: