Changeset 33428
- Timestamp:
- 2019-08-19T20:31:23+12:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/MoreReading/CommonCrawl.txt
r33425 r33428 1 To run firefox/anything graphical inside the VM run by vagrant, have to ssh -Y onto both analytics and then to the vagrant VM from analytics: 2 1. ssh analytics -Y 3 2. [anupama@analytics vagrant-hadoop-hive-spark]$ vagrant ssh -- -Y 4 or 5 vagrant ssh -- -Y node1 6 (the -- flag tells the vagrant command that the subsequent -Y flag should be passed to the ssh cmd that vagrant runs) 7 8 Only once ssh-ed with vagrant into the VM whose hostname is "node1", do you have access to node1's assigned IP: 10.211.55.101 9 - Connecting machines, like analytics, must access node1 or use port forwarding to view the VM's servers on localhost. For example, on analytics, can view Yarn pages at http://localhost:8088/ 10 - If firefox is launched inside the VM (so inside node1), then can access pages off their respective ports at any of localhost|10.211.55.101|node1. 11 12 13 14 15 WET example from https://github.com/commoncrawl/cc-warc-examples 16 17 vagrant@node1:~/cc-warc-examples$ hdfs dfs -mkdir /user/vagrant/data 18 vagrant@node1:~/cc-warc-examples$ hdfs dfs -put data/CC-MAIN-20190715175205-20190715200159-00000.warc.wet.gz hdfs:///user/vagrant/data/. 19 vagrant@node1:~/cc-warc-examples$ hdfs dfs -ls data 20 Found 1 items 21 -rw-r--r-- 1 vagrant supergroup 154823265 2019-08-19 08:23 data/CC-MAIN-20190715175205-20190715200159-00000.warc.wet.gz 22 vagrant@node1:~/cc-warc-examples$ hadoop jar target/cc-warc-examples-0.3-SNAPSHOT-jar-with-dependencies.jar org.commoncrawl.examples.mapreduce.WETWordCount 23 24 <ONCE FINISHED:> 25 26 vagrant@node1:~/cc-warc-examples$ hdfs dfs -cat /tmp/cc/part* 27 28 29 30 INFO ON HADOOP/HDFS: 31 https://www.bluedata.com/blog/2016/08/hadoop-hdfs-upgrades-painful/ 32 33 --------------- 34 More examples to try: 35 https://github.com/commoncrawl/cc-warc-examples 36 37 38 A bit outdated? 39 https://www.journaldev.com/20342/apache-spark-example-word-count-program-java 40 https://www.journaldev.com/20261/apache-spark 41 42 -------- 43 1 44 sudo apt-get install maven 2 45 (or sudo apt update
Note:
See TracChangeset
for help on using the changeset viewer.