Ignore:
Timestamp:
2016-10-25T10:02:58+13:00 (7 years ago)
Author:
davidb
Message:

Renaming to better represent what the cluster is designed for

Location:
other-projects/hathitrust/vagrant-spark-hdfs-cluster
Files:
1 edited
1 moved

Legend:

Unmodified
Added
Removed
  • other-projects/hathitrust/vagrant-spark-hdfs-cluster/trunk/README.txt

    r30905 r30913  
    11
    2 Vargrant provisioning files to spin up a modest (4 node) Hadoop
    3 cluster for experiments processing HTRC Extracted Feature JSON files
    4 suitable for ingesting into Solr.
     2Vargrant provisioning files to spin up a modest Spark cluster (master
     3+ 3 slaves + backup) for experiments processing HTRC Extracted Feature
     4JSON files suitable for ingesting into Solr.
     5
     6To aid parallelism, code is designed to read JSON files from HDFS, so
     7the provision of the cluster includes Hadoop core in addition to Spark
    58
    69
    7 Top-level code Apache Spark, processing HDFS stored JSON files, hence
    8 the need for an underlying Hadoop cluster.
     10Provisioning uses Puppet scripting, based on the following on-line
     11resources, but updated to use newer versions of Ubuntu, Java,
     12and Hadoop.  Spark is then added in on top of that.
    913
    10 Provisioning based on the following online resources, but updated to
    11 use newer versions of Ubuntu, Java, and Hadoop.
    1214
    1315  http://cscarioni.blogspot.co.nz/2012/09/setting-up-hadoop-virtual-cluster-with.html
    1416
    1517  https://github.com/calo81/vagrant-hadoop-cluster
     18
     19To get everything setup, type:
     20
     21  vargrant up
     22
     23Then log in to the master node, and swithc to 'ubuntu' user
     24
     25  vargrant ssh master
     26  sudo su - ubuntu
     27
     28If the first time, you need to format an HDFS area to use:
     29  hdfs namenode -format
     30
     31Otherwise start up HDFS and Spark deamon processes:
     32
     33  start-dfs.sh
     34  spark-start-all.sh
     35
     36You can visit the Spark cluster monitoring page at:
     37
     38  http://10.10.0.52:8080/
     39
    1640
    1741
     
    6084http://kvz.io/blog/2013/01/16/vagrant-tip-keep-virtualbox-guest-additions-in-sync/
    6185
     86----
     87SecondaryNode
     88----
     89
     90http://stackoverflow.com/questions/23581425/hadoop-how-to-start-secondary-namenode-on-other-node
     91
     92<property>
     93  <name>dfs.namenode.secondary.http-address</name>
     94  <value>ec2-54-187-222-213.us-west-2.compute.amazonaws.com:50090</value>
     95</property>
     96
     97----
     98Spark Cluster
     99----
     100
     101http://spark.apache.org/docs/latest/spark-standalone.html
Note: See TracChangeset for help on using the changeset viewer.