- Timestamp:
- 2019-12-17T19:53:17+13:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/hdfs-cc-work/GS_README.TXT
r33618 r33809 17 17 I. Setting up Nutch v2 on its own Vagrant VM machine 18 18 J. Automated crawling with Nutch v2.3.1 and post-processing 19 K. Sending the crawled data into mongodb with NutchTextDumpProcessor.java 20 --- 21 22 APPENDIX: Reading data from hbase tables and backing up hbase 19 23 20 24 ---------------------------------------- … … 665 669 666 670 671 ------------------------------------------------------------------------ 672 K. Sending the crawled data into mongodb with NutchTextDumpProcessor.java 673 ------------------------------------------------------------------------ 674 1. The crawled folder should contain all the batch crawls done with nutch (section J above). 675 676 2. Set up mongodb connection properties in conf/config.properties 677 By default, the mongodb database name is configured to be ateacrawldata. 678 679 3. Create a mongodb database by the specified name. A database named "ateacrawldata" to be created, unless the default db name is changed. 680 681 4. Set up the environment and compile NutchTextDumpProcessor: 682 cd maori-lang-detection/apache-opennlp-1.9.1 683 export OPENNLP_HOME=`pwd` 684 cd maori-lang-detection/src 685 686 javac -cp ".:../conf:../lib/*:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" org/greenstone/atea/NutchTextDumpToMongoDB.java 687 688 4. Pass the crawled folder to NutchTextDumpProcessor: 689 java -cp ".:../conf:../lib/*:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" org/greenstone/atea/NutchTextDumpToMongoDB /PATH/TO/crawled 690 691 5. It may take 1.5 hours or so to ingest the approximately 1450 crawled sites' data into mongodb. 692 693 6. Launch the Robo 3T (version 1.3 is one we tested) MongoDB client. Use it to connect to MongoDB's "ateacrawldata" database. 694 Now you can run queries. 695 667 696 -------------------------------------------------------- 668 K.Reading data from hbase tables and backing up hbase697 APPENDIX: Reading data from hbase tables and backing up hbase 669 698 -------------------------------------------------------- 670 699
Note:
See TracChangeset
for help on using the changeset viewer.