- Timestamp:
- 2019-09-30T22:51:36+13:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt
r33529 r33537 19 19 fetch -all seems to be a nutch v2 thing?] 20 20 21 Google (30 Sep): site mirroring with nutch 22 https://grokbase.com/t/nutch/user/125sfbg0pt/using-nutch-for-web-site-mirroring 23 https://lucene.472066.n3.nabble.com/Using-nutch-just-for-the-crawler-fetcher-td611918.html 24 http://www.cs.ucy.ac.cy/courses/EPL660/lectures/lab6.pdf 25 slide p.5 onwards 26 27 crawler softw options: https://repositorio.iscte-iul.pt/bitstream/10071/2871/1/Building%20a%20Scalable%20Index%20and%20Web%20Search%20Engine%20for%20Music%20on.pdf 28 See also p.20. HTTrack 29 30 21 31 Google: nutch performance tuning 22 32 * https://stackoverflow.com/questions/24383212/apache-nutch-performance-tuning-for-whole-web-crawling 23 33 * https://stackoverflow.com/questions/4871972/how-to-speed-up-crawling-in-nutch 24 34 * https://cwiki.apache.org/confluence/display/nutch/OptimizingCrawls 25 35 26 36 NUTCH INSTALLATION: … … 36 46 * Deleting a core: https://factorpad.com/tech/solr/reference/solr-delete.html 37 47 48 ---------------------------------- 49 Apache Nutch 2 with newer HBase 50 51 hbase-common-1.4.8.jar 52 53 1. hbase jar files need to go into runtime/local/lib 54 55 But not slf4j-log4j12-1.7.10.jar (there's already a slf4j-log4j12-1.7.5.jar) - so remove that from runtime/local/lib after copying it over. 56 57 2. https://stackoverflow.com/questions/46340416/how-to-compile-nutch-2-3-1-with-hbase-1-2-6 58 https://stackoverflow.com/questions/39834423/apache-nutch-fetcherjob-throws-nosuchelementexception-deep-in-gora/39837926#39837926 59 60 Unfortunately, the page https://paste.apache.org/jjqz referred to above that contains patches for using Gora 0.7 is no longer available. 61 62 http://mail-archives.apache.org/mod_mbox/nutch-user/201602.mbox/%[email protected]%3E 63 64 https://www.mail-archive.com/[email protected]/msg14245.html 65 66 ------------------------------------------------------------------------------ 67 Other way: Nutch on its own vagrant with specified hbase or nutch with mongodb 68 ------------------------------------------------------------------------------ 69 * https://lobster1234.github.io/2017/08/14/search-with-nutch-mongodb-solr/ 70 * https://waue0920.wordpress.com/2016/08/25/nutch-2-3-1-hbase-0-98-hadoop-2-5-solr-4-10-3/ 71 72 73 ----- 74 HBASE commands 75 /usr/local/hbase/bin/hbase shell 76 https://learnhbase.net/2013/03/02/hbase-shell-commands/ 77 78 79 list 80 81 davidbHomePage_webpage is a table 82 83
Note:
See TracChangeset
for help on using the changeset viewer.