source: gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt@ 33529

Last change on this file since 33529 was 33529, checked in by ak19, 5 years ago

Forgot to add most basic nutch links

File size: 1.6 KB
Line 
1https://codereview.stackexchange.com/questions/198343/crawl-and-gather-all-the-urls-recursively-in-a-domain
2http://lucene.472066.n3.nabble.com/Using-nutch-just-for-the-crawler-fetcher-td611918.html
3
4https://www.quora.com/What-are-some-Web-crawler-tips-to-avoid-crawler-traps
5
6-----------
7NUTCH
8-----------
9https://stackoverflow.com/questions/35449673/nutch-and-solr-indexing-blacklist-domain
10 https://nutch.apache.org/apidocs/apidocs-1.6/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.html
11
12https://lucene.472066.n3.nabble.com/blacklist-for-crawling-td618343.html
13https://lucene.472066.n3.nabble.com/Content-of-size-X-was-truncated-to-Y-td4003517.html
14
15
16Google: nutch mirror web site
17https://stackoverflow.com/questions/33354460/nutch-clone-website
18[https://stackoverflow.com/questions/35714897/nutch-not-crawling-entire-website
19fetch -all seems to be a nutch v2 thing?]
20
21Google: nutch performance tuning
22* https://stackoverflow.com/questions/24383212/apache-nutch-performance-tuning-for-whole-web-crawling
23* https://stackoverflow.com/questions/4871972/how-to-speed-up-crawling-in-nutch
24
25
26NUTCH INSTALLATION:
27* Nutch v1: https://cwiki.apache.org/confluence/display/nutch/NutchTutorial#NutchTutorial-SetupSolrforsearch
28
29Nutch v2 installation and set up:
30* https://cwiki.apache.org/confluence/display/NUTCH/Nutch2Tutorial
31* https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783286850/1/ch01lvl1sec09/installing-and-configuring-apache-nutch
32
33
34SOLR:
35* Query syntax: http://www.solrtutorial.com/solr-query-syntax.html
36* Deleting a core: https://factorpad.com/tech/solr/reference/solr-delete.html
37
Note: See TracBrowser for help on using the repository browser.