Changeset 33528

Show
Ignore:
Timestamp:
26.09.2019 21:47:13 (3 weeks ago)
Author:
ak19
Message:

Adding in Nutch links

Files:
1 moved

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt

    r33409 r33528  
    44https://www.quora.com/What-are-some-Web-crawler-tips-to-avoid-crawler-traps 
    55 
     6----------- 
     7NUTCH 
     8----------- 
     9https://stackoverflow.com/questions/35449673/nutch-and-solr-indexing-blacklist-domain 
     10    https://nutch.apache.org/apidocs/apidocs-1.6/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.html 
     11     
     12https://lucene.472066.n3.nabble.com/blacklist-for-crawling-td618343.html 
     13https://lucene.472066.n3.nabble.com/Content-of-size-X-was-truncated-to-Y-td4003517.html 
    614 
     15 
     16Google: nutch mirror web site 
     17https://stackoverflow.com/questions/33354460/nutch-clone-website 
     18[https://stackoverflow.com/questions/35714897/nutch-not-crawling-entire-website 
     19fetch -all seems to be a nutch v2 thing?] 
     20 
     21Google: nutch performance tuning 
     22* https://stackoverflow.com/questions/24383212/apache-nutch-performance-tuning-for-whole-web-crawling 
     23* https://stackoverflow.com/questions/4871972/how-to-speed-up-crawling-in-nutch 
     24 
     25Nutch v2 installation and set up: 
     26https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781783286850/1/ch01lvl1sec09/installing-and-configuring-apache-nutch 
     27 
     28 
     29SOLR: 
     30* Query syntax: http://www.solrtutorial.com/solr-query-syntax.html 
     31* Deleting a core: https://factorpad.com/tech/solr/reference/solr-delete.html 
     32