Ignore:
Timestamp:
2019-10-03T22:38:00+13:00 (4 years ago)
Author:
ak19
Message:

Mainly changes to crawling-Nutch.txt and some minor changes to other txt files. crawling-Nutch.txt now documents my attempts to successfully run nutch v2 on the davidb homepage site and crawl it entirely and dump the text output into the local or hadoop filesystem. I also ran 2 different numbers of nutch cycles (generate-fetch-parse-updatedb) to download the site: 10 cycles and 15 cycles. I paid attention to the output the second time, it stopped after 6 cycles saying there was nothing new to fetch. So it seems to have a built-in termination test, allowing site mirroring. Running readdb with the -stats flag allowed me to check that both times, it downloaded 44 URLs.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT

    r33543 r33545  
    4334334. Since trying to go install the crawl url didn't work
    434434https://stackoverflow.com/questions/14416275/error-cant-load-package-package-my-prog-found-packages-my-prog-and-main
     435[https://stackoverflow.com/questions/26694271/go-install-doesnt-create-any-bin-file]
    435436
    436437vagrant@node2:~/go/src$
Note: See TracChangeset for help on using the changeset viewer.