source: gs3-extensions/maori-lang-detection/MoreReading/other.txt@ 33376

Last change on this file since 33376 was 33376, checked in by ak19, 5 years ago

Links and extracts I've read so far on the Web Curator Tool (WCT), Heritrix, CommonCrawl and the related WebDataCommons.

File size: 275 bytes
Line 
1https://codereview.stackexchange.com/questions/198343/crawl-and-gather-all-the-urls-recursively-in-a-domain
2http://lucene.472066.n3.nabble.com/Using-nutch-just-for-the-crawler-fetcher-td611918.html
3
4https://www.quora.com/What-are-some-Web-crawler-tips-to-avoid-crawler-traps
Note: See TracBrowser for help on using the repository browser.