Ignore:
Timestamp:
2019-10-14T21:04:58+13:00 (5 years ago)
Author:
ak19
Message:

CCWETProcessor: domain url now goes in as a seedURL after the individual seedURLs, after Dr Bainbridge explained why the original ordering didn't make sense. 2. conf: we inspected the first site to be crawled. It was a non-top site, but we still wanted to control the crawling of it in the same way we control topsites. 3. Documented use of the nutch command for testing which urls pass and fail the existing regex-urlfilter checks.

File:
1 edited

Legend:

Unmodified
Added
Removed
Note: See TracChangeset for help on using the changeset viewer.