Changeset 33904

Timestamp:
05.02.2020 18:48:33 (2 weeks ago)
Author:
ak19
Message:

Shouldn't greylist anglican.org, as this prevented crawling of justus.anglican.org seedURLs. There's however no need to add an exception into sites-too-big-to-exhaustively-crawl.txt to control how much we crawl, as we only crawl to depth 10 anyway and the seedURLs already list the most promising pages (as well as 2 URLs on anglican.org which weren't promising). Added the to_crwal and finished crawled data for this. siteID is 01463.

Location:
other-projects/maori-lang-detection
Files:
4 modified