Changeset 33904
- Timestamp:
-
2020-02-05T18:48:33+13:00
(4 years ago)
- Author:
- ak19
- Message:
-
Shouldn't greylist anglican.org, as this prevented crawling of justus.anglican.org seedURLs. There's however no need to add an exception into sites-too-big-to-exhaustively-crawl.txt to control how much we crawl, as we only crawl to depth 10 anyway and the seedURLs already list the most promising pages (as well as 2 URLs on anglican.org which weren't promising). Added the to_crwal and finished crawled data for this. siteID is 01463.
- Location:
- other-projects/maori-lang-detection
-
Files:
-
Changeset view not shown, since the total size (213.5 MB) exceeds 9.5 MB