Changeset 33666 for other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt
- Timestamp:
- 2019-11-13T23:08:37+13:00 (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt
r33604 r33666 30 30 # However, if the seedurl's domain is an exact match on topsite-base-url, the seedurl will go 31 31 # into the file unprocessed-topsite-matches.txt and the site/page won't be crawled. 32 # - FOLLOW-LINKS-WITHIN-TOPSITE: if pages linked from the seedURL page can be followed and 33 # downloaded, as long as it's within the same subdomain matching the topsite-base-url. 32 # - FOLLOW-LINKS-WITHIN-TOPSITE: download seedURL pages and pages linked from each seedURL 33 # page should be followed and downloaded too, as long as they're within the same subdomain 34 # matching the topsite-base-url. 34 35 # This is different from SUBDOMAIN-COPY, as that can download all of a specific subdomain but 35 36 # restricts against downloading the entire domain (e.g. all pinky.blogspot.com and not anything … … 61 62 # special case 62 63 mi.centr-zashity.ru,SINGLEPAGE 64 65 # we want the http://loquevendra318.com/fox/maori.html seed URL but also 66 # pages within the following subsection 67 loquevendra318.com,loquevendra318.com/fox/maori/ 63 68 64 69 martinvrijland.nl,martinvrijland.nl/mi/
Note:
See TracChangeset
for help on using the changeset viewer.