Changeset 33569 for gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt
- Timestamp:
- 2019-10-16T20:00:09+13:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt
r33568 r33569 54 54 # NOT TOP SITES, BUT SITES WE INSPECTED AND WANT TO CONTROL SIMILARLY TO TOP SITES 55 55 00.gs,SINGLEPAGE 56 57 # May be a large site 56 # May be a large site with only seedURLs of real relevance 58 57 topographic-map.com,SINGLEPAGE 58 ami-media.net,SINGLEPAGE 59 # 2 pages of declarations of human rights in Maori, rest in other languages 60 anitra.net,SINGLEPAGE 61 # special case 62 mi.centr-zashity.ru,SINGLEPAGE 63 64 # TOP SITE BUT NOT TOP 500 65 www.tumblr.com,SINGLEPAGE 66 59 67 60 68 # TOP SITES … … 74 82 # The page's containing folder is whitelisted in case the photos are there. 75 83 korora.econ.yale.edu,SINGLEPAGE 84 76 85 77 86 000webhost.com
Note:
See TracChangeset
for help on using the changeset viewer.