Last change
on this file since 33550 was 33550, checked in by ak19, 5 years ago |
First stage of introducing sites-too-big-to-exhaustively-crawl.tx: split url-greylist-filter.txt into true greylisted sites (product sites so far) and the existing top sites urls that simply represent sites too big to crawl in entirety.
|
File size:
575 bytes
|
Line | |
---|
1 | # URL 'greylist': save matching urls to one side, to eyeball later and decide if
|
---|
2 | # they should be included after all or whether it was okay to have skipped them
|
---|
3 | # FORMAT:
|
---|
4 | # precede URL by ^ to greylist urls that match the given prefix
|
---|
5 | # succeed URL by $ to greylist urls that match the given suffix
|
---|
6 | # ^url$ will greylist urls that match the given url completely
|
---|
7 | # Without either ^ or $ symbol, urls containing the given url will get greylisted
|
---|
8 |
|
---|
9 |
|
---|
10 | # Product sites: unwanted auto-translation pages of online product stores
|
---|
11 | /product/
|
---|
12 | /products/
|
---|
13 | /product-page/
|
---|
14 | /product-category/
|
---|
Note:
See
TracBrowser
for help on using the repository browser.