source: gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt@ 33550

Last change on this file since 33550 was 33550, checked in by ak19, 5 years ago

First stage of introducing sites-too-big-to-exhaustively-crawl.tx: split url-greylist-filter.txt into true greylisted sites (product sites so far) and the existing top sites urls that simply represent sites too big to crawl in entirety.

File size: 575 bytes
Line 
1# URL 'greylist': save matching urls to one side, to eyeball later and decide if
2# they should be included after all or whether it was okay to have skipped them
3# FORMAT:
4# precede URL by ^ to greylist urls that match the given prefix
5# succeed URL by $ to greylist urls that match the given suffix
6# ^url$ will greylist urls that match the given url completely
7# Without either ^ or $ symbol, urls containing the given url will get greylisted
8
9
10# Product sites: unwanted auto-translation pages of online product stores
11/product/
12/products/
13/product-page/
14/product-category/
Note: See TracBrowser for help on using the repository browser.