1 | # URL 'greylist': save matching urls to one side, to eyeball later and decide if
|
---|
2 | # they should be included after all or whether it was okay to have skipped them
|
---|
3 | # FORMAT:
|
---|
4 | # precede URL by ^ to greylist urls that match the given prefix
|
---|
5 | # succeed URL by $ to greylist urls that match the given suffix
|
---|
6 | # ^url$ will greylist urls that match the given url completely
|
---|
7 | # Without either ^ or $ symbol, urls containing the given url will get greylisted
|
---|
8 |
|
---|
9 |
|
---|
10 | # Product sites: unwanted auto-translation pages of online product stores and other websites
|
---|
11 | /product/
|
---|
12 | /products/
|
---|
13 | /product-page/
|
---|
14 | /product-category/
|
---|
15 | ledlamp.china-led-lighting.com
|
---|
16 | ledpar64.china-led-lighting.com
|
---|
17 | ledwallwasher.china-led-lighting.com
|
---|
18 | abacre.com
|
---|
19 | cn-huafu.net
|
---|
20 | apteka.social
|
---|
21 |
|
---|
22 |
|
---|
23 | # not product stores but autotranslated?
|
---|
24 | 192-168-1-1l.com
|
---|
25 | 19216811login.club
|
---|
26 | 19216811login.club
|
---|
27 | 1videosmusica.com
|
---|
28 | 256file.com
|
---|
29 | # already in greylisting of all .ru
|
---|
30 | #7773033.ru
|
---|
31 | #abali.ru
|
---|
32 | #allbeautyone.ru
|
---|
33 | aqualuz.org
|
---|
34 |
|
---|
35 | # if page doesn't load and can't be tested
|
---|
36 | 1videosmusica.com
|
---|
37 | www.kiterewa.pl
|
---|
38 |
|
---|
39 |
|
---|
40 |
|
---|
41 | # MANUALLY INSPECTED URLS AND ADDED TO GREYLIST
|
---|
42 |
|
---|
43 | # license plate site? - already in greylisting of all .ru
|
---|
44 | #eba.com.ru
|
---|
45 |
|
---|
46 | # As per archive.org, there's just a photo on the defunct page at this site
|
---|
47 | # And the picture label and filename is probably Japanese
|
---|
48 | agri.mine.utsunomiya-u.ac.jp
|
---|
49 |
|
---|
50 | # seems to be Indonesian or Malaysian Bible rather than in Maori or any Polynesian language
|
---|
51 | alkitab.life:2022
|
---|
52 |
|
---|
53 | # appears defunct
|
---|
54 | alixira.com
|
---|
55 |
|
---|
56 | # single seedURL was not a page in Maori, but global languages.
|
---|
57 | # And the rest of the domain appears to be in English.
|
---|
58 | #anglican.org
|
---|
59 | # but we want the seedURLs from justus.anglican.org,
|
---|
60 | # so grab anglican.org anyway
|
---|
61 |
|
---|
62 |
|
---|
63 | ### TLDs that we greylist - any exceptions will be in the whitelist
|
---|
64 | # Our list of .ru and .pl domains were not relevant
|
---|
65 | .ru/
|
---|
66 | .pl/
|
---|
67 | .tk/
|
---|