Last change
on this file since 33532 was 33532, checked in by ak19, 5 years ago |
Found the other top 500 sites link again at last which Dr Bainbridge had discovered the other day. Still need to go through the links in there
|
File size:
1.4 KB
|
Line | |
---|
1 | # URL 'greylist': save matching urls to one side, to eyeball later and decide if
|
---|
2 | # they should be included after all or whether it was okay to have skipped them
|
---|
3 | # FORMAT:
|
---|
4 | # precede URL by ^ to greylist urls that match the given prefix
|
---|
5 | # succeed URL by $ to greylist urls that match the given suffix
|
---|
6 | # ^url$ will greylist urls that match the given url completely
|
---|
7 | # Without either ^ or $ symbol, urls containing the given url will get greylisted
|
---|
8 |
|
---|
9 |
|
---|
10 | # Product sites: unwanted auto-translation pages of online product stores
|
---|
11 | /product/
|
---|
12 | /products/
|
---|
13 | /product-page/
|
---|
14 | /product-category/
|
---|
15 |
|
---|
16 | # Add alexa top sites to greylist
|
---|
17 |
|
---|
18 | youtube.com
|
---|
19 | tmall.com
|
---|
20 | baidu.com
|
---|
21 | qq.com
|
---|
22 | sohu.com
|
---|
23 | facebook.com
|
---|
24 | taobao.com
|
---|
25 | #login.tmall.com
|
---|
26 | wikipedia.org
|
---|
27 | yahoo.com
|
---|
28 | 360.cn
|
---|
29 | jd.com
|
---|
30 | amazon.com
|
---|
31 | Sina.com.cn
|
---|
32 | weibo.com
|
---|
33 | #pages.tmall.com
|
---|
34 | live.com
|
---|
35 | vk.com
|
---|
36 | netflix.com
|
---|
37 | alipay.com
|
---|
38 | office.com
|
---|
39 | okezone.com
|
---|
40 | csdn.net
|
---|
41 | instagram.com
|
---|
42 | xinhuanet.com
|
---|
43 | babytree.com
|
---|
44 | twitter.com
|
---|
45 | ebay.com
|
---|
46 | stackoverflow.com
|
---|
47 | naver.com
|
---|
48 | aliexpress.com
|
---|
49 | twitch.tv
|
---|
50 | tribunnews.com
|
---|
51 | apple.com
|
---|
52 | soso.com
|
---|
53 | tianya.cn
|
---|
54 | microsoftonline.com
|
---|
55 | yandex.ru
|
---|
56 |
|
---|
57 | # Remaining top sites from https://en.wikipedia.org/wiki/List_of_most_popular_websites
|
---|
58 |
|
---|
59 | ok.ru
|
---|
60 | paypal.com
|
---|
61 | t.co
|
---|
62 | pinterest.com
|
---|
63 | sogou.com
|
---|
64 | espn.com
|
---|
65 | walmart.com
|
---|
66 | bitly.com
|
---|
67 | ampproject.org
|
---|
68 | sm.cn
|
---|
69 |
|
---|
70 |
|
---|
71 |
|
---|
72 | # UNSURE - what if these contain translated pages?
|
---|
73 | google.com
|
---|
74 | bing.com
|
---|
75 | amazon.co
|
---|
76 | msn.com
|
---|
77 | microsoft.com
|
---|
78 | accuweather.com
|
---|
79 |
|
---|
80 | #nasa.gov
|
---|
81 | # w3schools.com
|
---|
82 | # quora.com
|
---|
83 | #reddit.com
|
---|
84 | #blogspot.com
|
---|
85 | #yahoo.co.
|
---|
86 |
|
---|
87 |
|
---|
88 | ## TODO: Get more from https://moz.com/top500
|
---|
Note:
See
TracBrowser
for help on using the repository browser.