Ignore:
Timestamp:
2019-10-14T23:36:54+13:00 (5 years ago)
Author:
ak19
Message:
  1. More sites greylisted and blacklisted, discovered as I attempted to crawl them and afterwards learnt to investigate sites first. Should all .ru and .pl domains be on the greylist? 2. Adjusted instruction comments in CCWETProcessor for compiling and running
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt

    r33554 r33568  
    88
    99
    10 # Product sites: unwanted auto-translation pages of online product stores
     10# Product sites: unwanted auto-translation pages of online product stores and other websites
    1111/product/
    1212/products/
     
    1616ledpar64.china-led-lighting.com
    1717ledwallwasher.china-led-lighting.com
     18abacre.com
     19cn-huafu.net
     20
     21# not product stores but autotranslated?
     22192-168-1-1l.com
     2319216811login.club
     2419216811login.club
     251videosmusica.com
     26256file.com
     277773033.ru
     28abali.ru
     29allbeautyone.ru
     30
     31# if page doesn't load and can't be tested
     321videosmusica.com
     33www.kiterewa.pl
     34
     35# license plate site?
     36eba.com.ru
     37
     38# As per archive.org, there's just a photo on the defunct page at this site
     39# And the picture label and filename is probably Japanese
     40agri.mine.utsunomiya-u.ac.jp
     41
     42# seems to be Indonesian or Malaysian Bible rather than in Maori or any Polynesian language
     43alkitab.life:2022
Note: See TracChangeset for help on using the changeset viewer.