Ignore:
Timestamp:
2019-10-24T23:22:30+13:00 (4 years ago)
Author:
ak19
Message:
  1. Better output into possible-product-sites.txt including the overseas country code prefix to help decide whether the site is worth keeping or not. 2. Updated whitelisting and top-sites filters to grab the /mi/ subsections of sites that don't appear to be autotranslated. This is done in preparation for blocking out product sites hereafter
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt

    r33569 r33604  
    22# whitelist overrides blacklist and greylist.
    33# FORMAT:
    4 # precede URL by ^ to greylist urls that match the given prefix
    5 # succeed URL by $ to greylist urls that match the given suffix
    6 # ^url$ will greylist urls that match the given url completely
    7 # Without either ^ or $ symbol, urls containing the given url will get greylisted
     4# precede URL by ^ to whitelist urls that match the given prefix
     5# succeed URL by $ to whitelist urls that match the given suffix
     6# ^url$ will whitelist urls that match the given url completely
     7# Without either ^ or $ symbol, urls containing the given url will get whitelisted
    88
    99# Special exception for this url on yale.edu, since we needed to blacklist
     
    1515http://www.krassotkin.ru/sites/prayer.su/maori/
    1616https://mi.centr-zashity.ru/
     17
     18
     19
     20# WHITELIST WEBSITES THAT HAVE NON-AUTOMATED /mi/ SUBSECTIONS
     21# WE CONTROL WHAT PART OF THEM WILL BE DOWNLOADED (THE /mi SUBSECTION)
     22# IN sites-too-big-to-exhaustively-crawl.txt
     23#https://www.martinvrijland.nl/mi/te-mana-hinengaro/Ko-te-nuinga-ake-o-nga-tangata-kei-te-timata-ki-te-kite-kei-te-noho-tatou-i-roto-i-te-whakaata-ko-te-aha-tenei/
     24#https://www.csunplugged.org/mi/principles/
     25#http://www.gpedia.com/mi/gpedia/Reo_M%C4%81ori
     26
     27https://www.martinvrijland.nl
     28https://www.csunplugged.org
     29http://www.gpedia.com
     30
     31
Note: See TracChangeset for help on using the changeset viewer.