Ignore:
Timestamp:
2019-10-10T23:44:31+13:00 (5 years ago)
Author:
ak19
Message:
  1. Special string COPY changed to SUBDOMAIN-COPY after Dr Bainbridge explained why it was more accurate to the behaviour. 2. Comments to explain how the sites-too-big-to-exhaustively-crawl.txt should be formatted, what values are expected and how they work. 3. Special blacklisting and whitelisting of urls on yale.edu, coupled with special treatment in topsites file too.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt

    r33556 r33559  
    66# Without either ^ or $ symbol, urls containing the given url will get blacklisted
    77
     8
     9# manually adjusting for irrelevant topsite hits
     10# Rapa-Nui is related to Easter Island
     11^http://codex.cs.yale.edu/avi/silberschatz/gallery/trips-photos/South-America/Rapa-Nui/
     12
     13# We will blacklist this yale.edu domain except for the subportion that gets whitelisted
     14# then in the sites-too-big-to-exhaustively-crawl.txt, we have a mapping for an allowed url
     15# pattern in case elements on the page are stored elsewhere
     16^http://korora.econ.yale.edu/
    817
    918# wikipedia pages in
Note: See TracChangeset for help on using the changeset viewer.