Last change
on this file since 33561 was 33559, checked in by ak19, 5 years ago |
- Special string COPY changed to SUBDOMAIN-COPY after Dr Bainbridge explained why it was more accurate to the behaviour. 2. Comments to explain how the sites-too-big-to-exhaustively-crawl.txt should be formatted, what values are expected and how they work. 3. Special blacklisting and whitelisting of urls on yale.edu, coupled with special treatment in topsites file too.
|
File size:
565 bytes
|
Line | |
---|
1 | # URL 'whitelist': urls of these forms go into the keep pile.
|
---|
2 | # whitelist overrides blacklist and greylist.
|
---|
3 | # FORMAT:
|
---|
4 | # precede URL by ^ to greylist urls that match the given prefix
|
---|
5 | # succeed URL by $ to greylist urls that match the given suffix
|
---|
6 | # ^url$ will greylist urls that match the given url completely
|
---|
7 | # Without either ^ or $ symbol, urls containing the given url will get greylisted
|
---|
8 |
|
---|
9 | # Special exception for this url on yale.edu, since we needed to blacklist
|
---|
10 | # some particular other urls on yale.edu
|
---|
11 | http://korora.econ.yale.edu/phillips/archive/hauraki.htm
|
---|
Note:
See
TracBrowser
for help on using the repository browser.