Search:
Login
Help/Guide
About Trac
Preferences
Home
Timeline
Roadmap
Browse Source
View Tickets
Search
Context Navigation
View Latest Revision
source:
gs3-extensions
/
maori-lang-detection
/
src
/
org
/
greenstone
/
atea
/
CCWETProcessor.java
Revision Log Mode:
Stop on copy
Follow copies
Show only adds and deletes
View log starting at
and back to
Show at most
revisions per page.
Show full log messages
Legend:
Added
Modified
Copied or renamed
Diff
Rev
Age
Author
Log Message
(edit)
@33615
5 years
ak19
1. Worked out how to configure log4j to log both to console and …
(edit)
@33604
5 years
ak19
1. Better output into possible-product-sites.txt including the …
(edit)
@33603
5 years
ak19
Incorporating Dr Nichols suggestion to help weed out product sites: if …
(edit)
@33582
5 years
ak19
NutchTextDumpProcessor
prints each crawled site's stats: number of …
(edit)
@33575
5 years
ak19
Correcting usage string for CCWETProcessor before committing new java …
(edit)
@33573
5 years
ak19
Forgot to document that spaces were also allowed as separator in the …
(edit)
@33569
5 years
ak19
1. batchcrawl.sh now does what it should have from the start, which is …
(edit)
@33568
5 years
ak19
1. More sites greylisted and blacklisted, discovered as I attempted to …
(edit)
@33565
5 years
ak19
CCWETProcessor: domain url now goes in as a seedURL after the …
(edit)
@33562
5 years
ak19
1. The sites-too-big-to-exhaustively-crawl.txt is now a csv file of a …
(edit)
@33561
5 years
ak19
1. sites-too-big-to-exhaustively-crawl.txt is now a comma separated …
(edit)
@33560
5 years
ak19
1. Incorporated Dr Bainbridge's suggested improvements: only when …
(edit)
@33557
5 years
ak19
Implemented the topSitesMap of topsite domain to url pattern in the …
(edit)
@33552
5 years
ak19
1. Code now processes ccrawldata folder, containing each individual …
(edit)
@33519
5 years
ak19
Code still writes out the global seedURLs.txt and regex-urlfilter.txt …
(edit)
@33518
5 years
ak19
Intermediate commit: got the seed urls file temporarily written out as …
(edit)
@33517
5 years
ak19
1. Blacklists were introduced so that too many instances of camelcased …
(edit)
@33515
5 years
ak19
Removed an unused function
(edit)
@33503
5 years
ak19
More efficient blacklisting/greylisting/whitelisting now by reading in …
(add)
@33501
5 years
ak19
Refactored code into 2 classes: The existing WETProcessor, which …
Note:
See
TracRevisionLog
for help on using the revision log.
Download in other formats:
RSS Feed
ChangeLog