Search:
Login
Help/Guide
About Trac
Preferences
Home
Timeline
Roadmap
Browse Source
View Tickets
Search
Context Navigation
View Latest Revision
source:
gs3-extensions
/
maori-lang-detection
/
src
/
org
Revision Log Mode:
Stop on copy
Follow copies
Show only adds and deletes
View log starting at
and back to
Show at most
revisions per page.
Show full log messages
Legend:
Added
Modified
Copied or renamed
Diff
Rev
Age
Author
Log Message
(edit)
@33557
5 years
ak19
Implemented the topSitesMap of topsite domain to url pattern in the …
(edit)
@33552
5 years
ak19
1. Code now processes ccrawldata folder, containing each individual …
(edit)
@33519
5 years
ak19
Code still writes out the global seedURLs.txt and regex-urlfilter.txt …
(edit)
@33518
5 years
ak19
Intermediate commit: got the seed urls file temporarily written out as …
(edit)
@33517
5 years
ak19
1. Blacklists were introduced so that too many instances of camelcased …
(edit)
@33515
5 years
ak19
Removed an unused function
(edit)
@33503
5 years
ak19
More efficient blacklisting/greylisting/whitelisting now by reading in …
(edit)
@33501
5 years
ak19
Refactored code into 2 classes: The existing WETProcessor, which …
(edit)
@33497
5 years
ak19
First version of discard url filter file. Inefficient implementation. …
(edit)
@33488
5 years
ak19
new function createSeedURLsFiles() in WETProcessor that replaces the …
(edit)
@33480
5 years
ak19
Much harder to remove pages where words are fused together as some are …
(edit)
@33471
5 years
ak19
Very minor changes.
(edit)
@33469
5 years
ak19
Don't want URLs with the word product(s) in them (but production …
(edit)
@33468
5 years
ak19
More meaningful to (also) write out the keep vs discard URLs into keep …
(edit)
@33467
5 years
ak19
Improved the code to use a static block to load the needed properties …
(edit)
@33466
5 years
ak19
1. WETProcessor.main() now processes a folder of *.warc.wet(.gz) …
(edit)
@33465
5 years
ak19
Committing first version of the WETProcessor.java which takes a …
(edit)
@33411
5 years
ak19
Newer version now doesn't mirror sites with wget but gets WET files …
(edit)
@33410
5 years
ak19
Committing some variable name changes before I replace this file with …
(edit)
@33405
5 years
ak19
Even though we're probably not going to use this code after all, will …
(edit)
@33402
5 years
ak19
Beginnings of the Java class to wget sites and process its pages to …
(add)
@33398
5 years
ak19
Committing the actual package structure and the updated README after …
Note:
See
TracRevisionLog
for help on using the revision log.
Download in other formats:
RSS Feed
ChangeLog