source: other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @34005   4 years ak19 InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead …
(edit) @33988   4 years ak19 1. Print out which web pages of which web site's dump.txt were empty. …
(edit) @33983   4 years ak19 More sensible name for method which had too long kept its old name …
(edit) @33909   4 years ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33906   4 years ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
(edit) @33811   4 years ak19 Returning to using a single variable, urlContainsLangCodeInPath, to …
(edit) @33810   4 years ak19 Bugfix: mi in url path should be checked for for each page of site, …
(edit) @33808   4 years ak19 Storing not just whether /mi(/) suffix is in path, but also whether …
(edit) @33801   4 years ak19 1. NutchTextDumpToMongoDB Added an extra field to each document in …
(edit) @33800   4 years ak19 Removed an adult site from crawled contents and added its url to …
(edit) @33698   4 years ak19 Links to more reading
(edit) @33674   4 years ak19 Changes to support the top 5 predicted langcodes and their confidence …
(edit) @33657   4 years ak19 Some fixes after brief testing against 1/3 of the crawl. Restarted …
(edit) @33656   4 years ak19 Final minor changes before I start processing the crawls of node2.
(edit) @33655   4 years ak19 Minor change to print statement
(edit) @33652   4 years ak19 Introducing morphia subpackage
(copy) @33635   4 years ak19 Maori-language-detection doesn't use Greenstone 3 at present, it's not …
copied from gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java
(add) @33634   4 years ak19 Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which …
Note: See TracRevisionLog for help on using the revision log.