source:
other-projects/maori-lang-detection@
35978
Name | Size | Rev | Age | Author | Last Change |
---|---|---|---|---|---|
../ | |||||
bin | 33581 | 5 years | Minor fix. Noticed when looking for work I did on MRI sentence detection | ||
ccrawl-data | 33572 | 5 years | Only meant to store the wet.gz versions of these files, not also the … | ||
conf | 33938 | 4 years | 1. Don't regenerate random sample of web page urls and full web page … | ||
hdfs-cc-work | 33913 | 4 years | 1. Adjusted table mongodb query statements to be more exact, but same … | ||
journal-paper | 33903 | 4 years | My notes when preparing for today's meetings. Some of this may be … | ||
lib | 33940 | 4 years | 1. In order to make it easier to do the manual work of inspecting 260 … | ||
logs | 33401 | 5 years | MaoriTextDetector.class file now generated inside its package folder … | ||
models-trainingdata-and-sampletxts | 33588 | 5 years | Committing the MRI sentence model that I'm actually using, the one in … | ||
mongodb-data | 34127 | 4 years | Spelling correction in filename: screeMshot to screeNshot | ||
mongodb-data-auto | 34119 | 4 years | Committing the auto-generated analysis results folder, … | ||
MoreReading | 33919 | 4 years | SummaryTool now uses the CountryCodeCountsMapData.java class to … | ||
src | 34005 | 4 years | InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead … | ||
apache-opennlp-1.9.1-bin.tar.gz | 10.6 MB | 33335 | 5 years | First java file for Māori language detection using openNLP with the … | |
crawledNode2.tar | 606.8 MB | 33800 | 5 years | Removed an adult site from crawled contents and added its url to … | |
crawledNode3.tar | 370.6 MB | 33609 | 5 years | The tar files containing the crawled sites data shouldn't be called … | |
crawledNode4.tar | 374.6 MB | 33609 | 5 years | The tar files containing the crawled sites data shouldn't be called … | |
crawledNode5.tar | 544.3 MB | 33617 | 5 years | Node5 is now full and here is the finished crawl (up to and including … | |
crawledNode6.tar | 126.0 MB | 33904 | 4 years | Shouldn't greylist anglican.org, as this prevented crawling of … | |
feasibility.txt | 761 bytes | 33394 | 5 years | 1. Started a file on feasibility with the data now available and some … | |
mri-opennlp-corpus.tar.gz | 8.3 MB | 33355 | 5 years | Changes for adding in the new gen_SentenceDetection_model.sh script, … | |
README.txt | 14.1 KB | 35529 | 3 years | Updating URL in README.txt | |
to_crawl.tar.gz | 1.4 MB | 33904 | 4 years | Shouldn't greylist anglican.org, as this prevented crawling of … |
Note:
See TracBrowser
for help on using the repository browser.