source: other-projects/maori-lang-detection@ 33919

Name Size Rev Age Author Last Change
../
bin 33581   5 years ak19 Minor fix. Noticed when looking for work I did on MRI sentence detection
ccrawl-data 33572   5 years ak19 Only meant to store the wet.gz versions of these files, not also the …
conf 33904   4 years ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
hdfs-cc-work 33913   4 years ak19 1. Adjusted table mongodb query statements to be more exact, but same …
journal-paper 33903   4 years ak19 My notes when preparing for today's meetings. Some of this may be …
lib 33919   4 years ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
logs 33401   5 years ak19 MaoriTextDetector.class file now generated inside its package folder …
models-trainingdata-and-sampletxts 33588   4 years ak19 Committing the MRI sentence model that I'm actually using, the one in …
mongodb-data 33918   4 years ak19 Country codes added to each domain's URL of the manual site/domain …
MoreReading 33919   4 years ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
src 33919   4 years ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
apache-opennlp-1.9.1-bin.tar.gz 10.6 MB 33335   5 years ak19 First java file for Māori language detection using openNLP with the …
crawledNode2.tar 606.8 MB 33800   4 years ak19 Removed an adult site from crawled contents and added its url to …
crawledNode3.tar 370.6 MB 33609   4 years ak19 The tar files containing the crawled sites data shouldn't be called …
crawledNode4.tar 374.6 MB 33609   4 years ak19 The tar files containing the crawled sites data shouldn't be called …
crawledNode5.tar 544.3 MB 33617   4 years ak19 Node5 is now full and here is the finished crawl (up to and including …
crawledNode6.tar 126.0 MB 33904   4 years ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
feasibility.txt 761 bytes 33394   5 years ak19 1. Started a file on feasibility with the data now available and some …
mri-opennlp-corpus.tar.gz 8.3 MB 33355   5 years ak19 Changes for adding in the new gen_SentenceDetection_model.sh script, …
README.txt 14.0 KB 33398   5 years ak19 Committing the actual package structure and the updated README after …
to_crawl.tar.gz 1.4 MB 33904   4 years ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
Note: See TracBrowser for help on using the repository browser.