source:
gs3-extensions/maori-lang-detection@
33588
Name | Size | Rev | Age | Author | Last Change |
---|---|---|---|---|---|
../ | |||||
src | 33587 | 5 years | 1. Better stats reporting on crawled sites: not just if a page was in … | ||
MoreReading | 33565 | 5 years | CCWETProcessor: domain url now goes in as a seedURL after the … | ||
models-trainingdata-and-sampletxts | 33588 | 5 years | Committing the MRI sentence model that I'm actually using, the one in … | ||
logs | 33401 | 5 years | MaoriTextDetector.class file now generated inside its package folder … | ||
lib | 33562 | 5 years | 1. The sites-too-big-to-exhaustively-crawl.txt is now a csv file of a … | ||
hdfs-cc-work | 33574 | 5 years | If nutch stores a crawled site in more than 1 file, then cat all of … | ||
conf | 33569 | 5 years | 1. batchcrawl.sh now does what it should have from the start, which is … | ||
ccrawl-data | 33572 | 5 years | Only meant to store the wet.gz versions of these files, not also the … | ||
bin | 33581 | 5 years | Minor fix. Noticed when looking for work I did on MRI sentence detection | ||
README.txt | 14.0 KB | 33398 | 5 years | Committing the actual package structure and the updated README after … | |
mri-opennlp-corpus.tar.gz | 8.3 MB | 33355 | 5 years | Changes for adding in the new gen_SentenceDetection_model.sh script, … | |
feasibility.txt | 761 bytes | 33394 | 5 years | 1. Started a file on feasibility with the data now available and some … | |
apache-opennlp-1.9.1-bin.tar.gz | 10.6 MB | 33335 | 5 years | First java file for Māori language detection using openNLP with the … |
Note:
See TracBrowser
for help on using the repository browser.