# # ChangeLog for / # # Generated by Trac 1.4.2 # 2024-06-23T15:47:31+12:00 Sun, 10 Nov 2019 21:04:37 GMT kjdon [33636] * main/trunk/greenstone3/web/interfaces/default/transform/pages/about.xsl (modified) include means the stylesheet gets added inline, import mea s it gets ... Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Fri, 08 Nov 2019 10:59:07 GMT ak19 [33634] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/SentenceInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java (added) Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which ... Fri, 08 Nov 2019 06:43:39 GMT ak19 [33633] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (moved) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) 1. TextLanguageDetector now has methods for collecting all sentences ... Thu, 07 Nov 2019 01:53:54 GMT kjdon [33632] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/core/TransformingReceptionist.java (modified) overhaul of TransformingReceptionist. changed the order of inlining ... Thu, 07 Nov 2019 01:52:21 GMT kjdon [33631] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/XMLTransformer.java (modified) added a bit more error reporting Thu, 07 Nov 2019 01:44:16 GMT kjdon [33630] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSXSLT.java (modified) minor comment changes Thu, 07 Nov 2019 01:20:36 GMT kjdon [33629] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSXML.java (modified) added methods using Parameter2 - for params with text node values Thu, 07 Nov 2019 00:52:27 GMT kjdon [33628] * main/trunk/greenstone3/web/interfaces/default/transform/pages/query.xsl (modified) not sure why documentNode was a gsf:template here. Can't be like that ... Wed, 06 Nov 2019 20:28:41 GMT kjdon [33627] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSFile.java (modified) removed unnecessary comments Tue, 05 Nov 2019 08:59:46 GMT ak19 [33626] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) TODOs Tue, 05 Nov 2019 08:58:44 GMT ak19 [33625] * gs3-extensions/maori-lang-detection/conf/keep-since-not-product-sites.txt (added) * gs3-extensions/maori-lang-detection/conf/possible-product-sites.txt (added) A file listing domains with seedurls containing /mi(/) that are ... Tue, 05 Nov 2019 08:48:50 GMT ak19 [33624] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) Some cleanup surrounding the now renamed function createSeedURLsFile, ... Tue, 05 Nov 2019 08:04:09 GMT ak19 [33623] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) 1. Incorporated Dr Nichols earlier suggestion of storing page ... Tue, 05 Nov 2019 02:42:46 GMT ak19 [33622] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (moved) File rename Mon, 04 Nov 2019 07:35:59 GMT ak19 [33621] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) Comitting jotted down mongodb related instructions from what Dr ... Mon, 04 Nov 2019 01:24:25 GMT ak19 [33620] * gs3-extensions/maori-lang-detection/crawledNode6.tar (added) Final crawl, done on vagrant VM node6. Crawl site IDs 01407-01462. Sun, 03 Nov 2019 22:36:56 GMT kjdon [33619] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/core/URLFilter.java (modified) need to handle the case where a collection file (eg image) gets ... Fri, 01 Nov 2019 07:14:18 GMT ak19 [33618] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Adding in the download URL Fri, 01 Nov 2019 04:13:18 GMT ak19 [33617] * gs3-extensions/maori-lang-detection/crawledNode5.tar (modified) Node5 is now full and here is the finished crawl (up to and including ... Thu, 31 Oct 2019 07:05:07 GMT ak19 [33616] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBConnection.java (added) Beginnings of Java class that is to interact with MongoDB. I don't ... Thu, 31 Oct 2019 07:03:55 GMT ak19 [33615] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/conf/log4j.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) 1. Worked out how to configure log4j to log both to console and ... Wed, 30 Oct 2019 22:22:21 GMT kjdon [33614] * main/trunk/greenstone3/web/interfaces/default/transform/config_format.xsl (modified) added a new line Wed, 30 Oct 2019 22:18:44 GMT kjdon [33613] * main/trunk/greenstone2/collect/modelcol/etc/collectionConfig.xml (modified) added allowdocumentediting and allowmapgpsediting options, plus also ... Wed, 30 Oct 2019 22:00:37 GMT kjdon [33612] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/LibraryServlet.java (modified) work to do with params. add in default values to params if they are ... Wed, 30 Oct 2019 21:55:04 GMT kjdon [33611] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSParams.java (modified) added global setting to params - thesea re for params that are valid ... Wed, 30 Oct 2019 21:54:05 GMT kjdon [33610] * main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSXML.java (modified) USER_SESSION_CACHE_ATT moved to GSParams, as it is stored in session ... Wed, 30 Oct 2019 10:03:19 GMT ak19 [33609] * gs3-extensions/maori-lang-detection/crawledNode2.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode3.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode4.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode5.tar (added) The tar files containing the crawled sites data shouldn't be called ... Wed, 30 Oct 2019 10:02:26 GMT ak19 [33608] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/exportHBase.sh (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) 1. New script to export from HBase so that we could in theory ... Tue, 29 Oct 2019 05:33:49 GMT ak19 [33607] * gs3-extensions/maori-lang-detection/crawledNode4.tar.gz (modified) Updated with the remaining successfully crawled sites on node4 before ... Tue, 29 Oct 2019 02:18:51 GMT ak19 [33606] * gs3-extensions/maori-lang-detection/crawledNode2.tar.gz (moved) * gs3-extensions/maori-lang-detection/crawledNode3.tar.gz (added) 1. Committing crawl data from node3 (2nd VM for nutch crawling). 2. ... Tue, 29 Oct 2019 01:54:24 GMT ak19 [33605] * gs3-extensions/maori-lang-detection/crawledNode4.tar.gz (added) Node 4 VM still works, but committing first set of crawled sites on there Thu, 24 Oct 2019 10:22:30 GMT ak19 [33604] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) 1. Better output into possible-product-sites.txt including the ... Thu, 24 Oct 2019 09:04:37 GMT ak19 [33603] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/GeoLiteCity.dat (added) * gs3-extensions/maori-lang-detection/lib/geoip-api-1.2.10.jar (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) Incorporating Dr Nichols suggestion to help weed out product sites: ... Wed, 23 Oct 2019 10:49:34 GMT ak19 [33602] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) 1. The final csv file, mri-sentences.csv, is now written out. 2. Only ... Wed, 23 Oct 2019 10:22:14 GMT ak19 [33601] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) Creates the 2nd csv file, with info about webpages. At present stores ... Wed, 23 Oct 2019 10:05:38 GMT ak19 [33600] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) Work in progress of writing out CSV files. In future, may write the ... Tue, 22 Oct 2019 07:49:48 GMT ak19 [33599] * gs3-extensions/maori-lang-detection/crawled-1-of-3.tar.gz (added) First one-third sites crawled. Committing to SVN despite the tarred ... Tue, 22 Oct 2019 07:19:54 GMT ak19 [33598] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/vagrant-for-nutch2.tar.gz (modified) More instructions on setting up Nutch now that I've remembered to ... Tue, 22 Oct 2019 07:05:50 GMT ak19 [33597] * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) Committing active version of template file which has a newline at end ... Tue, 22 Oct 2019 05:44:05 GMT ak19 [33596] * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/nutch-site.xml (added) * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (added) Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template ... Tue, 22 Oct 2019 01:05:46 GMT kjdon [33595] * main/trunk/greenstone3/web/interfaces/default/transform/gslib.xsl (modified) new displayBaskets template - to avoid replicating code in query and ... Tue, 22 Oct 2019 01:00:34 GMT kjdon [33594] * main/trunk/greenstone3/web/interfaces/default/transform/pages/classifier.xsl (modified) call gslib:displayBasket instead of replicating the code here Tue, 22 Oct 2019 00:59:53 GMT kjdon [33593] * main/trunk/greenstone3/web/interfaces/default/transform/pages/query.xsl (modified) the test for facets should be facetList/facet/count, as the facets ... Tue, 22 Oct 2019 00:51:02 GMT kjdon [33592] * main/trunk/greenstone3/web/interfaces/default/transform/pages/query.xsl (modified) reindented the file Mon, 21 Oct 2019 22:51:11 GMT kjdon [33591] * main/trunk/greenstone3/web/WEB-INF/classes/interface_default.properties (modified) added in some strings for 'this collection contains x documents and ... Mon, 21 Oct 2019 22:12:22 GMT kjdon [33590] * main/trunk/greenstone3/web/sites/localsite/collect/lucene-jdbm-demo/etc/collectionConfig.xml (modified) added 'this colleciton contains X documents and was last build Y days ... Mon, 21 Oct 2019 08:45:10 GMT cpb16 [33589] * other-projects/is-sheet-music-encore/trunk/COMPX520-MAP-DOWNLOADER-PNG.sh (added) * other-projects/is-sheet-music-encore/trunk/COMPX520-MAP-RUN-PNG-hi-res.sh (added) * other-projects/is-sheet-music-encore/trunk/EndToEndSystem.sh (modified) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/AppendixFormattedListGenerator.class (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/AppendixFormattedListGenerator.java (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/BatchFORMATTED.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/BookIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/FORMATTEDBookIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/FORMATTEDMusicIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/FORMATTEDlSerialIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/Makefile (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/MapIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/MusicIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/FormattedListForAppendix/SerialIDList.txt (added) * other-projects/is-sheet-music-encore/trunk/Makefile (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/#test.txt# (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/.idea/workspace.xml (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/000 Inverse Binarized Original.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/001 De-noise.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/002 heal objects in mask.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/003 Isolate large.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/100 Large Items Removed.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/202 heal objects in mask.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/203 Open.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/204 Dilate.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/205 Close Again (Final).jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/4000 Rect found.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/houghtest-bin.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/houghtest-lines.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/image-identification-development/src/Main.java (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/image-identification-development/src/MainMorph.java (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/original.jpg (added) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/out/production/image-identification-dev-02/Main.class (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/out/production/image-identification-dev-02/MainMorph.class (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-dev-02/test.zip (added) * other-projects/is-sheet-music-encore/trunk/image-identification-development/Makefile~ (added) * other-projects/is-sheet-music-encore/trunk/image-identification-development/backup (added) * other-projects/is-sheet-music-encore/trunk/image-identification-development/backup/MainBackup.java (added) * other-projects/is-sheet-music-encore/trunk/image-identification-development/backup/MainHoughLine.java (added) * other-projects/is-sheet-music-encore/trunk/image-identification-development/backup/MainWithOldComments.java (added) * other-projects/is-sheet-music-encore/trunk/image-identification-terminal/Makefile (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-terminal/javaAccuracyCalculator.class (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-terminal/javaAccuracyCalculator.java (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-terminal/javaClassifierComparison.java (modified) * other-projects/is-sheet-music-encore/trunk/image-identification-terminal/runClassifer.sh (modified) final01. Need Map results still Fri, 18 Oct 2019 10:20:09 GMT ak19 [33588] * gs3-extensions/maori-lang-detection/models-trainingdata-and-sampletxts/mri-sent_trained.bin (modified) Committing the MRI sentence model that I'm actually using, the one in ... Fri, 18 Oct 2019 10:16:25 GMT ak19 [33587] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) 1. Better stats reporting on crawled sites: not just if a page was in ... Fri, 18 Oct 2019 09:20:06 GMT ak19 [33586] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (added) Refactored MaoriTextDetector.java class into more general ... Fri, 18 Oct 2019 08:41:32 GMT ak19 [33585] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Much simpler way of using sentence and language detection model to ... Fri, 18 Oct 2019 08:20:39 GMT ak19 [33584] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Committing experimental version 2 using the sentence detector model, ... Fri, 18 Oct 2019 08:20:18 GMT ak19 [33583] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Committing experimental version 1 using the sentence detector model, ... Thu, 17 Oct 2019 10:12:38 GMT ak19 [33582] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) NutchTextDumpProcessor prints each crawled site's stats: number of ... Thu, 17 Oct 2019 08:53:20 GMT ak19 [33581] * gs3-extensions/maori-lang-detection/bin/script/gen_SentenceDetection_model.sh (modified) Minor fix. Noticed when looking for work I did on MRI sentence detection Thu, 17 Oct 2019 08:44:46 GMT ak19 [33580] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) Finally fixed the thus-far identified bugs when parsing dump.txt. Thu, 17 Oct 2019 08:05:21 GMT ak19 [33579] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) Debugging. Solved one problem. Thu, 17 Oct 2019 06:31:53 GMT ak19 [33578] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) Corrections for compiling the 2 new classes. Thu, 17 Oct 2019 06:12:15 GMT ak19 [33577] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Forgot to adjust usage statement to say that silent mode was already ... Wed, 16 Oct 2019 10:37:41 GMT ak19 [33576] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (added) Introducing 2 new Java files still being written and untested. ... Wed, 16 Oct 2019 10:36:20 GMT ak19 [33575] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) Correcting usage string for CCWETProcessor before committing new java ... Wed, 16 Oct 2019 10:35:45 GMT ak19 [33574] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) If nutch stores a crawled site in more than 1 file, then cat all of ... Wed, 16 Oct 2019 08:39:56 GMT ak19 [33573] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) Forgot to document that spaces were also allowed as separator in the ... Wed, 16 Oct 2019 08:18:38 GMT ak19 [33572] * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102618-000000.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102621-000001.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000002.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000003.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103611-000004.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103613-000005.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000006.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000007.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104558-000009.warc.wet (deleted) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104559-000008.warc.wet (deleted) Only meant to store the wet.gz versions of these files, not also the ... Wed, 16 Oct 2019 08:11:26 GMT ak19 [33571] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) Adding Dr Bainbridge's suggestion of appending the crawlId of each ... Wed, 16 Oct 2019 07:04:44 GMT ak19 [33570] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) Need to check if UNFINISHED file actually exists before moving it ... Wed, 16 Oct 2019 07:00:09 GMT ak19 [33569] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. batchcrawl.sh now does what it should have from the start, which ... Mon, 14 Oct 2019 10:36:54 GMT ak19 [33568] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. More sites greylisted and blacklisted, discovered as I attempted ... Mon, 14 Oct 2019 09:40:22 GMT ak19 [33567] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) batchcrawl.sh now supports -all flag (and prints usage on 0 args). ... Mon, 14 Oct 2019 09:07:45 GMT ak19 [33566] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) batchcrawl.sh script now supports taking a comma or space separated ... Mon, 14 Oct 2019 08:04:58 GMT ak19 [33565] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) CCWETProcessor: domain url now goes in as a seedURL after the ... Mon, 14 Oct 2019 08:01:17 GMT ak19 [33564] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) batchcrawl.sh now does the crawl and logs output of the crawl, dumps ... Fri, 11 Oct 2019 10:29:40 GMT ak19 [33563] * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (added) Committing inactive testing batch scripts (only creates the regex- ... Fri, 11 Oct 2019 08:52:40 GMT ak19 [33562] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/lib/LICENSE.txt (added) * gs3-extensions/maori-lang-detection/lib/NOTICE.txt (added) * gs3-extensions/maori-lang-detection/lib/commons-csv-1.7.jar (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. The sites-too-big-to-exhaustively-crawl.txt is now a csv file of a ... Fri, 11 Oct 2019 07:49:05 GMT ak19 [33561] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. sites-too-big-to-exhaustively-crawl.txt is now a comma separated ... Thu, 10 Oct 2019 10:49:58 GMT ak19 [33560] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) 1. Incorporated Dr Bainbridge's suggested improvements: only when ... Thu, 10 Oct 2019 10:44:31 GMT ak19 [33559] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (modified) 1. Special string COPY changed to SUBDOMAIN-COPY after Dr Bainbridge ... Thu, 10 Oct 2019 10:41:36 GMT ak19 [33558] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) Committing cumulative changes since last commit. Wed, 09 Oct 2019 10:10:06 GMT ak19 [33557] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) Implemented the topSitesMap of topsite domain to url pattern in the ... Wed, 09 Oct 2019 05:58:30 GMT ak19 [33556] * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) Blacklisted wikipedia pages that are actually in other languages ... Wed, 09 Oct 2019 05:43:47 GMT ak19 [33555] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Modified top sites list as Dr Bainbridge described: suffixes for the ... Wed, 09 Oct 2019 05:11:19 GMT ak19 [33554] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-blacklist-filter.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) Added more to blacklist and greylist. And removed remaining ... Fri, 04 Oct 2019 09:19:20 GMT ak19 [33553] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Comments Fri, 04 Oct 2019 09:00:46 GMT ak19 [33552] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) 1. Code now processes ccrawldata folder, containing each individual ... Fri, 04 Oct 2019 06:35:06 GMT ak19 [33551] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) Added in top 500 urls from moz.com/top500 and removed duplicates, and ... Fri, 04 Oct 2019 06:06:51 GMT ak19 [33550] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (added) * gs3-extensions/maori-lang-detection/conf/url-greylist-filter.txt (modified) First stage of introducing sites-too-big-to-exhaustively-crawl.tx: ... Fri, 04 Oct 2019 05:29:50 GMT ak19 [33549] * gs3-extensions/maori-lang-detection/ccrawl-data (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135334-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135335-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135533-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135534-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135731-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135732-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135930-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926135930-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926140130-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-39-wet-files/MAORI-CC-MAIN-2018-39-20190926140132-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927111950-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927111952-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112247-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112247-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112539-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112540-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112830-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927112832-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927113121-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-43-wet-files/MAORI-CC-MAIN-2018-43-20190927113122-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930134759-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930134801-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930135217-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930135218-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930135634-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930135637-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930140053-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930140056-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930140510-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-47-wet-files/MAORI-CC-MAIN-2018-47-20190930140512-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112358-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112358-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112629-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112631-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112900-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002112901-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002113130-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002113131-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002113401-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2018-51-wet-files/MAORI-CC-MAIN-2018-51-20191002113401-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085129-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085129-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085435-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085437-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085739-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924085740-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924090041-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924090044-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924090347-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-04-wet-files/MAORI-CC-MAIN-2019-04-20190924090348-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924031741-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924031742-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032031-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032034-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032319-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032319-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032606-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032607-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032851-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-09-wet-files/MAORI-CC-MAIN-2019-09-20190924032854-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923212744-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923212748-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923213222-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923213227-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923213659-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923213702-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923214137-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923214138-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923214614-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-13-wet-files/MAORI-CC-MAIN-2019-13-20190923214616-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923161945-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923161945-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162223-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162223-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162500-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162502-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162737-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923162739-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923163013-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-18-wet-files/MAORI-CC-MAIN-2019-18-20190923163015-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923094332-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923094332-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923094842-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923094845-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923095357-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923095358-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923095911-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923095912-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923100426-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-22-wet-files/MAORI-CC-MAIN-2019-22-20190923100427-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923035248-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923035249-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923035802-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923035802-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923040326-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923040331-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923040848-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923040849-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923041403-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-26-wet-files/MAORI-CC-MAIN-2019-26-20190923041404-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100139-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100141-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100451-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100453-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100805-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902100809-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902101119-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902101119-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902101429-000008.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-30-wet-files/MAORI-CC-2019-30-20190902101429-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102618-000000.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102618-000000.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102621-000001.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921102621-000001.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000002.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000002.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000003.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103116-000003.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103611-000004.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103611-000004.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103613-000005.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921103613-000005.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000006.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000006.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000007.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104105-000007.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104558-000009.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104558-000009.warc.wet.gz (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104559-000008.warc.wet (added) * gs3-extensions/maori-lang-detection/ccrawl-data/CC-MAIN-2019-35-wet-files/MAORI-CC-MAIN-2019-35-20190921104559-000008.warc.wet.gz (added) All the downloaded commoncrawl MRI warc.wet.gz data from Sep 2018 ... Fri, 04 Oct 2019 01:36:53 GMT davidb [33548] * gs3-extensions/web-audio/trunk/INSTALL.sh (modified) Include new wavesurfer sub-project to install Fri, 04 Oct 2019 01:30:47 GMT davidb [33547] * main/trunk/model-sites-dev/mars/collect/amc-essentia (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/ACTIVATE.sh (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/BUILDCOL.sh (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/IMPORT.sh (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/collectionConfig.xml (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/collectionConfig.xml.checkpt01 (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/fail.log (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/oai-inf.jdb (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/oai-inf.jdb.bak (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/etc/oai-inf.lg (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/amc-footer.html (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/amc-header.html (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/bg-search-submit-sml.gif (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/common110825.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/countries.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/ga.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/hoverIntent-r7.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/jquery-1.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/main160201.css (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/par_10_135w.jpg (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/simple-lightbox.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/simplelightbox.css (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/superfish-1.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/iframe/support/swfobject.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/import (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/aurora-js-0.4.4.tar.gz (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/aurora.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/aurora.js.map (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/mp3-js-0.1.0.tar.gz (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/mp3.js (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/js/audiocogs/mp3.js.map (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/perllib (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/perllib/plugins (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/perllib/plugins/AMCMetadataJSONPlugin.pm (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/transform (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/transform/layouts (added) * main/trunk/model-sites-dev/mars/collect/amc-essentia/transform/layouts/main.xsl (added) Initial cut at wavesurfer JS audio player version of AMC music ... Fri, 04 Oct 2019 01:19:51 GMT davidb [33546] * gs3-extensions/web-audio/trunk/wavesurfer (added) * gs3-extensions/web-audio/trunk/wavesurfer/INSTALL.sh (added) * gs3-extensions/web-audio/trunk/wavesurfer/css (added) * gs3-extensions/web-audio/trunk/wavesurfer/css/ribbon.css (added) * gs3-extensions/web-audio/trunk/wavesurfer/css/style.css (added) * gs3-extensions/web-audio/trunk/wavesurfer/devel (added) * gs3-extensions/web-audio/trunk/wavesurfer/devel/node-v10.16.3-darwin-x64.tar.gz (added) * gs3-extensions/web-audio/trunk/wavesurfer/src (added) * gs3-extensions/web-audio/trunk/wavesurfer/src/wavesurfer.js-2.2.1.tar.gz (added) * gs3-extensions/web-audio/trunk/wavesurfer/wavesurfer-player.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.cursor.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.cursor.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.cursor.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.cursor.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.elan.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.elan.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.elan.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.elan.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.mediasession.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.mediasession.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.mediasession.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.mediasession.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.microphone.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.microphone.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.microphone.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.microphone.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.minimap.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.minimap.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.minimap.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.minimap.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.regions.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.regions.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.regions.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.regions.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.spectrogram.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.spectrogram.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.spectrogram.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.spectrogram.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.timeline.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.timeline.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.timeline.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/plugin/wavesurfer.timeline.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer-html-init.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer-html-init.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer-html-init.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer-html-init.min.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer.js.map (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer.min.js (added) * gs3-extensions/web-audio/trunk/wavesurfer/ws/wavesurfer.min.js.map (added) Initial cut at wave-surfer based JS audio player extension for Greenstone Thu, 03 Oct 2019 09:38:00 GMT ak19 [33545] * gs3-extensions/maori-lang-detection/MoreReading/Vagrant-Spark-Hadoop.txt (modified) * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Mainly changes to crawling-Nutch.txt and some minor changes to other ... Thu, 03 Oct 2019 05:56:15 GMT ak19 [33544] * main/trunk/greenstone3/web/interfaces/default/js/facet-scripts.js (modified) * main/trunk/greenstone3/web/interfaces/default/js/utility_scripts.js (modified) * main/trunk/greenstone3/web/interfaces/default/transform/javascript-global-setup.xsl (modified) 1. Dr Bainbridge had the correct fix for solr dealing with phrase ... Wed, 02 Oct 2019 04:01:47 GMT ak19 [33543] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/vagrant-for-nutch2.tar.gz (modified) Filled in some missing instructions Wed, 02 Oct 2019 02:25:10 GMT kjdon [33542] * main/trunk/greenstone3/web/sites/localsite/collect/lucene-jdbm-demo/etc/collectionConfig.xml (modified) use_hlist_for option is no longer valid Tue, 01 Oct 2019 09:27:03 GMT ak19 [33541] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/patches/GZRangeClient.java (added) * gs3-extensions/maori-lang-detection/hdfs-cc-work/patches/WATExtractorOutput.java (added) 1. hdfs-cc-work/GS_README.txt now contains the complete instructions ... Tue, 01 Oct 2019 08:40:33 GMT ak19 [33540] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) Since I wasn't getting further with nutch 2 to grab an entire site, I ... Tue, 01 Oct 2019 08:36:38 GMT ak19 [33539] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (moved) File rename Tue, 01 Oct 2019 08:36:06 GMT ak19 [33538] * gs3-extensions/maori-lang-detection/hdfs-cc-work/Readme.txt (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/setup.sh (modified) Some additions to the setup.sh script to query commoncrawl for MRI ... Mon, 30 Sep 2019 09:51:36 GMT ak19 [33537] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) More nutch and general site mirroring related links