# # ChangeLog for gs3-extensions # # Generated by Trac 1.4.2 # 2024-04-20T12:55:08+12:00 Tue, 17 Nov 2020 06:16:22 GMT ak19 [34542] * gs3-extensions/solr/trunk/src/collect/solr-jdbm-demo/resources/collectionConfig_de.properties (added) * main/trunk/greenstone3/web/sites/localsite/collect/lucene-jdbm-demo/resources/collectionConfig_de.properties (added) * main/trunk/greenstone3/web/sites/localsite/resources/siteConfig_de.properties (added) German language gs3colcfg module of GS interface. Many thanks to Nora ... Tue, 17 Nov 2020 06:10:52 GMT ak19 [34541] * gs3-extensions/solr/trunk/src/collect/solr-jdbm-demo/resources/collectionConfig_hr.properties (added) * main/trunk/greenstone3/web/sites/localsite/collect/lucene-jdbm-demo/resources/collectionConfig_hr.properties (added) * main/trunk/greenstone3/web/sites/localsite/resources/siteConfig_hr.properties (added) Croatian language gs3colcfg module of GS interface. Many thanks to ... Sun, 04 Oct 2020 03:44:06 GMT davidb [34427] * gs3-extensions/mars-src/trunk/bin/script/json_to_csv.py (added) Brought across from Essentia source, and preped for use from the ... Sun, 04 Oct 2020 03:42:57 GMT davidb [34426] * gs3-extensions/mars-src/trunk/jars (modified) Ignore downloaded zip and unziped dir Sun, 04 Oct 2020 03:41:42 GMT davidb [34425] * gs3-extensions/mars-src/trunk/GET-WEKA.sh (modified) More robust version; takes into account dir change from 'jar' to 'jars' Sun, 04 Oct 2020 03:39:24 GMT davidb [34424] * gs3-extensions/mars-src/trunk/jars (moved) Revised name for directory Sun, 04 Oct 2020 03:38:17 GMT davidb [34423] * gs3-extensions/mars-src/trunk/jar (added) * gs3-extensions/mars-src/trunk/jar/README.txt (added) * gs3-extensions/mars-src/trunk/jar/weka.jar (added) First cut at getting set up with Weka within Mars extension Sat, 03 Oct 2020 23:57:44 GMT davidb [34420] * gs3-extensions/mars-src/trunk/GET-WEKA.sh (added) Grab Weka via wget Sat, 03 Oct 2020 23:50:56 GMT davidb [34419] * gs3-extensions/mars-src/trunk/bin/script/essentia-extractor-all-mp3-with-profile.sh (added) * gs3-extensions/mars-src/trunk/bin/script/essentia-extractor-file-with-profile.sh (added) Version of script where an Essentia profile is also specified Thu, 17 Sep 2020 05:59:53 GMT davidb [34411] * gs3-extensions/mars-src/trunk/perllib/plugins/EssentiaPlugin.pm (modified) * gs3-extensions/mars-src/trunk/perllib/plugins/pEssentiaExtractor.pm (modified) Inclusion of HPCP calc Thu, 17 Sep 2020 05:59:15 GMT davidb [34410] * gs3-extensions/mars-src/trunk/bin/script/essentia-mfcc.py (added) Rough cut at something following in a similar suit to essestia- ... Thu, 17 Sep 2020 05:58:05 GMT davidb [34409] * gs3-extensions/mars-src/trunk/bin/script/essentia-hpcp.py (modified) Code tidyup Thu, 17 Sep 2020 05:45:26 GMT davidb [34408] * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE/WAVESURFER.sh (modified) Fine tuning of build script for WaveSurfer Thu, 17 Sep 2020 05:44:37 GMT davidb [34407] * gs3-extensions/mars-src/trunk/src/colormap-viridis (added) * gs3-extensions/mars-src/trunk/src/colormap-viridis/gen-viridis-colormap.js (added) * gs3-extensions/mars-src/trunk/src/colormap-viridis/package.json (added) * gs3-extensions/mars-src/trunk/src/colormap-viridis/viridis-colormap.json (added) NodeJS project to generate Viridis colormap as JSON file Thu, 17 Sep 2020 05:41:53 GMT davidb [34406] * gs3-extensions/mars-src/trunk/src/wavesurfer-player/wavesurfer-player.js (modified) Introduction of Viridis colormap Thu, 17 Sep 2020 05:40:43 GMT davidb [34405] * gs3-extensions/mars-src/trunk/src/wavesurfer-plugins (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-plugins/hpcp.js (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-plugins/spectrogram.js (added) Location for some bespoke plugin work to fits in with wavesurfer Tue, 15 Sep 2020 03:36:00 GMT davidb [34392] * gs3-extensions/mars-src/trunk/devel.bash (modified) Changed to default to python v3 Tue, 15 Sep 2020 03:35:37 GMT davidb [34391] * gs3-extensions/mars-src/trunk/CREATE-VENV-PYTHON2.sh (modified) * gs3-extensions/mars-src/trunk/CREATE-VENV-PYTHON3.sh (modified) More careful control over the creation of python venvs Tue, 15 Sep 2020 03:35:34 GMT davidb [34390] * gs3-extensions/mars-src/trunk/devel/virtualenv-1.10.tar.gz (moved) More logical folder for this to be in Tue, 15 Sep 2020 03:17:30 GMT davidb [34389] * gs3-extensions/mars-src/trunk/bin/script/essentia-hpcp.py (added) First cut at script to produce a borderless HPCP images of audio file Tue, 15 Sep 2020 03:14:05 GMT davidb [34388] * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/ESSENTIA.sh (modified) Work with virtual-env if present; assume python to use is on path Tue, 15 Sep 2020 03:13:22 GMT davidb [34387] * gs3-extensions/mars-src/trunk/devel.bash (modified) * gs3-extensions/mars-src/trunk/devel/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE.sh (modified) Some refinement of the development setup scripts Tue, 15 Sep 2020 00:55:41 GMT davidb [34386] * gs3-extensions/mars-src/trunk/devel/CASCADE-MAKE (moved) Fixed typo in directory name Tue, 15 Sep 2020 00:54:59 GMT davidb [34385] * gs3-extensions/mars-src/trunk/devel/CASCADE-MAKE.sh (added) Better location for these development/compile tools Tue, 15 Sep 2020 00:53:50 GMT davidb [34384] * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (modified) Better location for these development/compile tools Tue, 15 Sep 2020 00:53:34 GMT davidb [34383] * gs3-extensions/mars-src/trunk/CASCADE-MAKE.sh (modified) Better location for these development/compile tools Tue, 15 Sep 2020 00:53:19 GMT davidb [34382] * gs3-extensions/mars-src/trunk/devel/CASCAKE-MAKE/CMAKE.sh (moved) * gs3-extensions/mars-src/trunk/devel/CASCAKE-MAKE/NODEJS.sh (moved) * gs3-extensions/mars-src/trunk/devel/cmake-3.16.5.tar.gz (moved) * gs3-extensions/mars-src/trunk/devel/node-v12.18.3.tar.gz (moved) Better location for these development/compile tools Tue, 15 Sep 2020 00:48:57 GMT davidb [34381] * gs3-extensions/mars-src/trunk/devel/CASCAKE-MAKE (added) Area for development compilation tools such as cmake and nodejs Tue, 15 Sep 2020 00:48:13 GMT davidb [34380] * gs3-extensions/mars-src/trunk/devel (added) Area for development compilation tools such as cmake and nodejs Mon, 14 Sep 2020 06:37:54 GMT davidb [34379] * gs3-extensions/mars-src/trunk/SETUP.sh (modified) Some further refinement of what to print out, after some initial testing Mon, 14 Sep 2020 03:52:34 GMT davidb [34378] * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE/WAVESURFER.sh (modified) No longer need the JSON file copied into the web/ext/audio area Mon, 14 Sep 2020 03:51:30 GMT davidb [34377] * gs3-extensions/mars-src/trunk/SETUP.sh (modified) Better placement and document of what to do with this file Mon, 14 Sep 2020 03:50:05 GMT davidb [34375] * gs3-extensions/mars-src/trunk/src/wavesurfer-player/wavesurfer-player.js (modified) Introductions of spectrogram visualization Mon, 14 Sep 2020 03:49:26 GMT davidb [34374] * gs3-extensions/mars-src/trunk/packages/node-v12.18.3.tar.gz (added) Used to build the wavesurfer-js code from source Mon, 14 Sep 2020 03:25:58 GMT davidb [34373] * gs3-extensions/mars-src/trunk/src/colormap/hot-colormap.json (added) The result of running gen-heatmap.js Mon, 14 Sep 2020 03:24:11 GMT davidb [34372] * gs3-extensions/mars-src/trunk/src/colormap (added) * gs3-extensions/mars-src/trunk/src/colormap/gen-heatmap.js (added) * gs3-extensions/mars-src/trunk/src/colormap/package.json (added) NodeJS code to generate a JSON heatmap to be used with WaveSurferJS Mon, 14 Sep 2020 02:49:54 GMT davidb [34371] * gs3-extensions/mars-src/trunk/SETUP.sh (added) Top-level scripting and checks so CLI is ready to operate with the ... Mon, 14 Sep 2020 02:48:17 GMT davidb [34370] * gs3-extensions/mars-src/trunk/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/src (added) * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE (added) * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE.sh (added) * gs3-extensions/mars-src/trunk/src/CASCADE-MAKE/WAVESURFER.sh (added) * gs3-extensions/mars-src/trunk/src/GET-WAVESURFER.sh (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-player (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-player/css (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-player/css/ribbon.css (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-player/css/style.css (added) * gs3-extensions/mars-src/trunk/src/wavesurfer-player/wavesurfer-player.js (added) * gs3-extensions/mars-src/trunk/src/wavesurfer.js-4.0.1.tar.gz (added) WaveSurfer-JS source files and top-up player Mon, 14 Sep 2020 01:47:05 GMT davidb [34369] * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/NODEJS.sh (added) Adding in NodeJS to compilation sequence, so wavesurfer-js can be ... Mon, 14 Sep 2020 01:46:22 GMT davidb [34368] * gs3-extensions/mars-src/trunk/CASCADE-COMPILE-MANUAL.sh (deleted) * gs3-extensions/mars-src/trunk/packages/CASCADE-COMPILE-MANUAL.sh (deleted) No longer needed Sun, 13 Sep 2020 23:29:43 GMT davidb [34367] * gs3-extensions/mars-src/trunk/perllib/plugins/AMCMetadataJSONPlugin.pm (modified) Now supports https URLs as well Fri, 11 Sep 2020 06:06:53 GMT davidb [34362] * gs3-extensions/mars-src/trunk/README.txt (added) First rough cut at some notes Fri, 11 Sep 2020 06:03:36 GMT davidb [34361] * gs3-extensions/mars-src/trunk/bin/script/pessentia.sh (added) Collating of python essensia custom scripts and essentia perl plugin ... Fri, 11 Sep 2020 06:02:49 GMT davidb [34360] * gs3-extensions/mars-src/trunk/lib (added) * gs3-extensions/mars-src/trunk/lib/python (added) * gs3-extensions/mars-src/trunk/lib/python/pessentia.py (added) * gs3-extensions/mars-src/trunk/perllib/plugins/EssentiaPlugin.pm (added) * gs3-extensions/mars-src/trunk/perllib/plugins/pEssentiaExtractor.pm (added) Collating of python essensia custom scripts and essentia perl plugin code Fri, 11 Sep 2020 06:00:06 GMT davidb [34359] * gs3-extensions/mars-src/trunk/__setup.bat (moved) Needs to be updated to be brought back into line with setup.bash Fri, 11 Sep 2020 05:59:23 GMT davidb [34358] * gs3-extensions/mars-src/trunk/setup.bash (modified) Changed to be a Greenstone3 extension Fri, 11 Sep 2020 04:33:47 GMT davidb [34356] * gs3-extensions/mars-src/trunk/perllib (added) * gs3-extensions/mars-src/trunk/perllib/plugins (added) * gs3-extensions/mars-src/trunk/perllib/plugins/AMCMetadataJSONPlugin.pm (added) Some initial work computing essensia audio features when the ... Fri, 11 Sep 2020 04:30:00 GMT davidb [34355] * gs3-extensions/mars-src/trunk/bin (added) * gs3-extensions/mars-src/trunk/bin/script (added) * gs3-extensions/mars-src/trunk/bin/script/essentia-extractor-all-mp3.sh (added) * gs3-extensions/mars-src/trunk/bin/script/essentia-extractor-file.sh (added) Scripts for processing audio files can extracting audio features for ML Fri, 11 Sep 2020 03:45:52 GMT davidb [34354] * gs3-extensions/mars-src/trunk/GET-ESSENTIA.sh (added) Script to checkout/clone essentia from its git-hub repository Fri, 11 Sep 2020 03:45:17 GMT davidb [34353] * gs3-extensions/mars-src/trunk/packages/virtualenv-1.10.tar.gz (added) Useful in combo with a python2 to create a virtualenv python2 under ... Fri, 11 Sep 2020 03:39:33 GMT davidb [34349] * gs3-extensions/mars-src/trunk/CREATE-VENV-PYTHON2.sh (added) * gs3-extensions/mars-src/trunk/CREATE-VENV-PYTHON3.sh (added) Used to stand up a version of python where extra pip packages have ... Fri, 11 Sep 2020 03:26:53 GMT davidb [34348] * gs3-extensions/mars-src/trunk/packages/essentia-full-git.tar.gz (added) Adding in Essential source code to go along with compile scripts Fri, 11 Sep 2020 03:25:33 GMT davidb [34347] * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/ESSENTIA.sh (added) Adding in Essential compile scripts Fri, 11 Sep 2020 03:25:00 GMT davidb [34346] * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/EIGEN3.sh (modified) Further dir that needs to be installed as a header file area Fri, 11 Sep 2020 03:24:19 GMT davidb [34345] * gs3-extensions/mars-src/trunk/devel.bash (modified) Already done in setup.bash Fri, 11 Sep 2020 02:17:06 GMT davidb [34344] * gs3-extensions/mars-src/trunk/devel.bash (modified) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (modified) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/EIGEN3.sh (added) * gs3-extensions/mars-src/trunk/packages/eigen-3.3.7.tar.gz (added) Extended to now setup/install Eigen3 Fri, 11 Sep 2020 01:52:38 GMT davidb [34343] * gs3-extensions/mars-src/trunk/CASCADE-MAKE.sh (modified) Tweak to sourcing file Fri, 11 Sep 2020 01:52:19 GMT davidb [34342] * gs3-extensions/mars-src/trunk/setup.bash (modified) Added block to set GSDLOS Thu, 10 Sep 2020 23:39:17 GMT davidb [34341] * gs3-extensions/mars-src/trunk/CASCADE-MAKE.sh (added) * gs3-extensions/mars-src/trunk/devel.bash (added) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE (added) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE.sh (added) * gs3-extensions/mars-src/trunk/packages/CASCADE-MAKE/CMAKE.sh (added) Shift to using cascade-make Thu, 10 Sep 2020 23:06:39 GMT davidb [34340] * gs3-extensions/mars-src/trunk (modified) Added in cascade-make as an external property Thu, 10 Sep 2020 22:49:28 GMT davidb [34339] * gs3-extensions/mars-src/trunk/CASCADE-COMPILE-MANUAL.sh (added) * gs3-extensions/mars-src/trunk/packages (added) * gs3-extensions/mars-src/trunk/packages/CASCADE-COMPILE-MANUAL.sh (added) * gs3-extensions/mars-src/trunk/packages/cmake-3.16.5.tar.gz (added) * gs3-extensions/mars-src/trunk/setup.bash (added) * gs3-extensions/mars-src/trunk/setup.bat (added) Some initial files to compile up essentia, used in the Mars extension ... Sat, 13 Jun 2020 08:52:59 GMT ak19 [34166] * gs3-extensions/solr/trunk/src/collect/solr-jdbm-demo/resources/collectionConfig_it.properties (added) * main/trunk/greenstone3/web/sites/localsite/collect/lucene-jdbm-demo/resources/collectionConfig_it.properties (added) * main/trunk/greenstone3/web/sites/localsite/resources/siteConfig_it.properties (added) Adding Italian language translations of the gs3colcfg module. Many ... Fri, 06 Mar 2020 02:55:44 GMT davidb [33997] * gs3-extensions/mars-src (added) * gs3-extensions/mars-src/trunk (added) Top-level folder for MARS related Greenstone3 code Mon, 02 Dec 2019 00:54:11 GMT kjdon [33736] * gs3-extensions/solr/trunk/src/perllib/solrbuilder.pm (modified) fixed a spelling mistake Sun, 10 Nov 2019 20:38:55 GMT ak19 [33635] * other-projects/maori-lang-detection (moved) Maori-language-detection doesn't use Greenstone 3 at present, it's ... Fri, 08 Nov 2019 10:59:07 GMT ak19 [33634] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/SentenceInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java (added) Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which ... Fri, 08 Nov 2019 06:43:39 GMT ak19 [33633] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToCSV.java (moved) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) 1. TextLanguageDetector now has methods for collecting all sentences ... Tue, 05 Nov 2019 08:59:46 GMT ak19 [33626] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) TODOs Tue, 05 Nov 2019 08:58:44 GMT ak19 [33625] * gs3-extensions/maori-lang-detection/conf/keep-since-not-product-sites.txt (added) * gs3-extensions/maori-lang-detection/conf/possible-product-sites.txt (added) A file listing domains with seedurls containing /mi(/) that are ... Tue, 05 Nov 2019 08:48:50 GMT ak19 [33624] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) Some cleanup surrounding the now renamed function createSeedURLsFile, ... Tue, 05 Nov 2019 08:04:09 GMT ak19 [33623] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) 1. Incorporated Dr Nichols earlier suggestion of storing page ... Tue, 05 Nov 2019 02:42:46 GMT ak19 [33622] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBAccess.java (moved) File rename Mon, 04 Nov 2019 07:35:59 GMT ak19 [33621] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) Comitting jotted down mongodb related instructions from what Dr ... Mon, 04 Nov 2019 01:24:25 GMT ak19 [33620] * gs3-extensions/maori-lang-detection/crawledNode6.tar (added) Final crawl, done on vagrant VM node6. Crawl site IDs 01407-01462. Fri, 01 Nov 2019 07:14:18 GMT ak19 [33618] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) Adding in the download URL Fri, 01 Nov 2019 04:13:18 GMT ak19 [33617] * gs3-extensions/maori-lang-detection/crawledNode5.tar (modified) Node5 is now full and here is the finished crawl (up to and including ... Thu, 31 Oct 2019 07:05:07 GMT ak19 [33616] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MongoDBConnection.java (added) Beginnings of Java class that is to interact with MongoDB. I don't ... Thu, 31 Oct 2019 07:03:55 GMT ak19 [33615] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/config.properties (modified) * gs3-extensions/maori-lang-detection/conf/log4j.properties (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WETProcessor.java (modified) 1. Worked out how to configure log4j to log both to console and ... Wed, 30 Oct 2019 10:03:19 GMT ak19 [33609] * gs3-extensions/maori-lang-detection/crawledNode2.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode3.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode4.tar (moved) * gs3-extensions/maori-lang-detection/crawledNode5.tar (added) The tar files containing the crawled sites data shouldn't be called ... Wed, 30 Oct 2019 10:02:26 GMT ak19 [33608] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/batchcrawl.sh (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/scripts/exportHBase.sh (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) 1. New script to export from HBase so that we could in theory ... Tue, 29 Oct 2019 05:33:49 GMT ak19 [33607] * gs3-extensions/maori-lang-detection/crawledNode4.tar.gz (modified) Updated with the remaining successfully crawled sites on node4 before ... Tue, 29 Oct 2019 02:18:51 GMT ak19 [33606] * gs3-extensions/maori-lang-detection/crawledNode2.tar.gz (moved) * gs3-extensions/maori-lang-detection/crawledNode3.tar.gz (added) 1. Committing crawl data from node3 (2nd VM for nutch crawling). 2. ... Tue, 29 Oct 2019 01:54:24 GMT ak19 [33605] * gs3-extensions/maori-lang-detection/crawledNode4.tar.gz (added) Node 4 VM still works, but committing first set of crawled sites on there Thu, 24 Oct 2019 10:22:30 GMT ak19 [33604] * gs3-extensions/maori-lang-detection/conf/sites-too-big-to-exhaustively-crawl.txt (modified) * gs3-extensions/maori-lang-detection/conf/url-whitelist-filter.txt (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) 1. Better output into possible-product-sites.txt including the ... Thu, 24 Oct 2019 09:04:37 GMT ak19 [33603] * gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt (modified) * gs3-extensions/maori-lang-detection/conf/GeoLiteCity.dat (added) * gs3-extensions/maori-lang-detection/lib/geoip-api-1.2.10.jar (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/Utility.java (modified) Incorporating Dr Nichols suggestion to help weed out product sites: ... Wed, 23 Oct 2019 10:49:34 GMT ak19 [33602] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) 1. The final csv file, mri-sentences.csv, is now written out. 2. Only ... Wed, 23 Oct 2019 10:22:14 GMT ak19 [33601] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) Creates the 2nd csv file, with info about webpages. At present stores ... Wed, 23 Oct 2019 10:05:38 GMT ak19 [33600] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) Work in progress of writing out CSV files. In future, may write the ... Tue, 22 Oct 2019 07:49:48 GMT ak19 [33599] * gs3-extensions/maori-lang-detection/crawled-1-of-3.tar.gz (added) First one-third sites crawled. Committing to SVN despite the tarred ... Tue, 22 Oct 2019 07:19:54 GMT ak19 [33598] * gs3-extensions/maori-lang-detection/hdfs-cc-work/GS_README.TXT (modified) * gs3-extensions/maori-lang-detection/hdfs-cc-work/vagrant-for-nutch2.tar.gz (modified) More instructions on setting up Nutch now that I've remembered to ... Tue, 22 Oct 2019 07:05:50 GMT ak19 [33597] * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (modified) Committing active version of template file which has a newline at end ... Tue, 22 Oct 2019 05:44:05 GMT ak19 [33596] * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/nutch-site.xml (added) * gs3-extensions/maori-lang-detection/hdfs-cc-work/conf/regex-urlfilter.GS_TEMPLATE (added) Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template ... Fri, 18 Oct 2019 10:20:09 GMT ak19 [33588] * gs3-extensions/maori-lang-detection/models-trainingdata-and-sampletxts/mri-sent_trained.bin (modified) Committing the MRI sentence model that I'm actually using, the one in ... Fri, 18 Oct 2019 10:16:25 GMT ak19 [33587] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (modified) 1. Better stats reporting on crawled sites: not just if a page was in ... Fri, 18 Oct 2019 09:20:06 GMT ak19 [33586] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java (added) Refactored MaoriTextDetector.java class into more general ... Fri, 18 Oct 2019 08:41:32 GMT ak19 [33585] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Much simpler way of using sentence and language detection model to ... Fri, 18 Oct 2019 08:20:39 GMT ak19 [33584] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Committing experimental version 2 using the sentence detector model, ... Fri, 18 Oct 2019 08:20:18 GMT ak19 [33583] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java (modified) Committing experimental version 1 using the sentence detector model, ... Thu, 17 Oct 2019 10:12:38 GMT ak19 [33582] * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/CCWETProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java (added) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpProcessor.java (modified) * gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextDumpPage.java (modified) NutchTextDumpProcessor prints each crawled site's stats: number of ... Thu, 17 Oct 2019 08:53:20 GMT ak19 [33581] * gs3-extensions/maori-lang-detection/bin/script/gen_SentenceDetection_model.sh (modified) Minor fix. Noticed when looking for work I did on MRI sentence detection