root/gs3-extensions

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @34392 [34392] 8 days davidb Changed to default to python v3
(edit) @34391 [34391] 8 days davidb More careful control over the creation of python venvs
(edit) @34390 [34390] 8 days davidb More logical folder for this to be in
(edit) @34389 [34389] 8 days davidb First cut at script to produce a borderless HPCP images of audio file
(edit) @34388 [34388] 8 days davidb Work with virtual-env if present; assume python to use is on path
(edit) @34387 [34387] 8 days davidb Some refinement of the development setup scripts
(edit) @34386 [34386] 8 days davidb Fixed typo in directory name
(edit) @34385 [34385] 8 days davidb Better location for these development/compile tools
(edit) @34384 [34384] 8 days davidb Better location for these development/compile tools
(edit) @34383 [34383] 8 days davidb Better location for these development/compile tools
(edit) @34382 [34382] 8 days davidb Better location for these development/compile tools
(edit) @34381 [34381] 8 days davidb Area for development compilation tools such as cmake and nodejs
(edit) @34380 [34380] 8 days davidb Area for development compilation tools such as cmake and nodejs
(edit) @34379 [34379] 9 days davidb Some further refinement of what to print out, after some initial testing
(edit) @34378 [34378] 9 days davidb No longer need the JSON file copied into the web/ext/audio area
(edit) @34377 [34377] 9 days davidb Better placement and document of what to do with this file
(edit) @34375 [34375] 9 days davidb Introductions of spectrogram visualization
(edit) @34374 [34374] 9 days davidb Used to build the wavesurfer-js code from source
(edit) @34373 [34373] 9 days davidb The result of running gen-heatmap.js
(edit) @34372 [34372] 9 days davidb NodeJS code to generate a JSON heatmap to be used with WaveSurferJS
(edit) @34371 [34371] 9 days davidb Top-level scripting and checks so CLI is ready to operate with the MARS …
(edit) @34370 [34370] 9 days davidb WaveSurfer?-JS source files and top-up player
(edit) @34369 [34369] 9 days davidb Adding in NodeJS to compilation sequence, so wavesurfer-js can be built …
(edit) @34368 [34368] 9 days davidb No longer needed
(edit) @34367 [34367] 9 days davidb Now supports https URLs as well
(edit) @34362 [34362] 12 days davidb First rough cut at some notes
(edit) @34361 [34361] 12 days davidb Collating of python essensia custom scripts and essentia perl plugin code …
(edit) @34360 [34360] 12 days davidb Collating of python essensia custom scripts and essentia perl plugin code
(edit) @34359 [34359] 12 days davidb Needs to be updated to be brought back into line with setup.bash
(edit) @34358 [34358] 12 days davidb Changed to be a Greenstone3 extension
(edit) @34356 [34356] 12 days davidb Some initial work computing essensia audio features when the collection is …
(edit) @34355 [34355] 12 days davidb Scripts for processing audio files can extracting audio features for ML
(edit) @34354 [34354] 12 days davidb Script to checkout/clone essentia from its git-hub repository
(edit) @34353 [34353] 12 days davidb Useful in combo with a python2 to create a virtualenv python2 under user …
(edit) @34349 [34349] 12 days davidb Used to stand up a version of python where extra pip packages have been …
(edit) @34348 [34348] 12 days davidb Adding in Essential source code to go along with compile scripts
(edit) @34347 [34347] 12 days davidb Adding in Essential compile scripts
(edit) @34346 [34346] 12 days davidb Further dir that needs to be installed as a header file area
(edit) @34345 [34345] 12 days davidb Already done in setup.bash
(edit) @34344 [34344] 12 days davidb Extended to now setup/install Eigen3
(edit) @34343 [34343] 12 days davidb Tweak to sourcing file
(edit) @34342 [34342] 12 days davidb Added block to set GSDLOS
(edit) @34341 [34341] 12 days davidb Shift to using cascade-make
(edit) @34340 [34340] 12 days davidb Added in cascade-make as an external property
(edit) @34339 [34339] 12 days davidb Some initial files to compile up essentia, used in the Mars extension to …
(edit) @34166 [34166] 3 months ak19 Adding Italian language translations of the gs3colcfg module. Many thanks …
(edit) @33997 [33997] 7 months davidb Top-level folder for MARS related Greenstone3 code
(edit) @33736 [33736] 10 months kjdon fixed a spelling mistake
(edit) @33635 [33635] 11 months ak19 Maori-language-detection doesn't use Greenstone 3 at present, it's not a …
(edit) @33634 [33634] 11 months ak19 Rewrote NutchTextDumpProcessor? as NutchTextDumpToMongoDB.java, which uses …
(edit) @33633 [33633] 11 months ak19 1. TextLanguageDetector? now has methods for collecting all sentences and …
(edit) @33626 [33626] 11 months ak19 TODOs
(edit) @33625 [33625] 11 months ak19 A file listing domains with seedurls containing /mi(/) that are located …
(edit) @33624 [33624] 11 months ak19 Some cleanup surrounding the now renamed function createSeedURLsFile, now …
(edit) @33623 [33623] 11 months ak19 1. Incorporated Dr Nichols earlier suggestion of storing page modified …
(edit) @33622 [33622] 11 months ak19 File rename
(edit) @33621 [33621] 11 months ak19 Comitting jotted down mongodb related instructions from what Dr Bainbridge …
(edit) @33620 [33620] 11 months ak19 Final crawl, done on vagrant VM node6. Crawl site IDs 01407-01462.
(edit) @33618 [33618] 11 months ak19 Adding in the download URL
(edit) @33617 [33617] 11 months ak19 Node5 is now full and here is the finished crawl (up to and including site …
(edit) @33616 [33616] 11 months ak19 Beginnings of Java class that is to interact with MongoDB. I don't yet …
(edit) @33615 [33615] 11 months ak19 1. Worked out how to configure log4j to log both to console and logfile, …
(edit) @33609 [33609] 11 months ak19 The tar files containing the crawled sites data shouldn't be called tar.gz …
(edit) @33608 [33608] 11 months ak19 1. New script to export from HBase so that we could in theory reimport …
(edit) @33607 [33607] 11 months ak19 Updated with the remaining successfully crawled sites on node4 before …
(edit) @33606 [33606] 11 months ak19 1. Committing crawl data from node3 (2nd VM for nutch crawling). 2. …
(edit) @33605 [33605] 11 months ak19 Node 4 VM still works, but committing first set of crawled sites on there
(edit) @33604 [33604] 11 months ak19 1. Better output into possible-product-sites.txt including the overseas …
(edit) @33603 [33603] 11 months ak19 Incorporating Dr Nichols suggestion to help weed out product sites: if tld …
(edit) @33602 [33602] 11 months ak19 1. The final csv file, mri-sentences.csv, is now written out. 2. Only …
(edit) @33601 [33601] 11 months ak19 Creates the 2nd csv file, with info about webpages. At present stores …
(edit) @33600 [33600] 11 months ak19 Work in progress of writing out CSV files. In future, may write the same …
(edit) @33599 [33599] 11 months ak19 First one-third sites crawled. Committing to SVN despite the tarred …
(edit) @33598 [33598] 11 months ak19 More instructions on setting up Nutch now that I've remembered to commit …
(edit) @33597 [33597] 11 months ak19 Committing active version of template file which has a newline at end of …
(edit) @33596 [33596] 11 months ak19 Adding in the nutch-site.xml and regex-urlfilter.GS_TEMPLATE template file …
(edit) @33588 [33588] 11 months ak19 Committing the MRI sentence model that I'm actually using, the one in my …
(edit) @33587 [33587] 11 months ak19 1. Better stats reporting on crawled sites: not just if a page was in MRI …
(edit) @33586 [33586] 11 months ak19 Refactored MaoriTextDetector?.java class into more general …
(edit) @33585 [33585] 11 months ak19 Much simpler way of using sentence and language detection model to work on …
(edit) @33584 [33584] 11 months ak19 Committing experimental version 2 using the sentence detector model, …
(edit) @33583 [33583] 11 months ak19 Committing experimental version 1 using the sentence detector model, …
(edit) @33582 [33582] 11 months ak19 NutchTextDumpProcessor? prints each crawled site's stats: number of …
(edit) @33581 [33581] 11 months ak19 Minor fix. Noticed when looking for work I did on MRI sentence detection
(edit) @33580 [33580] 11 months ak19 Finally fixed the thus-far identified bugs when parsing dump.txt.
(edit) @33579 [33579] 11 months ak19 Debugging. Solved one problem.
(edit) @33578 [33578] 11 months ak19 Corrections for compiling the 2 new classes.
(edit) @33577 [33577] 11 months ak19 Forgot to adjust usage statement to say that silent mode was already …
(edit) @33576 [33576] 11 months ak19 Introducing 2 new Java files still being written and untested. …
(edit) @33575 [33575] 11 months ak19 Correcting usage string for CCWETProcessor before committing new java …
(edit) @33574 [33574] 11 months ak19 If nutch stores a crawled site in more than 1 file, then cat all of them …
(edit) @33573 [33573] 11 months ak19 Forgot to document that spaces were also allowed as separator in the input …
(edit) @33572 [33572] 11 months ak19 Only meant to store the wet.gz versions of these files, not also the …
(edit) @33571 [33571] 11 months ak19 Adding Dr Bainbridge's suggestion of appending the crawlId of each site to …
(edit) @33570 [33570] 11 months ak19 Need to check if UNFINISHED file actually exists before moving it across …
(edit) @33569 [33569] 11 months ak19 1. batchcrawl.sh now does what it should have from the start, which is to …
(edit) @33568 [33568] 11 months ak19 1. More sites greylisted and blacklisted, discovered as I attempted to …
(edit) @33567 [33567] 11 months ak19 batchcrawl.sh now supports -all flag (and prints usage on 0 args). The …
(edit) @33566 [33566] 11 months ak19 batchcrawl.sh script now supports taking a comma or space separated list …
(edit) @33565 [33565] 11 months ak19 CCWETProcessor: domain url now goes in as a seedURL after the individual …
Note: See TracRevisionLog for help on using the revision log.