root/gs3-extensions

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @33457 [33457] 13 days ak19 Got stage 1, the WARC to WET conversion, working, after necessary …
(edit) @33456 [33456] 13 days ak19 Link to discussion on how to convert WARC to WET
(edit) @33448 [33448] 3 weeks ak19 Minor clarification and inclusion of helpful command
(edit) @33446 [33446] 3 weeks ak19 1. Committing working version of export_maori_subset.sh which takes the …
(edit) @33445 [33445] 3 weeks ak19 The first working hadoop spark script for processing common crawl data. …
(edit) @33443 [33443] 3 weeks ak19 More notes
(edit) @33442 [33442] 3 weeks ak19 Updated gutil.jar file (with SafeProcses? debugging)
(edit) @33441 [33441] 3 weeks ak19 Adding further notes to do with running the CC-index examples on spark.
(edit) @33440 [33440] 3 weeks ak19 Split file to move vagrant-spark-hadoop notes into own file.
(edit) @33428 [33428] 4 weeks ak19 Working commoncrawl cc-warc-examples' WET wordcount example using Hadoop. …
(edit) @33425 [33425] 5 weeks ak19 A few more links now that I got past getting the vagrant VM with spark and …
(edit) @33423 [33423] 5 weeks ak19 Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
(edit) @33422 [33422] 5 weeks ak19 Some more links.
(edit) @33419 [33419] 5 weeks ak19 Last evening, I had found some links about how language-detection is done …
(edit) @33414 [33414] 5 weeks ak19 Adding important links
(edit) @33413 [33413] 5 weeks ak19 Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
(edit) @33412 [33412] 5 weeks ak19 config command for wgetting a single file
(edit) @33411 [33411] 5 weeks ak19 Newer version now doesn't mirror sites with wget but gets WET files and …
(edit) @33410 [33410] 5 weeks ak19 Committing some variable name changes before I replace this file with the …
(edit) @33409 [33409] 5 weeks ak19 Forgot to commit 2 files with links and shuffling some links around into …
(edit) @33408 [33408] 5 weeks ak19 Some rough notes. Will move into appropriate file later.
(edit) @33407 [33407] 5 weeks ak19 gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting for …
(edit) @33405 [33405] 5 weeks ak19 Even though we're probably not going to use this code after all, will …
(edit) @33404 [33404] 5 weeks ak19 1. Links to other Java ways of extracting text from web content. 2. …
(edit) @33402 [33402] 5 weeks ak19 Beginnings of the Java class to wget sites and process its pages to detect …
(edit) @33401 [33401] 5 weeks ak19 MaoriTextDetector?.class file now generated inside its package folder (for …
(edit) @33400 [33400] 5 weeks ak19 1. Setting up log4j.properties based on the macronizer's basic one that I …
(edit) @33399 [33399] 5 weeks ak19 Putting properties files into the conf folder and keeping the lib folder …
(edit) @33398 [33398] 5 weeks ak19 Committing the actual package structure and the updated README after …
(edit) @33397 [33397] 5 weeks ak19 1. Changing package structure and instructions on compiling/running as …
(edit) @33396 [33396] 5 weeks ak19 Georgian language gs3colcfg module of GS interface. Many thanks to Vano …
(edit) @33394 [33394] 6 weeks ak19 1. Started a file on feasibility with the data now available and some …
(edit) @33393 [33393] 6 weeks ak19 Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls file …
(edit) @33392 [33392] 6 weeks ak19 Kathy found a problem whereby she wanted to run consecutive buildcols …
(edit) @33391 [33391] 6 weeks ak19 Some rough bash scripting lines that work but aren't complete.
(edit) @33390 [33390] 6 weeks ak19 Minor message telling the user to wait for a task that takes some time.
(edit) @33388 [33388] 6 weeks kjdon tidied up some debug statements
(edit) @33379 [33379] 7 weeks ak19 New script to automate getting a file listing of the common crawl URL data …
(edit) @33378 [33378] 7 weeks ak19 New bin/script folder and relocating gen_SentenceDetection_model.sh to …
(edit) @33377 [33377] 7 weeks ak19 Changes to get gen_SentenceDetection_model.sh to run still from the …
(edit) @33376 [33376] 7 weeks ak19 Links and extracts I've read so far on the Web Curator Tool (WCT), …
(edit) @33372 [33372] 7 weeks kjdon when writing out facets in buildConfig, need to get them from …
(edit) @33371 [33371] 7 weeks kjdon separate sort and facet fields as the former needs to be single valued and …
(edit) @33370 [33370] 7 weeks kjdon use the new get_or_create_shortname instead of create_shortname
(edit) @33368 [33368] 7 weeks kjdon sort fields cannot be multivalued. Facet fields need to be. SO have …
(edit) @33359 [33359] 8 weeks davidb solr needs to add shortnames to the fieldnamemap otherwise it won't know …
(edit) @33358 [33358] 8 weeks ak19 More minor changes to README
(edit) @33357 [33357] 8 weeks ak19 Minor changes
(edit) @33356 [33356] 8 weeks ak19 Updating script. Correction to a filepath different in the svn folder …
(edit) @33355 [33355] 8 weeks ak19 Changes for adding in the new gen_SentenceDetection_model.sh script, which …
(edit) @33350 [33350] 8 weeks ak19 Better comments. Tested macronised vs unmacronised Māori language test …
(edit) @33339 [33339] 2 months ak19 Updated README.
(edit) @33338 [33338] 2 months ak19 1.After renaming the java class, changed all occurrences of the old name …
(edit) @33337 [33337] 2 months ak19 Renaming the class to MaoriTextDetector?, since it doesn't detect audio …
(edit) @33336 [33336] 2 months ak19 Major rewrite to make this class more useful to callers. …
(edit) @33335 [33335] 2 months ak19 First java file for Māori language detection using openNLP with the …
(edit) @33330 [33330] 2 months ak19 Also rebuilt the solr demo collection with the changes to (solrbuilder …
(edit) @33327 [33327] 2 months ak19 In order to get map coordinate metadata stored correctly in solr, changes …
(edit) @33315 [33315] 2 months ak19 1. Bugfix to issue discovered on windows: when the GS3 server isn't …
(edit) @33307 [33307] 2 months kjdon updating solr.war to include my latest changes. TODO: does this war file …
(edit) @33306 [33306] 2 months kjdon we need to use (the new) level_ids list to determine which cores we are …
(edit) @33065 [33065] 4 months ak19 3 new Georgian language files added, 2 of which automatically generated …
(edit) @32891 [32891] 6 months davidb Additional error checking
(edit) @32890 [32890] 6 months davidb No longer use the OAIConfig file
(edit) @32889 [32889] 6 months davidb Some adjustments after testing
(edit) @32888 [32888] 6 months davidb Also want to check and untar cantoloupe in this PREPARE file
(edit) @32886 [32886] 6 months davidb Copy refactoring
(edit) @32885 [32885] 6 months davidb Now in main Greenstone resources/iiif area
(edit) @32884 [32884] 6 months davidb Edit to make more generic
(edit) @32883 [32883] 6 months davidb Code tidy up
(edit) @32878 [32878] 6 months davidb Changed to specify 'sites' as the path_prefix area within Greenstone, but …
(edit) @32877 [32877] 6 months davidb GSImageResource added in to the mix
(edit) @32876 [32876] 6 months davidb Tweaks to text
(edit) @32875 [32875] 6 months davidb info.json now works
(edit) @32874 [32874] 6 months davidb Next round of changes, migrating from OAI imprint to what is needed for …
(edit) @32867 [32867] 7 months davidb Some install notes
(edit) @32866 [32866] 7 months davidb Tweak to script names specified
(edit) @32865 [32865] 7 months davidb More detailed script name
(edit) @32864 [32864] 7 months davidb More detailed script name
(edit) @32863 [32863] 7 months davidb More detailed script name
(edit) @32862 [32862] 7 months davidb More detailed script name
(edit) @32861 [32861] 7 months davidb Improved echo prints
(edit) @32860 [32860] 7 months davidb Mostly code tidy-up. In IIIFServerBridge.java, edit to remove hard-wired …
(edit) @32859 [32859] 7 months davidb Mostly code tidy-up. In IIIFServerBridge.java, edit to remove hard-wired …
(edit) @32843 [32843] 7 months davidb Shift from OAI as a template, to separate IIIF based classes
(edit) @32842 [32842] 7 months davidb Shift from OAI as a template, to separate IIIF based classes
(edit) @32776 [32776] 7 months ak19 Final version before merging with GLI
(edit) @32740 [32740] 8 months ak19 Minor change before generating release testing binary
(edit) @32736 [32736] 8 months ak19 Added feature of deselecting shapes when the drawing tool is changed.
(edit) @32732 [32732] 8 months ak19 Work to eliminate a undo history bug, and fixed the selection issue.
(edit) @32731 [32731] 8 months ak19 Introduction of drop-down menu for choosing map theme.
(edit) @32730 [32730] 8 months ak19 Gives the map editor the ability to be displayed in different colour …
(edit) @32724 [32724] 8 months ak19 Potential fix for a history bug found after resizing rectangles and …
(edit) @32723 [32723] 8 months ak19 Complete redo and undo and fixed marker drawing bug
(edit) @32722 [32722] 8 months ak19 Redo and undo working
(edit) @32721 [32721] 8 months ak19 Fixed selection bug and added colour, thickness and opacity change to …
(edit) @32720 [32720] 9 months davidb Changes after testing. Now gets through the OAI request without …
(edit) @32715 [32715] 9 months ak19 Turning off EDT (event dispatch thread) violation check doesn't help if …
(edit) @32714 [32714] 9 months ak19 1. window (the main app windows of running GLI, or later GEMS) is now …
(edit) @32713 [32713] 9 months ak19 No longer passing around the FrameFixture? window object between the test …
Note: See TracRevisionLog for help on using the revision log.