source: gs3-extensions

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33480   5 years ak19 Much harder to remove pages where words are fused together as some are …
(edit) @33471   5 years ak19 Very minor changes.
(edit) @33470   5 years ak19 A new script to reduce keepURLs.txt to unique URLs, 1 from each unique …
(edit) @33469   5 years ak19 Don't want URLs with the word product(s) in them (but production …
(edit) @33468   5 years ak19 More meaningful to (also) write out the keep vs discard URLs into keep …
(edit) @33467   5 years ak19 Improved the code to use a static block to load the needed properties …
(edit) @33466   5 years ak19 1. WETProcessor.main() now processes a folder of *.warc.wet(.gz) …
(edit) @33465   5 years ak19 Committing first version of the WETProcessor.java which takes a …
(edit) @33457   5 years ak19 Got stage 1, the WARC to WET conversion, working, after necessary …
(edit) @33456   5 years ak19 Link to discussion on how to convert WARC to WET
(edit) @33448   5 years ak19 Minor clarification and inclusion of helpful command
(edit) @33446   5 years ak19 1. Committing working version of export_maori_subset.sh which takes …
(edit) @33445   5 years ak19 The first working hadoop spark script for processing common crawl …
(edit) @33443   5 years ak19 More notes
(edit) @33442   5 years ak19 Updated gutil.jar file (with SafeProcses debugging)
(edit) @33441   5 years ak19 Adding further notes to do with running the CC-index examples on spark.
(edit) @33440   5 years ak19 Split file to move vagrant-spark-hadoop notes into own file.
(edit) @33428   5 years ak19 Working commoncrawl cc-warc-examples' WET wordcount example using …
(edit) @33425   5 years ak19 A few more links now that I got past getting the vagrant VM with spark …
(edit) @33423   5 years ak19 Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
(edit) @33422   5 years ak19 Some more links.
(edit) @33419   5 years ak19 Last evening, I had found some links about how language-detection is …
(edit) @33414   5 years ak19 Adding important links
(edit) @33413   5 years ak19 Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
(edit) @33412   5 years ak19 config command for wgetting a single file
(edit) @33411   5 years ak19 Newer version now doesn't mirror sites with wget but gets WET files …
(edit) @33410   5 years ak19 Committing some variable name changes before I replace this file with …
(edit) @33409   5 years ak19 Forgot to commit 2 files with links and shuffling some links around …
(edit) @33408   5 years ak19 Some rough notes. Will move into appropriate file later.
(edit) @33407   5 years ak19 gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
(edit) @33405   5 years ak19 Even though we're probably not going to use this code after all, will …
(edit) @33404   5 years ak19 1. Links to other Java ways of extracting text from web content. 2. …
(edit) @33402   5 years ak19 Beginnings of the Java class to wget sites and process its pages to …
(edit) @33401   5 years ak19 MaoriTextDetector.class file now generated inside its package folder …
(edit) @33400   5 years ak19 1. Setting up log4j.properties based on the macronizer's basic one …
(edit) @33399   5 years ak19 Putting properties files into the conf folder and keeping the lib …
(edit) @33398   5 years ak19 Committing the actual package structure and the updated README after …
(edit) @33397   5 years ak19 1. Changing package structure and instructions on compiling/running as …
(edit) @33396   5 years ak19 Georgian language gs3colcfg module of GS interface. Many thanks to …
(edit) @33394   5 years ak19 1. Started a file on feasibility with the data now available and some …
(edit) @33393   5 years ak19 Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …
(edit) @33392   5 years ak19 Kathy found a problem whereby she wanted to run consecutive buildcols …
(edit) @33391   5 years ak19 Some rough bash scripting lines that work but aren't complete.
(edit) @33390   5 years ak19 Minor message telling the user to wait for a task that takes some time.
(edit) @33388   5 years kjdon tidied up some debug statements
(edit) @33379   5 years ak19 New script to automate getting a file listing of the common crawl URL …
(edit) @33378   5 years ak19 New bin/script folder and relocating gen_SentenceDetection_model.sh to …
(edit) @33377   5 years ak19 Changes to get gen_SentenceDetection_model.sh to run still from the …
(edit) @33376   5 years ak19 Links and extracts I've read so far on the Web Curator Tool (WCT), …
(edit) @33372   5 years kjdon when writing out facets in buildConfig, need to get them from …
(edit) @33371   5 years kjdon separate sort and facet fields as the former needs to be single valued …
(edit) @33370   5 years kjdon use the new get_or_create_shortname instead of create_shortname
(edit) @33368   5 years kjdon sort fields cannot be multivalued. Facet fields need to be. SO have …
(edit) @33359   5 years davidb solr needs to add shortnames to the fieldnamemap otherwise it won't …
(edit) @33358   5 years ak19 More minor changes to README
(edit) @33357   5 years ak19 Minor changes
(edit) @33356   5 years ak19 Updating script. Correction to a filepath different in the svn folder …
(edit) @33355   5 years ak19 Changes for adding in the new gen_SentenceDetection_model.sh script, …
(edit) @33350   5 years ak19 Better comments. Tested macronised vs unmacronised Māori language test …
(edit) @33339   5 years ak19 Updated README.
(edit) @33338   5 years ak19 1.After renaming the java class, changed all occurrences of the old …
(edit) @33337   5 years ak19 Renaming the class to MaoriTextDetector, since it doesn't detect audio …
(edit) @33336   5 years ak19 Major rewrite to make this class more useful to callers. …
(edit) @33335   5 years ak19 First java file for Māori language detection using openNLP with the …
(edit) @33330   5 years ak19 Also rebuilt the solr demo collection with the changes to (solrbuilder …
(edit) @33327   5 years ak19 In order to get map coordinate metadata stored correctly in solr, …
(edit) @33315   5 years ak19 1. Bugfix to issue discovered on windows: when the GS3 server isn't …
(edit) @33307   5 years kjdon updating solr.war to include my latest changes. TODO: does this war …
(edit) @33306   5 years kjdon we need to use (the new) level_ids list to determine which cores we …
(edit) @33065   5 years ak19 3 new Georgian language files added, 2 of which automatically …
(edit) @32891   5 years davidb Additional error checking
(edit) @32890   5 years davidb No longer use the OAIConfig file
(edit) @32889   5 years davidb Some adjustments after testing
(edit) @32888   5 years davidb Also want to check and untar cantoloupe in this PREPARE file
(edit) @32886   5 years davidb Copy refactoring
(edit) @32885   5 years davidb Now in main Greenstone resources/iiif area
(edit) @32884   5 years davidb Edit to make more generic
(edit) @32883   5 years davidb Code tidy up
(edit) @32878   5 years davidb Changed to specify 'sites' as the path_prefix area within Greenstone, …
(edit) @32877   5 years davidb GSImageResource added in to the mix
(edit) @32876   5 years davidb Tweaks to text
(edit) @32875   5 years davidb info.json now works
(edit) @32874   5 years davidb Next round of changes, migrating from OAI imprint to what is needed …
(edit) @32867   5 years davidb Some install notes
(edit) @32866   5 years davidb Tweak to script names specified
(edit) @32865   5 years davidb More detailed script name
(edit) @32864   5 years davidb More detailed script name
(edit) @32863   5 years davidb More detailed script name
(edit) @32862   5 years davidb More detailed script name
(edit) @32861   5 years davidb Improved echo prints
(edit) @32860   5 years davidb Mostly code tidy-up. In IIIFServerBridge.java, edit to remove …
(edit) @32859   5 years davidb Mostly code tidy-up. In IIIFServerBridge.java, edit to remove …
(edit) @32843   5 years davidb Shift from OAI as a template, to separate IIIF based classes
(edit) @32842   5 years davidb Shift from OAI as a template, to separate IIIF based classes
(edit) @32776   5 years ak19 Final version before merging with GLI
(edit) @32740   5 years ak19 Minor change before generating release testing binary
(edit) @32736   5 years ak19 Added feature of deselecting shapes when the drawing tool is changed.
(edit) @32732   5 years ak19 Work to eliminate a undo history bug, and fixed the selection issue.
(edit) @32731   5 years ak19 Introduction of drop-down menu for choosing map theme.
(edit) @32730   5 years ak19 Gives the map editor the ability to be displayed in different colour …
Note: See TracRevisionLog for help on using the revision log.