source:

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33440   5 years ak19 Split file to move vagrant-spark-hadoop notes into own file.
(edit) @33439   5 years cpb16 Have created properties file and accessibility from …
(edit) @33438   5 years ak19 Forgot to commit a change made for Georgian.
(edit) @33437   5 years cpb16 made progress with morphology. Need to have a better area dimension …
(edit) @33436   5 years ak19 3 important changes for 2 separate bugfixes where one bugfix is …
(edit) @33435   5 years ak19 Georgian language translations for the language's new glihelp module …
(edit) @33434   5 years ak19 Correcting syntax errors in this bash script.
(edit) @33433   5 years ak19 New Georgian language translation for perlmodules module of the GS …
(edit) @33432   5 years ak19 New Georgian language translation for glidict module of the GS …
(edit) @33431   5 years ak19 Corrections of automated processing, noticed when processing Georgian …
(edit) @33430   5 years ak19 Undo call to to_utf8() on the query_string argument (arg[q]) to …
(edit) @33429   5 years kjdon fixed a bug in get_or_create_shortname where it wasn't storing the new …
(edit) @33428   5 years ak19 Working commoncrawl cc-warc-examples' WET wordcount example using …
(edit) @33427   5 years davidb Some initial files on how to get going
(edit) @33426   5 years davidb Folder to details on how to standup the HTRC DevEnv locally
(edit) @33425   5 years ak19 A few more links now that I got past getting the vagrant VM with spark …
(edit) @33424   5 years ak19 Georgian (code ka) language translations for the gs3interface module …
(edit) @33423   5 years ak19 Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
(edit) @33422   5 years ak19 Some more links.
(edit) @33421   5 years ak19 Forgot to fix up svn externals property for the Georgian …
(edit) @33420   5 years ak19 Update to svnproperty externals for the Georgian (code: ka) …
(edit) @33419   5 years ak19 Last evening, I had found some links about how language-detection is …
(edit) @33418   5 years cpb16 made progress with morphology, based one image, need to refine …
(edit) @33417   5 years ak19 Georgian language translations for the coredm for GS2, gsinstaller …
(edit) @33416   5 years ak19 DEC collections weren't getting built on 32 bit linux VM after trying …
(edit) @33415   5 years cpb16 updated, after unable to commit due to setup.bash being out of date. …
(edit) @33414   5 years ak19 Adding important links
(edit) @33413   5 years ak19 Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
(edit) @33412   5 years ak19 config command for wgetting a single file
(edit) @33411   5 years ak19 Newer version now doesn't mirror sites with wget but gets WET files …
(edit) @33410   5 years ak19 Committing some variable name changes before I replace this file with …
(edit) @33409   5 years ak19 Forgot to commit 2 files with links and shuffling some links around …
(edit) @33408   5 years ak19 Some rough notes. Will move into appropriate file later.
(edit) @33407   5 years ak19 gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
(edit) @33406   5 years kjdon if there is a semicolon after the file name, it ends up in the URL …
(edit) @33405   5 years ak19 Even though we're probably not going to use this code after all, will …
(edit) @33404   5 years ak19 1. Links to other Java ways of extracting text from web content. 2. …
(edit) @33403   5 years ak19 Mistake to do with launchdir in SafeProcess: if the environment for …
(edit) @33402   5 years ak19 Beginnings of the Java class to wget sites and process its pages to …
(edit) @33401   5 years ak19 MaoriTextDetector.class file now generated inside its package folder …
(edit) @33400   5 years ak19 1. Setting up log4j.properties based on the macronizer's basic one …
(edit) @33399   5 years ak19 Putting properties files into the conf folder and keeping the lib …
(edit) @33398   5 years ak19 Committing the actual package structure and the updated README after …
(edit) @33397   5 years ak19 1. Changing package structure and instructions on compiling/running as …
(edit) @33396   5 years ak19 Georgian language gs3colcfg module of GS interface. Many thanks to …
(edit) @33395   5 years ak19 Georgian language translation work for the gs3interface module of the …
(edit) @33394   5 years ak19 1. Started a file on feasibility with the data now available and some …
(edit) @33393   5 years ak19 Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …
(edit) @33392   5 years ak19 Kathy found a problem whereby she wanted to run consecutive buildcols …
(edit) @33391   5 years ak19 Some rough bash scripting lines that work but aren't complete.
(edit) @33390   5 years ak19 Minor message telling the user to wait for a task that takes some time.
(edit) @33389   5 years kjdon store csv field array associated with filename, because you might have …
(edit) @33388   5 years kjdon tidied up some debug statements
(edit) @33387   5 years kjdon removed all my debug statements
(edit) @33386   5 years kjdon modified the test for whether this is the selected node or not. cant …
(edit) @33385   5 years kjdon need to import response node as it is not part of same document
(edit) @33384   5 years cpb16 backup before intellij working
(edit) @33383   5 years kjdon some more work on the help page
(edit) @33382   5 years kjdon don't add collection/collname to pref and help link if collname is empty
(edit) @33381   5 years kjdon use nice /page/gsdl url for about greenstone page
(edit) @33380   5 years kjdon some more mods and strings for collection help page
(edit) @33379   5 years ak19 New script to automate getting a file listing of the common crawl URL …
(edit) @33378   5 years ak19 New bin/script folder and relocating gen_SentenceDetection_model.sh to …
(edit) @33377   5 years ak19 Changes to get gen_SentenceDetection_model.sh to run still from the …
(edit) @33376   5 years ak19 Links and extracts I've read so far on the Web Curator Tool (WCT), …
(edit) @33375   5 years cpb16 Full backup after running first successful highres classifier run
(edit) @33374   5 years davidb added in opt-doc-args-link variable otherwise the transform fails with …
(edit) @33373   5 years kjdon need to check for null result from getTextString - otherwise get a …
(edit) @33372   5 years kjdon when writing out facets in buildConfig, need to get them from …
(edit) @33371   5 years kjdon separate sort and facet fields as the former needs to be single valued …
(edit) @33370   5 years kjdon use the new get_or_create_shortname instead of create_shortname
(edit) @33369   5 years kjdon instead of create_shortname, now have get_or_create_shortname. this …
(edit) @33368   5 years kjdon sort fields cannot be multivalued. Facet fields need to be. SO have …
(edit) @33367   5 years cpb16 Pre-hires classification w/o MU
(edit) @33366   5 years davidb Formatting refactoring to reduce code duplication
(edit) @33365   5 years davidb Exported version of spreadsheet for public download
(edit) @33364   5 years davidb Requested word changes to About page
(edit) @33363   5 years davidb Customization of help text
(edit) @33362   5 years davidb Changes to the wording and formating of Terms and Conditions
(edit) @33361   5 years davidb Change of headings that are exported
(edit) @33360   5 years davidb Code tidy-up and change of input/output filenanme
(edit) @33359   5 years davidb solr needs to add shortnames to the fieldnamemap otherwise it won't …
(edit) @33358   5 years ak19 More minor changes to README
(edit) @33357   5 years ak19 Minor changes
(edit) @33356   5 years ak19 Updating script. Correction to a filepath different in the svn folder …
(edit) @33355   5 years ak19 Changes for adding in the new gen_SentenceDetection_model.sh script, …
(edit) @33354   5 years davidb Template file for producing OpenOffice spreadsheet format
(edit) @33353   5 years davidb Initial set of files to page scrape and turn in the OpenOffice
(edit) @33352   5 years davidb Top-level folder for code to page-scrape BookStumper site
(edit) @33351   5 years davidb Top-level folder for code to page-scrape BookStumper site
(edit) @33350   5 years ak19 Better comments. Tested macronised vs unmacronised Māori language test …
(edit) @33349   5 years ak19 Minor changes to the README for map demo solr-haminfo collection …
(edit) @33348   5 years ak19 2 major changes. 1. Forgot to commit Dr Bainbridge's bugfix for why …
(edit) @33347   5 years kjdon made it optional whether the user gets shown the terms and conditions …
(edit) @33346   5 years kjdon check for empty child_id, and null DBInfo before using them
(edit) @33345   5 years kjdon got rid of hard coded empty basket text
(edit) @33344   5 years kjdon added favourites empty text
(edit) @33343   5 years kjdon add in favourites langfrags (not just berry ones). Change the title …
(edit) @33342   5 years kjdon change the empty basket message depending on whether it is a berry …
(edit) @33341   5 years kjdon tidied up relational metadata retrieval. implemented descendants and …
Note: See TracRevisionLog for help on using the revision log.