source:

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33460   5 years kjdon fixed up some typos. removed use_hlist_for option. This is very hard …
(edit) @33459   5 years kjdon small changes to some strings
(edit) @33458   5 years cpb16 Running new morphology version after quick meeting with david last …
(edit) @33457   5 years ak19 Got stage 1, the WARC to WET conversion, working, after necessary …
(edit) @33456   5 years ak19 Link to discussion on how to convert WARC to WET
(edit) @33455   5 years cpb16 Started implementing Davids suggested morphology sequence, codeversion9
(edit) @33454   5 years kjdon updated metadata_selection_mode to be …
(edit) @33453   5 years kjdon the new and modified strings for revamped List classifier
(edit) @33452   5 years kjdon revamp of list classifier. More precise handling of numeric metadata …
(edit) @33451   5 years kjdon added a comment
(edit) @33450   5 years kjdon removed some unnecessary comments
(edit) @33449   5 years cpb16 termnal version executes correctly. (Didnt include init threshold in …
(edit) @33448   5 years ak19 Minor clarification and inclusion of helpful command
(edit) @33447   5 years cpb16 starting to implement terminal version of new morphology. need to fix. …
(edit) @33446   5 years ak19 1. Committing working version of export_maori_subset.sh which takes …
(edit) @33445   5 years ak19 The first working hadoop spark script for processing common crawl …
(edit) @33444   5 years cpb16 Have created a preprocess to remove large objects. …
(edit) @33443   5 years ak19 More notes
(edit) @33442   5 years ak19 Updated gutil.jar file (with SafeProcses debugging)
(edit) @33441   5 years ak19 Adding further notes to do with running the CC-index examples on spark.
(edit) @33440   5 years ak19 Split file to move vagrant-spark-hadoop notes into own file.
(edit) @33439   5 years cpb16 Have created properties file and accessibility from …
(edit) @33438   5 years ak19 Forgot to commit a change made for Georgian.
(edit) @33437   5 years cpb16 made progress with morphology. Need to have a better area dimension …
(edit) @33436   5 years ak19 3 important changes for 2 separate bugfixes where one bugfix is …
(edit) @33435   5 years ak19 Georgian language translations for the language's new glihelp module …
(edit) @33434   5 years ak19 Correcting syntax errors in this bash script.
(edit) @33433   5 years ak19 New Georgian language translation for perlmodules module of the GS …
(edit) @33432   5 years ak19 New Georgian language translation for glidict module of the GS …
(edit) @33431   5 years ak19 Corrections of automated processing, noticed when processing Georgian …
(edit) @33430   5 years ak19 Undo call to to_utf8() on the query_string argument (arg[q]) to …
(edit) @33429   5 years kjdon fixed a bug in get_or_create_shortname where it wasn't storing the new …
(edit) @33428   5 years ak19 Working commoncrawl cc-warc-examples' WET wordcount example using …
(edit) @33427   5 years davidb Some initial files on how to get going
(edit) @33426   5 years davidb Folder to details on how to standup the HTRC DevEnv locally
(edit) @33425   5 years ak19 A few more links now that I got past getting the vagrant VM with spark …
(edit) @33424   5 years ak19 Georgian (code ka) language translations for the gs3interface module …
(edit) @33423   5 years ak19 Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
(edit) @33422   5 years ak19 Some more links.
(edit) @33421   5 years ak19 Forgot to fix up svn externals property for the Georgian …
(edit) @33420   5 years ak19 Update to svnproperty externals for the Georgian (code: ka) …
(edit) @33419   5 years ak19 Last evening, I had found some links about how language-detection is …
(edit) @33418   5 years cpb16 made progress with morphology, based one image, need to refine …
(edit) @33417   5 years ak19 Georgian language translations for the coredm for GS2, gsinstaller …
(edit) @33416   5 years ak19 DEC collections weren't getting built on 32 bit linux VM after trying …
(edit) @33415   5 years cpb16 updated, after unable to commit due to setup.bash being out of date. …
(edit) @33414   5 years ak19 Adding important links
(edit) @33413   5 years ak19 Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
(edit) @33412   5 years ak19 config command for wgetting a single file
(edit) @33411   5 years ak19 Newer version now doesn't mirror sites with wget but gets WET files …
(edit) @33410   5 years ak19 Committing some variable name changes before I replace this file with …
(edit) @33409   5 years ak19 Forgot to commit 2 files with links and shuffling some links around …
(edit) @33408   5 years ak19 Some rough notes. Will move into appropriate file later.
(edit) @33407   5 years ak19 gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
(edit) @33406   5 years kjdon if there is a semicolon after the file name, it ends up in the URL …
(edit) @33405   5 years ak19 Even though we're probably not going to use this code after all, will …
(edit) @33404   5 years ak19 1. Links to other Java ways of extracting text from web content. 2. …
(edit) @33403   5 years ak19 Mistake to do with launchdir in SafeProcess: if the environment for …
(edit) @33402   5 years ak19 Beginnings of the Java class to wget sites and process its pages to …
(edit) @33401   5 years ak19 MaoriTextDetector.class file now generated inside its package folder …
(edit) @33400   5 years ak19 1. Setting up log4j.properties based on the macronizer's basic one …
(edit) @33399   5 years ak19 Putting properties files into the conf folder and keeping the lib …
(edit) @33398   5 years ak19 Committing the actual package structure and the updated README after …
(edit) @33397   5 years ak19 1. Changing package structure and instructions on compiling/running as …
(edit) @33396   5 years ak19 Georgian language gs3colcfg module of GS interface. Many thanks to …
(edit) @33395   5 years ak19 Georgian language translation work for the gs3interface module of the …
(edit) @33394   5 years ak19 1. Started a file on feasibility with the data now available and some …
(edit) @33393   5 years ak19 Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …
(edit) @33392   5 years ak19 Kathy found a problem whereby she wanted to run consecutive buildcols …
(edit) @33391   5 years ak19 Some rough bash scripting lines that work but aren't complete.
(edit) @33390   5 years ak19 Minor message telling the user to wait for a task that takes some time.
(edit) @33389   5 years kjdon store csv field array associated with filename, because you might have …
(edit) @33388   5 years kjdon tidied up some debug statements
(edit) @33387   5 years kjdon removed all my debug statements
(edit) @33386   5 years kjdon modified the test for whether this is the selected node or not. cant …
(edit) @33385   5 years kjdon need to import response node as it is not part of same document
(edit) @33384   5 years cpb16 backup before intellij working
(edit) @33383   5 years kjdon some more work on the help page
(edit) @33382   5 years kjdon don't add collection/collname to pref and help link if collname is empty
(edit) @33381   5 years kjdon use nice /page/gsdl url for about greenstone page
(edit) @33380   5 years kjdon some more mods and strings for collection help page
(edit) @33379   5 years ak19 New script to automate getting a file listing of the common crawl URL …
(edit) @33378   5 years ak19 New bin/script folder and relocating gen_SentenceDetection_model.sh to …
(edit) @33377   5 years ak19 Changes to get gen_SentenceDetection_model.sh to run still from the …
(edit) @33376   5 years ak19 Links and extracts I've read so far on the Web Curator Tool (WCT), …
(edit) @33375   5 years cpb16 Full backup after running first successful highres classifier run
(edit) @33374   5 years davidb added in opt-doc-args-link variable otherwise the transform fails with …
(edit) @33373   5 years kjdon need to check for null result from getTextString - otherwise get a …
(edit) @33372   5 years kjdon when writing out facets in buildConfig, need to get them from …
(edit) @33371   5 years kjdon separate sort and facet fields as the former needs to be single valued …
(edit) @33370   5 years kjdon use the new get_or_create_shortname instead of create_shortname
(edit) @33369   5 years kjdon instead of create_shortname, now have get_or_create_shortname. this …
(edit) @33368   5 years kjdon sort fields cannot be multivalued. Facet fields need to be. SO have …
(edit) @33367   5 years cpb16 Pre-hires classification w/o MU
(edit) @33366   5 years davidb Formatting refactoring to reduce code duplication
(edit) @33365   5 years davidb Exported version of spreadsheet for public download
(edit) @33364   5 years davidb Requested word changes to About page
(edit) @33363   5 years davidb Customization of help text
(edit) @33362   5 years davidb Changes to the wording and formating of Terms and Conditions
(edit) @33361   5 years davidb Change of headings that are exported
Note: See TracRevisionLog for help on using the revision log.