20:31 Changeset [33428] by ak19
Working commoncrawl cc-warc-examples' WET wordcount example using …
14:25 Changeset [33427] by davidb
Some initial files on how to get going
14:23 Changeset [33426] by davidb
Folder to details on how to standup the HTRC DevEnv locally


22:15 Changeset [33425] by ak19
A few more links now that I got past getting the vagrant VM with spark …
18:19 Changeset [33424] by ak19
Georgian (code ka) language translations for the gs3interface module …


20:07 Changeset [33423] by ak19
Adding in the link to the vagrant VM with Hadoop, Spark for cluster …
17:52 Changeset [33422] by ak19
Some more links.
16:39 Changeset [33421] by ak19
Forgot to fix up svn externals property for the Georgian …
16:38 Changeset [33420] by ak19
Update to svnproperty externals for the Georgian (code: ka) …
16:20 Changeset [33419] by ak19
Last evening, I had found some links about how language-detection is …
13:53 Changeset [33418] by cpb16
made progress with morphology, based one image, need to refine …


19:55 Changeset [33417] by ak19
Georgian language translations for the coredm for GS2, gsinstaller …
17:48 Changeset [33416] by ak19
DEC collections weren't getting built on 32 bit linux VM after trying …
11:42 Changeset [33415] by cpb16
updated, after unable to commit due to setup.bash being out of date. …


21:57 Changeset [33414] by ak19
Adding important links
21:57 Changeset [33413] by ak19
Splitting the get_commoncrawl_nz_urls.sh script back into 2 scripts, …
21:54 Changeset [33412] by ak19
config command for wgetting a single file
21:50 Changeset [33411] by ak19
Newer version now doesn't mirror sites with wget but gets WET files …
21:48 Changeset [33410] by ak19
Committing some variable name changes before I replace this file with …
15:59 Changeset [33409] by ak19
Forgot to commit 2 files with links and shuffling some links around …
15:09 Changeset [33408] by ak19
Some rough notes. Will move into appropriate file later.
14:40 Changeset [33407] by ak19
gutil.jar was rebuilt yesterday in GS3 after a bugfix. Recommitting …
12:17 Changeset [33406] by kjdon
if there is a semicolon after the file name, it ends up in the URL …


20:37 Changeset [33405] by ak19
Even though we're probably not going to use this code after all, will …
20:35 Changeset [33404] by ak19
1. Links to other Java ways of extracting text from web content. 2. …
15:07 Changeset [33403] by ak19
Mistake to do with launchdir in SafeProcess: if the environment for …


22:03 Changeset [33402] by ak19
Beginnings of the Java class to wget sites and process its pages to …
21:16 Changeset [33401] by ak19
MaoriTextDetector.class file now generated inside its package folder …
21:15 Changeset [33400] by ak19
1. Setting up log4j.properties based on the macronizer's basic one …
20:48 Changeset [33399] by ak19
Putting properties files into the conf folder and keeping the lib …
19:35 Changeset [33398] by ak19
Committing the actual package structure and the updated README after …
19:30 Changeset [33397] by ak19
1. Changing package structure and instructions on compiling/running as …
18:20 Changeset [33396] by ak19
Georgian language gs3colcfg module of GS interface. Many thanks to …
18:03 Changeset [33395] by ak19
Georgian language translation work for the gs3interface module of the …


20:37 Changeset [33394] by ak19
1. Started a file on feasibility with the data now available and some …
18:57 Changeset [33393] by ak19
Modified the get_commoncrawl_nz_urls.sh to also create a reduced urls …


15:15 Changeset [33392] by ak19
Kathy found a problem whereby she wanted to run consecutive buildcols …


19:11 Changeset [33391] by ak19
Some rough bash scripting lines that work but aren't complete.
17:31 Changeset [33390] by ak19
Minor message telling the user to wait for a task that takes some time.


13:19 Changeset [33389] by kjdon
store csv field array associated with filename, because you might have …
11:46 Changeset [33388] by kjdon
tidied up some debug statements
11:33 Changeset [33387] by kjdon
removed all my debug statements
11:06 Changeset [33386] by kjdon
modified the test for whether this is the selected node or not. cant …


12:53 Changeset [33385] by kjdon
need to import response node as it is not part of same document
12:39 Changeset [33384] by cpb16
backup before intellij working
12:20 Changeset [33383] by kjdon
some more work on the help page
12:14 Changeset [33382] by kjdon
don't add collection/collname to pref and help link if collname is empty
12:13 Changeset [33381] by kjdon
use nice /page/gsdl url for about greenstone page
12:12 Changeset [33380] by kjdon
some more mods and strings for collection help page


21:09 Changeset [33379] by ak19
New script to automate getting a file listing of the common crawl URL …
19:05 Changeset [33378] by ak19
New bin/script folder and relocating gen_SentenceDetection_model.sh to …
19:04 Changeset [33377] by ak19
Changes to get gen_SentenceDetection_model.sh to run still from the …
18:39 Changeset [33376] by ak19
Links and extracts I've read so far on the Web Curator Tool (WCT), …


13:58 Changeset [33375] by cpb16
Full backup after running first successful highres classifier run
11:11 Changeset [33374] by davidb
added in opt-doc-args-link variable otherwise the transform fails with …
10:29 Changeset [33373] by kjdon
need to check for null result from getTextString - otherwise get a …


15:25 Changeset [33372] by kjdon
when writing out facets in buildConfig, need to get them from …
12:08 Changeset [33371] by kjdon
separate sort and facet fields as the former needs to be single valued …
11:59 Changeset [33370] by kjdon
use the new get_or_create_shortname instead of create_shortname
11:55 Changeset [33369] by kjdon
instead of create_shortname, now have get_or_create_shortname. this …
11:12 Changeset [33368] by kjdon
sort fields cannot be multivalued. Facet fields need to be. SO have …


12:53 Changeset [33367] by cpb16
Pre-hires classification w/o MU


16:01 Changeset [33366] by davidb
Formatting refactoring to reduce code duplication
16:00 Changeset [33365] by davidb
Exported version of spreadsheet for public download
15:59 Changeset [33364] by davidb
Requested word changes to About page
15:59 Changeset [33363] by davidb
Customization of help text
15:58 Changeset [33362] by davidb
Changes to the wording and formating of Terms and Conditions
15:57 Changeset [33361] by davidb
Change of headings that are exported
15:57 Changeset [33360] by davidb
Code tidy-up and change of input/output filenanme
10:43 Changeset [33359] by davidb
solr needs to add shortnames to the fieldnamemap otherwise it won't …


21:03 Changeset [33358] by ak19
More minor changes to README
21:00 Changeset [33357] by ak19
Minor changes
20:57 Changeset [33356] by ak19
Updating script. Correction to a filepath different in the svn folder …
20:54 Changeset [33355] by ak19
Changes for adding in the new gen_SentenceDetection_model.sh script, …
16:31 Changeset [33354] by davidb
Template file for producing OpenOffice spreadsheet format
16:04 Changeset [33353] by davidb
Initial set of files to page scrape and turn in the OpenOffice
16:01 Changeset [33352] by davidb
Top-level folder for code to page-scrape BookStumper site
16:00 Changeset [33351] by davidb
Top-level folder for code to page-scrape BookStumper site


17:29 Changeset [33350] by ak19
Better comments. Tested macronised vs unmacronised Māori language test …
16:45 Changeset [33349] by ak19
Minor changes to the README for map demo solr-haminfo collection …
16:33 Changeset [33348] by ak19
2 major changes. 1. Forgot to commit Dr Bainbridge's bugfix for why …
13:45 Changeset [33347] by kjdon
made it optional whether the user gets shown the terms and conditions …
13:15 Changeset [33346] by kjdon
check for empty child_id, and null DBInfo before using them
12:59 Changeset [33345] by kjdon
got rid of hard coded empty basket text
12:48 Changeset [33344] by kjdon
added favourites empty text
12:38 Changeset [33343] by kjdon
add in favourites langfrags (not just berry ones). Change the title …
12:37 Changeset [33342] by kjdon
change the empty basket message depending on whether it is a berry …
11:07 Changeset [33341] by kjdon
tidied up relational metadata retrieval. implemented descendants and …


16:46 Changeset [33340] by cpb16
transferred backup of low res images. Classifiers work as expected. …


23:43 Changeset [33339] by ak19
Updated README.
23:24 Changeset [33338] by ak19
1.After renaming the java class, changed all occurrences of the old …
23:21 Changeset [33337] by ak19
Renaming the class to MaoriTextDetector, since it doesn't detect audio …
22:58 Changeset [33336] by ak19
Major rewrite to make this class more useful to callers. …
Note: See TracTimeline for information about the timeline view.