Timeline


and .

21.02.2020:

21:00 Changeset [33966] by ak19
Added the origSequence and basicDomain columns to the random 260 web page …
20:59 Changeset [33965] by ak19
1. Adding a basicDomain column (stripped of http/https and www prefix) for …
19:57 Changeset [33964] by ak19
2 records were missing a value for the qualityLevel column.

20.02.2020:

22:12 Changeset [33963] by ak19
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI and …
22:07 Changeset [33962] by ak19
2 fields changed, as one was missed out and the other incorrectly entered. …
20:24 Changeset [33961] by ak19
New category, LINK_TEXT, introduced for the random web page URL samples.
20:22 Changeset [33960] by ak19
Reviewed all the random sample web page URLs marked SINGLE_MRI_SENTENCE …
20:06 Changeset [33959] by ak19
URIEncoding the mapData makes it unparseable by geojson.io
19:32 Changeset [33958] by ak19
There were other xsl files using the original depositorTitleAndLink …
19:24 Changeset [33957] by ak19
1. depositor related interface display modified to work with recent …
18:28 Changeset [33956] by ak19
Related to commit 33953: made lots of accidental commits in rev 33953, and …
18:26 Changeset [33955] by ak19
Undoing accidental commit of unintended files.
18:21 Changeset [33954] by ak19
Accidentally committed with other files. Undoing.
18:19 Changeset [33953] by ak19
Depositor link not used

18.02.2020:

23:35 Changeset [33952] by ak19
Minor changes for processing
23:33 Changeset [33951] by ak19
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
23:28 Changeset [33950] by ak19
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
23:22 Changeset [33949] by ak19
Reviewed the qualityLevel column where NAV was assigned.
22:56 Changeset [33948] by ak19
Reviewed the random sampled web page URLs marked as SIGNIFICANTLY_MAORI …
22:07 Changeset [33947] by ak19
Some more questionmarked field values assigned.
21:58 Changeset [33946] by ak19
1. New function to handle user input assigning the newly introduced 4th …
21:48 Changeset [33945] by ak19
Added a 4th column for all 260 sample web page URLs and have used the …
16:44 Changeset [33944] by ak19
Added the isReallyInMRI column after manually inspecting the remaining 70 …
15:56 Changeset [33943] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:55 Changeset [33942] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:18 Changeset [33941] by ak19
1. Uppercase 3rd field (Y/N/? field) read back in from file before being …

17.02.2020:

22:16 Changeset [33940] by ak19
1. In order to make it easier to do the manual work of inspecting 260 web …
16:22 Changeset [33939] by ak19
1. Old random samples file doesn't apply as we're not sampling by country …
16:10 Changeset [33938] by ak19
1. Don't regenerate random sample of web page urls and full web page url …
16:06 Changeset [33937] by ak19
New counts of manual sites after reingesting into MongoDB. Forgot to …
16:05 Changeset [33936] by ak19
Renaming old file to place with new counts after reingesting into MongoDB.

16.02.2020:

18:16 Changeset [33935] by davidb
Additional check added into get-isis target
17:34 Changeset [33934] by davidb
Removal of static code block calling ancient/deprecated static …
14:19 Changeset [33933] by davidb
Changed 8-spaces to tag chars in Makefile.in. Original problem caused by …

15.02.2020:

19:14 Changeset [33932] by davidb
Commented out Java version warning message, as it presents as something …
19:10 Changeset [33931] by davidb
Two changes to setup file. The first was to move the test for ant to be …
19:00 Changeset [33930] by davidb
Code used to assume that major number was a single digit, as in 1.6 or …
18:57 Changeset [33929] by davidb
Newer JDKs don't have javah => make file change that takes account of this
18:55 Changeset [33928] by davidb
Streamlining of how test for JDK/javac is done
14:57 Changeset [33927] by davidb
Reworking of javah test

14.02.2020:

23:03 Changeset [33926] by ak19
Investigated some other options for screen capturing and Google chrome …
20:41 Changeset [33925] by ak19
1. Bugfix: oversight, should return uri encoded URL for mapData, forgot to …
19:22 Changeset [33924] by ak19
Adding in Dr Bainbridge's command to check the JSON generated is valid. …
18:45 Changeset [33923] by davidb
Removed non-UTF8 valid char from comment; regenerated tar file
18:13 Changeset [33922] by davidb
Notes about using this site
18:11 Changeset [33921] by davidb
Newer Java's don't have 'javah' any more. The functionality has been …
16:55 Changeset [33920] by davidb
Found to be needed when compiling up on a Google Compute Engine (GCE) …

13.02.2020:

22:40 Changeset [33919] by ak19
SummaryTool? now uses the CountryCodeCountsMapData?.java class to generate …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist?2.txt …

12.02.2020:

21:27 Changeset [33913] by ak19
1. Adjusted table mongodb query statements to be more exact, but same …
19:53 Changeset [33912] by ak19
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
19:12 Changeset [33911] by ak19
Correct commit message for previous and current commit: 1. After …
19:05 Changeset [33910] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
19:02 Changeset [33909] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …

10.02.2020:

09:41 Changeset [33908] by kjdon
meta values are already escaped. Don't want to escape them again otherwise …

05.02.2020:

23:38 Changeset [33907] by ak19
See previous commit message. This will be the file with the results for …
23:36 Changeset [33906] by ak19
Code is intermediate state. 1. Introduced basicDomain field to MongoDB and …
18:49 Changeset [33905] by ak19
More notes
18:48 Changeset [33904] by ak19
Shouldn't greylist anglican.org, as this prevented crawling of …

04.02.2020:

15:50 Changeset [33903] by ak19
My notes when preparing for today's meetings. Some of this may be useful …
13:05 Changeset [33902] by kjdon
pass in new casefold and accentfold options to format_metadata_for_sorting
13:04 Changeset [33901] by kjdon
new casefold_metadata_for_formatting and …
13:03 Changeset [33900] by kjdon
BaseClassifier? casefold/accentfold options
13:03 Changeset [33899] by kjdon
pass in new casefold and accentfold options (BaseClassifier?) to …
12:59 Changeset [33898] by kjdon
format_metadata_for_sorting now takes two additional args - casefold and …
10:06 Changeset [33897] by kjdon
elsewhere in the code - GSXML.xmlSafe, we are escaping ' => ' we need …

03.02.2020:

23:29 Changeset [33896] by ak19
Clarification in comments
23:20 Changeset [33895] by ak19
Minor rename
23:20 Changeset [33894] by ak19
1. Adding map, counts.json and geo-json files for 5b count of sites by …
22:41 Changeset [33893] by ak19
1. Left out region code column. 2. Two more sheets of work in progress to …
22:28 Changeset [33892] by ak19
Sheets renamed and spreadsheet renamed
22:27 Changeset [33891] by ak19
Site level detected vs manual inspected data: working shown in file …
20:31 Changeset [33890] by ak19
Finished going through NZ sites listing of numPagesContainingMRI > 0 and …
15:48 Changeset [33889] by ak19
1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of the …
13:08 Changeset [33888] by kjdon
added propertyFile attribute to gsf:interfaceText so that you can request …

31.01.2020:

23:49 Changeset [33887] by ak19
1. Added support for writing out tables in csv format too. 2. Second table …
23:17 Changeset [33886] by ak19
Minor. File rename
22:54 Changeset [33885] by ak19
Attempting to write the tables. csv not yet supported. Table 1 done.
22:21 Changeset [33884] by ak19
0. Previous commit had lots of modifications, and only 2 files matched the …
21:50 Changeset [33883] by ak19
Clarifications

30.01.2020:

22:54 Changeset [33882] by ak19
Code now writes both a listing of all non-autotranslated websites and a …
22:08 Changeset [33881] by ak19
Uses lambda expression to process each doc in a mongodb aggregate result. …
21:17 Changeset [33880] by ak19
Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
20:21 Changeset [33879] by ak19
Have the 2 mongodb aggregate() calls working that
20:18 Changeset [33878] by ak19
Better comment
20:07 Changeset [33877] by ak19
Reordering to have proper descending order of counts

29.01.2020:

21:48 Changeset [33876] by ak19
Some missteps, but have got complex collection.aggregate() working at …
19:18 Changeset [33875] by ak19
Renaming 2 more files correctly
19:15 Changeset [33874] by ak19
Renaming 2 files correctly

24.01.2020:

21:49 Changeset [33873] by ak19
Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
21:44 Changeset [33872] by ak19
1. Added the file containing the 255 random NZ page URLs to sample. 2. …
20:59 Changeset [33871] by ak19
Removed mostly duplicated older version of method but left the different …
20:48 Changeset [33870] by ak19
Got the mongodb query working in Java in 2 different ways: the fully Java …

23.01.2020:

22:59 Changeset [33869] by ak19
First cut at the RandomURLsForDomainGenerator.java class and the mongodb …
21:16 Changeset [33868] by ak19
With the updated code for generating the maps from 6a and 6b manual site …
21:12 Changeset [33867] by ak19
Moved the code handling of special case large rectangles and those that …
18:56 Changeset [33866] by ak19
Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
18:49 Changeset [33865] by ak19
1. The gs3 context name changed from macronizer to macron-restoration. 2. …
14:09 Changeset [33864] by davidb
Changes to make the Whakatohea banner narrower
11:32 Changeset [33863] by davidb
Script to get sample content for the DL collection
11:17 Changeset [33862] by davidb
Change to specifying the About page text done through about.xml so it can …
11:16 Changeset [33861] by davidb
About page text done through about.xml so it can include xslt tags
10:22 Changeset [33860] by davidb
Addition of 3 further CPAN packages, found to be needed on CentOS build
09:56 Changeset [33859] by davidb
Additional CPAN Perl packages found to be needed when compiling up …

22.01.2020:

19:31 Changeset [33858] by ak19
Fixes to the code committed yesterday: correct calculation of the …
16:49 Changeset [33857] by davidb
Next iteration of the about text
16:33 Changeset [33856] by ak19
Forgot to commit. Last week, Dr Bainbridge had properly cropped the SVG …
15:03 Changeset [33855] by davidb
Code added to detect if the CGI parameter already specifies a collection, …
Note: See TracTimeline for information about the timeline view.