Timeline



2020-02-21:

21:00 Changeset [33966] by ak19
Added the origSequence and basicDomain columns to the random 260 web …
20:59 Changeset [33965] by ak19
1. Adding a basicDomain column (stripped of http/https and www prefix) …
19:57 Changeset [33964] by ak19
2 records were missing a value for the qualityLevel column.

2020-02-20:

22:12 Changeset [33963] by ak19
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
22:07 Changeset [33962] by ak19
2 fields changed, as one was missed out and the other incorrectly …
20:24 Changeset [33961] by ak19
New category, LINK_TEXT, introduced for the random web page URL samples.
20:22 Changeset [33960] by ak19
Reviewed all the random sample web page URLs marked …
20:06 Changeset [33959] by ak19
URIEncoding the mapData makes it unparseable by geojson.io
19:32 Changeset [33958] by ak19
There were other xsl files using the original depositorTitleAndLink …
19:24 Changeset [33957] by ak19
1. depositor related interface display modified to work with recent …
18:28 Changeset [33956] by ak19
Related to commit 33953: made lots of accidental commits in rev 33953, …
18:26 Changeset [33955] by ak19
Undoing accidental commit of unintended files.
18:21 Changeset [33954] by ak19
Accidentally committed with other files. Undoing.
18:19 Changeset [33953] by ak19
Depositor link not used

2020-02-18:

23:35 Changeset [33952] by ak19
Minor changes for processing
23:33 Changeset [33951] by ak19
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
23:28 Changeset [33950] by ak19
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
23:22 Changeset [33949] by ak19
Reviewed the qualityLevel column where NAV was assigned.
22:56 Changeset [33948] by ak19
Reviewed the random sampled web page URLs marked as …
22:07 Changeset [33947] by ak19
Some more questionmarked field values assigned.
21:58 Changeset [33946] by ak19
1. New function to handle user input assigning the newly introduced …
21:48 Changeset [33945] by ak19
Added a 4th column for all 260 sample web page URLs and have used the …
16:44 Changeset [33944] by ak19
Added the isReallyInMRI column after manually inspecting the remaining …
15:56 Changeset [33943] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:55 Changeset [33942] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:18 Changeset [33941] by ak19
1. Uppercase 3rd field (Y/N/? field) read back in from file before …

2020-02-17:

22:16 Changeset [33940] by ak19
1. In order to make it easier to do the manual work of inspecting 260 …
16:22 Changeset [33939] by ak19
1. Old random samples file doesn't apply as we're not sampling by …
16:10 Changeset [33938] by ak19
1. Don't regenerate random sample of web page urls and full web page …
16:06 Changeset [33937] by ak19
New counts of manual sites after reingesting into MongoDB. Forgot to …
16:05 Changeset [33936] by ak19
Renaming old file to place with new counts after reingesting into MongoDB.

2020-02-16:

18:16 Changeset [33935] by davidb
Additional check added into get-isis target
17:34 Changeset [33934] by davidb
Removal of static code block calling ancient/deprecated static …
14:19 Changeset [33933] by davidb
Changed 8-spaces to tag chars in Makefile.in. Original problem caused …

2020-02-15:

19:14 Changeset [33932] by davidb
Commented out Java version warning message, as it presents as …
19:10 Changeset [33931] by davidb
Two changes to setup file. The first was to move the test for ant to …
19:00 Changeset [33930] by davidb
Code used to assume that major number was a single digit, as in 1.6 or …
18:57 Changeset [33929] by davidb
Newer JDKs don't have javah => make file change that takes account of this
18:55 Changeset [33928] by davidb
Streamlining of how test for JDK/javac is done
14:57 Changeset [33927] by davidb
Reworking of javah test

2020-02-14:

23:03 Changeset [33926] by ak19
Investigated some other options for screen capturing and Google chrome …
20:41 Changeset [33925] by ak19
1. Bugfix: oversight, should return uri encoded URL for mapData, …
19:22 Changeset [33924] by ak19
Adding in Dr Bainbridge's command to check the JSON generated is …
18:45 Changeset [33923] by davidb
Removed non-UTF8 valid char from comment; regenerated tar file
18:13 Changeset [33922] by davidb
Notes about using this site
18:11 Changeset [33921] by davidb
Newer Java's don't have 'javah' any more. The functionality has been …
16:55 Changeset [33920] by davidb
Found to be needed when compiling up on a Google Compute Engine (GCE) …

2020-02-13:

22:40 Changeset [33919] by ak19
SummaryTool now uses the CountryCodeCountsMapData.java class to …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist2.txt …

2020-02-12:

21:27 Changeset [33913] by ak19
1. Adjusted table mongodb query statements to be more exact, but same …
19:53 Changeset [33912] by ak19
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
19:12 Changeset [33911] by ak19
Correct commit message for previous and current commit: 1. After …
19:05 Changeset [33910] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
19:02 Changeset [33909] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …

2020-02-10:

09:41 Changeset [33908] by kjdon
meta values are already escaped. Don't want to escape them again …

2020-02-05:

23:38 Changeset [33907] by ak19
See previous commit message. This will be the file with the results …
23:36 Changeset [33906] by ak19
Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
18:49 Changeset [33905] by ak19
More notes
18:48 Changeset [33904] by ak19
Shouldn't greylist anglican.org, as this prevented crawling of …

2020-02-04:

15:50 Changeset [33903] by ak19
My notes when preparing for today's meetings. Some of this may be …
13:05 Changeset [33902] by kjdon
pass in new casefold and accentfold options to format_metadata_for_sorting
13:04 Changeset [33901] by kjdon
new casefold_metadata_for_formatting and …
13:03 Changeset [33900] by kjdon
BaseClassifier casefold/accentfold options
13:03 Changeset [33899] by kjdon
pass in new casefold and accentfold options (BaseClassifier) to …
12:59 Changeset [33898] by kjdon
format_metadata_for_sorting now takes two additional args - casefold …
10:06 Changeset [33897] by kjdon
elsewhere in the code - GSXML.xmlSafe, we are escaping ' => ' we …

2020-02-03:

23:29 Changeset [33896] by ak19
Clarification in comments
23:20 Changeset [33895] by ak19
Minor rename
23:20 Changeset [33894] by ak19
1. Adding map, counts.json and geo-json files for 5b count of sites by …
22:41 Changeset [33893] by ak19
1. Left out region code column. 2. Two more sheets of work in progress …
22:28 Changeset [33892] by ak19
Sheets renamed and spreadsheet renamed
22:27 Changeset [33891] by ak19
Site level detected vs manual inspected data: working shown in file …
20:31 Changeset [33890] by ak19
Finished going through NZ sites listing of numPagesContainingMRI > 0 …
15:48 Changeset [33889] by ak19
1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
13:08 Changeset [33888] by kjdon
added propertyFile attribute to gsf:interfaceText so that you can …

2020-01-31:

23:49 Changeset [33887] by ak19
1. Added support for writing out tables in csv format too. 2. Second …
23:17 Changeset [33886] by ak19
Minor. File rename
22:54 Changeset [33885] by ak19
Attempting to write the tables. csv not yet supported. Table 1 done.
22:21 Changeset [33884] by ak19
0. Previous commit had lots of modifications, and only 2 files matched …
21:50 Changeset [33883] by ak19
Clarifications

2020-01-30:

22:54 Changeset [33882] by ak19
Code now writes both a listing of all non-autotranslated websites and …
22:08 Changeset [33881] by ak19
Uses lambda expression to process each doc in a mongodb aggregate …
21:17 Changeset [33880] by ak19
Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
20:21 Changeset [33879] by ak19
Have the 2 mongodb aggregate() calls working that
20:18 Changeset [33878] by ak19
Better comment
20:07 Changeset [33877] by ak19
Reordering to have proper descending order of counts

2020-01-29:

21:48 Changeset [33876] by ak19
Some missteps, but have got complex collection.aggregate() working at last.
19:18 Changeset [33875] by ak19
Renaming 2 more files correctly
19:15 Changeset [33874] by ak19
Renaming 2 files correctly

2020-01-24:

21:49 Changeset [33873] by ak19
Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
21:44 Changeset [33872] by ak19
1. Added the file containing the 255 random NZ page URLs to sample. 2. …
20:59 Changeset [33871] by ak19
Removed mostly duplicated older version of method but left the …
20:48 Changeset [33870] by ak19
Got the mongodb query working in Java in 2 different ways: the fully …
Note: See TracTimeline for information about the timeline view.