Timeline



2020-02-13:

22:40 Changeset [33919] by ak19
SummaryTool now uses the CountryCodeCountsMapData.java class to …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist2.txt …

2020-02-12:

21:27 Changeset [33913] by ak19
1. Adjusted table mongodb query statements to be more exact, but same …
19:53 Changeset [33912] by ak19
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
19:12 Changeset [33911] by ak19
Correct commit message for previous and current commit: 1. After …
19:05 Changeset [33910] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
19:02 Changeset [33909] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …

2020-02-10:

09:41 Changeset [33908] by kjdon
meta values are already escaped. Don't want to escape them again …

2020-02-05:

23:38 Changeset [33907] by ak19
See previous commit message. This will be the file with the results …
23:36 Changeset [33906] by ak19
Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
18:49 Changeset [33905] by ak19
More notes
18:48 Changeset [33904] by ak19
Shouldn't greylist anglican.org, as this prevented crawling of …

2020-02-04:

15:50 Changeset [33903] by ak19
My notes when preparing for today's meetings. Some of this may be …
13:05 Changeset [33902] by kjdon
pass in new casefold and accentfold options to format_metadata_for_sorting
13:04 Changeset [33901] by kjdon
new casefold_metadata_for_formatting and …
13:03 Changeset [33900] by kjdon
BaseClassifier casefold/accentfold options
13:03 Changeset [33899] by kjdon
pass in new casefold and accentfold options (BaseClassifier) to …
12:59 Changeset [33898] by kjdon
format_metadata_for_sorting now takes two additional args - casefold …
10:06 Changeset [33897] by kjdon
elsewhere in the code - GSXML.xmlSafe, we are escaping ' => ' we …

2020-02-03:

23:29 Changeset [33896] by ak19
Clarification in comments
23:20 Changeset [33895] by ak19
Minor rename
23:20 Changeset [33894] by ak19
1. Adding map, counts.json and geo-json files for 5b count of sites by …
22:41 Changeset [33893] by ak19
1. Left out region code column. 2. Two more sheets of work in progress …
22:28 Changeset [33892] by ak19
Sheets renamed and spreadsheet renamed
22:27 Changeset [33891] by ak19
Site level detected vs manual inspected data: working shown in file …
20:31 Changeset [33890] by ak19
Finished going through NZ sites listing of numPagesContainingMRI > 0 …
15:48 Changeset [33889] by ak19
1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
13:08 Changeset [33888] by kjdon
added propertyFile attribute to gsf:interfaceText so that you can …

2020-01-31:

23:49 Changeset [33887] by ak19
1. Added support for writing out tables in csv format too. 2. Second …
23:17 Changeset [33886] by ak19
Minor. File rename
22:54 Changeset [33885] by ak19
Attempting to write the tables. csv not yet supported. Table 1 done.
22:21 Changeset [33884] by ak19
0. Previous commit had lots of modifications, and only 2 files matched …
21:50 Changeset [33883] by ak19
Clarifications

2020-01-30:

22:54 Changeset [33882] by ak19
Code now writes both a listing of all non-autotranslated websites and …
22:08 Changeset [33881] by ak19
Uses lambda expression to process each doc in a mongodb aggregate …
21:17 Changeset [33880] by ak19
Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
20:21 Changeset [33879] by ak19
Have the 2 mongodb aggregate() calls working that
20:18 Changeset [33878] by ak19
Better comment
20:07 Changeset [33877] by ak19
Reordering to have proper descending order of counts

2020-01-29:

21:48 Changeset [33876] by ak19
Some missteps, but have got complex collection.aggregate() working at last.
19:18 Changeset [33875] by ak19
Renaming 2 more files correctly
19:15 Changeset [33874] by ak19
Renaming 2 files correctly

2020-01-24:

21:49 Changeset [33873] by ak19
Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
21:44 Changeset [33872] by ak19
1. Added the file containing the 255 random NZ page URLs to sample. 2. …
20:59 Changeset [33871] by ak19
Removed mostly duplicated older version of method but left the …
20:48 Changeset [33870] by ak19
Got the mongodb query working in Java in 2 different ways: the fully …

2020-01-23:

22:59 Changeset [33869] by ak19
First cut at the RandomURLsForDomainGenerator.java class and the …
21:16 Changeset [33868] by ak19
With the updated code for generating the maps from 6a and 6b manual …
21:12 Changeset [33867] by ak19
Moved the code handling of special case large rectangles and those …
18:56 Changeset [33866] by ak19
Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
18:49 Changeset [33865] by ak19
1. The gs3 context name changed from macronizer to macron-restoration. …
14:09 Changeset [33864] by davidb
Changes to make the Whakatohea banner narrower
11:32 Changeset [33863] by davidb
Script to get sample content for the DL collection
11:17 Changeset [33862] by davidb
Change to specifying the About page text done through about.xml so it …
11:16 Changeset [33861] by davidb
About page text done through about.xml so it can include xslt tags
10:22 Changeset [33860] by davidb
Addition of 3 further CPAN packages, found to be needed on CentOS build
09:56 Changeset [33859] by davidb
Additional CPAN Perl packages found to be needed when compiling up …

2020-01-22:

19:31 Changeset [33858] by ak19
Fixes to the code committed yesterday: correct calculation of the …
16:49 Changeset [33857] by davidb
Next iteration of the about text
16:33 Changeset [33856] by ak19
Forgot to commit. Last week, Dr Bainbridge had properly cropped the …
15:03 Changeset [33855] by davidb
Code added to detect if the CGI parameter already specifies a …

2020-01-21:

22:01 Changeset [33854] by ak19
Manually gone over around 150 webpages of sample size of 255 webpages …
21:58 Changeset [33853] by ak19
Handling map coordinates that are horizontally excessive (beyond …
13:37 Changeset [33852] by davidb
Unused. XSL filename extension potentially causing a problem with how …

2020-01-17:

22:38 Changeset [33851] by ak19
Deleting faulty maps. NZ numPages inMRI and containingMRI count is …
22:38 Changeset [33850] by ak19
Renames before deleting faulty maps. NZ numPages inMRI and …
22:22 Changeset [33849] by ak19
One less Australian site as it was an infographic containing Maori …
22:21 Changeset [33848] by ak19
Tables of mongodb counts (1-5 table) and manual counts (6table). …
19:32 Changeset [33847] by ak19
indigenousblogs.com did have one page actually in Maori (an XML feed). …
16:49 Changeset [33846] by ak19
Cropped out the json portion
16:34 Changeset [33845] by ak19
Cropped out the json portion
16:33 Changeset [33844] by ak19
Regenerated
16:24 Changeset [33843] by ak19
Counting the 3 non-NZ sites that had mi in the URl path that manual …

2020-01-16:

22:30 Changeset [33842] by ak19
Jotted down some further paragraphs and notes of interest. Tentatively …
21:23 Changeset [33841] by ak19
Latest version of the flowchart of the process of getting Common Crawl …
21:22 Changeset [33840] by ak19
Older flowchart of the process of getting Common Crawl data into …
21:18 Changeset [33839] by ak19
Moving writeup text file into new folder so I can add the SVG …
17:56 Changeset [33838] by ak19
Updated after checking non-NZ and non-nz TLD sites with mi in URL path
12:15 Changeset [33837] by davidb
Local notes for the site

2020-01-15:

10:14 Changeset [33836] by davidb
Macron added
10:12 Changeset [33835] by davidb
Supporting iframe files now located within interface area
10:08 Changeset [33834] by davidb
Metadata shell ready for download of demonstration source content files
10:06 Changeset [33833] by davidb
Initial collection design
10:03 Changeset [33832] by davidb
Initial set of files for Whakatohea collections
10:00 Changeset [33831] by davidb
Top-level folder for Whakatohea Maori Trust Board collections
09:58 Changeset [33830] by davidb
Initial set of files for WMTB themed DL
09:56 Changeset [33829] by davidb
Top-level folder for Whakatohea Maori Trust Board themes DL

2020-01-14:

22:09 Changeset [33828] by ak19
Additions and modifications to the write-up.
14:34 Changeset [33827] by davidb
Updated text about groupConfig.xml file
13:57 Changeset [33826] by davidb
Fix to help compiling on CentOS
Note: See TracTimeline for information about the timeline view.