source:

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33920   4 years davidb Found to be needed when compiling up on a Google Compute Engine (GCE) …
(edit) @33919   4 years ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
(edit) @33918   4 years ak19 Country codes added to each domain's URL of the manual site/domain …
(edit) @33917   4 years ak19 Added some better reporting when confirming sample size was correct
(edit) @33916   4 years ak19 Updated the rest of the file after reingest
(edit) @33915   4 years ak19 Forgot to add a (manual) counts file created last week, and am now …
(edit) @33914   4 years ak19 Shortlisted just the domain sites by country into ManualShortlist2.txt …
(edit) @33913   4 years ak19 1. Adjusted table mongodb query statements to be more exact, but same …
(edit) @33912   4 years ak19 Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
(edit) @33911   4 years ak19 Correct commit message for previous and current commit: 1. After …
(edit) @33910   4 years ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33909   4 years ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33908   4 years kjdon meta values are already escaped. Don't want to escape them again …
(edit) @33907   4 years ak19 See previous commit message. This will be the file with the results …
(edit) @33906   4 years ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
(edit) @33905   4 years ak19 More notes
(edit) @33904   4 years ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
(edit) @33903   4 years ak19 My notes when preparing for today's meetings. Some of this may be …
(edit) @33902   4 years kjdon pass in new casefold and accentfold options to format_metadata_for_sorting
(edit) @33901   4 years kjdon new casefold_metadata_for_formatting and …
(edit) @33900   4 years kjdon BaseClassifier casefold/accentfold options
(edit) @33899   4 years kjdon pass in new casefold and accentfold options (BaseClassifier) to …
(edit) @33898   4 years kjdon format_metadata_for_sorting now takes two additional args - casefold …
(edit) @33897   4 years kjdon elsewhere in the code - GSXML.xmlSafe, we are escaping ' => ' we …
(edit) @33896   4 years ak19 Clarification in comments
(edit) @33895   4 years ak19 Minor rename
(edit) @33894   4 years ak19 1. Adding map, counts.json and geo-json files for 5b count of sites by …
(edit) @33893   4 years ak19 1. Left out region code column. 2. Two more sheets of work in progress …
(edit) @33892   4 years ak19 Sheets renamed and spreadsheet renamed
(edit) @33891   4 years ak19 Site level detected vs manual inspected data: working shown in file …
(edit) @33890   4 years ak19 Finished going through NZ sites listing of numPagesContainingMRI > 0 …
(edit) @33889   4 years ak19 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
(edit) @33888   4 years kjdon added propertyFile attribute to gsf:interfaceText so that you can …
(edit) @33887   4 years ak19 1. Added support for writing out tables in csv format too. 2. Second …
(edit) @33886   4 years ak19 Minor. File rename
(edit) @33885   4 years ak19 Attempting to write the tables. csv not yet supported. Table 1 done.
(edit) @33884   4 years ak19 0. Previous commit had lots of modifications, and only 2 files matched …
(edit) @33883   4 years ak19 Clarifications
(edit) @33882   4 years ak19 Code now writes both a listing of all non-autotranslated websites and …
(edit) @33881   4 years ak19 Uses lambda expression to process each doc in a mongodb aggregate …
(edit) @33880   4 years ak19 Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
(edit) @33879   4 years ak19 Have the 2 mongodb aggregate() calls working that
(edit) @33878   4 years ak19 Better comment
(edit) @33877   4 years ak19 Reordering to have proper descending order of counts
(edit) @33876   4 years ak19 Some missteps, but have got complex collection.aggregate() working at last.
(edit) @33875   4 years ak19 Renaming 2 more files correctly
(edit) @33874   4 years ak19 Renaming 2 files correctly
(edit) @33873   4 years ak19 Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
(edit) @33872   4 years ak19 1. Added the file containing the 255 random NZ page URLs to sample. 2. …
(edit) @33871   4 years ak19 Removed mostly duplicated older version of method but left the …
(edit) @33870   4 years ak19 Got the mongodb query working in Java in 2 different ways: the fully …
(edit) @33869   4 years ak19 First cut at the RandomURLsForDomainGenerator.java class and the …
(edit) @33868   4 years ak19 With the updated code for generating the maps from 6a and 6b manual …
(edit) @33867   4 years ak19 Moved the code handling of special case large rectangles and those …
(edit) @33866   4 years ak19 Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
(edit) @33865   4 years ak19 1. The gs3 context name changed from macronizer to macron-restoration. …
(edit) @33864   4 years davidb Changes to make the Whakatohea banner narrower
(edit) @33863   4 years davidb Script to get sample content for the DL collection
(edit) @33862   4 years davidb Change to specifying the About page text done through about.xml so it …
(edit) @33861   4 years davidb About page text done through about.xml so it can include xslt tags
(edit) @33860   4 years davidb Addition of 3 further CPAN packages, found to be needed on CentOS build
(edit) @33859   4 years davidb Additional CPAN Perl packages found to be needed when compiling up …
(edit) @33858   4 years ak19 Fixes to the code committed yesterday: correct calculation of the …
(edit) @33857   4 years davidb Next iteration of the about text
(edit) @33856   4 years ak19 Forgot to commit. Last week, Dr Bainbridge had properly cropped the …
(edit) @33855   4 years davidb Code added to detect if the CGI parameter already specifies a …
(edit) @33854   4 years ak19 Manually gone over around 150 webpages of sample size of 255 webpages …
(edit) @33853   4 years ak19 Handling map coordinates that are horizontally excessive (beyond …
(edit) @33852   4 years davidb Unused. XSL filename extension potentially causing a problem with how …
(edit) @33851   4 years ak19 Deleting faulty maps. NZ numPages inMRI and containingMRI count is …
(edit) @33850   4 years ak19 Renames before deleting faulty maps. NZ numPages inMRI and …
(edit) @33849   4 years ak19 One less Australian site as it was an infographic containing Maori …
(edit) @33848   4 years ak19 Tables of mongodb counts (1-5 table) and manual counts (6table). …
(edit) @33847   4 years ak19 indigenousblogs.com did have one page actually in Maori (an XML feed). …
(edit) @33846   4 years ak19 Cropped out the json portion
(edit) @33845   4 years ak19 Cropped out the json portion
(edit) @33844   4 years ak19 Regenerated
(edit) @33843   4 years ak19 Counting the 3 non-NZ sites that had mi in the URl path that manual …
(edit) @33842   4 years ak19 Jotted down some further paragraphs and notes of interest. Tentatively …
(edit) @33841   4 years ak19 Latest version of the flowchart of the process of getting Common Crawl …
(edit) @33840   4 years ak19 Older flowchart of the process of getting Common Crawl data into …
(edit) @33839   4 years ak19 Moving writeup text file into new folder so I can add the SVG …
(edit) @33838   4 years ak19 Updated after checking non-NZ and non-nz TLD sites with mi in URL path
(edit) @33837   4 years davidb Local notes for the site
(edit) @33836   4 years davidb Macron added
(edit) @33835   4 years davidb Supporting iframe files now located within interface area
(edit) @33834   4 years davidb Metadata shell ready for download of demonstration source content files
(edit) @33833   4 years davidb Initial collection design
(edit) @33832   4 years davidb Initial set of files for Whakatohea collections
(edit) @33831   4 years davidb Top-level folder for Whakatohea Maori Trust Board collections
(edit) @33830   4 years davidb Initial set of files for WMTB themed DL
(edit) @33829   4 years davidb Top-level folder for Whakatohea Maori Trust Board themes DL
(edit) @33828   4 years ak19 Additions and modifications to the write-up.
(edit) @33827   4 years davidb Updated text about groupConfig.xml file
(edit) @33826   4 years davidb Fix to help compiling on CentOS
(edit) @33825   4 years ak19 Beginnings of first draft of write up.
(edit) @33824   4 years ak19 More instructions and explaining the contents of the mongodb-data folder.
(edit) @33823   4 years ak19 Recommitting mongo-data folder with renamed files with numbering.
(edit) @33822   4 years ak19 Removing as I'm renaming all the files with prefixes. There are too …
(edit) @33821   4 years ak19 Manually created a shortlist of MRI sites from longer …
Note: See TracRevisionLog for help on using the revision log.