source: other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @34524   4 weeks ak19 Correct Mac OS name in log file being uploaded
(edit) @34523   4 weeks ak19 Minor. After testing on new release-kit mac.
(edit) @34520   5 weeks Jeremy Symon need to use ed25519 key on www-internal
(edit) @34519   5 weeks Jeremy Symon adding in code to upload to www-internal. Needs a new ed25519 identity …
(edit) @34518   5 weeks Jeremy Symon use a different identity file for www-internal - needs to be ed25519, …
(edit) @34515   6 weeks ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Forgot to svn up …
(edit) @34514   6 weeks ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Forgot to svn up …
(edit) @34513   6 weeks ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after …
(edit) @34512   6 weeks ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after …
(edit) @34418   2 months ak19 Attempted to upload diffcol report to wwwinternal instead of wwwdev. …
(edit) @34417   2 months ak19 Updates to diffcol to handle change introduced in commit 34394, which …
(edit) @34416   2 months ak19 Committing rebuilt model collections after new doc.xml meta …
(edit) @34231   5 months ak19 Rebuilding diffcol model collection Multimedia after recent update to …
(edit) @34127   6 months ak19 Spelling correction in filename: screeMshot to screeNshot
(edit) @34120   7 months ak19 CSV version of .ods file, so openoffice isn't required
(edit) @34119   7 months ak19 Committing the auto-generated analysis results folder, …
(edit) @34097   9 months ak19 Open office version of similarly named spreadsheet, just with columns …
(edit) @34089   9 months ak19 So far accumulated URLs to docs on Google scholar about or somewhat …
(edit) @34011   9 months ak19 Piechart data for sites prepared for crawling and the piecharts for these
(edit) @34007   9 months ak19 Prepared more data for the piecharts. This time for empty web pages vs …
(edit) @34006   9 months ak19 Committing more data I've collected for generating pie charts and the …
(edit) @34005   9 months ak19 InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead …
(edit) @34004   9 months ak19 Renaming csv file to have csv extension
(edit) @34003   9 months ak19 Redid the file with info on empty URL web pages as a csv file with …
(edit) @34001   9 months ak19 Tentative total urls from common crawl 12 month cral data.
(edit) @34000   9 months ak19 Some debugging and other minor changes
(edit) @33999   9 months ak19 Common crawl 12 month urls and CC provided stats
(edit) @33988   9 months ak19 1. Print out which web pages of which web site's dump.txt were empty. …
(edit) @33987   9 months ak19 Output of re-running NutchTextDumpToMongoDB to print out which web …
(edit) @33986   9 months ak19 Dr Bainbridge investigated the original data set more
(edit) @33985   9 months ak19 Data to back the piechart I need to make that will illustrate how we …
(edit) @33984   9 months ak19 Simple class to summarise some basic counts of the input common crawl data
(edit) @33983   9 months ak19 More sensible name for method which had too long kept its old name …
(edit) @33982   9 months ak19 SummaryTool.java now processed the handcrafted UNIQUE domains counts …
(edit) @33981   9 months ak19 As Dr Bainbridge suggested, code now opens a new firefox tab with a …
(edit) @33980   9 months ak19 Additional comments
(edit) @33979   9 months ak19 Clearly stating that counts are of unique domains
(edit) @33978   9 months ak19 Opens all geoJSON maps in new tabs instead of waiting for user to have …
(edit) @33977   9 months ak19 Added something on precision vs recall being applicable to our …
(edit) @33976   9 months ak19 Adding in what I could remember of Dr Bainbridge's statement about the …
(edit) @33966   10 months ak19 Added the origSequence and basicDomain columns to the random 260 web …
(edit) @33965   10 months ak19 1. Adding a basicDomain column (stripped of http/https and www prefix) …
(edit) @33964   10 months ak19 2 records were missing a value for the qualityLevel column.
(edit) @33963   10 months ak19 Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
(edit) @33962   10 months ak19 2 fields changed, as one was missed out and the other incorrectly …
(edit) @33961   10 months ak19 New category, LINK_TEXT, introduced for the random web page URL samples.
(edit) @33960   10 months ak19 Reviewed all the random sample web page URLs marked …
(edit) @33959   10 months ak19 URIEncoding the mapData makes it unparseable by geojson.io
(edit) @33952   10 months ak19 Minor changes for processing
(edit) @33951   10 months ak19 Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
(edit) @33950   10 months ak19 Reviewed the qualityLevel column where MIXED_TEXT was assigned.
(edit) @33949   10 months ak19 Reviewed the qualityLevel column where NAV was assigned.
(edit) @33948   10 months ak19 Reviewed the random sampled web page URLs marked as …
(edit) @33947   10 months ak19 Some more questionmarked field values assigned.
(edit) @33946   10 months ak19 1. New function to handle user input assigning the newly introduced …
(edit) @33945   10 months ak19 Added a 4th column for all 260 sample web page URLs and have used the …
(edit) @33944   10 months ak19 Added the isReallyInMRI column after manually inspecting the remaining …
(edit) @33941   10 months ak19 1. Uppercase 3rd field (Y/N/? field) read back in from file before …
(edit) @33940   10 months ak19 1. In order to make it easier to do the manual work of inspecting 260 …
(edit) @33939   10 months ak19 1. Old random samples file doesn't apply as we're not sampling by …
(edit) @33938   10 months ak19 1. Don't regenerate random sample of web page urls and full web page …
(edit) @33937   10 months ak19 New counts of manual sites after reingesting into MongoDB. Forgot to …
(edit) @33936   10 months ak19 Renaming old file to place with new counts after reingesting into MongoDB.
(edit) @33926   10 months ak19 Investigated some other options for screen capturing and Google chrome …
(edit) @33925   10 months ak19 1. Bugfix: oversight, should return uri encoded URL for mapData, …
(edit) @33924   10 months ak19 Adding in Dr Bainbridge's command to check the JSON generated is …
(edit) @33919   10 months ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
(edit) @33918   10 months ak19 Country codes added to each domain's URL of the manual site/domain …
(edit) @33917   10 months ak19 Added some better reporting when confirming sample size was correct
(edit) @33916   10 months ak19 Updated the rest of the file after reingest
(edit) @33915   10 months ak19 Forgot to add a (manual) counts file created last week, and am now …
(edit) @33914   10 months ak19 Shortlisted just the domain sites by country into ManualShortlist2.txt …
(edit) @33913   10 months ak19 1. Adjusted table mongodb query statements to be more exact, but same …
(edit) @33912   10 months ak19 Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
(edit) @33911   10 months ak19 Correct commit message for previous and current commit: 1. After …
(edit) @33910   10 months ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33909   10 months ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33907   10 months ak19 See previous commit message. This will be the file with the results …
(edit) @33906   10 months ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
(edit) @33905   10 months ak19 More notes
(edit) @33904   10 months ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
(edit) @33903   10 months ak19 My notes when preparing for today's meetings. Some of this may be …
(edit) @33896   10 months ak19 Clarification in comments
(edit) @33895   10 months ak19 Minor rename
(edit) @33894   10 months ak19 1. Adding map, counts.json and geo-json files for 5b count of sites by …
(edit) @33893   10 months ak19 1. Left out region code column. 2. Two more sheets of work in progress …
(edit) @33892   10 months ak19 Sheets renamed and spreadsheet renamed
(edit) @33891   10 months ak19 Site level detected vs manual inspected data: working shown in file …
(edit) @33890   10 months ak19 Finished going through NZ sites listing of numPagesContainingMRI > 0 …
(edit) @33889   10 months ak19 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
(edit) @33887   10 months ak19 1. Added support for writing out tables in csv format too. 2. Second …
(edit) @33886   10 months ak19 Minor. File rename
(edit) @33885   10 months ak19 Attempting to write the tables. csv not yet supported. Table 1 done.
(edit) @33884   10 months ak19 0. Previous commit had lots of modifications, and only 2 files matched …
(edit) @33883   10 months ak19 Clarifications
(edit) @33882   10 months ak19 Code now writes both a listing of all non-autotranslated websites and …
(edit) @33881   10 months ak19 Uses lambda expression to process each doc in a mongodb aggregate …
(edit) @33880   10 months ak19 Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
(edit) @33879   10 months ak19 Have the 2 mongodb aggregate() calls working that
(edit) @33878   10 months ak19 Better comment
Note: See TracRevisionLog for help on using the revision log.