root/other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @34231 [34231] 2 weeks ak19 Rebuilding diffcol model collection Multimedia after recent update to …
(edit) @34127 [34127] 7 weeks ak19 Spelling correction in filename: screeMshot to screeNshot
(edit) @34120 [34120] 8 weeks ak19 CSV version of .ods file, so openoffice isn't required
(edit) @34119 [34119] 8 weeks ak19 Committing the auto-generated analysis results folder, mongodb-data-auto. …
(edit) @34097 [34097] 4 months ak19 Open office version of similarly named spreadsheet, just with columns …
(edit) @34089 [34089] 4 months ak19 So far accumulated URLs to docs on Google scholar about or somewhat …
(edit) @34011 [34011] 4 months ak19 Piechart data for sites prepared for crawling and the piecharts for these
(edit) @34007 [34007] 4 months ak19 Prepared more data for the piecharts. This time for empty web pages vs …
(edit) @34006 [34006] 4 months ak19 Committing more data I've collected for generating pie charts and the …
(edit) @34005 [34005] 4 months ak19 InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead of …
(edit) @34004 [34004] 4 months ak19 Renaming csv file to have csv extension
(edit) @34003 [34003] 4 months ak19 Redid the file with info on empty URL web pages as a csv file with more …
(edit) @34001 [34001] 4 months ak19 Tentative total urls from common crawl 12 month cral data.
(edit) @34000 [34000] 4 months ak19 Some debugging and other minor changes
(edit) @33999 [33999] 4 months ak19 Common crawl 12 month urls and CC provided stats
(edit) @33988 [33988] 5 months ak19 1. Print out which web pages of which web site's dump.txt were empty. Then …
(edit) @33987 [33987] 5 months ak19 Output of re-running NutchTextDumpToMongoDB to print out which web pages …
(edit) @33986 [33986] 5 months ak19 Dr Bainbridge investigated the original data set more
(edit) @33985 [33985] 5 months ak19 Data to back the piechart I need to make that will illustrate how we …
(edit) @33984 [33984] 5 months ak19 Simple class to summarise some basic counts of the input common crawl data
(edit) @33983 [33983] 5 months ak19 More sensible name for method which had too long kept its old name from …
(edit) @33982 [33982] 5 months ak19 SummaryTool?.java now processed the handcrafted UNIQUE domains counts file …
(edit) @33981 [33981] 5 months ak19 As Dr Bainbridge suggested, code now opens a new firefox tab with a …
(edit) @33980 [33980] 5 months ak19 Additional comments
(edit) @33979 [33979] 5 months ak19 Clearly stating that counts are of unique domains
(edit) @33978 [33978] 5 months ak19 Opens all geoJSON maps in new tabs instead of waiting for user to have …
(edit) @33977 [33977] 5 months ak19 Added something on precision vs recall being applicable to our sampling …
(edit) @33976 [33976] 5 months ak19 Adding in what I could remember of Dr Bainbridge's statement about the …
(edit) @33966 [33966] 5 months ak19 Added the origSequence and basicDomain columns to the random 260 web page …
(edit) @33965 [33965] 5 months ak19 1. Adding a basicDomain column (stripped of http/https and www prefix) for …
(edit) @33964 [33964] 5 months ak19 2 records were missing a value for the qualityLevel column.
(edit) @33963 [33963] 5 months ak19 Added a new helper method to MongoDBQueryer.java to add numPagesInMRI and …
(edit) @33962 [33962] 5 months ak19 2 fields changed, as one was missed out and the other incorrectly entered. …
(edit) @33961 [33961] 5 months ak19 New category, LINK_TEXT, introduced for the random web page URL samples.
(edit) @33960 [33960] 5 months ak19 Reviewed all the random sample web page URLs marked SINGLE_MRI_SENTENCE …
(edit) @33959 [33959] 5 months ak19 URIEncoding the mapData makes it unparseable by geojson.io
(edit) @33952 [33952] 5 months ak19 Minor changes for processing
(edit) @33951 [33951] 5 months ak19 Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
(edit) @33950 [33950] 5 months ak19 Reviewed the qualityLevel column where MIXED_TEXT was assigned.
(edit) @33949 [33949] 5 months ak19 Reviewed the qualityLevel column where NAV was assigned.
(edit) @33948 [33948] 5 months ak19 Reviewed the random sampled web page URLs marked as SIGNIFICANTLY_MAORI …
(edit) @33947 [33947] 5 months ak19 Some more questionmarked field values assigned.
(edit) @33946 [33946] 5 months ak19 1. New function to handle user input assigning the newly introduced 4th …
(edit) @33945 [33945] 5 months ak19 Added a 4th column for all 260 sample web page URLs and have used the …
(edit) @33944 [33944] 5 months ak19 Added the isReallyInMRI column after manually inspecting the remaining 70 …
(edit) @33941 [33941] 5 months ak19 1. Uppercase 3rd field (Y/N/? field) read back in from file before being …
(edit) @33940 [33940] 5 months ak19 1. In order to make it easier to do the manual work of inspecting 260 web …
(edit) @33939 [33939] 5 months ak19 1. Old random samples file doesn't apply as we're not sampling by country …
(edit) @33938 [33938] 5 months ak19 1. Don't regenerate random sample of web page urls and full web page url …
(edit) @33937 [33937] 5 months ak19 New counts of manual sites after reingesting into MongoDB. Forgot to …
(edit) @33936 [33936] 5 months ak19 Renaming old file to place with new counts after reingesting into MongoDB.
(edit) @33926 [33926] 5 months ak19 Investigated some other options for screen capturing and Google chrome …
(edit) @33925 [33925] 5 months ak19 1. Bugfix: oversight, should return uri encoded URL for mapData, forgot to …
(edit) @33924 [33924] 5 months ak19 Adding in Dr Bainbridge's command to check the JSON generated is valid. …
(edit) @33919 [33919] 5 months ak19 SummaryTool? now uses the CountryCodeCountsMapData?.java class to generate …
(edit) @33918 [33918] 5 months ak19 Country codes added to each domain's URL of the manual site/domain …
(edit) @33917 [33917] 5 months ak19 Added some better reporting when confirming sample size was correct
(edit) @33916 [33916] 5 months ak19 Updated the rest of the file after reingest
(edit) @33915 [33915] 5 months ak19 Forgot to add a (manual) counts file created last week, and am now …
(edit) @33914 [33914] 5 months ak19 Shortlisted just the domain sites by country into ManualShortlist?2.txt …
(edit) @33913 [33913] 5 months ak19 1. Adjusted table mongodb query statements to be more exact, but same …
(edit) @33912 [33912] 5 months ak19 Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
(edit) @33911 [33911] 5 months ak19 Correct commit message for previous and current commit: 1. After …
(edit) @33910 [33910] 5 months ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33909 [33909] 5 months ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33907 [33907] 5 months ak19 See previous commit message. This will be the file with the results for …
(edit) @33906 [33906] 5 months ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB and …
(edit) @33905 [33905] 5 months ak19 More notes
(edit) @33904 [33904] 5 months ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
(edit) @33903 [33903] 5 months ak19 My notes when preparing for today's meetings. Some of this may be useful …
(edit) @33896 [33896] 5 months ak19 Clarification in comments
(edit) @33895 [33895] 5 months ak19 Minor rename
(edit) @33894 [33894] 5 months ak19 1. Adding map, counts.json and geo-json files for 5b count of sites by …
(edit) @33893 [33893] 5 months ak19 1. Left out region code column. 2. Two more sheets of work in progress to …
(edit) @33892 [33892] 5 months ak19 Sheets renamed and spreadsheet renamed
(edit) @33891 [33891] 5 months ak19 Site level detected vs manual inspected data: working shown in file …
(edit) @33890 [33890] 5 months ak19 Finished going through NZ sites listing of numPagesContainingMRI > 0 and …
(edit) @33889 [33889] 5 months ak19 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of the …
(edit) @33887 [33887] 6 months ak19 1. Added support for writing out tables in csv format too. 2. Second table …
(edit) @33886 [33886] 6 months ak19 Minor. File rename
(edit) @33885 [33885] 6 months ak19 Attempting to write the tables. csv not yet supported. Table 1 done.
(edit) @33884 [33884] 6 months ak19 0. Previous commit had lots of modifications, and only 2 files matched the …
(edit) @33883 [33883] 6 months ak19 Clarifications
(edit) @33882 [33882] 6 months ak19 Code now writes both a listing of all non-autotranslated websites and a …
(edit) @33881 [33881] 6 months ak19 Uses lambda expression to process each doc in a mongodb aggregate result. …
(edit) @33880 [33880] 6 months ak19 Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
(edit) @33879 [33879] 6 months ak19 Have the 2 mongodb aggregate() calls working that
(edit) @33878 [33878] 6 months ak19 Better comment
(edit) @33877 [33877] 6 months ak19 Reordering to have proper descending order of counts
(edit) @33876 [33876] 6 months ak19 Some missteps, but have got complex collection.aggregate() working at …
(edit) @33875 [33875] 6 months ak19 Renaming 2 more files correctly
(edit) @33874 [33874] 6 months ak19 Renaming 2 files correctly
(edit) @33873 [33873] 6 months ak19 Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
(edit) @33872 [33872] 6 months ak19 1. Added the file containing the 255 random NZ page URLs to sample. 2. …
(edit) @33871 [33871] 6 months ak19 Removed mostly duplicated older version of method but left the different …
(edit) @33870 [33870] 6 months ak19 Got the mongodb query working in Java in 2 different ways: the fully Java …
(edit) @33869 [33869] 6 months ak19 First cut at the RandomURLsForDomainGenerator.java class and the mongodb …
(edit) @33868 [33868] 6 months ak19 With the updated code for generating the maps from 6a and 6b manual site …
(edit) @33867 [33867] 6 months ak19 Moved the code handling of special case large rectangles and those that …
(edit) @33866 [33866] 6 months ak19 Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
Note: See TracRevisionLog for help on using the revision log.