root/other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @33926 [33926] 6 days ak19 Investigated some other options for screen capturing and Google chrome …
(edit) @33925 [33925] 6 days ak19 1. Bugfix: oversight, should return uri encoded URL for mapData, forgot to …
(edit) @33924 [33924] 6 days ak19 Adding in Dr Bainbridge's command to check the JSON generated is valid. …
(edit) @33919 [33919] 7 days ak19 SummaryTool? now uses the CountryCodeCountsMapData?.java class to generate …
(edit) @33918 [33918] 7 days ak19 Country codes added to each domain's URL of the manual site/domain …
(edit) @33917 [33917] 7 days ak19 Added some better reporting when confirming sample size was correct
(edit) @33916 [33916] 7 days ak19 Updated the rest of the file after reingest
(edit) @33915 [33915] 7 days ak19 Forgot to add a (manual) counts file created last week, and am now …
(edit) @33914 [33914] 7 days ak19 Shortlisted just the domain sites by country into ManualShortlist?2.txt …
(edit) @33913 [33913] 8 days ak19 1. Adjusted table mongodb query statements to be more exact, but same …
(edit) @33912 [33912] 8 days ak19 Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
(edit) @33911 [33911] 8 days ak19 Correct commit message for previous and current commit: 1. After …
(edit) @33910 [33910] 8 days ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33909 [33909] 8 days ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33907 [33907] 2 weeks ak19 See previous commit message. This will be the file with the results for …
(edit) @33906 [33906] 2 weeks ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB and …
(edit) @33905 [33905] 2 weeks ak19 More notes
(edit) @33904 [33904] 2 weeks ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
(edit) @33903 [33903] 2 weeks ak19 My notes when preparing for today's meetings. Some of this may be useful …
(edit) @33896 [33896] 2 weeks ak19 Clarification in comments
(edit) @33895 [33895] 2 weeks ak19 Minor rename
(edit) @33894 [33894] 2 weeks ak19 1. Adding map, counts.json and geo-json files for 5b count of sites by …
(edit) @33893 [33893] 2 weeks ak19 1. Left out region code column. 2. Two more sheets of work in progress to …
(edit) @33892 [33892] 2 weeks ak19 Sheets renamed and spreadsheet renamed
(edit) @33891 [33891] 2 weeks ak19 Site level detected vs manual inspected data: working shown in file …
(edit) @33890 [33890] 2 weeks ak19 Finished going through NZ sites listing of numPagesContainingMRI > 0 and …
(edit) @33889 [33889] 2 weeks ak19 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of the …
(edit) @33887 [33887] 3 weeks ak19 1. Added support for writing out tables in csv format too. 2. Second table …
(edit) @33886 [33886] 3 weeks ak19 Minor. File rename
(edit) @33885 [33885] 3 weeks ak19 Attempting to write the tables. csv not yet supported. Table 1 done.
(edit) @33884 [33884] 3 weeks ak19 0. Previous commit had lots of modifications, and only 2 files matched the …
(edit) @33883 [33883] 3 weeks ak19 Clarifications
(edit) @33882 [33882] 3 weeks ak19 Code now writes both a listing of all non-autotranslated websites and a …
(edit) @33881 [33881] 3 weeks ak19 Uses lambda expression to process each doc in a mongodb aggregate result. …
(edit) @33880 [33880] 3 weeks ak19 Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
(edit) @33879 [33879] 3 weeks ak19 Have the 2 mongodb aggregate() calls working that
(edit) @33878 [33878] 3 weeks ak19 Better comment
(edit) @33877 [33877] 3 weeks ak19 Reordering to have proper descending order of counts
(edit) @33876 [33876] 3 weeks ak19 Some missteps, but have got complex collection.aggregate() working at …
(edit) @33875 [33875] 3 weeks ak19 Renaming 2 more files correctly
(edit) @33874 [33874] 3 weeks ak19 Renaming 2 files correctly
(edit) @33873 [33873] 4 weeks ak19 Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
(edit) @33872 [33872] 4 weeks ak19 1. Added the file containing the 255 random NZ page URLs to sample. 2. …
(edit) @33871 [33871] 4 weeks ak19 Removed mostly duplicated older version of method but left the different …
(edit) @33870 [33870] 4 weeks ak19 Got the mongodb query working in Java in 2 different ways: the fully Java …
(edit) @33869 [33869] 4 weeks ak19 First cut at the RandomURLsForDomainGenerator.java class and the mongodb …
(edit) @33868 [33868] 4 weeks ak19 With the updated code for generating the maps from 6a and 6b manual site …
(edit) @33867 [33867] 4 weeks ak19 Moved the code handling of special case large rectangles and those that …
(edit) @33866 [33866] 4 weeks ak19 Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
(edit) @33865 [33865] 4 weeks ak19 1. The gs3 context name changed from macronizer to macron-restoration. 2. …
(edit) @33858 [33858] 4 weeks ak19 Fixes to the code committed yesterday: correct calculation of the …
(edit) @33856 [33856] 4 weeks ak19 Forgot to commit. Last week, Dr Bainbridge had properly cropped the SVG …
(edit) @33854 [33854] 4 weeks ak19 Manually gone over around 150 webpages of sample size of 255 webpages from …
(edit) @33853 [33853] 4 weeks ak19 Handling map coordinates that are horizontally excessive (beyond allowed …
(edit) @33851 [33851] 5 weeks ak19 Deleting faulty maps. NZ numPages inMRI and containingMRI count is much …
(edit) @33850 [33850] 5 weeks ak19 Renames before deleting faulty maps. NZ numPages inMRI and containingMRI …
(edit) @33849 [33849] 5 weeks ak19 One less Australian site as it was an infographic containing Maori words …
(edit) @33848 [33848] 5 weeks ak19 Tables of mongodb counts (1-5 table) and manual counts (6table). GeoJSON …
(edit) @33847 [33847] 5 weeks ak19 indigenousblogs.com did have one page actually in Maori (an XML feed). So …
(edit) @33846 [33846] 5 weeks ak19 Cropped out the json portion
(edit) @33845 [33845] 5 weeks ak19 Cropped out the json portion
(edit) @33844 [33844] 5 weeks ak19 Regenerated
(edit) @33843 [33843] 5 weeks ak19 Counting the 3 non-NZ sites that had mi in the URl path that manual …
(edit) @33842 [33842] 5 weeks ak19 Jotted down some further paragraphs and notes of interest. Tentatively …
(edit) @33841 [33841] 5 weeks ak19 Latest version of the flowchart of the process of getting Common Crawl …
(edit) @33840 [33840] 5 weeks ak19 Older flowchart of the process of getting Common Crawl data into MongoDB …
(edit) @33839 [33839] 5 weeks ak19 Moving writeup text file into new folder so I can add the SVG flowchart …
(edit) @33838 [33838] 5 weeks ak19 Updated after checking non-NZ and non-nz TLD sites with mi in URL path
(edit) @33828 [33828] 5 weeks ak19 Additions and modifications to the write-up.
(edit) @33825 [33825] 5 weeks ak19 Beginnings of first draft of write up.
(edit) @33824 [33824] 5 weeks ak19 More instructions and explaining the contents of the mongodb-data folder.
(edit) @33823 [33823] 5 weeks ak19 Recommitting mongo-data folder with renamed files with numbering.
(edit) @33822 [33822] 5 weeks ak19 Removing as I'm renaming all the files with prefixes. There are too many …
(edit) @33821 [33821] 5 weeks ak19 Manually created a shortlist of MRI sites from longer …
(edit) @33820 [33820] 5 weeks ak19 Forgot to commit before holidays.
(edit) @33816 [33816] 2 months ak19 Finished manually going through the sites that I couldn't easily filter …
(edit) @33815 [33815] 2 months ak19 Removed old results from before bugfix and improvement to …
(edit) @33814 [33814] 2 months ak19 Put the important mongodb queries and results into …
(edit) @33813 [33813] 2 months ak19 With the bugfix from yesterday and the inclusion of http(s)://mi.* type …
(edit) @33812 [33812] 2 months ak19 Better handling of multi-line comment symbols, so I can now include proper …
(edit) @33811 [33811] 2 months ak19 Returning to using a single variable, urlContainsLangCodeInPath, to record …
(edit) @33810 [33810] 2 months ak19 Bugfix: mi in url path should be checked for for each page of site, not …
(edit) @33809 [33809] 2 months ak19 Some more GS_README.txt instructions. Not put the mongodb queries in here …
(edit) @33808 [33808] 2 months ak19 Storing not just whether /mi(/) suffix is in path, but also whether …
(edit) @33807 [33807] 2 months ak19 Trying to manually go through a shortlisted set of domains to see if …
(edit) @33806 [33806] 2 months ak19 More mongodb querying revealed that excluding tentative product sites (if …
(edit) @33805 [33805] 2 months ak19 1. Moving the static countrycodes.json file to conf folder and updated …
(edit) @33804 [33804] 2 months ak19 1. Updated results from mongodb querying after yesterday's modifications …
(edit) @33803 [33803] 2 months ak19 geojson mapdata and map for mongodb results on sitesWithPagesContainingMRI …
(edit) @33802 [33802] 2 months ak19 With an extra adult site removed and with setting countrycodes that …
(edit) @33801 [33801] 2 months ak19 1. NutchTextDumpToMongoDB Added an extra field to each document in …
(edit) @33800 [33800] 2 months ak19 Removed an adult site from crawled contents and added its url to blacklist …
(edit) @33799 [33799] 2 months ak19 1. Adding breadcrumb for next step at end of running …
(edit) @33798 [33798] 2 months ak19 Adding the geojson related files related to querying mongodb for sites …
(edit) @33797 [33797] 2 months ak19 Updated json and imaegs files, and new files for when /mi(/) is in the URL …
(edit) @33796 [33796] 2 months ak19 Instead of a hack for US' count being too great that its histogram goes …
(edit) @33794 [33794] 2 months ak19 Wrote the geojson map data created from the site counts per country/region …
(edit) @33790 [33790] 2 months ak19 Got the MultiPoint? geojson mapdata of the country code counts working: the …
(edit) @33789 [33789] 2 months ak19 Redid the mongodb query to get the countrycode counts for all the …
(edit) @33788 [33788] 2 months ak19 Adding all the jar files needed to work in Java with geojson Simple …
Note: See TracRevisionLog for help on using the revision log.