source: other-projects

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33881   4 years ak19 Uses lambda expression to process each doc in a mongodb aggregate …
(edit) @33880   4 years ak19 Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
(edit) @33879   4 years ak19 Have the 2 mongodb aggregate() calls working that
(edit) @33878   4 years ak19 Better comment
(edit) @33877   4 years ak19 Reordering to have proper descending order of counts
(edit) @33876   4 years ak19 Some missteps, but have got complex collection.aggregate() working at last.
(edit) @33875   4 years ak19 Renaming 2 more files correctly
(edit) @33874   4 years ak19 Renaming 2 files correctly
(edit) @33873   4 years ak19 Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
(edit) @33872   4 years ak19 1. Added the file containing the 255 random NZ page URLs to sample. 2. …
(edit) @33871   4 years ak19 Removed mostly duplicated older version of method but left the …
(edit) @33870   4 years ak19 Got the mongodb query working in Java in 2 different ways: the fully …
(edit) @33869   4 years ak19 First cut at the RandomURLsForDomainGenerator.java class and the …
(edit) @33868   4 years ak19 With the updated code for generating the maps from 6a and 6b manual …
(edit) @33867   4 years ak19 Moved the code handling of special case large rectangles and those …
(edit) @33866   4 years ak19 Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
(edit) @33865   4 years ak19 1. The gs3 context name changed from macronizer to macron-restoration. …
(edit) @33858   4 years ak19 Fixes to the code committed yesterday: correct calculation of the …
(edit) @33856   4 years ak19 Forgot to commit. Last week, Dr Bainbridge had properly cropped the …
(edit) @33854   4 years ak19 Manually gone over around 150 webpages of sample size of 255 webpages …
(edit) @33853   4 years ak19 Handling map coordinates that are horizontally excessive (beyond …
(edit) @33851   4 years ak19 Deleting faulty maps. NZ numPages inMRI and containingMRI count is …
(edit) @33850   4 years ak19 Renames before deleting faulty maps. NZ numPages inMRI and …
(edit) @33849   4 years ak19 One less Australian site as it was an infographic containing Maori …
(edit) @33848   4 years ak19 Tables of mongodb counts (1-5 table) and manual counts (6table). …
(edit) @33847   4 years ak19 indigenousblogs.com did have one page actually in Maori (an XML feed). …
(edit) @33846   4 years ak19 Cropped out the json portion
(edit) @33845   4 years ak19 Cropped out the json portion
(edit) @33844   4 years ak19 Regenerated
(edit) @33843   4 years ak19 Counting the 3 non-NZ sites that had mi in the URl path that manual …
(edit) @33842   4 years ak19 Jotted down some further paragraphs and notes of interest. Tentatively …
(edit) @33841   4 years ak19 Latest version of the flowchart of the process of getting Common Crawl …
(edit) @33840   4 years ak19 Older flowchart of the process of getting Common Crawl data into …
(edit) @33839   4 years ak19 Moving writeup text file into new folder so I can add the SVG …
(edit) @33838   4 years ak19 Updated after checking non-NZ and non-nz TLD sites with mi in URL path
(edit) @33828   4 years ak19 Additions and modifications to the write-up.
(edit) @33825   4 years ak19 Beginnings of first draft of write up.
(edit) @33824   4 years ak19 More instructions and explaining the contents of the mongodb-data folder.
(edit) @33823   4 years ak19 Recommitting mongo-data folder with renamed files with numbering.
(edit) @33822   4 years ak19 Removing as I'm renaming all the files with prefixes. There are too …
(edit) @33821   4 years ak19 Manually created a shortlist of MRI sites from longer …
(edit) @33820   4 years ak19 Forgot to commit before holidays.
(edit) @33816   4 years ak19 Finished manually going through the sites that I couldn't easily …
(edit) @33815   4 years ak19 Removed old results from before bugfix and improvement to …
(edit) @33814   4 years ak19 Put the important mongodb queries and results into …
(edit) @33813   4 years ak19 With the bugfix from yesterday and the inclusion of http(s):mi.* …
(edit) @33812   4 years ak19 Better handling of multi-line comment symbols, so I can now include …
(edit) @33811   4 years ak19 Returning to using a single variable, urlContainsLangCodeInPath, to …
(edit) @33810   4 years ak19 Bugfix: mi in url path should be checked for for each page of site, …
(edit) @33809   4 years ak19 Some more GS_README.txt instructions. Not put the mongodb queries in …
(edit) @33808   4 years ak19 Storing not just whether /mi(/) suffix is in path, but also whether …
(edit) @33807   4 years ak19 Trying to manually go through a shortlisted set of domains to see if …
(edit) @33806   4 years ak19 More mongodb querying revealed that excluding tentative product sites …
(edit) @33805   4 years ak19 1. Moving the static countrycodes.json file to conf folder and updated …
(edit) @33804   4 years ak19 1. Updated results from mongodb querying after yesterday's …
(edit) @33803   4 years ak19 geojson mapdata and map for mongodb results on …
(edit) @33802   4 years ak19 With an extra adult site removed and with setting countrycodes that …
(edit) @33801   4 years ak19 1. NutchTextDumpToMongoDB Added an extra field to each document in …
(edit) @33800   4 years ak19 Removed an adult site from crawled contents and added its url to …
(edit) @33799   4 years ak19 1. Adding breadcrumb for next step at end of running …
(edit) @33798   4 years ak19 Adding the geojson related files related to querying mongodb for sites …
(edit) @33797   4 years ak19 Updated json and imaegs files, and new files for when /mi(/) is in the …
(edit) @33796   4 years ak19 Instead of a hack for US' count being too great that its histogram …
(edit) @33794   4 years ak19 Wrote the geojson map data created from the site counts per …
(edit) @33790   4 years ak19 Got the MultiPoint geojson mapdata of the country code counts working: …
(edit) @33789   4 years ak19 Redid the mongodb query to get the countrycode counts for all the …
(edit) @33788   4 years ak19 Adding all the jar files needed to work in Java with geojson Simple …
(edit) @33787   4 years ak19 Documented another mongodb query that I'm using, the one to produce …
(edit) @33778   4 years ak19 Made a beginning on getting the geojson map data automated. Couldn't …
(edit) @33776   4 years ak19 Field Separator (IFS) conflicting with backticks and other ways of …
(edit) @33760   4 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
(edit) @33759   4 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
(edit) @33723   4 years ak19 On linux 64 bit, the additional wrap command did not work because the …
(edit) @33722   4 years ak19 Adding in additional instructions in mongodb.txt, before I forgot how …
(edit) @33710   4 years ak19 Working queries and map coords for geojson.tools (ironically, Lat and …
(edit) @33698   4 years ak19 Links to more reading
(edit) @33675   4 years ak19 Committing the newer query results (but from before today's …
(edit) @33674   4 years ak19 Changes to support the top 5 predicted langcodes and their confidence …
(edit) @33666   4 years ak19 Having finished sending all the crawl data to mongodb 1. Recrawled the …
(edit) @33657   4 years ak19 Some fixes after brief testing against 1/3 of the crawl. Restarted …
(edit) @33656   4 years ak19 Final minor changes before I start processing the crawls of node2.
(edit) @33655   4 years ak19 Minor change to print statement
(edit) @33654   4 years ak19 Removing jar file that wasn't used after all.
(edit) @33653   4 years ak19 1. As suggested by Dr Bainbridge, made the code changes to use Morphia …
(edit) @33652   4 years ak19 Introducing morphia subpackage
(edit) @33651   4 years ak19 1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor
(edit) @33646   4 years ak19 Saving the mongodb queries and learning links that Dr Bainbridge found …
(edit) @33645   4 years ak19 Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences …
(edit) @33644   4 years ak19 Just committing the growing mongodb.txt file with links and …
(edit) @33643   4 years ak19 Brought the template log4j.properties.in back up to speed. I forgot it …
(edit) @33642   4 years ak19 Forgot to commit the java driver for mongodb when I committed the Java …
(edit) @33635   4 years ak19 Maori-language-detection doesn't use Greenstone 3 at present, it's not …
(edit) @33589   4 years cpb16 final01. Need Map results still
(edit) @33521   5 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
(edit) @33520   5 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
(edit) @33512   5 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
(edit) @33511   5 years ak19 AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
(edit) @33458   5 years cpb16 Running new morphology version after quick meeting with david last …
(edit) @33455   5 years cpb16 Started implementing Davids suggested morphology sequence, codeversion9
(edit) @33449   5 years cpb16 termnal version executes correctly. (Didnt include init threshold in …
Note: See TracRevisionLog for help on using the revision log.