|
|
@33891
|
4 years |
ak19 |
Site level detected vs manual inspected data: working shown in file …
|
|
|
@33890
|
4 years |
ak19 |
Finished going through NZ sites listing of numPagesContainingMRI > 0 …
|
|
|
@33889
|
4 years |
ak19 |
1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
|
|
|
@33887
|
4 years |
ak19 |
1. Added support for writing out tables in csv format too. 2. Second …
|
|
|
@33886
|
4 years |
ak19 |
Minor. File rename
|
|
|
@33885
|
4 years |
ak19 |
Attempting to write the tables. csv not yet supported. Table 1 done.
|
|
|
@33884
|
4 years |
ak19 |
0. Previous commit had lots of modifications, and only 2 files matched …
|
|
|
@33883
|
4 years |
ak19 |
Clarifications
|
|
|
@33882
|
4 years |
ak19 |
Code now writes both a listing of all non-autotranslated websites and …
|
|
|
@33881
|
4 years |
ak19 |
Uses lambda expression to process each doc in a mongodb aggregate …
|
|
|
@33880
|
4 years |
ak19 |
Write out the 5counts_tentativeNonAutotranslatedSites.json file with …
|
|
|
@33879
|
4 years |
ak19 |
Have the 2 mongodb aggregate() calls working that
|
|
|
@33878
|
4 years |
ak19 |
Better comment
|
|
|
@33877
|
4 years |
ak19 |
Reordering to have proper descending order of counts
|
|
|
@33876
|
4 years |
ak19 |
Some missteps, but have got complex collection.aggregate() working at last.
|
|
|
@33875
|
4 years |
ak19 |
Renaming 2 more files correctly
|
|
|
@33874
|
4 years |
ak19 |
Renaming 2 files correctly
|
|
|
@33873
|
4 years |
ak19 |
Beginnings of WebPageURLsListing program whose purpose Dr Bainbridge …
|
|
|
@33872
|
4 years |
ak19 |
1. Added the file containing the 255 random NZ page URLs to sample. 2. …
|
|
|
@33871
|
4 years |
ak19 |
Removed mostly duplicated older version of method but left the …
|
|
|
@33870
|
4 years |
ak19 |
Got the mongodb query working in Java in 2 different ways: the fully …
|
|
|
@33869
|
4 years |
ak19 |
First cut at the RandomURLsForDomainGenerator.java class and the …
|
|
|
@33868
|
4 years |
ak19 |
With the updated code for generating the maps from 6a and 6b manual …
|
|
|
@33867
|
4 years |
ak19 |
Moved the code handling of special case large rectangles and those …
|
|
|
@33866
|
4 years |
ak19 |
Dr Bainbridge's fix to Android mobile macronizer user (on Chrome …
|
|
|
@33865
|
4 years |
ak19 |
1. The gs3 context name changed from macronizer to macron-restoration. …
|
|
|
@33858
|
4 years |
ak19 |
Fixes to the code committed yesterday: correct calculation of the …
|
|
|
@33856
|
4 years |
ak19 |
Forgot to commit. Last week, Dr Bainbridge had properly cropped the …
|
|
|
@33854
|
4 years |
ak19 |
Manually gone over around 150 webpages of sample size of 255 webpages …
|
|
|
@33853
|
4 years |
ak19 |
Handling map coordinates that are horizontally excessive (beyond …
|
|
|
@33851
|
4 years |
ak19 |
Deleting faulty maps. NZ numPages inMRI and containingMRI count is …
|
|
|
@33850
|
4 years |
ak19 |
Renames before deleting faulty maps. NZ numPages inMRI and …
|
|
|
@33849
|
4 years |
ak19 |
One less Australian site as it was an infographic containing Maori …
|
|
|
@33848
|
4 years |
ak19 |
Tables of mongodb counts (1-5 table) and manual counts (6table). …
|
|
|
@33847
|
4 years |
ak19 |
indigenousblogs.com did have one page actually in Maori (an XML feed). …
|
|
|
@33846
|
4 years |
ak19 |
Cropped out the json portion
|
|
|
@33845
|
4 years |
ak19 |
Cropped out the json portion
|
|
|
@33844
|
4 years |
ak19 |
Regenerated
|
|
|
@33843
|
4 years |
ak19 |
Counting the 3 non-NZ sites that had mi in the URl path that manual …
|
|
|
@33842
|
4 years |
ak19 |
Jotted down some further paragraphs and notes of interest. Tentatively …
|
|
|
@33841
|
4 years |
ak19 |
Latest version of the flowchart of the process of getting Common Crawl …
|
|
|
@33840
|
4 years |
ak19 |
Older flowchart of the process of getting Common Crawl data into …
|
|
|
@33839
|
4 years |
ak19 |
Moving writeup text file into new folder so I can add the SVG …
|
|
|
@33838
|
4 years |
ak19 |
Updated after checking non-NZ and non-nz TLD sites with mi in URL path
|
|
|
@33828
|
4 years |
ak19 |
Additions and modifications to the write-up.
|
|
|
@33825
|
4 years |
ak19 |
Beginnings of first draft of write up.
|
|
|
@33824
|
4 years |
ak19 |
More instructions and explaining the contents of the mongodb-data folder.
|
|
|
@33823
|
4 years |
ak19 |
Recommitting mongo-data folder with renamed files with numbering.
|
|
|
@33822
|
4 years |
ak19 |
Removing as I'm renaming all the files with prefixes. There are too …
|
|
|
@33821
|
4 years |
ak19 |
Manually created a shortlist of MRI sites from longer …
|
|
|
@33820
|
4 years |
ak19 |
Forgot to commit before holidays.
|
|
|
@33816
|
4 years |
ak19 |
Finished manually going through the sites that I couldn't easily …
|
|
|
@33815
|
4 years |
ak19 |
Removed old results from before bugfix and improvement to …
|
|
|
@33814
|
4 years |
ak19 |
Put the important mongodb queries and results into …
|
|
|
@33813
|
4 years |
ak19 |
With the bugfix from yesterday and the inclusion of http(s):mi.* …
|
|
|
@33812
|
4 years |
ak19 |
Better handling of multi-line comment symbols, so I can now include …
|
|
|
@33811
|
4 years |
ak19 |
Returning to using a single variable, urlContainsLangCodeInPath, to …
|
|
|
@33810
|
4 years |
ak19 |
Bugfix: mi in url path should be checked for for each page of site, …
|
|
|
@33809
|
4 years |
ak19 |
Some more GS_README.txt instructions. Not put the mongodb queries in …
|
|
|
@33808
|
4 years |
ak19 |
Storing not just whether /mi(/) suffix is in path, but also whether …
|
|
|
@33807
|
4 years |
ak19 |
Trying to manually go through a shortlisted set of domains to see if …
|
|
|
@33806
|
4 years |
ak19 |
More mongodb querying revealed that excluding tentative product sites …
|
|
|
@33805
|
4 years |
ak19 |
1. Moving the static countrycodes.json file to conf folder and updated …
|
|
|
@33804
|
4 years |
ak19 |
1. Updated results from mongodb querying after yesterday's …
|
|
|
@33803
|
4 years |
ak19 |
geojson mapdata and map for mongodb results on …
|
|
|
@33802
|
4 years |
ak19 |
With an extra adult site removed and with setting countrycodes that …
|
|
|
@33801
|
4 years |
ak19 |
1. NutchTextDumpToMongoDB Added an extra field to each document in …
|
|
|
@33800
|
4 years |
ak19 |
Removed an adult site from crawled contents and added its url to …
|
|
|
@33799
|
4 years |
ak19 |
1. Adding breadcrumb for next step at end of running …
|
|
|
@33798
|
4 years |
ak19 |
Adding the geojson related files related to querying mongodb for sites …
|
|
|
@33797
|
4 years |
ak19 |
Updated json and imaegs files, and new files for when /mi(/) is in the …
|
|
|
@33796
|
4 years |
ak19 |
Instead of a hack for US' count being too great that its histogram …
|
|
|
@33794
|
4 years |
ak19 |
Wrote the geojson map data created from the site counts per …
|
|
|
@33790
|
4 years |
ak19 |
Got the MultiPoint geojson mapdata of the country code counts working: …
|
|
|
@33789
|
4 years |
ak19 |
Redid the mongodb query to get the countrycode counts for all the …
|
|
|
@33788
|
4 years |
ak19 |
Adding all the jar files needed to work in Java with geojson Simple …
|
|
|
@33787
|
4 years |
ak19 |
Documented another mongodb query that I'm using, the one to produce …
|
|
|
@33778
|
4 years |
ak19 |
Made a beginning on getting the geojson map data automated. Couldn't …
|
|
|
@33776
|
4 years |
ak19 |
Field Separator (IFS) conflicting with backticks and other ways of …
|
|
|
@33760
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
|
|
|
@33759
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
|
|
|
@33723
|
4 years |
ak19 |
On linux 64 bit, the additional wrap command did not work because the …
|
|
|
@33722
|
4 years |
ak19 |
Adding in additional instructions in mongodb.txt, before I forgot how …
|
|
|
@33710
|
5 years |
ak19 |
Working queries and map coords for geojson.tools (ironically, Lat and …
|
|
|
@33698
|
5 years |
ak19 |
Links to more reading
|
|
|
@33675
|
5 years |
ak19 |
Committing the newer query results (but from before today's …
|
|
|
@33674
|
5 years |
ak19 |
Changes to support the top 5 predicted langcodes and their confidence …
|
|
|
@33666
|
5 years |
ak19 |
Having finished sending all the crawl data to mongodb 1. Recrawled the …
|
|
|
@33657
|
5 years |
ak19 |
Some fixes after brief testing against 1/3 of the crawl. Restarted …
|
|
|
@33656
|
5 years |
ak19 |
Final minor changes before I start processing the crawls of node2.
|
|
|
@33655
|
5 years |
ak19 |
Minor change to print statement
|
|
|
@33654
|
5 years |
ak19 |
Removing jar file that wasn't used after all.
|
|
|
@33653
|
5 years |
ak19 |
1. As suggested by Dr Bainbridge, made the code changes to use Morphia …
|
|
|
@33652
|
5 years |
ak19 |
Introducing morphia subpackage
|
|
|
@33651
|
5 years |
ak19 |
1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor
|
|
|
@33646
|
5 years |
ak19 |
Saving the mongodb queries and learning links that Dr Bainbridge found …
|
|
|
@33645
|
5 years |
ak19 |
Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences …
|
|
|
@33644
|
5 years |
ak19 |
Just committing the growing mongodb.txt file with links and …
|
|
|
@33643
|
5 years |
ak19 |
Brought the template log4j.properties.in back up to speed. I forgot it …
|
|
|
@33642
|
5 years |
ak19 |
Forgot to commit the java driver for mongodb when I committed the Java …
|
|
|