|
|
@33865
|
4 years |
ak19 |
1. The gs3 context name changed from macronizer to macron-restoration. …
|
|
|
@33858
|
4 years |
ak19 |
Fixes to the code committed yesterday: correct calculation of the …
|
|
|
@33856
|
4 years |
ak19 |
Forgot to commit. Last week, Dr Bainbridge had properly cropped the …
|
|
|
@33854
|
4 years |
ak19 |
Manually gone over around 150 webpages of sample size of 255 webpages …
|
|
|
@33853
|
4 years |
ak19 |
Handling map coordinates that are horizontally excessive (beyond …
|
|
|
@33851
|
4 years |
ak19 |
Deleting faulty maps. NZ numPages inMRI and containingMRI count is …
|
|
|
@33850
|
4 years |
ak19 |
Renames before deleting faulty maps. NZ numPages inMRI and …
|
|
|
@33849
|
4 years |
ak19 |
One less Australian site as it was an infographic containing Maori …
|
|
|
@33848
|
4 years |
ak19 |
Tables of mongodb counts (1-5 table) and manual counts (6table). …
|
|
|
@33847
|
4 years |
ak19 |
indigenousblogs.com did have one page actually in Maori (an XML feed). …
|
|
|
@33846
|
4 years |
ak19 |
Cropped out the json portion
|
|
|
@33845
|
4 years |
ak19 |
Cropped out the json portion
|
|
|
@33844
|
4 years |
ak19 |
Regenerated
|
|
|
@33843
|
4 years |
ak19 |
Counting the 3 non-NZ sites that had mi in the URl path that manual …
|
|
|
@33842
|
4 years |
ak19 |
Jotted down some further paragraphs and notes of interest. Tentatively …
|
|
|
@33841
|
4 years |
ak19 |
Latest version of the flowchart of the process of getting Common Crawl …
|
|
|
@33840
|
4 years |
ak19 |
Older flowchart of the process of getting Common Crawl data into …
|
|
|
@33839
|
4 years |
ak19 |
Moving writeup text file into new folder so I can add the SVG …
|
|
|
@33838
|
4 years |
ak19 |
Updated after checking non-NZ and non-nz TLD sites with mi in URL path
|
|
|
@33828
|
4 years |
ak19 |
Additions and modifications to the write-up.
|
|
|
@33825
|
4 years |
ak19 |
Beginnings of first draft of write up.
|
|
|
@33824
|
4 years |
ak19 |
More instructions and explaining the contents of the mongodb-data folder.
|
|
|
@33823
|
4 years |
ak19 |
Recommitting mongo-data folder with renamed files with numbering.
|
|
|
@33822
|
4 years |
ak19 |
Removing as I'm renaming all the files with prefixes. There are too …
|
|
|
@33821
|
4 years |
ak19 |
Manually created a shortlist of MRI sites from longer …
|
|
|
@33820
|
4 years |
ak19 |
Forgot to commit before holidays.
|
|
|
@33816
|
4 years |
ak19 |
Finished manually going through the sites that I couldn't easily …
|
|
|
@33815
|
4 years |
ak19 |
Removed old results from before bugfix and improvement to …
|
|
|
@33814
|
4 years |
ak19 |
Put the important mongodb queries and results into …
|
|
|
@33813
|
4 years |
ak19 |
With the bugfix from yesterday and the inclusion of http(s):mi.* …
|
|
|
@33812
|
4 years |
ak19 |
Better handling of multi-line comment symbols, so I can now include …
|
|
|
@33811
|
4 years |
ak19 |
Returning to using a single variable, urlContainsLangCodeInPath, to …
|
|
|
@33810
|
4 years |
ak19 |
Bugfix: mi in url path should be checked for for each page of site, …
|
|
|
@33809
|
4 years |
ak19 |
Some more GS_README.txt instructions. Not put the mongodb queries in …
|
|
|
@33808
|
4 years |
ak19 |
Storing not just whether /mi(/) suffix is in path, but also whether …
|
|
|
@33807
|
4 years |
ak19 |
Trying to manually go through a shortlisted set of domains to see if …
|
|
|
@33806
|
4 years |
ak19 |
More mongodb querying revealed that excluding tentative product sites …
|
|
|
@33805
|
4 years |
ak19 |
1. Moving the static countrycodes.json file to conf folder and updated …
|
|
|
@33804
|
4 years |
ak19 |
1. Updated results from mongodb querying after yesterday's …
|
|
|
@33803
|
4 years |
ak19 |
geojson mapdata and map for mongodb results on …
|
|
|
@33802
|
4 years |
ak19 |
With an extra adult site removed and with setting countrycodes that …
|
|
|
@33801
|
4 years |
ak19 |
1. NutchTextDumpToMongoDB Added an extra field to each document in …
|
|
|
@33800
|
4 years |
ak19 |
Removed an adult site from crawled contents and added its url to …
|
|
|
@33799
|
4 years |
ak19 |
1. Adding breadcrumb for next step at end of running …
|
|
|
@33798
|
4 years |
ak19 |
Adding the geojson related files related to querying mongodb for sites …
|
|
|
@33797
|
4 years |
ak19 |
Updated json and imaegs files, and new files for when /mi(/) is in the …
|
|
|
@33796
|
4 years |
ak19 |
Instead of a hack for US' count being too great that its histogram …
|
|
|
@33794
|
4 years |
ak19 |
Wrote the geojson map data created from the site counts per …
|
|
|
@33790
|
4 years |
ak19 |
Got the MultiPoint geojson mapdata of the country code counts working: …
|
|
|
@33789
|
4 years |
ak19 |
Redid the mongodb query to get the countrycode counts for all the …
|
|
|
@33788
|
4 years |
ak19 |
Adding all the jar files needed to work in Java with geojson Simple …
|
|
|
@33787
|
4 years |
ak19 |
Documented another mongodb query that I'm using, the one to produce …
|
|
|
@33778
|
4 years |
ak19 |
Made a beginning on getting the geojson map data automated. Couldn't …
|
|
|
@33776
|
4 years |
ak19 |
Field Separator (IFS) conflicting with backticks and other ways of …
|
|
|
@33760
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
|
|
|
@33759
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after GLI …
|
|
|
@33723
|
4 years |
ak19 |
On linux 64 bit, the additional wrap command did not work because the …
|
|
|
@33722
|
4 years |
ak19 |
Adding in additional instructions in mongodb.txt, before I forgot how …
|
|
|
@33710
|
5 years |
ak19 |
Working queries and map coords for geojson.tools (ironically, Lat and …
|
|
|
@33698
|
5 years |
ak19 |
Links to more reading
|
|
|
@33675
|
5 years |
ak19 |
Committing the newer query results (but from before today's …
|
|
|
@33674
|
5 years |
ak19 |
Changes to support the top 5 predicted langcodes and their confidence …
|
|
|
@33666
|
5 years |
ak19 |
Having finished sending all the crawl data to mongodb 1. Recrawled the …
|
|
|
@33657
|
5 years |
ak19 |
Some fixes after brief testing against 1/3 of the crawl. Restarted …
|
|
|
@33656
|
5 years |
ak19 |
Final minor changes before I start processing the crawls of node2.
|
|
|
@33655
|
5 years |
ak19 |
Minor change to print statement
|
|
|
@33654
|
5 years |
ak19 |
Removing jar file that wasn't used after all.
|
|
|
@33653
|
5 years |
ak19 |
1. As suggested by Dr Bainbridge, made the code changes to use Morphia …
|
|
|
@33652
|
5 years |
ak19 |
Introducing morphia subpackage
|
|
|
@33651
|
5 years |
ak19 |
1. Bugfix: overlappingSentences works. 2. storing numSentencesInMaor
|
|
|
@33646
|
5 years |
ak19 |
Saving the mongodb queries and learning links that Dr Bainbridge found …
|
|
|
@33645
|
5 years |
ak19 |
Fix to 2 bugs when sending data to MongoDB: 1. overlappingSentences …
|
|
|
@33644
|
5 years |
ak19 |
Just committing the growing mongodb.txt file with links and …
|
|
|
@33643
|
5 years |
ak19 |
Brought the template log4j.properties.in back up to speed. I forgot it …
|
|
|
@33642
|
5 years |
ak19 |
Forgot to commit the java driver for mongodb when I committed the Java …
|
|
|
@33635
|
5 years |
ak19 |
Maori-language-detection doesn't use Greenstone 3 at present, it's not …
|
|
|
@33589
|
5 years |
cpb16 |
final01. Need Map results still
|
|
|
@33521
|
5 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
|
|
|
@33520
|
5 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
|
|
|
@33512
|
5 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
|
|
|
@33511
|
5 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
|
|
|
@33458
|
5 years |
cpb16 |
Running new morphology version after quick meeting with david last …
|
|
|
@33455
|
5 years |
cpb16 |
Started implementing Davids suggested morphology sequence, codeversion9
|
|
|
@33449
|
5 years |
cpb16 |
termnal version executes correctly. (Didnt include init threshold in …
|
|
|
@33447
|
5 years |
cpb16 |
starting to implement terminal version of new morphology. need to fix. …
|
|
|
@33444
|
5 years |
cpb16 |
Have created a preprocess to remove large objects.
…
|
|
|
@33439
|
5 years |
cpb16 |
Have created properties file and accessibility from …
|
|
|
@33437
|
5 years |
cpb16 |
made progress with morphology. Need to have a better area dimension …
|
|
|
@33427
|
5 years |
davidb |
Some initial files on how to get going
|
|
|
@33426
|
5 years |
davidb |
Folder to details on how to standup the HTRC DevEnv locally
|
|
|
@33418
|
5 years |
cpb16 |
made progress with morphology, based one image, need to refine …
|
|
|
@33415
|
5 years |
cpb16 |
updated, after unable to commit due to setup.bash being out of date. …
|
|
|
@33384
|
5 years |
cpb16 |
backup before intellij working
|
|
|
@33375
|
5 years |
cpb16 |
Full backup after running first successful highres classifier run
|
|
|
@33367
|
5 years |
cpb16 |
Pre-hires classification w/o MU
|
|
|
@33354
|
5 years |
davidb |
Template file for producing OpenOffice spreadsheet format
|
|
|
@33353
|
5 years |
davidb |
Initial set of files to page scrape and turn in the OpenOffice …
|
|
|
@33352
|
5 years |
davidb |
Top-level folder for code to page-scrape BookStumper site
|
|
|
@33351
|
5 years |
davidb |
Top-level folder for code to page-scrape BookStumper site
|
|
|
@33340
|
5 years |
cpb16 |
transferred backup of low res images. Classifiers work as expected. …
|
|
|