|
|
@34637
|
3 years |
davidb |
Change of external reference to be SETUP.bash, not SETUP.sh
|
|
|
@34636
|
3 years |
davidb |
Initial set of svn:externals to get 'ml-processing' off the ground
|
|
|
@34635
|
3 years |
davidb |
Directory that holds together a skeleton set of the Greenstone3 …
|
|
|
@34617
|
3 years |
anupama |
Before we forget, putting Kathy's new script for uploading to the …
|
|
|
@34524
|
4 years |
ak19 |
Correct Mac OS name in log file being uploaded
|
|
|
@34523
|
4 years |
ak19 |
Minor. After testing on new release-kit mac.
|
|
|
@34520
|
4 years |
Jeremy Symon |
need to use ed25519 key on www-internal
|
|
|
@34519
|
4 years |
Jeremy Symon |
adding in code to upload to www-internal. Needs a new ed25519 identity …
|
|
|
@34518
|
4 years |
Jeremy Symon |
use a different identity file for www-internal - needs to be ed25519, …
|
|
|
@34515
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Forgot to svn up …
|
|
|
@34514
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Forgot to svn up …
|
|
|
@34513
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after …
|
|
|
@34512
|
4 years |
ak19 |
AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding after …
|
|
|
@34418
|
4 years |
ak19 |
Attempted to upload diffcol report to wwwinternal instead of wwwdev. …
|
|
|
@34417
|
4 years |
ak19 |
Updates to diffcol to handle change introduced in commit 34394, which …
|
|
|
@34416
|
4 years |
ak19 |
Committing rebuilt model collections after new doc.xml meta …
|
|
|
@34231
|
4 years |
ak19 |
Rebuilding diffcol model collection Multimedia after recent update to …
|
|
|
@34127
|
4 years |
ak19 |
Spelling correction in filename: screeMshot to screeNshot
|
|
|
@34120
|
4 years |
ak19 |
CSV version of .ods file, so openoffice isn't required
|
|
|
@34119
|
4 years |
ak19 |
Committing the auto-generated analysis results folder, …
|
|
|
@34097
|
4 years |
ak19 |
Open office version of similarly named spreadsheet, just with columns …
|
|
|
@34089
|
4 years |
ak19 |
So far accumulated URLs to docs on Google scholar about or somewhat …
|
|
|
@34011
|
4 years |
ak19 |
Piechart data for sites prepared for crawling and the piecharts for these
|
|
|
@34007
|
4 years |
ak19 |
Prepared more data for the piecharts. This time for empty web pages vs …
|
|
|
@34006
|
4 years |
ak19 |
Committing more data I've collected for generating pie charts and the …
|
|
|
@34005
|
4 years |
ak19 |
InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead …
|
|
|
@34004
|
4 years |
ak19 |
Renaming csv file to have csv extension
|
|
|
@34003
|
4 years |
ak19 |
Redid the file with info on empty URL web pages as a csv file with …
|
|
|
@34001
|
4 years |
ak19 |
Tentative total urls from common crawl 12 month cral data.
|
|
|
@34000
|
4 years |
ak19 |
Some debugging and other minor changes
|
|
|
@33999
|
4 years |
ak19 |
Common crawl 12 month urls and CC provided stats
|
|
|
@33988
|
4 years |
ak19 |
1. Print out which web pages of which web site's dump.txt were empty. …
|
|
|
@33987
|
4 years |
ak19 |
Output of re-running NutchTextDumpToMongoDB to print out which web …
|
|
|
@33986
|
4 years |
ak19 |
Dr Bainbridge investigated the original data set more
|
|
|
@33985
|
4 years |
ak19 |
Data to back the piechart I need to make that will illustrate how we …
|
|
|
@33984
|
4 years |
ak19 |
Simple class to summarise some basic counts of the input common crawl data
|
|
|
@33983
|
4 years |
ak19 |
More sensible name for method which had too long kept its old name …
|
|
|
@33982
|
4 years |
ak19 |
SummaryTool.java now processed the handcrafted UNIQUE domains counts …
|
|
|
@33981
|
4 years |
ak19 |
As Dr Bainbridge suggested, code now opens a new firefox tab with a …
|
|
|
@33980
|
4 years |
ak19 |
Additional comments
|
|
|
@33979
|
4 years |
ak19 |
Clearly stating that counts are of unique domains
|
|
|
@33978
|
4 years |
ak19 |
Opens all geoJSON maps in new tabs instead of waiting for user to have …
|
|
|
@33977
|
4 years |
ak19 |
Added something on precision vs recall being applicable to our …
|
|
|
@33976
|
4 years |
ak19 |
Adding in what I could remember of Dr Bainbridge's statement about the …
|
|
|
@33966
|
4 years |
ak19 |
Added the origSequence and basicDomain columns to the random 260 web …
|
|
|
@33965
|
4 years |
ak19 |
1. Adding a basicDomain column (stripped of http/https and www prefix) …
|
|
|
@33964
|
4 years |
ak19 |
2 records were missing a value for the qualityLevel column.
|
|
|
@33963
|
4 years |
ak19 |
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
|
|
|
@33962
|
4 years |
ak19 |
2 fields changed, as one was missed out and the other incorrectly …
|
|
|
@33961
|
4 years |
ak19 |
New category, LINK_TEXT, introduced for the random web page URL samples.
|
|
|
@33960
|
4 years |
ak19 |
Reviewed all the random sample web page URLs marked …
|
|
|
@33959
|
4 years |
ak19 |
URIEncoding the mapData makes it unparseable by geojson.io
|
|
|
@33952
|
4 years |
ak19 |
Minor changes for processing
|
|
|
@33951
|
4 years |
ak19 |
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
|
|
|
@33950
|
4 years |
ak19 |
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
|
|
|
@33949
|
4 years |
ak19 |
Reviewed the qualityLevel column where NAV was assigned.
|
|
|
@33948
|
4 years |
ak19 |
Reviewed the random sampled web page URLs marked as …
|
|
|
@33947
|
4 years |
ak19 |
Some more questionmarked field values assigned.
|
|
|
@33946
|
4 years |
ak19 |
1. New function to handle user input assigning the newly introduced …
|
|
|
@33945
|
4 years |
ak19 |
Added a 4th column for all 260 sample web page URLs and have used the …
|
|
|
@33944
|
4 years |
ak19 |
Added the isReallyInMRI column after manually inspecting the remaining …
|
|
|
@33941
|
4 years |
ak19 |
1. Uppercase 3rd field (Y/N/? field) read back in from file before …
|
|
|
@33940
|
4 years |
ak19 |
1. In order to make it easier to do the manual work of inspecting 260 …
|
|
|
@33939
|
4 years |
ak19 |
1. Old random samples file doesn't apply as we're not sampling by …
|
|
|
@33938
|
4 years |
ak19 |
1. Don't regenerate random sample of web page urls and full web page …
|
|
|
@33937
|
4 years |
ak19 |
New counts of manual sites after reingesting into MongoDB. Forgot to …
|
|
|
@33936
|
4 years |
ak19 |
Renaming old file to place with new counts after reingesting into MongoDB.
|
|
|
@33926
|
4 years |
ak19 |
Investigated some other options for screen capturing and Google chrome …
|
|
|
@33925
|
4 years |
ak19 |
1. Bugfix: oversight, should return uri encoded URL for mapData, …
|
|
|
@33924
|
4 years |
ak19 |
Adding in Dr Bainbridge's command to check the JSON generated is …
|
|
|
@33919
|
4 years |
ak19 |
SummaryTool now uses the CountryCodeCountsMapData.java class to …
|
|
|
@33918
|
4 years |
ak19 |
Country codes added to each domain's URL of the manual site/domain …
|
|
|
@33917
|
4 years |
ak19 |
Added some better reporting when confirming sample size was correct
|
|
|
@33916
|
4 years |
ak19 |
Updated the rest of the file after reingest
|
|
|
@33915
|
4 years |
ak19 |
Forgot to add a (manual) counts file created last week, and am now …
|
|
|
@33914
|
4 years |
ak19 |
Shortlisted just the domain sites by country into ManualShortlist2.txt …
|
|
|
@33913
|
4 years |
ak19 |
1. Adjusted table mongodb query statements to be more exact, but same …
|
|
|
@33912
|
4 years |
ak19 |
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
|
|
|
@33911
|
4 years |
ak19 |
Correct commit message for previous and current commit: 1. After …
|
|
|
@33910
|
4 years |
ak19 |
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
|
|
|
@33909
|
4 years |
ak19 |
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
|
|
|
@33907
|
4 years |
ak19 |
See previous commit message. This will be the file with the results …
|
|
|
@33906
|
4 years |
ak19 |
Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
|
|
|
@33905
|
4 years |
ak19 |
More notes
|
|
|
@33904
|
4 years |
ak19 |
Shouldn't greylist anglican.org, as this prevented crawling of …
|
|
|
@33903
|
4 years |
ak19 |
My notes when preparing for today's meetings. Some of this may be …
|
|
|
@33896
|
4 years |
ak19 |
Clarification in comments
|
|
|
@33895
|
4 years |
ak19 |
Minor rename
|
|
|
@33894
|
4 years |
ak19 |
1. Adding map, counts.json and geo-json files for 5b count of sites by …
|
|
|
@33893
|
4 years |
ak19 |
1. Left out region code column. 2. Two more sheets of work in progress …
|
|
|
@33892
|
4 years |
ak19 |
Sheets renamed and spreadsheet renamed
|
|
|
@33891
|
4 years |
ak19 |
Site level detected vs manual inspected data: working shown in file …
|
|
|
@33890
|
4 years |
ak19 |
Finished going through NZ sites listing of numPagesContainingMRI > 0 …
|
|
|
@33889
|
4 years |
ak19 |
1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
|
|
|
@33887
|
4 years |
ak19 |
1. Added support for writing out tables in csv format too. 2. Second …
|
|
|
@33886
|
4 years |
ak19 |
Minor. File rename
|
|
|
@33885
|
4 years |
ak19 |
Attempting to write the tables. csv not yet supported. Table 1 done.
|
|
|
@33884
|
4 years |
ak19 |
0. Previous commit had lots of modifications, and only 2 files matched …
|
|
|
@33883
|
4 years |
ak19 |
Clarifications
|
|
|
@33882
|
4 years |
ak19 |
Code now writes both a listing of all non-autotranslated websites and …
|
|
|