source:

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33986   4 years ak19 Dr Bainbridge investigated the original data set more
(edit) @33985   4 years ak19 Data to back the piechart I need to make that will illustrate how we …
(edit) @33984   4 years ak19 Simple class to summarise some basic counts of the input common crawl data
(edit) @33983   4 years ak19 More sensible name for method which had too long kept its old name …
(edit) @33982   4 years ak19 SummaryTool.java now processed the handcrafted UNIQUE domains counts …
(edit) @33981   4 years ak19 As Dr Bainbridge suggested, code now opens a new firefox tab with a …
(edit) @33980   4 years ak19 Additional comments
(edit) @33979   4 years ak19 Clearly stating that counts are of unique domains
(edit) @33978   4 years ak19 Opens all geoJSON maps in new tabs instead of waiting for user to have …
(edit) @33977   4 years ak19 Added something on precision vs recall being applicable to our …
(edit) @33976   4 years ak19 Adding in what I could remember of Dr Bainbridge's statement about the …
(edit) @33975   4 years kjdon some mods to do with allowing multiple oaiservers. need …
(edit) @33974   4 years kjdon added in new oai.servlets field - if you want to run two oaiservlets, …
(edit) @33973   4 years kjdon tidied up the file a bit. added new servlet_url param to oaiserver - …
(edit) @33972   4 years kjdon fixed a typo in a comment
(edit) @33971   4 years kjdon get servlet_url param and pass to getOAIConfigXML, as now the files …
(edit) @33970   4 years kjdon changed OAIConfig naming to OAIConfig-oaiserver.xml - so multiple …
(edit) @33969   4 years kjdon we no longer use OAIConfig.xml as the filename, now we use eg …
(edit) @33968   4 years kjdon pass in oai_config from server, rather than reading it in itself
(edit) @33967   4 years kjdon you might want to change the oaiserver url, eg if you have 2 oai …
(edit) @33966   4 years ak19 Added the origSequence and basicDomain columns to the random 260 web …
(edit) @33965   4 years ak19 1. Adding a basicDomain column (stripped of http/https and www prefix) …
(edit) @33964   4 years ak19 2 records were missing a value for the qualityLevel column.
(edit) @33963   4 years ak19 Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
(edit) @33962   4 years ak19 2 fields changed, as one was missed out and the other incorrectly …
(edit) @33961   4 years ak19 New category, LINK_TEXT, introduced for the random web page URL samples.
(edit) @33960   4 years ak19 Reviewed all the random sample web page URLs marked …
(edit) @33959   4 years ak19 URIEncoding the mapData makes it unparseable by geojson.io
(edit) @33958   4 years ak19 There were other xsl files using the original depositorTitleAndLink …
(edit) @33957   4 years ak19 1. depositor related interface display modified to work with recent …
(edit) @33956   4 years ak19 Related to commit 33953: made lots of accidental commits in rev 33953, …
(edit) @33955   4 years ak19 Undoing accidental commit of unintended files.
(edit) @33954   4 years ak19 Accidentally committed with other files. Undoing.
(edit) @33953   4 years ak19 Depositor link not used
(edit) @33952   4 years ak19 Minor changes for processing
(edit) @33951   4 years ak19 Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
(edit) @33950   4 years ak19 Reviewed the qualityLevel column where MIXED_TEXT was assigned.
(edit) @33949   4 years ak19 Reviewed the qualityLevel column where NAV was assigned.
(edit) @33948   4 years ak19 Reviewed the random sampled web page URLs marked as …
(edit) @33947   4 years ak19 Some more questionmarked field values assigned.
(edit) @33946   4 years ak19 1. New function to handle user input assigning the newly introduced …
(edit) @33945   4 years ak19 Added a 4th column for all 260 sample web page URLs and have used the …
(edit) @33944   4 years ak19 Added the isReallyInMRI column after manually inspecting the remaining …
(edit) @33943   4 years davidb Further tweaking of javah check after it failed to work on Bedrock LSB
(edit) @33942   4 years davidb Further tweaking of javah check after it failed to work on Bedrock LSB
(edit) @33941   4 years ak19 1. Uppercase 3rd field (Y/N/? field) read back in from file before …
(edit) @33940   4 years ak19 1. In order to make it easier to do the manual work of inspecting 260 …
(edit) @33939   4 years ak19 1. Old random samples file doesn't apply as we're not sampling by …
(edit) @33938   4 years ak19 1. Don't regenerate random sample of web page urls and full web page …
(edit) @33937   4 years ak19 New counts of manual sites after reingesting into MongoDB. Forgot to …
(edit) @33936   4 years ak19 Renaming old file to place with new counts after reingesting into MongoDB.
(edit) @33935   4 years davidb Additional check added into get-isis target
(edit) @33934   4 years davidb Removal of static code block calling ancient/deprecated static …
(edit) @33933   4 years davidb Changed 8-spaces to tag chars in Makefile.in. Original problem caused …
(edit) @33932   4 years davidb Commented out Java version warning message, as it presents as …
(edit) @33931   4 years davidb Two changes to setup file. The first was to move the test for ant to …
(edit) @33930   4 years davidb Code used to assume that major number was a single digit, as in 1.6 or …
(edit) @33929   4 years davidb Newer JDKs don't have javah => make file change that takes account of this
(edit) @33928   4 years davidb Streamlining of how test for JDK/javac is done
(edit) @33927   4 years davidb Reworking of javah test
(edit) @33926   4 years ak19 Investigated some other options for screen capturing and Google chrome …
(edit) @33925   4 years ak19 1. Bugfix: oversight, should return uri encoded URL for mapData, …
(edit) @33924   4 years ak19 Adding in Dr Bainbridge's command to check the JSON generated is …
(edit) @33923   4 years davidb Removed non-UTF8 valid char from comment; regenerated tar file
(edit) @33922   4 years davidb Notes about using this site
(edit) @33921   4 years davidb Newer Java's don't have 'javah' any more. The functionality has been …
(edit) @33920   4 years davidb Found to be needed when compiling up on a Google Compute Engine (GCE) …
(edit) @33919   4 years ak19 SummaryTool now uses the CountryCodeCountsMapData.java class to …
(edit) @33918   4 years ak19 Country codes added to each domain's URL of the manual site/domain …
(edit) @33917   4 years ak19 Added some better reporting when confirming sample size was correct
(edit) @33916   4 years ak19 Updated the rest of the file after reingest
(edit) @33915   4 years ak19 Forgot to add a (manual) counts file created last week, and am now …
(edit) @33914   4 years ak19 Shortlisted just the domain sites by country into ManualShortlist2.txt …
(edit) @33913   4 years ak19 1. Adjusted table mongodb query statements to be more exact, but same …
(edit) @33912   4 years ak19 Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
(edit) @33911   4 years ak19 Correct commit message for previous and current commit: 1. After …
(edit) @33910   4 years ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33909   4 years ak19 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
(edit) @33908   4 years kjdon meta values are already escaped. Don't want to escape them again …
(edit) @33907   4 years ak19 See previous commit message. This will be the file with the results …
(edit) @33906   4 years ak19 Code is intermediate state. 1. Introduced basicDomain field to MongoDB …
(edit) @33905   4 years ak19 More notes
(edit) @33904   4 years ak19 Shouldn't greylist anglican.org, as this prevented crawling of …
(edit) @33903   4 years ak19 My notes when preparing for today's meetings. Some of this may be …
(edit) @33902   4 years kjdon pass in new casefold and accentfold options to format_metadata_for_sorting
(edit) @33901   4 years kjdon new casefold_metadata_for_formatting and …
(edit) @33900   4 years kjdon BaseClassifier casefold/accentfold options
(edit) @33899   4 years kjdon pass in new casefold and accentfold options (BaseClassifier) to …
(edit) @33898   4 years kjdon format_metadata_for_sorting now takes two additional args - casefold …
(edit) @33897   4 years kjdon elsewhere in the code - GSXML.xmlSafe, we are escaping ' => ' we …
(edit) @33896   4 years ak19 Clarification in comments
(edit) @33895   4 years ak19 Minor rename
(edit) @33894   4 years ak19 1. Adding map, counts.json and geo-json files for 5b count of sites by …
(edit) @33893   4 years ak19 1. Left out region code column. 2. Two more sheets of work in progress …
(edit) @33892   4 years ak19 Sheets renamed and spreadsheet renamed
(edit) @33891   4 years ak19 Site level detected vs manual inspected data: working shown in file …
(edit) @33890   4 years ak19 Finished going through NZ sites listing of numPagesContainingMRI > 0 …
(edit) @33889   4 years ak19 1. Additional column: totalPagesAcrossMatchingSites. 2. Screengrab of …
(edit) @33888   4 years kjdon added propertyFile attribute to gsf:interfaceText so that you can …
(edit) @33887   4 years ak19 1. Added support for writing out tables in csv format too. 2. Second …
Note: See TracRevisionLog for help on using the revision log.