Timeline



03/10/20:

21:03 Changeset [34015] by davidb
Further elimination of PJ related HTML/templates
21:02 Changeset [34014] by davidb
Added in vidoe player template; remove PJ templates
21:01 Changeset [34013] by davidb
Added hr line to break up sections
20:53 Changeset [34012] by davidb
Images for Atea alt interface
20:45 Changeset [34011] by ak19
Piechart data for sites prepared for crawling and the piecharts for these
20:45 Changeset [34010] by davidb
icon image for MP4 video
20:25 Changeset [34009] by davidb
PJ based alternative interface for Atea
20:17 Changeset [34008] by davidb
Alternative interface look-and-feel for the Atea project
19:56 Changeset [34007] by ak19
Prepared more data for the piecharts. This time for empty web pages vs …
18:51 Changeset [34006] by ak19
Committing more data I've collected for generating pie charts and the …
17:33 Changeset [34005] by ak19
InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead …
17:27 Changeset [34004] by ak19
Renaming csv file to have csv extension
17:26 Changeset [34003] by ak19
Redid the file with info on empty URL web pages as a csv file with …
12:09 Changeset [34002] by davidb
Comment-based changes resulting from: (i) merging in differences from …

03/09/20:

18:56 Changeset [34001] by ak19
Tentative total urls from common crawl 12 month cral data.
18:55 Changeset [34000] by ak19
Some debugging and other minor changes
17:34 Changeset [33999] by ak19
Common crawl 12 month urls and CC provided stats

03/06/20:

17:49 Changeset [33998] by davidb
Removed import statement that is no longer used, and was stopping …
15:55 Changeset [33997] by davidb
Top-level folder for MARS related Greenstone3 code
15:18 Changeset [33996] by ak19
Accidentally committed the wrong thing in previous commit. Attempting …
15:14 Changeset [33995] by ak19
There was no Expat.so for perl 5.18 so am recompiling and committing that

03/03/20:

14:42 Changeset [33994] by davidb
The introduction of UTF8Control class means we can now work directly …

03/02/20:

14:10 Changeset [33993] by kjdon
when downloading a pdf, browsers seem to make more than one request - …

03/01/20:

16:41 Changeset [33992] by davidb
Notes at start of file updated
16:35 Changeset [33991] by davidb
A version of the tomcat/conf/server.xml file that is better aligned …
16:29 Changeset [33990] by davidb
Some white-space changes for consistency with newer …
15:16 Changeset [33989] by davidb
In a default setup, AJP is not used => so not needed. Commented out to …

02/28/20:

22:09 Changeset [33988] by ak19
1. Print out which web pages of which web site's dump.txt were empty. …
22:08 Changeset [33987] by ak19
Output of re-running NutchTextDumpToMongoDB to print out which web …
22:07 Changeset [33986] by ak19
Dr Bainbridge investigated the original data set more

02/27/20:

21:49 Changeset [33985] by ak19
Data to back the piechart I need to make that will illustrate how we …
21:44 Changeset [33984] by ak19
Simple class to summarise some basic counts of the input common crawl data
20:26 Changeset [33983] by ak19
More sensible name for method which had too long kept its old name …

02/26/20:

21:59 Changeset [33982] by ak19
SummaryTool.java now processed the handcrafted UNIQUE domains counts …
21:19 Changeset [33981] by ak19
As Dr Bainbridge suggested, code now opens a new firefox tab with a …
21:11 Changeset [33980] by ak19
Additional comments
21:00 Changeset [33979] by ak19
Clearly stating that counts are of unique domains
19:57 Changeset [33978] by ak19
Opens all geoJSON maps in new tabs instead of waiting for user to have …
18:37 Changeset [33977] by ak19
Added something on precision vs recall being applicable to our …
18:28 Changeset [33976] by ak19
Adding in what I could remember of Dr Bainbridge's statement about the …

02/25/20:

14:46 Changeset [33975] by kjdon
some mods to do with allowing multiple oaiservers. need …
14:14 Changeset [33974] by kjdon
added in new oai.servlets field - if you want to run two oaiservlets, …
14:01 Changeset [33973] by kjdon
tidied up the file a bit. added new servlet_url param to oaiserver - …
13:47 Changeset [33972] by kjdon
fixed a typo in a comment
13:47 Changeset [33971] by kjdon
get servlet_url param and pass to getOAIConfigXML, as now the files …
13:46 Changeset [33970] by kjdon
changed OAIConfig naming to OAIConfig-oaiserver.xml - so multiple …
13:39 Changeset [33969] by kjdon
we no longer use OAIConfig.xml as the filename, now we use eg …
13:37 Changeset [33968] by kjdon
pass in oai_config from server, rather than reading it in itself
13:36 Changeset [33967] by kjdon
you might want to change the oaiserver url, eg if you have 2 oai …

02/21/20:

21:00 Changeset [33966] by ak19
Added the origSequence and basicDomain columns to the random 260 web …
20:59 Changeset [33965] by ak19
1. Adding a basicDomain column (stripped of http/https and www prefix) …
19:57 Changeset [33964] by ak19
2 records were missing a value for the qualityLevel column.

02/20/20:

22:12 Changeset [33963] by ak19
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
22:07 Changeset [33962] by ak19
2 fields changed, as one was missed out and the other incorrectly …
20:24 Changeset [33961] by ak19
New category, LINK_TEXT, introduced for the random web page URL samples.
20:22 Changeset [33960] by ak19
Reviewed all the random sample web page URLs marked …
20:06 Changeset [33959] by ak19
URIEncoding the mapData makes it unparseable by geojson.io
19:32 Changeset [33958] by ak19
There were other xsl files using the original depositorTitleAndLink …
19:24 Changeset [33957] by ak19
1. depositor related interface display modified to work with recent …
18:28 Changeset [33956] by ak19
Related to commit 33953: made lots of accidental commits in rev 33953, …
18:26 Changeset [33955] by ak19
Undoing accidental commit of unintended files.
18:21 Changeset [33954] by ak19
Accidentally committed with other files. Undoing.
18:19 Changeset [33953] by ak19
Depositor link not used

02/18/20:

23:35 Changeset [33952] by ak19
Minor changes for processing
23:33 Changeset [33951] by ak19
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
23:28 Changeset [33950] by ak19
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
23:22 Changeset [33949] by ak19
Reviewed the qualityLevel column where NAV was assigned.
22:56 Changeset [33948] by ak19
Reviewed the random sampled web page URLs marked as …
22:07 Changeset [33947] by ak19
Some more questionmarked field values assigned.
21:58 Changeset [33946] by ak19
1. New function to handle user input assigning the newly introduced …
21:48 Changeset [33945] by ak19
Added a 4th column for all 260 sample web page URLs and have used the …
16:44 Changeset [33944] by ak19
Added the isReallyInMRI column after manually inspecting the remaining …
15:56 Changeset [33943] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:55 Changeset [33942] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:18 Changeset [33941] by ak19
1. Uppercase 3rd field (Y/N/? field) read back in from file before …

02/17/20:

22:16 Changeset [33940] by ak19
1. In order to make it easier to do the manual work of inspecting 260 …
16:22 Changeset [33939] by ak19
1. Old random samples file doesn't apply as we're not sampling by …
16:10 Changeset [33938] by ak19
1. Don't regenerate random sample of web page urls and full web page …
16:06 Changeset [33937] by ak19
New counts of manual sites after reingesting into MongoDB. Forgot to …
16:05 Changeset [33936] by ak19
Renaming old file to place with new counts after reingesting into MongoDB.

02/16/20:

18:16 Changeset [33935] by davidb
Additional check added into get-isis target
17:34 Changeset [33934] by davidb
Removal of static code block calling ancient/deprecated static …
14:19 Changeset [33933] by davidb
Changed 8-spaces to tag chars in Makefile.in. Original problem caused …

02/15/20:

19:14 Changeset [33932] by davidb
Commented out Java version warning message, as it presents as …
19:10 Changeset [33931] by davidb
Two changes to setup file. The first was to move the test for ant to …
19:00 Changeset [33930] by davidb
Code used to assume that major number was a single digit, as in 1.6 or …
18:57 Changeset [33929] by davidb
Newer JDKs don't have javah => make file change that takes account of this
18:55 Changeset [33928] by davidb
Streamlining of how test for JDK/javac is done
14:57 Changeset [33927] by davidb
Reworking of javah test

02/14/20:

23:03 Changeset [33926] by ak19
Investigated some other options for screen capturing and Google chrome …
20:41 Changeset [33925] by ak19
1. Bugfix: oversight, should return uri encoded URL for mapData, …
19:22 Changeset [33924] by ak19
Adding in Dr Bainbridge's command to check the JSON generated is …
18:45 Changeset [33923] by davidb
Removed non-UTF8 valid char from comment; regenerated tar file
18:13 Changeset [33922] by davidb
Notes about using this site
18:11 Changeset [33921] by davidb
Newer Java's don't have 'javah' any more. The functionality has been …
16:55 Changeset [33920] by davidb
Found to be needed when compiling up on a Google Compute Engine (GCE) …

02/13/20:

22:40 Changeset [33919] by ak19
SummaryTool now uses the CountryCodeCountsMapData.java class to …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist2.txt …

02/12/20:

21:27 Changeset [33913] by ak19
1. Adjusted table mongodb query statements to be more exact, but same …
19:53 Changeset [33912] by ak19
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
19:12 Changeset [33911] by ak19
Correct commit message for previous and current commit: 1. After …
19:05 Changeset [33910] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
19:02 Changeset [33909] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …

02/10/20:

09:41 Changeset [33908] by kjdon
meta values are already escaped. Don't want to escape them again …
Note: See TracTimeline for information about the timeline view.