Timeline


and .

06.03.2020:

17:49 Changeset [33998] by davidb
Removed import statement that is no longer used, and was stopping …
15:55 Changeset [33997] by davidb
Top-level folder for MARS related Greenstone3 code
15:18 Changeset [33996] by ak19
Accidentally committed the wrong thing in previous commit. Attempting the …
15:14 Changeset [33995] by ak19
There was no Expat.so for perl 5.18 so am recompiling and committing that

03.03.2020:

14:42 Changeset [33994] by davidb
The introduction of UTF8Control class means we can now work directly with …

02.03.2020:

14:10 Changeset [33993] by kjdon
when downloading a pdf, browsers seem to make more than one request - …

01.03.2020:

16:41 Changeset [33992] by davidb
Notes at start of file updated
16:35 Changeset [33991] by davidb
A version of the tomcat/conf/server.xml file that is better aligned with …
16:29 Changeset [33990] by davidb
Some white-space changes for consistency with newer tomcat/conf/server.xml …
15:16 Changeset [33989] by davidb
In a default setup, AJP is not used => so not needed. Commented out to …

28.02.2020:

22:09 Changeset [33988] by ak19
1. Print out which web pages of which web site's dump.txt were empty. Then …
22:08 Changeset [33987] by ak19
Output of re-running NutchTextDumpToMongoDB to print out which web pages …
22:07 Changeset [33986] by ak19
Dr Bainbridge investigated the original data set more

27.02.2020:

21:49 Changeset [33985] by ak19
Data to back the piechart I need to make that will illustrate how we …
21:44 Changeset [33984] by ak19
Simple class to summarise some basic counts of the input common crawl data
20:26 Changeset [33983] by ak19
More sensible name for method which had too long kept its old name from …

26.02.2020:

21:59 Changeset [33982] by ak19
SummaryTool?.java now processed the handcrafted UNIQUE domains counts file …
21:19 Changeset [33981] by ak19
As Dr Bainbridge suggested, code now opens a new firefox tab with a …
21:11 Changeset [33980] by ak19
Additional comments
21:00 Changeset [33979] by ak19
Clearly stating that counts are of unique domains
19:57 Changeset [33978] by ak19
Opens all geoJSON maps in new tabs instead of waiting for user to have …
18:37 Changeset [33977] by ak19
Added something on precision vs recall being applicable to our sampling …
18:28 Changeset [33976] by ak19
Adding in what I could remember of Dr Bainbridge's statement about the …

25.02.2020:

14:46 Changeset [33975] by kjdon
some mods to do with allowing multiple oaiservers. need …
14:14 Changeset [33974] by kjdon
added in new oai.servlets field - if you want to run two oaiservlets, add …
14:01 Changeset [33973] by kjdon
tidied up the file a bit. added new servlet_url param to oaiserver - used …
13:47 Changeset [33972] by kjdon
fixed a typo in a comment
13:47 Changeset [33971] by kjdon
get servlet_url param and pass to getOAIConfigXML, as now the files are …
13:46 Changeset [33970] by kjdon
changed OAIConfig naming to OAIConfig-oaiserver.xml - so multiple versions …
13:39 Changeset [33969] by kjdon
we no longer use OAIConfig.xml as the filename, now we use eg …
13:37 Changeset [33968] by kjdon
pass in oai_config from server, rather than reading it in itself
13:36 Changeset [33967] by kjdon
you might want to change the oaiserver url, eg if you have 2 oai servers, …

21.02.2020:

21:00 Changeset [33966] by ak19
Added the origSequence and basicDomain columns to the random 260 web page …
20:59 Changeset [33965] by ak19
1. Adding a basicDomain column (stripped of http/https and www prefix) for …
19:57 Changeset [33964] by ak19
2 records were missing a value for the qualityLevel column.

20.02.2020:

22:12 Changeset [33963] by ak19
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI and …
22:07 Changeset [33962] by ak19
2 fields changed, as one was missed out and the other incorrectly entered. …
20:24 Changeset [33961] by ak19
New category, LINK_TEXT, introduced for the random web page URL samples.
20:22 Changeset [33960] by ak19
Reviewed all the random sample web page URLs marked SINGLE_MRI_SENTENCE …
20:06 Changeset [33959] by ak19
URIEncoding the mapData makes it unparseable by geojson.io
19:32 Changeset [33958] by ak19
There were other xsl files using the original depositorTitleAndLink …
19:24 Changeset [33957] by ak19
1. depositor related interface display modified to work with recent …
18:28 Changeset [33956] by ak19
Related to commit 33953: made lots of accidental commits in rev 33953, and …
18:26 Changeset [33955] by ak19
Undoing accidental commit of unintended files.
18:21 Changeset [33954] by ak19
Accidentally committed with other files. Undoing.
18:19 Changeset [33953] by ak19
Depositor link not used

18.02.2020:

23:35 Changeset [33952] by ak19
Minor changes for processing
23:33 Changeset [33951] by ak19
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
23:28 Changeset [33950] by ak19
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
23:22 Changeset [33949] by ak19
Reviewed the qualityLevel column where NAV was assigned.
22:56 Changeset [33948] by ak19
Reviewed the random sampled web page URLs marked as SIGNIFICANTLY_MAORI …
22:07 Changeset [33947] by ak19
Some more questionmarked field values assigned.
21:58 Changeset [33946] by ak19
1. New function to handle user input assigning the newly introduced 4th …
21:48 Changeset [33945] by ak19
Added a 4th column for all 260 sample web page URLs and have used the …
16:44 Changeset [33944] by ak19
Added the isReallyInMRI column after manually inspecting the remaining 70 …
15:56 Changeset [33943] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:55 Changeset [33942] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:18 Changeset [33941] by ak19
1. Uppercase 3rd field (Y/N/? field) read back in from file before being …

17.02.2020:

22:16 Changeset [33940] by ak19
1. In order to make it easier to do the manual work of inspecting 260 web …
16:22 Changeset [33939] by ak19
1. Old random samples file doesn't apply as we're not sampling by country …
16:10 Changeset [33938] by ak19
1. Don't regenerate random sample of web page urls and full web page url …
16:06 Changeset [33937] by ak19
New counts of manual sites after reingesting into MongoDB. Forgot to …
16:05 Changeset [33936] by ak19
Renaming old file to place with new counts after reingesting into MongoDB.

16.02.2020:

18:16 Changeset [33935] by davidb
Additional check added into get-isis target
17:34 Changeset [33934] by davidb
Removal of static code block calling ancient/deprecated static …
14:19 Changeset [33933] by davidb
Changed 8-spaces to tag chars in Makefile.in. Original problem caused by …

15.02.2020:

19:14 Changeset [33932] by davidb
Commented out Java version warning message, as it presents as something …
19:10 Changeset [33931] by davidb
Two changes to setup file. The first was to move the test for ant to be …
19:00 Changeset [33930] by davidb
Code used to assume that major number was a single digit, as in 1.6 or …
18:57 Changeset [33929] by davidb
Newer JDKs don't have javah => make file change that takes account of this
18:55 Changeset [33928] by davidb
Streamlining of how test for JDK/javac is done
14:57 Changeset [33927] by davidb
Reworking of javah test

14.02.2020:

23:03 Changeset [33926] by ak19
Investigated some other options for screen capturing and Google chrome …
20:41 Changeset [33925] by ak19
1. Bugfix: oversight, should return uri encoded URL for mapData, forgot to …
19:22 Changeset [33924] by ak19
Adding in Dr Bainbridge's command to check the JSON generated is valid. …
18:45 Changeset [33923] by davidb
Removed non-UTF8 valid char from comment; regenerated tar file
18:13 Changeset [33922] by davidb
Notes about using this site
18:11 Changeset [33921] by davidb
Newer Java's don't have 'javah' any more. The functionality has been …
16:55 Changeset [33920] by davidb
Found to be needed when compiling up on a Google Compute Engine (GCE) …

13.02.2020:

22:40 Changeset [33919] by ak19
SummaryTool? now uses the CountryCodeCountsMapData?.java class to generate …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist?2.txt …

12.02.2020:

21:27 Changeset [33913] by ak19
1. Adjusted table mongodb query statements to be more exact, but same …
19:53 Changeset [33912] by ak19
Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
19:12 Changeset [33911] by ak19
Correct commit message for previous and current commit: 1. After …
19:05 Changeset [33910] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
19:02 Changeset [33909] by ak19
1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …

10.02.2020:

09:41 Changeset [33908] by kjdon
meta values are already escaped. Don't want to escape them again otherwise …

05.02.2020:

23:38 Changeset [33907] by ak19
See previous commit message. This will be the file with the results for …
23:36 Changeset [33906] by ak19
Code is intermediate state. 1. Introduced basicDomain field to MongoDB and …
18:49 Changeset [33905] by ak19
More notes
18:48 Changeset [33904] by ak19
Shouldn't greylist anglican.org, as this prevented crawling of …
Note: See TracTimeline for information about the timeline view.