Timeline


and .

14.03.2020:

23:30 Changeset [34073] by davidb
cwd and site need to be set earlier on
23:21 Changeset [34072] by davidb
Updated to add download link
23:20 Changeset [34071] by davidb
Script changed to work out the site-name from the pwd (in addiition to col …
18:24 Changeset [34070] by davidb
Don't want this file under svn control for a collection that starts with …
18:18 Changeset [34069] by davidb
Update on files to ignore
18:14 Changeset [34068] by davidb
Standardizing on new name for this type of script
18:13 Changeset [34067] by davidb
Fixed typo, added pw hint
18:12 Changeset [34066] by davidb
Update on files to ignore
18:10 Changeset [34065] by davidb
Do not want these unver svn control for a collection with an empty import …
18:09 Changeset [34064] by davidb
Edited to include pw hint
18:06 Changeset [34063] by davidb
Script rename
18:05 Changeset [34062] by davidb
Fixed typo in script
17:53 Changeset [34061] by davidb
Promoting the Alternative interface to now be the main Atea interface
17:51 Changeset [34060] by davidb
Deprecating this version of the Atea interface
17:48 Changeset [34059] by davidb
Moving the original localhost _LL.properties files out of the way for now; …
17:47 Changeset [34058] by davidb
To be used to hold the original localhost _LL.properties files
17:41 Changeset [34057] by davidb
Updated reference to glTF.png icon
17:40 Changeset [34056] by davidb
Moved from interface location, as referenced in the siteConfig.xml file
17:39 Changeset [34055] by davidb
files to ignore
17:38 Changeset [34054] by davidb
Directories to ignore
17:36 Changeset [34053] by davidb
Don't want these under SVN control in a collection that starts with no …
17:34 Changeset [34052] by davidb
Not working with GLIL, so having such a file will only increasingly get …
17:26 Changeset [34051] by davidb
Update on files to ignore
17:23 Changeset [34050] by davidb
Once the zip files are added in, do not want svn to report they files to …
17:19 Changeset [34049] by davidb
files to ignore
17:18 Changeset [34048] by davidb
directories to ignore
17:15 Changeset [34047] by davidb
Further directory to ignore
17:10 Changeset [34046] by davidb
Some directories to ignore
17:09 Changeset [34045] by davidb
Some files to ignore
17:08 Changeset [34044] by davidb
Some directories to ignore
17:07 Changeset [34043] by davidb
Some files to ignore
17:06 Changeset [34042] by davidb
Initial set of files
17:06 Changeset [34041] by davidb
Initial set of files
17:00 Changeset [34040] by davidb
Top-level folder for MP3s collection sourced from hemi-dl
17:00 Changeset [34039] by davidb
Top-level folder for PDF collection sourced from hemi-dl
16:54 Changeset [34038] by davidb
Run this script to populate the import folder
16:52 Changeset [34037] by davidb
No longer needed as will be formed by untarring import.tar.gz
16:52 Changeset [34036] by davidb
import folder with the result of running the NZ Digital API searching for …
16:49 Changeset [34035] by davidb
Directories to ignore
16:32 Changeset [34034] by davidb
Some files to ignore
16:32 Changeset [34033] by davidb
Solr schema.xml changes that have flowed through from changes in the …
16:25 Changeset [34032] by davidb
Files to ignore
16:24 Changeset [34031] by davidb
Dedicated scripts for this collection to build and activate it
16:23 Changeset [34030] by davidb
Ignore archives and index dirs
16:15 Changeset [34029] by davidb
Moved to atea site

13.03.2020:

23:19 Changeset [34028] by davidb
Tweaks to overall interface look-and-feel
23:19 Changeset [34027] by davidb
Tweaks to overall interface look-and-feel
23:18 Changeset [34026] by davidb
Used to provide the gray jquery-ui theme to Atea
23:16 Changeset [34025] by davidb
Icon to glTF 3D model/zip files
23:15 Changeset [34024] by davidb
Couple of over-looked files for the initial set of files for Global …
23:14 Changeset [34023] by davidb
Initial set of files for Global Digital Heritage glTF demonstration set of …
23:09 Changeset [34022] by davidb
Collection for demonstration VR model artefacts

12.03.2020:

17:22 Changeset [34021] by davidb
Tidy up on help/usage message
17:20 Changeset [34020] by davidb
Changed to using newer version (8.5.51) of Tomcat
15:04 Changeset [34019] by kjdon
replaced a couple of text strings
13:42 Changeset [34018] by kjdon
check for error element in response - add that in if present, instead of …
13:41 Changeset [34017] by kjdon
add error element, don't just print a message to log, if we have specified …
13:32 Changeset [34016] by kjdon
added cpan folder to @INC, as something is expecting to find JSON.pm - …

10.03.2020:

21:03 Changeset [34015] by davidb
Further elimination of PJ related HTML/templates
21:02 Changeset [34014] by davidb
Added in vidoe player template; remove PJ templates
21:01 Changeset [34013] by davidb
Added hr line to break up sections
20:53 Changeset [34012] by davidb
Images for Atea alt interface
20:45 Changeset [34011] by ak19
Piechart data for sites prepared for crawling and the piecharts for these
20:45 Changeset [34010] by davidb
icon image for MP4 video
20:25 Changeset [34009] by davidb
PJ based alternative interface for Atea
20:17 Changeset [34008] by davidb
Alternative interface look-and-feel for the Atea project
19:56 Changeset [34007] by ak19
Prepared more data for the piecharts. This time for empty web pages vs …
18:51 Changeset [34006] by ak19
Committing more data I've collected for generating pie charts and the …
17:33 Changeset [34005] by ak19
InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead of …
17:27 Changeset [34004] by ak19
Renaming csv file to have csv extension
17:26 Changeset [34003] by ak19
Redid the file with info on empty URL web pages as a csv file with more …
12:09 Changeset [34002] by davidb
Comment-based changes resulting from: (i) merging in differences from the …

09.03.2020:

18:56 Changeset [34001] by ak19
Tentative total urls from common crawl 12 month cral data.
18:55 Changeset [34000] by ak19
Some debugging and other minor changes
17:34 Changeset [33999] by ak19
Common crawl 12 month urls and CC provided stats

06.03.2020:

17:49 Changeset [33998] by davidb
Removed import statement that is no longer used, and was stopping …
15:55 Changeset [33997] by davidb
Top-level folder for MARS related Greenstone3 code
15:18 Changeset [33996] by ak19
Accidentally committed the wrong thing in previous commit. Attempting the …
15:14 Changeset [33995] by ak19
There was no Expat.so for perl 5.18 so am recompiling and committing that

03.03.2020:

14:42 Changeset [33994] by davidb
The introduction of UTF8Control class means we can now work directly with …

02.03.2020:

14:10 Changeset [33993] by kjdon
when downloading a pdf, browsers seem to make more than one request - …

01.03.2020:

16:41 Changeset [33992] by davidb
Notes at start of file updated
16:35 Changeset [33991] by davidb
A version of the tomcat/conf/server.xml file that is better aligned with …
16:29 Changeset [33990] by davidb
Some white-space changes for consistency with newer tomcat/conf/server.xml …
15:16 Changeset [33989] by davidb
In a default setup, AJP is not used => so not needed. Commented out to …

28.02.2020:

22:09 Changeset [33988] by ak19
1. Print out which web pages of which web site's dump.txt were empty. Then …
22:08 Changeset [33987] by ak19
Output of re-running NutchTextDumpToMongoDB to print out which web pages …
22:07 Changeset [33986] by ak19
Dr Bainbridge investigated the original data set more

27.02.2020:

21:49 Changeset [33985] by ak19
Data to back the piechart I need to make that will illustrate how we …
21:44 Changeset [33984] by ak19
Simple class to summarise some basic counts of the input common crawl data
20:26 Changeset [33983] by ak19
More sensible name for method which had too long kept its old name from …

26.02.2020:

21:59 Changeset [33982] by ak19
SummaryTool?.java now processed the handcrafted UNIQUE domains counts file …
21:19 Changeset [33981] by ak19
As Dr Bainbridge suggested, code now opens a new firefox tab with a …
21:11 Changeset [33980] by ak19
Additional comments
21:00 Changeset [33979] by ak19
Clearly stating that counts are of unique domains
19:57 Changeset [33978] by ak19
Opens all geoJSON maps in new tabs instead of waiting for user to have …
18:37 Changeset [33977] by ak19
Added something on precision vs recall being applicable to our sampling …
18:28 Changeset [33976] by ak19
Adding in what I could remember of Dr Bainbridge's statement about the …

25.02.2020:

14:46 Changeset [33975] by kjdon
some mods to do with allowing multiple oaiservers. need …
14:14 Changeset [33974] by kjdon
added in new oai.servlets field - if you want to run two oaiservlets, add …
14:01 Changeset [33973] by kjdon
tidied up the file a bit. added new servlet_url param to oaiserver - used …
13:47 Changeset [33972] by kjdon
fixed a typo in a comment
13:47 Changeset [33971] by kjdon
get servlet_url param and pass to getOAIConfigXML, as now the files are …
13:46 Changeset [33970] by kjdon
changed OAIConfig naming to OAIConfig-oaiserver.xml - so multiple versions …
13:39 Changeset [33969] by kjdon
we no longer use OAIConfig.xml as the filename, now we use eg …
13:37 Changeset [33968] by kjdon
pass in oai_config from server, rather than reading it in itself
13:36 Changeset [33967] by kjdon
you might want to change the oaiserver url, eg if you have 2 oai servers, …

21.02.2020:

21:00 Changeset [33966] by ak19
Added the origSequence and basicDomain columns to the random 260 web page …
20:59 Changeset [33965] by ak19
1. Adding a basicDomain column (stripped of http/https and www prefix) for …
19:57 Changeset [33964] by ak19
2 records were missing a value for the qualityLevel column.

20.02.2020:

22:12 Changeset [33963] by ak19
Added a new helper method to MongoDBQueryer.java to add numPagesInMRI and …
22:07 Changeset [33962] by ak19
2 fields changed, as one was missed out and the other incorrectly entered. …
20:24 Changeset [33961] by ak19
New category, LINK_TEXT, introduced for the random web page URL samples.
20:22 Changeset [33960] by ak19
Reviewed all the random sample web page URLs marked SINGLE_MRI_SENTENCE …
20:06 Changeset [33959] by ak19
URIEncoding the mapData makes it unparseable by geojson.io
19:32 Changeset [33958] by ak19
There were other xsl files using the original depositorTitleAndLink …
19:24 Changeset [33957] by ak19
1. depositor related interface display modified to work with recent …
18:28 Changeset [33956] by ak19
Related to commit 33953: made lots of accidental commits in rev 33953, and …
18:26 Changeset [33955] by ak19
Undoing accidental commit of unintended files.
18:21 Changeset [33954] by ak19
Accidentally committed with other files. Undoing.
18:19 Changeset [33953] by ak19
Depositor link not used

18.02.2020:

23:35 Changeset [33952] by ak19
Minor changes for processing
23:33 Changeset [33951] by ak19
Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
23:28 Changeset [33950] by ak19
Reviewed the qualityLevel column where MIXED_TEXT was assigned.
23:22 Changeset [33949] by ak19
Reviewed the qualityLevel column where NAV was assigned.
22:56 Changeset [33948] by ak19
Reviewed the random sampled web page URLs marked as SIGNIFICANTLY_MAORI …
22:07 Changeset [33947] by ak19
Some more questionmarked field values assigned.
21:58 Changeset [33946] by ak19
1. New function to handle user input assigning the newly introduced 4th …
21:48 Changeset [33945] by ak19
Added a 4th column for all 260 sample web page URLs and have used the …
16:44 Changeset [33944] by ak19
Added the isReallyInMRI column after manually inspecting the remaining 70 …
15:56 Changeset [33943] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:55 Changeset [33942] by davidb
Further tweaking of javah check after it failed to work on Bedrock LSB
15:18 Changeset [33941] by ak19
1. Uppercase 3rd field (Y/N/? field) read back in from file before being …

17.02.2020:

22:16 Changeset [33940] by ak19
1. In order to make it easier to do the manual work of inspecting 260 web …
16:22 Changeset [33939] by ak19
1. Old random samples file doesn't apply as we're not sampling by country …
16:10 Changeset [33938] by ak19
1. Don't regenerate random sample of web page urls and full web page url …
16:06 Changeset [33937] by ak19
New counts of manual sites after reingesting into MongoDB. Forgot to …
16:05 Changeset [33936] by ak19
Renaming old file to place with new counts after reingesting into MongoDB.

16.02.2020:

18:16 Changeset [33935] by davidb
Additional check added into get-isis target
17:34 Changeset [33934] by davidb
Removal of static code block calling ancient/deprecated static …
14:19 Changeset [33933] by davidb
Changed 8-spaces to tag chars in Makefile.in. Original problem caused by …

15.02.2020:

19:14 Changeset [33932] by davidb
Commented out Java version warning message, as it presents as something …
19:10 Changeset [33931] by davidb
Two changes to setup file. The first was to move the test for ant to be …
19:00 Changeset [33930] by davidb
Code used to assume that major number was a single digit, as in 1.6 or …
18:57 Changeset [33929] by davidb
Newer JDKs don't have javah => make file change that takes account of this
18:55 Changeset [33928] by davidb
Streamlining of how test for JDK/javac is done
14:57 Changeset [33927] by davidb
Reworking of javah test

14.02.2020:

23:03 Changeset [33926] by ak19
Investigated some other options for screen capturing and Google chrome …
20:41 Changeset [33925] by ak19
1. Bugfix: oversight, should return uri encoded URL for mapData, forgot to …
19:22 Changeset [33924] by ak19
Adding in Dr Bainbridge's command to check the JSON generated is valid. …
18:45 Changeset [33923] by davidb
Removed non-UTF8 valid char from comment; regenerated tar file
18:13 Changeset [33922] by davidb
Notes about using this site
18:11 Changeset [33921] by davidb
Newer Java's don't have 'javah' any more. The functionality has been …
16:55 Changeset [33920] by davidb
Found to be needed when compiling up on a Google Compute Engine (GCE) …

13.02.2020:

22:40 Changeset [33919] by ak19
SummaryTool? now uses the CountryCodeCountsMapData?.java class to generate …
19:34 Changeset [33918] by ak19
Country codes added to each domain's URL of the manual site/domain …
18:18 Changeset [33917] by ak19
Added some better reporting when confirming sample size was correct
17:42 Changeset [33916] by ak19
Updated the rest of the file after reingest
17:12 Changeset [33915] by ak19
Forgot to add a (manual) counts file created last week, and am now …
17:09 Changeset [33914] by ak19
Shortlisted just the domain sites by country into ManualShortlist?2.txt …
Note: See TracTimeline for information about the timeline view.