Timeline
2020-03-10:
- 21:03 Changeset [34015] by
- Further elimination of PJ related HTML/templates
- 21:02 Changeset [34014] by
- Added in vidoe player template; remove PJ templates
- 21:01 Changeset [34013] by
- Added hr line to break up sections
- 20:53 Changeset [34012] by
- Images for Atea alt interface
- 20:45 Changeset [34011] by
- Piechart data for sites prepared for crawling and the piecharts for these
- 20:45 Changeset [34010] by
- icon image for MP4 video
- 20:25 Changeset [34009] by
- PJ based alternative interface for Atea
- 20:17 Changeset [34008] by
- Alternative interface look-and-feel for the Atea project
- 19:56 Changeset [34007] by
- Prepared more data for the piecharts. This time for empty web pages vs …
- 18:51 Changeset [34006] by
- Committing more data I've collected for generating pie charts and the …
- 17:33 Changeset [34005] by
- InfoOnEmptyPagesNotInMongoDB.txt is now written out to a file, instead …
- 17:27 Changeset [34004] by
- Renaming csv file to have csv extension
- 17:26 Changeset [34003] by
- Redid the file with info on empty URL web pages as a csv file with …
- 12:09 Changeset [34002] by
- Comment-based changes resulting from: (i) merging in differences from …
2020-03-09:
- 18:56 Changeset [34001] by
- Tentative total urls from common crawl 12 month cral data.
- 18:55 Changeset [34000] by
- Some debugging and other minor changes
- 17:34 Changeset [33999] by
- Common crawl 12 month urls and CC provided stats
2020-03-06:
- 17:49 Changeset [33998] by
- Removed import statement that is no longer used, and was stopping …
- 15:55 Changeset [33997] by
- Top-level folder for MARS related Greenstone3 code
- 15:18 Changeset [33996] by
- Accidentally committed the wrong thing in previous commit. Attempting …
- 15:14 Changeset [33995] by
- There was no Expat.so for perl 5.18 so am recompiling and committing that
2020-03-03:
- 14:42 Changeset [33994] by
- The introduction of UTF8Control class means we can now work directly …
2020-03-02:
- 14:10 Changeset [33993] by
- when downloading a pdf, browsers seem to make more than one request - …
2020-03-01:
- 16:41 Changeset [33992] by
- Notes at start of file updated
- 16:35 Changeset [33991] by
- A version of the tomcat/conf/server.xml file that is better aligned …
- 16:29 Changeset [33990] by
- Some white-space changes for consistency with newer …
- 15:16 Changeset [33989] by
- In a default setup, AJP is not used => so not needed. Commented out to …
2020-02-28:
- 22:09 Changeset [33988] by
- 1. Print out which web pages of which web site's dump.txt were empty. …
- 22:08 Changeset [33987] by
- Output of re-running NutchTextDumpToMongoDB to print out which web …
- 22:07 Changeset [33986] by
- Dr Bainbridge investigated the original data set more
2020-02-27:
- 21:49 Changeset [33985] by
- Data to back the piechart I need to make that will illustrate how we …
- 21:44 Changeset [33984] by
- Simple class to summarise some basic counts of the input common crawl data
- 20:26 Changeset [33983] by
- More sensible name for method which had too long kept its old name …
2020-02-26:
- 21:59 Changeset [33982] by
- SummaryTool.java now processed the handcrafted UNIQUE domains counts …
- 21:19 Changeset [33981] by
- As Dr Bainbridge suggested, code now opens a new firefox tab with a …
- 21:11 Changeset [33980] by
- Additional comments
- 21:00 Changeset [33979] by
- Clearly stating that counts are of unique domains
- 19:57 Changeset [33978] by
- Opens all geoJSON maps in new tabs instead of waiting for user to have …
- 18:37 Changeset [33977] by
- Added something on precision vs recall being applicable to our …
- 18:28 Changeset [33976] by
- Adding in what I could remember of Dr Bainbridge's statement about the …
2020-02-25:
- 14:46 Changeset [33975] by
- some mods to do with allowing multiple oaiservers. need …
- 14:14 Changeset [33974] by
- added in new oai.servlets field - if you want to run two oaiservlets, …
- 14:01 Changeset [33973] by
- tidied up the file a bit. added new servlet_url param to oaiserver - …
- 13:47 Changeset [33972] by
- fixed a typo in a comment
- 13:47 Changeset [33971] by
- get servlet_url param and pass to getOAIConfigXML, as now the files …
- 13:46 Changeset [33970] by
- changed OAIConfig naming to OAIConfig-oaiserver.xml - so multiple …
- 13:39 Changeset [33969] by
- we no longer use OAIConfig.xml as the filename, now we use eg …
- 13:37 Changeset [33968] by
- pass in oai_config from server, rather than reading it in itself
- 13:36 Changeset [33967] by
- you might want to change the oaiserver url, eg if you have 2 oai …
2020-02-21:
- 21:00 Changeset [33966] by
- Added the origSequence and basicDomain columns to the random 260 web …
- 20:59 Changeset [33965] by
- 1. Adding a basicDomain column (stripped of http/https and www prefix) …
- 19:57 Changeset [33964] by
- 2 records were missing a value for the qualityLevel column.
2020-02-20:
- 22:12 Changeset [33963] by
- Added a new helper method to MongoDBQueryer.java to add numPagesInMRI …
- 22:07 Changeset [33962] by
- 2 fields changed, as one was missed out and the other incorrectly …
- 20:24 Changeset [33961] by
- New category, LINK_TEXT, introduced for the random web page URL samples.
- 20:22 Changeset [33960] by
- Reviewed all the random sample web page URLs marked …
- 20:06 Changeset [33959] by
- URIEncoding the mapData makes it unparseable by geojson.io
- 19:32 Changeset [33958] by
- There were other xsl files using the original depositorTitleAndLink …
- 19:24 Changeset [33957] by
- 1. depositor related interface display modified to work with recent …
- 18:28 Changeset [33956] by
- Related to commit 33953: made lots of accidental commits in rev 33953, …
- 18:26 Changeset [33955] by
- Undoing accidental commit of unintended files.
- 18:21 Changeset [33954] by
- Accidentally committed with other files. Undoing.
- 18:19 Changeset [33953] by
- Depositor link not used
2020-02-18:
- 23:35 Changeset [33952] by
- Minor changes for processing
- 23:33 Changeset [33951] by
- Reviewed the qualityLevel column where LITTLE_TEXT was assigned.
- 23:28 Changeset [33950] by
- Reviewed the qualityLevel column where MIXED_TEXT was assigned.
- 23:22 Changeset [33949] by
- Reviewed the qualityLevel column where NAV was assigned.
- 22:56 Changeset [33948] by
- Reviewed the random sampled web page URLs marked as …
- 22:07 Changeset [33947] by
- Some more questionmarked field values assigned.
- 21:58 Changeset [33946] by
- 1. New function to handle user input assigning the newly introduced …
- 21:48 Changeset [33945] by
- Added a 4th column for all 260 sample web page URLs and have used the …
- 16:44 Changeset [33944] by
- Added the isReallyInMRI column after manually inspecting the remaining …
- 15:56 Changeset [33943] by
- Further tweaking of javah check after it failed to work on Bedrock LSB
- 15:55 Changeset [33942] by
- Further tweaking of javah check after it failed to work on Bedrock LSB
- 15:18 Changeset [33941] by
- 1. Uppercase 3rd field (Y/N/? field) read back in from file before …
2020-02-17:
- 22:16 Changeset [33940] by
- 1. In order to make it easier to do the manual work of inspecting 260 …
- 16:22 Changeset [33939] by
- 1. Old random samples file doesn't apply as we're not sampling by …
- 16:10 Changeset [33938] by
- 1. Don't regenerate random sample of web page urls and full web page …
- 16:06 Changeset [33937] by
- New counts of manual sites after reingesting into MongoDB. Forgot to …
- 16:05 Changeset [33936] by
- Renaming old file to place with new counts after reingesting into MongoDB.
2020-02-16:
- 18:16 Changeset [33935] by
- Additional check added into get-isis target
- 17:34 Changeset [33934] by
- Removal of static code block calling ancient/deprecated static …
- 14:19 Changeset [33933] by
- Changed 8-spaces to tag chars in Makefile.in. Original problem caused …
2020-02-15:
- 19:14 Changeset [33932] by
- Commented out Java version warning message, as it presents as …
- 19:10 Changeset [33931] by
- Two changes to setup file. The first was to move the test for ant to …
- 19:00 Changeset [33930] by
- Code used to assume that major number was a single digit, as in 1.6 or …
- 18:57 Changeset [33929] by
- Newer JDKs don't have javah => make file change that takes account of this
- 18:55 Changeset [33928] by
- Streamlining of how test for JDK/javac is done
- 14:57 Changeset [33927] by
- Reworking of javah test
2020-02-14:
- 23:03 Changeset [33926] by
- Investigated some other options for screen capturing and Google chrome …
- 20:41 Changeset [33925] by
- 1. Bugfix: oversight, should return uri encoded URL for mapData, …
- 19:22 Changeset [33924] by
- Adding in Dr Bainbridge's command to check the JSON generated is …
- 18:45 Changeset [33923] by
- Removed non-UTF8 valid char from comment; regenerated tar file
- 18:13 Changeset [33922] by
- Notes about using this site
- 18:11 Changeset [33921] by
- Newer Java's don't have 'javah' any more. The functionality has been …
- 16:55 Changeset [33920] by
- Found to be needed when compiling up on a Google Compute Engine (GCE) …
2020-02-13:
- 22:40 Changeset [33919] by
- SummaryTool now uses the CountryCodeCountsMapData.java class to …
- 19:34 Changeset [33918] by
- Country codes added to each domain's URL of the manual site/domain …
- 18:18 Changeset [33917] by
- Added some better reporting when confirming sample size was correct
- 17:42 Changeset [33916] by
- Updated the rest of the file after reingest
- 17:12 Changeset [33915] by
- Forgot to add a (manual) counts file created last week, and am now …
- 17:09 Changeset [33914] by
- Shortlisted just the domain sites by country into ManualShortlist2.txt …
2020-02-12:
- 21:27 Changeset [33913] by
- 1. Adjusted table mongodb query statements to be more exact, but same …
- 19:53 Changeset [33912] by
- Forgot to svn add the new MongoDBQueryer.java class with commit 33909. …
- 19:12 Changeset [33911] by
- Correct commit message for previous and current commit: 1. After …
- 19:05 Changeset [33910] by
- 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
- 19:02 Changeset [33909] by
- 1. Implementing tables 3 to 5. 2. Rolled back the introduction of the …
2020-02-10:
- 09:41 Changeset [33908] by
- meta values are already escaped. Don't want to escape them again …
Note:
See TracTimeline
for information about the timeline view.