Timeline
2019-09-27:
- 17:05 Changeset [33534] by
- Correction: toplevel script has to be placed inside cc-index-table not …
- 11:02 Changeset [33533] by
- some collections might not have Title or root_Title metadata, so check …
2019-09-26:
- 23:06 Changeset [33532] by
- Found the other top 500 sites link again at last which Dr Bainbridge …
- 23:03 Changeset [33531] by
- Added whitelist for mi.wikipedia.org, and updates to blacklist and …
- 22:41 Changeset [33530] by
- Completed sentence that was left hanging.
- 22:22 Changeset [33529] by
- Forgot to add most basic nutch links
- 21:47 Changeset [33528] by
- Adding in Nutch links
- 20:39 Changeset [33527] by
- Name change for folder
- 20:38 Changeset [33526] by
- Moved hadoop related scripts from bin/script into hdfs-instructions
- 20:35 Changeset [33525] by
- Rename before latest version
- 20:34 Changeset [33524] by
- 1. Further adjustments to documenting what we did to get things to run …
- 19:00 Changeset [33523] by
- Instructional comment
- 19:00 Changeset [33522] by
- Some comments and an improvement
- 17:49 Changeset [33521] by
- AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
- 17:49 Changeset [33520] by
- AUTOCOMMIT by gen-model-colls.sh script. Message: Redoing the CDS-ISIS …
2019-09-24:
- 21:40 Changeset [33519] by
- Code still writes out the global seedURLs.txt and regex-urlfilter.txt …
- 21:13 Changeset [33518] by
- Intermediate commit: got the seed urls file temporarily written out as …
- 20:30 Changeset [33517] by
- 1. Blacklists were introduced so that too many instances of camelcased …
- 20:14 Changeset [33516] by
- Before I accidentally lose it, committing the script Dr Bainbridge …
- 19:50 Changeset [33515] by
- Removed an unused function
- 19:44 Changeset [33514] by
- Committing README on starting off with the vagrant VM for hadoop-spark …
- 19:15 Changeset [33513] by
- Higher level script that runs against each named crawl since Sep 2018 …
- 15:17 Changeset [33512] by
- AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
- 15:16 Changeset [33511] by
- AUTOCOMMIT by gen-model-colls.sh script. Message: Rebuilding all the …
- 14:13 Changeset [33510] by
- isEditingTurnedOn renamed to isEditingAllowed, and added …
- 14:12 Changeset [33509] by
- only display Map GPS editing stuff if its allowed in config file
- 14:07 Changeset [33508] by
- pass a param into readyPageForEditing - indicates whether to add the …
- 13:24 Changeset [33507] by
- moved canDoEditing variable code to top, so can be used everywhere in …
- 13:04 Changeset [33506] by
- need to check whether document editing is turned on, not just if the …
- 12:55 Changeset [33505] by
- allowUserComments option changed to start with lower case a, to match …
- 12:53 Changeset [33504] by
- allowDocumentEditing option changed to start with lower case a, to …
- 10:23 Ticket #955 (Use of GreenStone/Koha with Multimedia Production Management - e.g. Lumiera) created by
- This was a feature request added to sourceforge greenstone 3 project …
2019-09-23:
- 23:16 Changeset [33503] by
- More efficient blacklisting/greylisting/whitelisting now by reading in …
- 23:11 Changeset [33502] by
- Current url pattern blacklist and greylist filter files. Used by …
- 21:28 Changeset [33501] by
- Refactored code into 2 classes: The existing WETProcessor, which …
- 19:05 Changeset [33500] by
- ThemeRoller download functionality currently offline. So uploading the …
- 17:59 Changeset [33499] by
- Explicitly adding in IAM policy configuration details instead of just …
- 16:43 Changeset [33498] by
- Corrections to script. Modified the tests checking for file/dir …
2019-09-22:
- 21:17 Changeset [33497] by
- First version of discard url filter file. Inefficient implementation. …
- 19:23 Changeset [33496] by
- Minor changes to reading list file
- 19:19 Changeset [33495] by
- Pruned out unused commands, added comments, marked unused variables to …
2019-09-21:
- 22:49 Changeset [33494] by
- All in one script that takes as parameter a common crawl identifier of …
2019-09-19:
- 14:24 Changeset [33493] by
- if we are on a cross collection search page, the collection for each …
- 13:43 Changeset [33492] by
- not all ccs pages has hierarchy element, so just test on s1.collection
- 13:23 Changeset [33491] by
- need to add optional args for doc links into the CCS format links. …
- 12:34 Changeset [33490] by
- changed default partition sizes back to 20, to match what was there …
2019-09-18:
- 20:20 Changeset [33489] by
- Handy file to not have to keep manually repeating commands when …
2019-09-17:
- 14:48 Changeset [33488] by
- new function createSeedURLsFiles() in WETProcessor that replaces the …
- 14:24 Changeset [33487] by
- added code to display any error messages
- 14:23 Changeset [33486] by
- reindented the page, added some extra links, and organised the items …
- 14:22 Changeset [33485] by
- removed an erroneous space
- 14:21 Changeset [33484] by
- some changes and additions to the debuginfo page texts
- 14:20 Changeset [33483] by
- added an explicit space after Error:
- 10:55 Changeset [33482] by
- changed standardize_capitalization to …
- 10:41 Changeset [33481] by
- a few more refinements to List strings
2019-09-16:
- 19:45 Changeset [33480] by
- Much harder to remove pages where words are fused together as some are …
- 14:55 Changeset [33479] by
- changed numeric option order to match letter options
- 14:54 Changeset [33478] by
- some refining of list option descriptions
- 12:30 Changeset [33477] by
- need to call setup_custom_sort to allow for collection's customsorttools.pm
- 12:30 Changeset [33476] by
- enabled having customsorttools in collection's perllib folder. you can …
- 11:19 Changeset [33475] by
- added numeric partition defaults to match partition type
- 11:04 Changeset [33474] by
- it turns out that childtype is not set in all cases, so put in the …
- 10:21 Changeset [33473] by
- still didn't get it quite right…
- 09:54 Changeset [33472] by
- forgot the -> to access member of a hash ref
2019-09-13:
- 22:57 Changeset [33471] by
- Very minor changes.
- 22:53 Changeset [33470] by
- A new script to reduce keepURLs.txt to unique URLs, 1 from each unique …
- 21:46 Changeset [33469] by
- Don't want URLs with the word product(s) in them (but production …
- 19:24 Changeset [33468] by
- More meaningful to (also) write out the keep vs discard URLs into keep …
- 17:44 Changeset [33467] by
- Improved the code to use a static block to load the needed properties …
2019-09-12:
- 21:37 Changeset [33466] by
- 1. WETProcessor.main() now processes a folder of *.warc.wet(.gz) …
- 20:00 Changeset [33465] by
- Committing first version of the WETProcessor.java which takes a …
- 14:21 Changeset [33464] by
- I committed the last changes by mistake, using the previous revision …
- 14:17 Changeset [33463] by
- fixed up some typos. removed use_hlist_for option. This is very hard …
2019-09-11:
- 20:10 Changeset [33462] by
- Tested new tomcat.allowLinking property on Windows too now and it …
- 19:45 Changeset [33461] by
- Implementing Diego Spano's suggested changes for tomcat's allowLinking …
2019-09-09:
- 13:04 Changeset [33460] by
- fixed up some typos. removed use_hlist_for option. This is very hard …
- 12:06 Changeset [33459] by
- small changes to some strings
2019-09-07:
- 14:30 Changeset [33458] by
- Running new morphology version after quick meeting with david last …
2019-09-05:
- 19:01 Changeset [33457] by
- Got stage 1, the WARC to WET conversion, working, after necessary …
- 17:26 Changeset [33456] by
- Link to discussion on how to convert WARC to WET
2019-09-04:
- 14:45 Changeset [33455] by
- Started implementing Davids suggested morphology sequence, codeversion9
2019-09-03:
- 14:41 Changeset [33454] by
- updated metadata_selection_mode to be …
- 13:16 Changeset [33453] by
- the new and modified strings for revamped List classifier
- 13:15 Changeset [33452] by
- revamp of list classifier. More precise handling of numeric metadata …
- 12:55 Changeset [33451] by
- added a comment
- 12:54 Changeset [33450] by
- removed some unnecessary comments
2019-09-02:
- 17:08 Changeset [33449] by
- termnal version executes correctly. (Didnt include init threshold in …
2019-08-30:
- 18:27 Changeset [33448] by
- Minor clarification and inclusion of helpful command
- 18:03 Changeset [33447] by
- starting to implement terminal version of new morphology. need to fix. …
2019-08-29:
- 19:12 Changeset [33446] by
- 1. Committing working version of export_maori_subset.sh which takes …
- 17:01 Changeset [33445] by
- The first working hadoop spark script for processing common crawl …
- 16:57 Changeset [33444] by
- Have created a preprocess to remove large objects. …
2019-08-28:
- 20:22 Changeset [33443] by
- More notes
- 19:30 Changeset [33442] by
- Updated gutil.jar file (with SafeProcses debugging)
- 19:30 Changeset [33441] by
- Adding further notes to do with running the CC-index examples on spark.
- 19:17 Changeset [33440] by
- Split file to move vagrant-spark-hadoop notes into own file.
- 17:03 Changeset [33439] by
- Have created properties file and accessibility from …
Note:
See TracTimeline
for information about the timeline view.