source: other-projects/hathitrust/wcsa/extracted-features-solr

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @31675   7 years davidb More careful set of metadata fields indexed
(edit) @31645   7 years davidb Some initial work on drawing in workset info from sparql-endpoint. …
(edit) @31626   7 years davidb Links to blog entries added
(edit) @31625   7 years davidb Tidy up
(edit) @31624   7 years davidb Combined volume md and full-text page searching
(edit) @31623   7 years davidb Removed commented out static HTML POS section
(edit) @31622   7 years davidb Adding in CORS support to Solr
(edit) @31621   7 years davidb Step towards making HTML/JS work with on different server, with AJAX …
(edit) @31619   7 years davidb Further minor tidy up
(edit) @31618   7 years davidb Code tidy up
(edit) @31614   7 years davidb Separate off stream query page
(edit) @31613   7 years davidb Multiple word support in POS search box. Tidy up of anchor for search …
(edit) @31601   7 years davidb To get the look and feel of the HTRC portal web site, supporting files …
(edit) @31598   7 years davidb Easier to remember what to do
(edit) @31597   7 years davidb Additional _s and _ss fields to help with faceting. Temporarily …
(edit) @31571   7 years davidb Simple search-all-langs feature added
(edit) @31570   7 years davidb Solr-stream based search
(edit) @31524   7 years davidb Main changes: Fix for page/seqnum; group by id; show-hide other …
(edit) @31510   7 years davidb Turns out some languages fields can be empty. Need to test for this
(edit) @31509   7 years davidb LangPos determination changed to lock into first match, rather than …
(edit) @31506   7 years davidb Forgot to add initialization line. Doh!
(edit) @31505   7 years davidb Added in storing of top-level document metadata as separate solr-doc
(edit) @31504   7 years davidb Adjusted call to work with added parameter
(edit) @31503   7 years davidb Monitor for missing POS keys, and print out details first time each …
(edit) @31502   7 years davidb Comment out section, useful for controlling a smaller run
(edit) @31501   7 years davidb No longer used
(edit) @31500   7 years davidb Synchronize on reading in of white-list and universal-lang-pos
(edit) @31499   7 years davidb Better exception handling
(edit) @31498   7 years davidb Tidy up on print statements
(edit) @31466   7 years davidb Fix to work out solr_host rather than assume it is gc0
(edit) @31465   7 years davidb Adjustment to run solr with more memory
(edit) @31464   7 years davidb More general version of script that let's you specify the collection …
(edit) @31455   7 years davidb deprecated
(edit) @31454   7 years davidb Deprecated
(edit) @31453   7 years davidb Added size() method
(edit) @31452   7 years davidb Additional Spark progs to run
(edit) @31451   7 years davidb shift to using solr-base-url and a specified solr-collection
(edit) @31450   7 years davidb Some debugging output to help see what is happening with …
(edit) @31385   7 years davidb Next and previous pages
(edit) @31384   7 years davidb After next phase of development
(edit) @31383   7 years davidb Files for initial functioning search page
(edit) @31378   7 years davidb Fixed loop limit test
(edit) @31377   7 years davidb Switch to using URI not string
(edit) @31376   7 years davidb Universal language mappings for opennlp POS model tags
(edit) @31375   7 years davidb Initial cut at including POS information to solr index
(edit) @31374   7 years davidb simplified command line usage
(edit) @31373   7 years davidb Changes made to operate on solr1 and solr2 boxes
(edit) @31372   7 years davidb Reworked to use sequenceFiles
(edit) @31371   7 years davidb Trying to get saveAsSequenceFile working
(edit) @31370   7 years davidb Fixed incorrect version number. Using htrcstring so field values not …
(edit) @31369   7 years davidb Trial new save
(edit) @31368   7 years davidb downsample-100 added
(edit) @31367   7 years davidb Changes to work with solr1 and solr2
(edit) @31366   7 years davidb Updated to latest released version of Solr
(edit) @31365   7 years davidb Quick code added to downsample
(edit) @31364   7 years davidb removed sample() line
(edit) @31363   7 years davidb Control num of partitions on sort
(edit) @31362   7 years davidb use Spark sample() to make for smaller test with Sequence files
(edit) @31361   7 years davidb Change from String to Text
(edit) @31360   7 years davidb Seems to be Text class not a String class coming out of the seuquenceFiles
(edit) @31359   7 years davidb Changed over to use sequenceFiles as input
(edit) @31320   7 years davidb build Document rather than parse JSON string
(edit) @31319   7 years davidb Changed to replace existing MongoDB entry. Fixed up printt statement
(edit) @31318   7 years davidb change to using contains()
(edit) @31317   7 years davidb added debug statement
(edit) @31316   7 years davidb fixed typo
(edit) @31315   7 years davidb Further tweak
(edit) @31314   7 years davidb Another go at avoiding concurrency update exception
(edit) @31313   7 years davidb Alternative to avoid concurrency update exception
(edit) @31312   7 years davidb MongoDB can't have 'period' and 'dollar' in key, as reserved characters
(edit) @31311   7 years davidb Processing print statement added
(edit) @31310   7 years davidb Initial cut at files for working with MongoDB
(edit) @31309   7 years davidb Sparked MongoDB connector added
(edit) @31308   7 years davidb Minor tidy-up
(edit) @31307   7 years davidb convenience scripts
(edit) @31306   7 years davidb Final part of the mongodb shard puzzle -- router servers
(edit) @31305   7 years davidb Next good commit point. Initial testing of shard replset scripts
(edit) @31304   7 years davidb Changes made whe (it turned out) the real source of the error was an …
(edit) @31303   7 years davidb Adding in support to start and stop router server
(edit) @31302   7 years davidb Initial commit of scripts, after some testing, and subsequent changes …
(edit) @31301   7 years davidb Fix for gsliscluster1
(edit) @31300   7 years davidb Need to use NETWORK not PACKAGE
(edit) @31299   7 years davidb Additionally setup MongoDB
(edit) @31298   7 years davidb Initial cut at setup file for MongoDB
(edit) @31297   7 years davidb
(edit) @31294   7 years davidb Version for language counting the catalog assignment language …
(edit) @31278   7 years davidb To avoid null pointer on ids.iterator()
(edit) @31277   7 years davidb Tweak to minimum value
(edit) @31276   7 years davidb Min num partition guard put in
(edit) @31275   7 years davidb Changes to allow gc slave nodes to work with local disk versions of …
(edit) @31274   7 years davidb Need to use JSONArray no JSONObject for a multifield item
(edit) @31273   7 years davidb Code moved to store fields for multilingual use using dynamic Solr …
(edit) @31272   7 years davidb Use disk and memory to store main language RDD
(edit) @31271   7 years davidb Updating of POS code to new files-per-partition paramater, plus some …
(edit) @31270   7 years davidb Changed over to repartition approach
(edit) @31269   7 years davidb Some variable name changes, and printing tidy up
(edit) @31268   7 years davidb Adjustments to memory allocation in response to test runs on 10% of dataset
(edit) @31267   7 years davidb Values trialed on gsliscluster1. Rekindling idea of per-vol processing
(edit) @31266   7 years davidb Rekindling of per-volume approach. Also some tweaking to verbosity …
(edit) @31264   7 years davidb Switching to 'long' in counts to allow higher number representation
Note: See TracRevisionLog for help on using the revision log.