source: other-projects/hathitrust

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @33427   3 years davidb Some initial files on how to get going
(edit) @33426   3 years davidb Folder to details on how to standup the HTRC DevEnv locally
(edit) @32175   4 years davidb No longer needed
(edit) @32174   4 years davidb Useful utility
(edit) @32173   4 years davidb Shift to using Solr7 setup
(edit) @32172   4 years davidb Tweaked to use Solr7 config files with Jetty installation
(edit) @32171   4 years davidb Increased number or max bool clauses
(edit) @32170   4 years davidb Shifting to newer version of Solr
(edit) @32120   5 years davidb Not entirely sure about this tweak to which SOLR env var is being …
(edit) @32119   5 years davidb Another useful script to run with Solr
(edit) @32118   5 years davidb Config files needed to top up Jetty with solr servlet
(edit) @32117   5 years davidb Some changes that are needed to run Jetty with security on admin, and …
(edit) @32116   5 years davidb Useful script that makes it easier to recall how to talk to the …
(edit) @32109   5 years davidb Changes made after testing through YARN
(edit) @32108   5 years davidb Useful breadcrumb for compiling
(edit) @32107   5 years davidb Rekindling the ability to run a JSON-filelist Spark run via YARN
(edit) @32106   5 years davidb Rekindle ability to process a json-filelist.txt using Spark
(edit) @32104   5 years davidb Serial version
(edit) @32103   5 years davidb Tidy up of output
(edit) @32102   5 years davidb Version to project local JSON list serially
(edit) @32101   5 years davidb Tweaks to allow serial ingest to run
(edit) @31786   5 years davidb extra param in call; change to case-folding _htrctokentext
(edit) @31785   5 years davidb Change to allow solr command to optioanlly issue 'restart' instead of …
(edit) @31784   5 years davidb Output to highlight skipping per-page indexing
(edit) @31783   5 years davidb Solr Doc Add changed to include volume-level metadata within every …
(edit) @31782   5 years davidb more careful separation into field types htrcstring and htrcstrings
(edit) @31779   5 years davidb Change in how POS words are checked against the Whitelist. Previously …
(edit) @31772   5 years davidb Accidentally committed
(edit) @31693   5 years davidb Changes to workset information is pulled from sparql-endpoint for each …
(edit) @31677   5 years davidb Supress processing governmentDocument for now in JSON metadata record, …
(edit) @31676   5 years davidb To make it easier to remember how to kill off a YARN task at the …
(edit) @31675   5 years davidb More careful set of metadata fields indexed
(edit) @31645   5 years davidb Some initial work on drawing in workset info from sparql-endpoint. …
(edit) @31626   5 years davidb Links to blog entries added
(edit) @31625   5 years davidb Tidy up
(edit) @31624   5 years davidb Combined volume md and full-text page searching
(edit) @31623   5 years davidb Removed commented out static HTML POS section
(edit) @31622   5 years davidb Adding in CORS support to Solr
(edit) @31621   5 years davidb Step towards making HTML/JS work with on different server, with AJAX …
(edit) @31619   5 years davidb Further minor tidy up
(edit) @31618   5 years davidb Code tidy up
(edit) @31614   5 years davidb Separate off stream query page
(edit) @31613   5 years davidb Multiple word support in POS search box. Tidy up of anchor for search …
(edit) @31601   5 years davidb To get the look and feel of the HTRC portal web site, supporting files …
(edit) @31598   5 years davidb Easier to remember what to do
(edit) @31597   5 years davidb Additional _s and _ss fields to help with faceting. Temporarily …
(edit) @31571   5 years davidb Simple search-all-langs feature added
(edit) @31570   5 years davidb Solr-stream based search
(edit) @31524   5 years davidb Main changes: Fix for page/seqnum; group by id; show-hide other …
(edit) @31510   5 years davidb Turns out some languages fields can be empty. Need to test for this
(edit) @31509   5 years davidb LangPos determination changed to lock into first match, rather than …
(edit) @31506   5 years davidb Forgot to add initialization line. Doh!
(edit) @31505   5 years davidb Added in storing of top-level document metadata as separate solr-doc
(edit) @31504   5 years davidb Adjusted call to work with added parameter
(edit) @31503   5 years davidb Monitor for missing POS keys, and print out details first time each …
(edit) @31502   5 years davidb Comment out section, useful for controlling a smaller run
(edit) @31501   5 years davidb No longer used
(edit) @31500   5 years davidb Synchronize on reading in of white-list and universal-lang-pos
(edit) @31499   5 years davidb Better exception handling
(edit) @31498   5 years davidb Tidy up on print statements
(edit) @31466   5 years davidb Fix to work out solr_host rather than assume it is gc0
(edit) @31465   5 years davidb Adjustment to run solr with more memory
(edit) @31464   5 years davidb More general version of script that let's you specify the collection …
(edit) @31455   5 years davidb deprecated
(edit) @31454   5 years davidb Deprecated
(edit) @31453   5 years davidb Added size() method
(edit) @31452   5 years davidb Additional Spark progs to run
(edit) @31451   5 years davidb shift to using solr-base-url and a specified solr-collection
(edit) @31450   5 years davidb Some debugging output to help see what is happening with …
(edit) @31393   6 years davidb Fixed typo
(edit) @31392   6 years davidb Support for Catalog page added
(edit) @31385   6 years davidb Next and previous pages
(edit) @31384   6 years davidb After next phase of development
(edit) @31383   6 years davidb Files for initial functioning search page
(edit) @31378   6 years davidb Fixed loop limit test
(edit) @31377   6 years davidb Switch to using URI not string
(edit) @31376   6 years davidb Universal language mappings for opennlp POS model tags
(edit) @31375   6 years davidb Initial cut at including POS information to solr index
(edit) @31374   6 years davidb simplified command line usage
(edit) @31373   6 years davidb Changes made to operate on solr1 and solr2 boxes
(edit) @31372   6 years davidb Reworked to use sequenceFiles
(edit) @31371   6 years davidb Trying to get saveAsSequenceFile working
(edit) @31370   6 years davidb Fixed incorrect version number. Using htrcstring so field values not …
(edit) @31369   6 years davidb Trial new save
(edit) @31368   6 years davidb downsample-100 added
(edit) @31367   6 years davidb Changes to work with solr1 and solr2
(edit) @31366   6 years davidb Updated to latest released version of Solr
(edit) @31365   6 years davidb Quick code added to downsample
(edit) @31364   6 years davidb removed sample() line
(edit) @31363   6 years davidb Control num of partitions on sort
(edit) @31362   6 years davidb use Spark sample() to make for smaller test with Sequence files
(edit) @31361   6 years davidb Change from String to Text
(edit) @31360   6 years davidb Seems to be Text class not a String class coming out of the seuquenceFiles
(edit) @31359   6 years davidb Changed over to use sequenceFiles as input
(edit) @31358   6 years davidb Make workset download save as file
(edit) @31357   6 years davidb Ensure all output sent to browser
(edit) @31356   6 years davidb Tidy up on appending missing volumes
(edit) @31355   6 years davidb Changed to using containsKey rather than get to avoid null pointer …
(edit) @31354   6 years davidb import tidy-up
(edit) @31353   6 years davidb Added debug print statement
Note: See TracRevisionLog for help on using the revision log.