source: other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures@ 32101

Name Size Rev Age Author Last Change
../
ClusterFileIO.java 6.5 KB 32101   4 years davidb Tweaks to allow serial ingest to run
JSONClusterFileIO.java 782 bytes 31272   5 years davidb Use disk and memory to store main language RDD
PerPageJSONFlatmap.java 5.0 KB 31786   4 years davidb extra param in call; change to case-folding _htrctokentext
PerPageJSONMap.java 2.6 KB 31504   5 years davidb Adjusted call to work with added parameter
PerVolumeCatalogLangSequenceFileMap.java 1.6 KB 31360   5 years davidb Seems to be Text class not a String class coming out of the seuquenceFiles
PerVolumeCatalogLangStreamFlatmap.java 2.4 KB 31294   5 years davidb Version for language counting the catalog assignment language …
PerVolumeJSON.java 9.4 KB 31784   4 years davidb Output to highlight skipping per-page indexing
PerVolumeLangStreamFlatmap.java 3.2 KB 31269   5 years davidb Some variable name changes, and printing tidy up
PerVolumeMongoDBDocumentsMap.java 7.5 KB 31320   5 years davidb build Document rather than parse JSON string
PerVolumePOSStreamFlatmap.java 3.2 KB 31271   5 years davidb Updating of POS code to new files-per-partition paramater, plus some …
PerVolumeWordStreamFlatmap.java 3.3 KB 31273   5 years davidb Code moved to store fields for multilingual use using dynamic Solr …
POSString.java 335 bytes 31375   5 years davidb Initial cut at including POS information to solr index
ProcessForCatalogLangCount.java 12.2 KB 31371   5 years davidb Trying to get saveAsSequenceFile working
ProcessForLangCount.java 6.7 KB 31272   5 years davidb Use disk and memory to store main language RDD
ProcessForMongoDBIngest.java 6.1 KB 31319   5 years davidb Changed to replace existing MongoDB entry. Fixed up printt statement
ProcessForPOSCount.java 7.0 KB 31271   5 years davidb Updating of POS code to new files-per-partition paramater, plus some …
ProcessForSerialSolrIngest.java 8.7 KB 32101   4 years davidb Tweaks to allow serial ingest to run
ProcessForSolrIngest.java 13.2 KB 31597   5 years davidb Additional _s and _ss fields to help with faceting. Temporarily …
ProcessForWhitelist.java 7.9 KB 31308   5 years davidb Minor tidy-up
SolrDocJSON.java 25.5 KB 31786   4 years davidb extra param in call; change to case-folding _htrctokentext
TestWhitelistBloomFilter.java 3.7 KB 31200   5 years davidb Better output statement
TestWhitelistDictionaryMain.java 1.0 KB 31199   5 years davidb Renaming of classname to reflect filename rename
TestWhitelistHashmap.java 1.3 KB 31199   5 years davidb Renaming of classname to reflect filename rename
UniversalPOSLangMap.java 5.1 KB 32101   4 years davidb Tweaks to allow serial ingest to run
WhitelistBloomFilter.java 4.2 KB 31227   5 years davidb Code tidy up
Note: See TracBrowser for help on using the repository browser.