Changeset 31244 for other-projects
- Timestamp:
- 2016-12-18T16:57:05+13:00 (7 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
other-projects/hathitrust/wcsa/extracted-features-solr/trunk/solr-ingest/src/main/java/org/hathitrust/extractedfeatures/SolrDocJSON.java
r31243 r31244 16 16 import org.apache.commons.compress.compressors.CompressorException; 17 17 import org.json.JSONObject; 18 import org.apache.lucene.analysis.TokenStream; 18 19 import org.apache.lucene.analysis.Tokenizer; 19 20 import org.apache.lucene.analysis.icu.segmentation.ICUTokenizer; … … 40 41 tokenizer.setReader(reader); 41 42 42 //TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);43 //OffsetAttribute offsetAttribute = tokenizer.addAttribute(OffsetAttribute.class);44 43 CharTermAttribute charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class); 45 44 … … 48 47 49 48 while (tokenizer.incrementToken()) { 50 //int startOffset = offsetAttribute.startOffset();51 //int endOffset = offsetAttribute.endOffset();52 49 String term = charTermAttribute.toString(); 53 50 tokens.add(term); 54 51 } 55 52 53 tokenizer.end(); 56 54 tokenizer.close(); 57 55 } … … 107 105 } 108 106 */ 109 110 111 107 112 108 return sb.toString();
Note:
See TracChangeset
for help on using the changeset viewer.