- Timestamp:
- 2019-07-23T17:29:18+12:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/src/MaoriTextDetector.java
r33338 r33350 1 1 /** 2 * Class that uses OpenNLP with the Language Detection Model to determine, with a default 3 * or configurable level of confidence, whether text (from a file or stdin) is in MÄori or not. 4 * Internal functions can be used for detecting any of the 103 languages currently supported by 5 * the OpenNLP Language Detection Model. 6 * 2 7 * http://opennlp.apache.org/news/model-langdetect-183.html 3 8 * language detector model: http://opennlp.apache.org/models.html … … 8 13 * 9 14 * This code was based on the information and sample code at the above links and the links dispersed throughout this file. 15 * See also the accompanying README file. 16 * 17 * July 2019 10 18 */ 11 19 … … 16 24 /** 17 25 * EXPORT OPENNLP_HOME environment variable to be your apache OpenNLP installation. 18 * Then, to compile this program: 26 * Create a folder called "models" within the $OPENNLP_HOME folder, and put the file "langdetect-183.bin" in there 27 * (which is the language detection model zipped up and renamed to .bin extension). 28 * 29 * Then, to compile this program, do the following from the "src" folder (the folder containing this java file): 19 30 * maori-lang-detection/src$ javac -cp ".:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" MaoriTextDetector.java 20 * To run this program, one of: 31 * 32 * To run this program, issue one of the following commands from the "src" folder (the folder containing this java file): 21 33 * 22 34 * maori-lang-detection/src$ java -cp ".:$OPENNLP_HOME/lib/*" MaoriTextDetector --help … … 25 37 * 26 38 * maori-lang-detection/src$ java -cp ".:$OPENNLP_HOME/lib/*" MaoriTextDetector - 27 * whichexpects text to stream in from standard input.39 * Press enter. This variant of the program expects text to stream in from standard input. 28 40 * If entering text manually, then remember to press Ctrl-D to indicate the usual end of StdIn. 29 41 * … … 39 51 greater or equal to which determines that the best predicted language is acceptable to user of MaoriTextDetector. */ 40 52 public final double MINIMUM_CONFIDENCE; 53 41 54 /** silentMode set to false means MaoriTextDetector won't print helpful messages while running. Set to true to run silently. */ 42 55 public final boolean silentMode; … … 44 57 /** Language Detection Model file for OpenNLP is expected to be at $OPENNLP_HOME/models/langdetect-183.bin */ 45 58 private final String LANG_DETECT_MODEL_RELATIVE_PATH = "models" + File.separator + "langdetect-183.bin"; 59 60 /** 61 * The LanguageDetectorModel object that will do the actual language detection/prediction for us. 62 * Created once in the constructor, can be used as often as needed thereafter. 63 */ 46 64 private LanguageDetector myCategorizer = null; 47 65 48 /** 49 * String taken from our university website 50 * https://www.waikato.ac.nz/maori/ 51 */ 66 /** String taken from our university website, https://www.waikato.ac.nz/maori/ */ 52 67 public static final String TEST_MRI_INPUT_TEXT = "Ko tÄnei te Whare WÄnanga o Waikato e whakatau nei i ngÄ iwi o te ao, ki roto i te riu o te awa e rere nei, ki runga i te whenua e hora nei, ki raro i te taumaru o ngÄ maunga whakaruru e tau awhi nei."; 53 68 … … 224 239 /** 225 240 * Prints to STDOUT the predicted languages of the input text in order of descending confidence. 226 * U nused.241 * UNUSED. 227 242 */ 228 243 public void predictedLanguages(String text) { … … 367 382 System.exit(returnVal); 368 383 } 369 384 385 386 // 2. Finally, we can now do the actual language detection 370 387 try { 371 388 MaoriTextDetector maoriTextDetector = null;
Note:
See TracChangeset
for help on using the changeset viewer.