Changeset 33350 for gs3-extensions
- Timestamp:
- 2019-07-23T17:29:18+12:00 (5 years ago)
- Location:
- gs3-extensions/maori-lang-detection
- Files:
-
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
gs3-extensions/maori-lang-detection/README.txt
r33339 r33350 38 38 39 39 40 41 42 For reading materials, see the OLD README section below. 40 For links to background reading materials, see the OLD README section further below. 41 42 43 NOTE: The OpenNLP Language Detection Model can detect non-macronised MÄori text too, 44 but as anticipated, the same text produces a lower confidence level for the language prediction. Compare: 45 46 $maori-lang-detection/src>java -cp ".:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" MaoriTextDetector - 47 Waiting to read text from STDIN... (press Ctrl-D when done entering text)> 48 Ko tenei te Whare Wananga o Waikato e whakatau nei i nga iwi o te ao, ki roto i te riu o te awa e rere nei, ki runga i te whenua e hora nei, ki raro i te taumaru o nga maunga whakaruru e tau awhi nei. 49 Best language: mri 50 Best language confidence: 0.5959533972070814 51 Exitting program with returnVal 0... 52 53 $maori-lang-detection/src>java -cp ".:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" MaoriTextDetector - 54 Waiting to read text from STDIN... (press Ctrl-D when done entering text)> 55 Ko tÄnei te Whare WÄnanga o Waikato e whakatau nei i ngÄ iwi o te ao, ki roto i te riu o te awa e rere nei, ki runga i te whenua e hora nei, ki raro i te taumaru o ngÄ maunga whakaruru e tau awhi nei. 56 Best language: mri 57 Best language confidence: 0.6825737450092515 58 Exitting program with returnVal 0... 59 43 60 44 61 ------------------------- -
gs3-extensions/maori-lang-detection/src/MaoriTextDetector.java
r33338 r33350 1 1 /** 2 * Class that uses OpenNLP with the Language Detection Model to determine, with a default 3 * or configurable level of confidence, whether text (from a file or stdin) is in MÄori or not. 4 * Internal functions can be used for detecting any of the 103 languages currently supported by 5 * the OpenNLP Language Detection Model. 6 * 2 7 * http://opennlp.apache.org/news/model-langdetect-183.html 3 8 * language detector model: http://opennlp.apache.org/models.html … … 8 13 * 9 14 * This code was based on the information and sample code at the above links and the links dispersed throughout this file. 15 * See also the accompanying README file. 16 * 17 * July 2019 10 18 */ 11 19 … … 16 24 /** 17 25 * EXPORT OPENNLP_HOME environment variable to be your apache OpenNLP installation. 18 * Then, to compile this program: 26 * Create a folder called "models" within the $OPENNLP_HOME folder, and put the file "langdetect-183.bin" in there 27 * (which is the language detection model zipped up and renamed to .bin extension). 28 * 29 * Then, to compile this program, do the following from the "src" folder (the folder containing this java file): 19 30 * maori-lang-detection/src$ javac -cp ".:$OPENNLP_HOME/lib/opennlp-tools-1.9.1.jar" MaoriTextDetector.java 20 * To run this program, one of: 31 * 32 * To run this program, issue one of the following commands from the "src" folder (the folder containing this java file): 21 33 * 22 34 * maori-lang-detection/src$ java -cp ".:$OPENNLP_HOME/lib/*" MaoriTextDetector --help … … 25 37 * 26 38 * maori-lang-detection/src$ java -cp ".:$OPENNLP_HOME/lib/*" MaoriTextDetector - 27 * whichexpects text to stream in from standard input.39 * Press enter. This variant of the program expects text to stream in from standard input. 28 40 * If entering text manually, then remember to press Ctrl-D to indicate the usual end of StdIn. 29 41 * … … 39 51 greater or equal to which determines that the best predicted language is acceptable to user of MaoriTextDetector. */ 40 52 public final double MINIMUM_CONFIDENCE; 53 41 54 /** silentMode set to false means MaoriTextDetector won't print helpful messages while running. Set to true to run silently. */ 42 55 public final boolean silentMode; … … 44 57 /** Language Detection Model file for OpenNLP is expected to be at $OPENNLP_HOME/models/langdetect-183.bin */ 45 58 private final String LANG_DETECT_MODEL_RELATIVE_PATH = "models" + File.separator + "langdetect-183.bin"; 59 60 /** 61 * The LanguageDetectorModel object that will do the actual language detection/prediction for us. 62 * Created once in the constructor, can be used as often as needed thereafter. 63 */ 46 64 private LanguageDetector myCategorizer = null; 47 65 48 /** 49 * String taken from our university website 50 * https://www.waikato.ac.nz/maori/ 51 */ 66 /** String taken from our university website, https://www.waikato.ac.nz/maori/ */ 52 67 public static final String TEST_MRI_INPUT_TEXT = "Ko tÄnei te Whare WÄnanga o Waikato e whakatau nei i ngÄ iwi o te ao, ki roto i te riu o te awa e rere nei, ki runga i te whenua e hora nei, ki raro i te taumaru o ngÄ maunga whakaruru e tau awhi nei."; 53 68 … … 224 239 /** 225 240 * Prints to STDOUT the predicted languages of the input text in order of descending confidence. 226 * U nused.241 * UNUSED. 227 242 */ 228 243 public void predictedLanguages(String text) { … … 367 382 System.exit(returnVal); 368 383 } 369 384 385 386 // 2. Finally, we can now do the actual language detection 370 387 try { 371 388 MaoriTextDetector maoriTextDetector = null;
Note:
See TracChangeset
for help on using the changeset viewer.