Changeset 33583

Show
Ignore:
Timestamp:
18.10.2019 21:20:18 (4 weeks ago)
Author:
ak19
Message:

Committing experimental version 1 using the sentence detector model, experimenting with how best to detect whether individual sentences are in Maori or not.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MaoriTextDetector.java

    r33577 r33583  
    2323import java.io.*; 
    2424import opennlp.tools.langdetect.*; 
     25import opennlp.tools.sentdetect.*; 
    2526import opennlp.tools.util.*; 
     27 
     28import java.util.ArrayList; 
    2629 
    2730/** 
     
    5861    public final boolean silentMode; 
    5962 
     63    private final String OPENNLP_MODELS_RELATIVE_PATH = "models" + File.separator;  
     64     
    6065    /** Language Detection Model file for OpenNLP is expected to be at $OPENNLP_HOME/models/langdetect-183.bin */ 
    61     private final String LANG_DETECT_MODEL_RELATIVE_PATH = "models" + File.separator + "langdetect-183.bin"; 
    62  
     66    private final String LANG_DETECT_MODEL_RELATIVE_PATH = OPENNLP_MODELS_RELATIVE_PATH + "langdetect-183.bin"; 
     67 
     68    /** Two Māori language sentences taken from http://anglicanhistory.org/england/swilberforce/agathos1882.html 
     69     * which have a reasonable/high confidence in detection. 
     70     * We'll use this String of 2 high confidence MRI sentences to detect whether the addition 
     71     * of a subsequent sentence of unknown language brings down the cumulative confidence level 
     72     * drastically (below DEF MIN CONF), implying that the added sentence is therefore not likely 
     73     * to be in MRI. 
     74     */ 
     75    private final String TWO_HIGH_CONFIDENCE_MRI_SENTENCES = "Hohoro tonu te whakaae me te haere katoa o nga hoia, kit e wahi e whakamatea nei e te Tarakona. E hou ana to ratou taenga atu, mataara tonu ratou, a kihai I mahue o ratou kahu arai; ka moe etahi ka ara etahi kit e whanga; ano to hunga e tu ana, rawe rawa I te kanapa o a ratou kahu arai, me o ratou ringaringa; hari tonu te tangata kainga no te mea kei waenga pu I a ratou nga hoia o te Kingi e whanga ana."; // 0.9497468988295584 
     76     
     77    //"Nan, kia mataara koutou. Ki te mahara koutou ki aku kupu, a ka karanga mai ki taku ingoa, I nga wa, e tata mai ai te mate; a kit e mau tonu ano hoki koutou ki te kahu aria katoa, me nga ringaringa, kua oti nei te taka e ahau mo koutou, e kore koutou e mate I te Tarakona."; // 0.7220962333610585 
     78     
     79    //"E rapu ana ia i te hoari a tona Piriniha, a te kitea ki tana taha e mau ana, ka ngore noa nga turi, e haerea atu ana e te Tarakona, e karanga ana ia ki tona Kingi, otira e mea ake ana a roto i a ia, kua pahure ke te ra e karanga atu ai ia; kua whakarere hoki ia i ana kahu arai me ona ringaringa, a kahore he mea hei whakakora i a ia, tahuri noa atu ia ki te oma, hoake rawa kua kapi mai a mua ona, i nga tao o te Tarakona, a na te mea kua mahue i a ia nga takai paraihi, ngore noa ona waewae, kainga ana ia e te Tarakona. I peneitia ano hoki etahi, a ngaro noa iho ratou i te tirohanga a o ratou [6/7] hoa; ka puta te mahara ki o ratou hoa, ka pouri o ratou ngakau, otira kihai roa, kua hakari ano ratou, kua inu, kua hari, kua whakarere i o ratou kahu arai, kua wareware ano hoki ki nga kupu a to ratou Piriniha, te mahara kua patata te mate."; //0.991402350887951. 
     80 
     81    /** http://www.greenstone.org/ */ 
     82    private final String TWO_HIGH_CONFIDENCE_EN_SENTENCES="We hope that this software will encourage the effective deployment of digital libraries to share information and place it in the public domain. Further information can be found in the book How to build a digital library, authored by three of the group's members."; 
     83 
     84     
     85    /**  
     86     * Large chunk of text in te reo Māori from 
     87     * http://anglicanhistory.org/england/swilberforce/agathos1882.html 
     88     * for testing the language detector 
     89     */ 
     90    private final static String MRI_SENTENCE_TEST="Meake ratou haere, ka mea atu ia ki a ratou, \"Nana, e matau ana koutou ahakoa whakaputaina mai te riri me te kaha katoa o te Tarakona ki ahau I mua ra, kihai ia I kaha, a mate ana I ahau. Me aru katoa aku hoa pono, I te ritenga kua waiho iho e ahau ki a ratou kia maia ratou, me ahau kua maia; ko reira ratou noho ai I raro-raro iho I toku torona. Pinky was here today! Na koneil I tonoa atu ai koutou e ahau, kit e whawhai ki tenei Tarakona, a k otaku kaha e haere tahi atu me koutou ki te taua. Nan, kia mataara koutou. Ki te mahara koutou ki aku kupu, a ka karanga mai ki taku ingoa, I nga wa, e tata mai ai te mate; a kit e mau tonu ano hoki koutou ki te kahu aria katoa, me nga ringaringa, kua oti nei te taka e ahau mo koutou, e kore koutou e mate I te Tarakona. Otira kit e rokohina whakaarokoretia mai koutou e ia, a kahore o koutou kahu aria e mau ana, ka mate koutou I a ia.\" [4] Hohoro tonu te whakaae me te haere katoa o nga hoia, kit e wahi e whakamatea nei e te Tarakona. E hou ana to ratou taenga atu, mataara tonu ratou, a kihai I mahue o ratou kahu arai; ka moe etahi ka ara etahi kit e whanga; ano to hunga e tu ana, rawe rawa I te kanapa o a ratou kahu arai, me o ratou ringaringa; hari tonu te tangata kainga no te mea kei waenga pu I a ratou nga hoia o te Kingi e whanga ana. Ma te ata e titiro ko nga kai mataara o te po ka kaere kit e moe, a ko te hunga kua moe, oti rawa te whakakahu ki o ratou, kahu arai, me to whakam[a]tautau hoki i te koinga o nga hoari, ka karanga ratou ki te ingoa o to ratou Piriniha, a ka haere ki te whanga i te Tarakona kino. Rawe rawa ratou i konei, otira kihai ratou i mau tonu ki tenei ritenga; tiaki noa hoki ratou, a te puta te Tarakona. Marire tonu to ratou kainga. Ngaki ana te tangata whenua i a ratou mara, a ka haere ano hoki ka tata ke hauhakenga, e marena ana ratou, e tuku hakari ana, e hokohoko ana; a ka whakaaro nga hoia he teka noa pea nga rongo o te Tarakona, ka wareware haere ki te kupu o to ratou Piriniha mo te mataara, me te tupato. Na te kaha o te ra ka taimaha o ratou ringaringa; mea noa tetahi \"Ha, he aha te tikanga i maua tonutia ai tenei potae taimaha? Wera noa iho taku matenga i te whitinga iho o te ra ki tenei potae, a te kitea te Tarakona e meingatia nei, ka mahue [4/5] rawa i ahahu te potae nei ki te teneti, hei te kiteatanga at u o te Tarakona e haere mai ana, ka tiki ai. Pera noa hoki tetahik ki te arai o tona uma, me tetahi hoki ki tona arai. A na te wera o te whenua ka wera ake nga takai paraihi o o ratou waewae; mamae noa ratou, a mahue iho era, a ka marara ratou, puta noa ki tenei hakari ki tera marengatanga ranei. Kihai i matauria he hoia ratou no te Kingi, ma te rapu tonu ano ia ki tana tohu e mau ana, ka matauria ai, mahue rawa hoki te ahua i tonoa mai ai ratou e to ratou Piriniha ke te taua. Kotahi ia o ratou kihai rite ki ana hoa, ko Akatohe te ingoa, pouri raw tona ngakau ki a ratou mahi. He tini ana whakamaharatanga at ki a ratou i nga kupu a to ratou Piriniha, mea atu ana ia ki a ratou, \"Ahakoa te kitea, e koro ma, te hoa riri, tenei ano ia te patata ana; a kahore he pohehetanga o to tatou Piriniha, kua whawhai hoki ia ki te Tarakona, a kua matau ia ki tana ahua whakamataku.\" Kataina ana, tawaia ana tenei tangata maia, meinga ana ia he wawau, no te rite ana mahi ki a ratou. Otiia kiahi ia i whakarongo; a ahakoa puta o ratou kupo kino, ahakoa kah te ra o te awatea hei whakahemo i a ia, ahahkoa negenge ia i tona haerenga i te weranga o te onepu, ahakoa kuiki ia i nga huarahi o te po, kihai i mahue i a Akatohe nga kahu arai a tona Piriniha, i hoatu ai kia mau tonu i a ia; kihai hoki i mahue i a ia nga takai paraihi o ana waewae mamae, me tana mahi mataara i te po. [5/6] Roa rawa iho to ratou noho penei, a te kitea mai te hoa riri, ka kake haee o ratou kupu kino ki a ia. A mea kau ano ratou, \"he ora, kahore, kua patata te mate.\" Katahi hoki ka kitea nga tohu whakamatau, me he ai tangta hei titiro. Tera taua hoia i tenei wa e hoki mai ana i te hakari, kua hari, kua waiata, kua kanikani ratou, a kua mahue i taua hoia ana kahu arai me ana ringaringa; a tenei ia te hoki marire ana ki tana teneti i te ahi-ahi o te rangi raumati. E whakaaro haere ana ia ki ana hoa i taua hakari, ki te rawe ano hoki ona, e mhh an ki a Akatohe mona e wehi nei, e haereere tonu nei i te roro o tona teneti e pehia ana e te taimaha o ana kahu arai. E whakaaroa ana ano enei mea, ka rongo ia ki te ngaehe e puta mai ana i te motu ngahere, ki matau ona, a me te uira ano te puta whakarere mai o te Tarakona ki mua ona. E rapu ana ia i te hoari a tona Piriniha, a te kitea ki tana taha e mau ana, ka ngore noa nga turi, e haerea atu ana e te Tarakona, e karanga ana ia ki tona Kingi, otira e mea ake ana a roto i a ia, kua pahure ke te ra e karanga atu ai ia; kua whakarere hoki ia i ana kahu arai me ona ringaringa, a kahore he mea hei whakakora i a ia, tahuri noa atu ia ki te oma, hoake rawa kua kapi mai a mua ona, i nga tao o te Tarakona, a na te mea kua mahue i a ia nga takai paraihi, ngore noa ona waewae, kainga ana ia e te Tarakona. I peneitia ano hoki etahi, a ngaro noa iho ratou i te tirohanga a o ratou [6/7] hoa; ka puta te mahara ki o ratou hoa, ka pouri o ratou ngakau, otira kihai roa, kua hakari ano ratou, kua inu, kua hari, kua whakarere i o ratou kahu arai, kua wareware ano hoki ki nga kupu a to ratou Piriniha, te mahara kua patata te mate. Tera te Tarakona kua manamanangia i te matenga o etahi o nga hoia i a ia, ka whakaaroaro kia huakina putia nga toenga iho, kia kotahi ai matenga o ana hoa riri."; 
     91     
    6392    /** 
    6493     * The LanguageDetectorModel object that will do the actual language detection/prediction for us. 
     
    6695    */ 
    6796    private LanguageDetector myCategorizer = null; 
    68      
     97 
     98    /**  
     99     * The Sentence Detection object that does the sentence splitting for the language 
     100     * the sentece model was trained for. 
     101     */ 
     102    private SentenceDetectorME sentenceDetector = null; 
     103     
    69104    /** String taken from our university website, https://www.waikato.ac.nz/maori/ */ 
    70105    public static final String TEST_MRI_INPUT_TEXT = "Ko tēnei te Whare Wānanga o Waikato e whakatau nei i ngā iwi o te ao, ki roto i te riu o te awa e rere nei, ki runga i te whenua e hora nei, ki raro i te taumaru o ngā maunga whakaruru e tau awhi nei."; 
     
    77112    this(silentMode, DEFAULT_MINIMUM_CONFIDENCE); 
    78113    } 
    79      
     114 
     115    /** Constructor that uses the sentence Model we trained for Māori */ 
    80116    public MaoriTextDetector(boolean silentMode, double min_confidence) throws Exception { 
     117    this(silentMode, min_confidence, "mri-sent_trained.bin"); 
     118    } 
     119 
     120    /** More general constructor that can use sentence detector models for other languages */ 
     121    public MaoriTextDetector(boolean silentMode, double min_confidence, 
     122                 String sentenceModelFileName) throws Exception 
     123    {     
    81124    this.silentMode = silentMode; 
    82125    this.MINIMUM_CONFIDENCE = min_confidence; 
     
    91134    if(!langDetectModelBinFile.exists()) { 
    92135        throw new Exception("\n\t*** " + langDetectModelBinFile.getPath() + " doesn't exist." 
    93                 + "\n\t*** Ensure the $OPENNLP_HOME folder contains a 'models' folder with the model file 'langdetect-183.bin' in it."); 
     136                + "\n\t*** Ensure the $OPENNLP_HOME folder contains a 'models' folder" 
     137                + "\n\t*** with the model file 'langdetect-183.bin' in it."); 
    94138    } 
    95139 
     
    109153     
    110154    // instantiating function should handle critical exceptions. Constructors shouldn't. 
    111     }     
    112  
     155 
     156 
     157 
     158    // 3. Set up our sentence model and SentenceDetector object 
     159    String sentenceModelPath = System.getenv("OPENNLP_HOME") + File.separator 
     160        + OPENNLP_MODELS_RELATIVE_PATH + sentenceModelFileName; // "mri-sent_trained.bin" default 
     161    File sentenceModelBinFile = new File(sentenceModelPath); 
     162    if(!sentenceModelBinFile.exists()) {         
     163        throw new Exception("\n\t*** " + sentenceModelBinFile.getPath() + " doesn't exist." 
     164                + "\n\t*** Ensure the $OPENNLP_HOME folder contains a 'models' folder" 
     165                + "\n\t*** with the model file "+sentenceModelFileName+" in it."); 
     166    } 
     167    try (InputStream modelIn = new FileInputStream(sentenceModelPath)) { 
     168        // https://www.tutorialspoint.com/opennlp/opennlp_sentence_detection.htm 
     169        SentenceModel sentenceModel = new SentenceModel(modelIn);        
     170        this.sentenceDetector = new SentenceDetectorME(sentenceModel); 
     171         
     172    } // instantiating function should handle this critical exception 
     173    } 
     174 
     175    public ArrayList<String> getAllSentencesInMaori(String text) throws Exception { 
     176    // big assumption here: that we can split incoming text into sentences 
     177    // for any language (using the Māori language trained sentence model), 
     178    // despite not knowing what language those sentences are in 
     179    // Hinges on MRI sentences detection being similar to at least ENG equivalent 
     180 
     181 
     182    // we'll be storing just those sentences in text that are in Māori.  
     183    ArrayList<String> mriSentences = new ArrayList<String>(); 
     184    // OpenNLP language detection works best with a minimum of 2 sentences 
     185    // See https://opennlp.apache.org/news/model-langdetect-183.html 
     186    // "It is important to note that this model is trained for and works well with 
     187    // longer texts that have at least 2 sentences or more from the same language." 
     188    // So we'll be attempting to detect the language working on 2 sentences at a time 
     189     
     190    String[] sentences = sentenceDetector.sentDetect(text); 
     191    double prev_confidence = 0.0; 
     192     
     193    for(int i = 1; i < sentences.length; i++) { 
     194      String two_sentences = sentences[i-1]+" "+sentences[i]+" This is another sentence."; 
     195      //for(int i = 0; i < sentences.length; i++) { 
     196      //String two_sentences = sentences[i]; 
     197 
     198     
     199        System.err.println(two_sentences);       
     200 
     201        //isTextInMaori(two_sentences) 
     202        Language bestLanguage = myCategorizer.predictLanguage(two_sentences); 
     203        if(bestLanguage.getLang().equals(MAORI_3LETTER_CODE)) { 
     204        double confidence = bestLanguage.getConfidence(); 
     205        /* 
     206        if(prev_confidence >= this.MINIMUM_CONFIDENCE) { 
     207            if(confidence < prev_confidence) { 
     208            // then the current sentence dragged down confidence 
     209            // and we're only confident about previous sentence 
     210            mriSentences.add(sentences[i-1]); 
     211            } else { 
     212             
     213            mriSentences.add(sentences[i]); 
     214            } 
     215        } 
     216        prev_confidence = confidence; 
     217        */ 
     218        System.err.println("Confidence for sentences up to " + i + ": " + confidence); 
     219        System.err.println(""); 
     220        } 
     221 
     222        /* 
     223        two_sentences = sentences[i] + " Pinky was here today."; 
     224        bestLanguage = myCategorizer.predictLanguage(two_sentences); 
     225        double confidence = bestLanguage.getConfidence(); 
     226        System.err.println("Confidence for added Pinky: " + confidence); 
     227        System.err.println(""); 
     228        */ 
     229    } 
     230    return mriSentences; 
     231    } 
     232 
     233    public ArrayList<String> getAllSentencesInLanguage(String langCode, String text) throws Exception { 
     234    // big assumption here: that we can split incoming text into sentences 
     235    // for any language (using the Māori language trained sentence model), 
     236    // despite not knowing what language those sentences are in 
     237    // Hinges on MRI sentences detection being similar to at least ENG equivalent 
     238 
     239 
     240    // we'll be storing just those sentences in text that are in Māori.  
     241    ArrayList<String> mriSentences = new ArrayList<String>(); 
     242    // OpenNLP language detection works best with a minimum of 2 sentences 
     243    // See https://opennlp.apache.org/news/model-langdetect-183.html 
     244    // "It is important to note that this model is trained for and works well with 
     245    // longer texts that have at least 2 sentences or more from the same language." 
     246     
     247     
     248    // we're pretty confident that the following static string is in Māori 
     249    // but want to store its confidence level as baseline confidence value 
     250    // to compare other sentences against 
     251 
     252    String baseline = TWO_HIGH_CONFIDENCE_MRI_SENTENCES; 
     253     
     254    Language bestLanguage = myCategorizer.predictLanguage(baseline); 
     255    if(!bestLanguage.getLang().equals(langCode)) { 
     256        System.err.println("@@@@ Something's gone wrong, obvious "+MAORI_3LETTER_CODE+" language string not properly detected as "+MAORI_3LETTER_CODE+" any more."); 
     257    } 
     258    double baselineConfidence = bestLanguage.getConfidence(); 
     259    System.err.println("Baseline confidence: " + baselineConfidence); 
     260    System.err.println("----------------------------------------"); 
     261     
     262    String[] sentences = sentenceDetector.sentDetect(text); 
     263     
     264    for(int i = 0; i < sentences.length; i++) { 
     265        String unknownLangSentenceAppendedToBaseline = baseline+" "+sentences[i]; 
     266 
     267        System.err.println("Added sentence: " + sentences[i]); 
     268         
     269        bestLanguage = myCategorizer.predictLanguage(unknownLangSentenceAppendedToBaseline); 
     270        double confidence = bestLanguage.getConfidence(); 
     271        //System.err.println("Confidence is now " + confidence); 
     272 
     273        //if(!bestLanguage.getLang().equals(langCode) || confidence < this.MINIMUM_CONFIDENCE) { 
     274         
     275        // confidence should increase with added sentence in same language (or should 
     276        // stay about the same?) not decrease with added sentence in same language 
     277        if(bestLanguage.getLang().equals(langCode) && confidence > baselineConfidence) { 
     278 
     279        System.err.println("Added sentence (maintained or) increased confidence to: " + confidence); 
     280         
     281         
     282        } 
     283        else { 
     284        System.err.println("ADDED sentence not in " + langCode + " as it DECREASED confidence to: " + confidence); 
     285        } 
     286        System.err.println(""); 
     287         
     288    } 
     289    return mriSentences; 
     290    } 
     291 
     292     
    113293    /** 
    114294     * @return true if the input text is Maori (mri) with MINIMUM_CONFIDENCE levels of confidence (if set, 
     
    388568    }    
    389569 
    390  
     570     
    391571    // 2. Finally, we can now do the actual language detection 
    392572    try { 
     
    397577        maoriTextDetector = new MaoriTextDetector(runSilent, minConfidence); 
    398578        } 
     579 
     580        // TODO 
     581        maoriTextDetector.getAllSentencesInMaori(MRI_SENTENCE_TEST); 
     582        //maoriTextDetector.getAllSentencesInLanguage(MAORI_3LETTER_CODE, MRI_SENTENCE_TEST); 
     583        maoriTextDetector.getAllSentencesInLanguage(MAORI_3LETTER_CODE,  
     584                            "Primary sources ~ Published Maramataka Mo Te Tau 1885, Nepia: Te Haaringi, Kai-ta Pukapuka, kei Hehitingi Tiriti, 1884. Maramataka Mo Te Tau 1886, Nepia: Na te Haaringi i ta ki tona Whare Perehi Pukapuka, 1885. Maramataka Mo Te Tau 1887, Nepia: Na te Haaringi i ta ki tona Whare Perehi Pukapuka, 1886. Maramataka Mo Te Tau 1888, Nepia: Na te Haaringi i ta ki tona Whare Perehi Pukapuka, 1887. Maramataka Mo Te Tau 1889, Nepia: Na te Haaringi i ta ki tona Whare Perehi Pukapuka, 1888. Maramataka Mo Te Tau 1890, Nepia: Na te Haaringi i ta ki tona Whare Perehi Pukapuka, 1889. Maramataka Mo Te Tau 1891, Kihipane: Na te Muri i ta ki tona Whare Perehi Pukapuka, 1890. Maramataka Mo Te Tau 1892, Nepia: Na te Haaringi, i ta ki tona Whare Perehi Pukapuka, 1891. Maramataka Mo Te Tau 1893, Kihipane: Na te Muri i ta ki tona Whare Perehi Pukapuka, 1892. Maramataka Mo Te Tau 1894, Kihipane: Na te Muri i ta ki tona Whare Perehi Pukapuka, 1893. Maramataka Me Te Tau 1895, Kihipane: Na te Muri i Ta ki tona whare perehi pukapuka, 1894. Maramataka Mo Te Tau 1896, Kihipane: Na te Muri i ta ki tona Whare Perehi Pukapuka, 1895. Maramataka Mo Te Tau 1897, Kihipane: Na te Muri i ta ki tona Whare Perehi Pukapuka 1896. Maramataka Mo Te Tau 1898, Turanga: Na te Wiremu Hapata i ta ki Te Rau Kahikatea, 1897. Ko Te Paipera Tapu Ara, Ko Te Kawenata Tawhito Me Te Kawenata Hou, He Mea Whakamaori Mai No Nga Reo I Oroko-Tuhituhia Ai, Ranana: He mea ta ki te perehi a W.M.Watts ma te Komiti Ta Paipera mo Ingarangi mo Te Ao Katoa, 1868. Ko Te Pukapuka O Nga Inoinga, Me Era Atu Tikanga, I Whakaritea E Te Hahi O Ingarani, Mo Te Minitatanga O Nga Hakarameta, O Era Atu Ritenga a Te Hahi: Me Nga Waiata Ano Hoki a Rawiri, Me Te Tikanga Mo Te Whiriwhiringa, Mo Te Whakaturanga, Me Te Whakatapunga O Nga Pihopa, O Nga Piriti, Me Nga Rikona, Me Nga Himene, Ranana: I taia tenei ki te perehi o te Komiti mo te whakapuaki i to mohiotanga ki a te Karaiti, 1858. Ko Te Pukapuka O Nga Inoinga, Me Era Atu Tikanga, I Whakaritea E Te Hahi O Ingarani, Mo Te Minitatanga O Nga Hakarameta, O Era Atu Ritenga a Te Hahi: Me Nga Waiata Ano Hoki a Rawiri, Me Te Tikanga Mo Te Whiriwhiringa, Mo Te Whakaturanga, Me Te Whakatapunga O Nga Pihopa, O Nga Piriti, Me Nga Rikona. 1883. The Book of Common Prayer, and Administration of the Sacraments, and Other Rites and Ceremonies of the Church, According to the Use of the United Church of England and Ireland: Together with the Proper Lessons for Sundays and Other Holy-Days, and a New Version of the Psalms of David, Oxford: Printed at 134 the University Press, 1852. The Book of Common Prayer and Administration of the Sacraments, and Other Rites and Ceremonies of the Church, According to the Church of England: Together with the Psalter or Psalms of David, Printed as They Are to Be Sung or Said in Churches: And the Form and Manner of Making, Ordaining, and Consecrating of Bishops, Priests, and Deacons, London: G.E. Eyre and W. Spottiswoode, after 1871 but before 1877. Brown, A.N., The Journals of A.N. Brown C.M.S. Missionary Tauranga Covering the Years 1840 to 1842, Tauranga: The Elms Trust, 1990 (Commemorative Edition). ______________, Select Sermons of A.N. Brown, Tauranga: The Elms Trust, 1997. Fitzgerald, Caroline (ed.), Te Wiremu Henry Williams: Early Years in the North, Wellington: Huia Publishers, 2011. The Hawke's Bay Almanac, Napier: James Wood, Hawke's Bay Herald, 1862, 1863, 1867."); 
     585 
    399586         
    400587        //boolean textIsInMaori = maoriTextDetector.isTextInMaori(TEST_MRI_INPUT_TEXT); // test hardcoded string