Changeset 33377

Show
Ignore:
Timestamp:
31.07.2019 19:04:00 (3 weeks ago)
Author:
ak19
Message:

Changes to get gen_SentenceDetection_model.sh to run still from the toplevel directory of this extenstion but located in bin/script.

Location:
gs3-extensions/maori-lang-detection
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/README.txt

    r33358 r33377  
    262262https://stackoverflow.com/questions/36516363/sentence-detection-with-opennlp 
    263263 
     264 
  • gs3-extensions/maori-lang-detection/gen_SentenceDetection_model.sh

    r33357 r33377  
    1212# because the 2011 one appears to have fewer accidentally incorporated English sentences 
    1313 
     14 
     15# Need to run this script from the top level folder of this extension 
    1416 
    1517if [ ! -z $1 ]; then 
     
    6870#tail -100 $infile 
    6971 
     72# Ensure OPENNLP_HOME is set 
     73if [ "x$OPENNLP_HOME" = "x" ]; then 
     74    echo "OPENNLP_HOME not set, attempting to set it to the local apache-opennlp (v1.9.1). ENSURE THIS EXISTS OR SET OPENNLP_HOME YOURSELF!" 
     75    #if [ -d apache-opennlp-* ]; then 
     76    cd apache-opennlp-* 
     77    if [ "x$?" = "x0" ]; then 
     78    export OPENNLP_HOME=`pwd` 
     79    cd .. 
     80    else 
     81    echo "No OPENNLP_HOME set and could not find a subfolder 'apache-opennlp-...' to set it to." 
     82    echo "Set OPENNLP_HOME yourself before running this script. Exitting..." 
     83    exit 
     84    fi 
     85fi 
    7086 
    7187mkdir -p $OPENNLP_HOME/training_data 
     
    100116# Note that I tried manually inserting \t, after copying the original line with tabspacing had no effect. Still no difference. 
    101117# Note 2: echo doesn't appear to preserve copied tab spaces. 
    102  
     118# Answer: echo doesn't treat \n as newline and \t as tab and so on, unless the -e flag is passed in: 
     119# echo -e "100000\tYWCA Boarding house : ĀwhinaServices and support Kei te pÅ«manawa o Tāmaki Makaurau a YMCA." | awk -F "\t" '{ print $2 }' 
    103120 
    104121# 2. Create mri sentences model from training sentences file 
    105122#$OPENNLP_HOME/bin/opennlp SentenceDetectorTrainer -model mri-sent_trained.bin -lang en -data mri-sent.train -encoding UTF-8 
    106123 
    107 if [ "x$OPENNLP_HOME" = "x" ]; then 
    108     echo "OPENNLP_HOME not set, attempting to set it to apache-opennlp-1.9.1 (ENSURE THIS EXISTS OR SET OPENNLP_HOME YOURSELF!)" 
    109     if [ -d apache-opennlp-* ]; then 
    110     cd apache-opennlp-* 
    111     export OPENNLP_HOME=`pwd` 
    112     cd .. 
    113     else 
    114     echo "No OPENNLP_HOME set and could not find a subfolder 'apache-opennlp-...' to set it to." 
    115     echo "Set OPENNLP_HOME yourself before running this script. Exitting..." 
    116     fi 
    117 fi 
    118124 
    119125mkdir -p $OPENNLP_HOME/models 
     
    132138echo "****************************" 
    133139echo "" 
     140