Ignore:
Timestamp:
2019-11-08T23:59:07+13:00 (4 years ago)
Author:
ak19
Message:

Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which uses MongoDBAccess that now has insertWebpageInfo() and insertWebsiteInfo(). However, testing has been unsuccessful locally, despite the fact that authentication should be working, as I'm following the examples online to use the Credential object. It supposedly connects to the database, but database.listCollections() fails with an Unauthorized error. Nothing subsequent can be expected to work. I could do my preliminary testing against a small sample subset of crawled sites on vagrant where there is no authentication setup, but what if someone else wants to run this one day against a mongodb where they authentication is set up (the way TSG set it up for the mongodb they gave me access to). Then it still wouldn't work.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/src/org/greenstone/atea/TextLanguageDetector.java

    r33633 r33634  
    142142    }
    143143
    144     /** inner class */
    145     public class SentenceInfo {
    146     public final double confidenceLevel;
    147     /** 3 letter lang code */
    148     public final String langCode;
    149     public final String sentence;
    150 
    151     public SentenceInfo(double confidence, String langCode, String sentence) {
    152         confidenceLevel = confidence;
    153         this.langCode = langCode;
    154         this.sentence = sentence;
    155     }
    156     }
    157 
    158144    /** TODO: Is it sensible to use the Maori Language Sentence Model to split the text
    159145     * into sentences? What if the text in any other language or a mix of languages?
     
    183169        double confidence = bestLanguage.getConfidence();
    184170
    185         sentencesList.add(new SentenceInfo(confidence, bestLanguage, sentence));
     171        sentencesList.add(new SentenceInfo(confidence, bestLanguage.getLang(), sentence));
    186172    }
    187173
     
    207193        separator = " ";
    208194        }
    209         sentence = sentence + separator + sentence[i];
     195        sentence = sentence + separator + sentences[i];
    210196       
    211197        //System.err.println(sentence);
     
    214200        double confidence = bestLanguage.getConfidence();
    215201
    216         sentencesList.add(new SentenceInfo(confidence, bestLanguage, sentence));
     202        sentencesList.add(new SentenceInfo(confidence, bestLanguage.getLang(), sentence));
    217203    }
    218204
Note: See TracChangeset for help on using the changeset viewer.