Ignore:
Timestamp:
2019-12-12T18:04:10+13:00 (4 years ago)
Author:
ak19
Message:

Removed an adult site from crawled contents and added its url to blacklist conf file (for if ever anyone crawls our MRI set of common crawl sites again)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java

    r33698 r33800  
    291291    File geoLiteCityDatFile = new File(this.getClass().getClassLoader().getResource("GeoLiteCity.dat").getFile());
    292292    try {
    293         if(this.domainOfSite.equals("UNKNOWN")) {
     293        if(this.domainOfSite.equals("UNKNOWN")) { // for sites that had 0 webpages downloaded, we have no domain
    294294        this.geoLocationCountryCode = "UNKNOWN";
    295295        } else {
     
    298298    } catch(Exception e) {     
    299299        logger.error("*** For SiteID " + siteID + ", got exception: "  + e.getMessage(), e);
    300         this.geoLocationCountryCode = null;
     300        this.geoLocationCountryCode = "UNKNOWN"; // couldn't get the country code, so should also be UNKNOWN not null
    301301    }
    302302
Note: See TracChangeset for help on using the changeset viewer.