Changeset 33656

Show
Ignore:
Timestamp:
12.11.2019 21:11:05 (3 weeks ago)
Author:
ak19
Message:

Final minor changes before I start processing the crawls of node2.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/src/org/greenstone/atea/NutchTextDumpToMongoDB.java

    r33655 r33656  
    374374             
    375375            else { 
    376             File UNFINISHED_FILE = new File(siteDir, "UNFINISHED");          
     376            File UNFINISHED_FILE = new File(siteDir, "UNFINISHED"); 
    377377             
    378378            String siteID = siteDir.getName(); 
     379            if(siteID.contains("_")) { 
     380                logger.warn("*** Skipping site " + siteID + " as its dir name indicates it wasn't crawled properly."); 
     381                continue; 
     382            } 
     383             
    379384            long lastModified = siteDir.lastModified(); 
    380             logger.debug("Processing siteID: " + siteID);            
     385            logger.debug("@@@ Processing siteID: " + siteID);            
    381386            NutchTextDumpToMongoDB nutchTxtDump = new NutchTextDumpToMongoDB( 
    382387                 mongodb, mriTxtDetector,