Ignore:
Timestamp:
2019-09-12T21:37:39+12:00 (5 years ago)
Author:
ak19
Message:
  1. WETProcessor.main() now processes a folder of *.warc.wet(.gz) files. Each file's WET records is written out into an individual file and put into either the keep folder or discard folder, based on amount of content (number lines and/or content-length). 2. Moved unzipFile() from NZTLDProcessor.java into new Utility.java class as a static method.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/src/org/greenstone/atea/NZTLDProcessor.java

    r33411 r33466  
    169169        // don't have the WET file yet. Get it from the zip file, which we know we should have by now
    170170       
    171         boolean success = unzipFile(inZipFile, WETfile);
     171        boolean success = Utility.unzipFile(inZipFile, WETfile);
     172        log("Unzipped " + inZipFile + " to " + WETfile);
     173       
    172174        // whether we succeeded or not, get rid of the zipped file:
    173175        if(!inZipFile.delete()) {
     
    185187    }
    186188
     189    /*
    187190    // Run gunzip
    188191    // To avoid making this linux specific, use Java to unzip, instead of running gunzip as process
     
    216219    return true;
    217220    }
     221    */
    218222   
    219223    // wget will be launched from the specified directory, SITES_DIR
Note: See TracChangeset for help on using the changeset viewer.