source: gs3-extensions/maori-lang-detection/src/org/greenstone/atea/MRIWebPageStats.java@ 33582

Last change on this file since 33582 was 33582, checked in by ak19, 5 years ago

NutchTextDumpProcessor prints each crawled site's stats: number of webpages per crawled site and how many of those were detected by OpenNLP as being in Maori (mri). Needed to make a reusable method in CCWETProcessor as public and static.

File size: 554 bytes
Line 
1package org.greenstone.atea;
2
3
4//import org.apache.log4j.Logger;
5
6
7public class MRIWebPageStats {
8 //private static Logger logger = Logger.getLogger(org.greenstone.atea.MRIWebPageStats.class.getName());
9
10 public final String siteID; // crawled site's folder name e.g. 00510
11 public final String URL; // URL of webpage
12 public final int pageID; // index into NutchTextDumpProcessor::pages ArrayList
13
14 public MRIWebPageStats(String siteID, String url, int pageID) {
15 this.siteID = siteID;
16 this.URL = url;
17 this.pageID = pageID;
18 }
19}
Note: See TracBrowser for help on using the repository browser.