Last change
on this file since 33582 was 33582, checked in by ak19, 5 years ago |
NutchTextDumpProcessor prints each crawled site's stats: number of webpages per crawled site and how many of those were detected by OpenNLP as being in Maori (mri). Needed to make a reusable method in CCWETProcessor as public and static.
|
File size:
554 bytes
|
Line | |
---|
1 | package org.greenstone.atea;
|
---|
2 |
|
---|
3 |
|
---|
4 | //import org.apache.log4j.Logger;
|
---|
5 |
|
---|
6 |
|
---|
7 | public class MRIWebPageStats {
|
---|
8 | //private static Logger logger = Logger.getLogger(org.greenstone.atea.MRIWebPageStats.class.getName());
|
---|
9 |
|
---|
10 | public final String siteID; // crawled site's folder name e.g. 00510
|
---|
11 | public final String URL; // URL of webpage
|
---|
12 | public final int pageID; // index into NutchTextDumpProcessor::pages ArrayList
|
---|
13 |
|
---|
14 | public MRIWebPageStats(String siteID, String url, int pageID) {
|
---|
15 | this.siteID = siteID;
|
---|
16 | this.URL = url;
|
---|
17 | this.pageID = pageID;
|
---|
18 | }
|
---|
19 | }
|
---|
Note:
See
TracBrowser
for help on using the repository browser.