Last change
on this file since 33582 was 33582, checked in by ak19, 5 years ago |
NutchTextDumpProcessor prints each crawled site's stats: number of webpages per crawled site and how many of those were detected by OpenNLP as being in Maori (mri). Needed to make a reusable method in CCWETProcessor as public and static.
|
File size:
554 bytes
|
Rev | Line | |
---|
[33582] | 1 | package org.greenstone.atea;
|
---|
| 2 |
|
---|
| 3 |
|
---|
| 4 | //import org.apache.log4j.Logger;
|
---|
| 5 |
|
---|
| 6 |
|
---|
| 7 | public class MRIWebPageStats {
|
---|
| 8 | //private static Logger logger = Logger.getLogger(org.greenstone.atea.MRIWebPageStats.class.getName());
|
---|
| 9 |
|
---|
| 10 | public final String siteID; // crawled site's folder name e.g. 00510
|
---|
| 11 | public final String URL; // URL of webpage
|
---|
| 12 | public final int pageID; // index into NutchTextDumpProcessor::pages ArrayList
|
---|
| 13 |
|
---|
| 14 | public MRIWebPageStats(String siteID, String url, int pageID) {
|
---|
| 15 | this.siteID = siteID;
|
---|
| 16 | this.URL = url;
|
---|
| 17 | this.pageID = pageID;
|
---|
| 18 | }
|
---|
| 19 | }
|
---|
Note:
See
TracBrowser
for help on using the repository browser.