source: gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebsiteInfo.java@ 33634

Last change on this file since 33634 was 33634, checked in by ak19, 4 years ago

Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which uses MongoDBAccess that now has insertWebpageInfo() and insertWebsiteInfo(). However, testing has been unsuccessful locally, despite the fact that authentication should be working, as I'm following the examples online to use the Credential object. It supposedly connects to the database, but database.listCollections() fails with an Unauthorized error. Nothing subsequent can be expected to work. I could do my preliminary testing against a small sample subset of crawled sites on vagrant where there is no authentication setup, but what if someone else wants to run this one day against a mongodb where they authentication is set up (the way TSG set it up for the mongodb they gave me access to). Then it still wouldn't work.

File size: 1.3 KB
Line 
1package org.greenstone.atea;
2
3public class WebsiteInfo {
4
5 public final int id;
6 public final String siteFolderName;
7 public final String domain;
8
9 public final int totalPages;
10 public final int countOfWebPagesWithBodyText;
11 public final int numPagesInMRI;
12
13 public final long siteCrawledTimestamp;
14 public final boolean siteCrawlUnfinished;
15 public final boolean redoCrawl;
16
17 public final String geoLocationCountryCode;
18 public final boolean urlContainsLangCodeInpath;
19
20 public WebsiteInfo(int siteCount, String siteFolderName, String domainOfSite,
21 int totalPages, int countOfWebPagesWithBodyText, int numPagesInMRI,
22 long siteCrawledTimestamp, boolean siteCrawlUnfinished, boolean redoCrawl,
23 String geoLocationCountryCode, boolean urlContainsLangCodeInpath)
24 {
25 this.id = siteCount;
26 this.siteFolderName = siteFolderName;
27 this.domain = domainOfSite;
28
29 this.totalPages = totalPages;
30 this.countOfWebPagesWithBodyText = countOfWebPagesWithBodyText;
31 this.numPagesInMRI = numPagesInMRI;
32
33 this.siteCrawledTimestamp = siteCrawledTimestamp;
34 this.siteCrawlUnfinished = siteCrawlUnfinished;
35 this.redoCrawl = redoCrawl;
36
37 this.geoLocationCountryCode = geoLocationCountryCode;
38 this.urlContainsLangCodeInpath = urlContainsLangCodeInpath;
39 }
40}
Note: See TracBrowser for help on using the repository browser.