source: gs3-extensions/maori-lang-detection/src/org/greenstone/atea/WebpageInfo.java@ 33634

Last change on this file since 33634 was 33634, checked in by ak19, 4 years ago

Rewrote NutchTextDumpProcessor as NutchTextDumpToMongoDB.java, which uses MongoDBAccess that now has insertWebpageInfo() and insertWebsiteInfo(). However, testing has been unsuccessful locally, despite the fact that authentication should be working, as I'm following the examples online to use the Credential object. It supposedly connects to the database, but database.listCollections() fails with an Unauthorized error. Nothing subsequent can be expected to work. I could do my preliminary testing against a small sample subset of crawled sites on vagrant where there is no authentication setup, but what if someone else wants to run this one day against a mongodb where they authentication is set up (the way TSG set it up for the mongodb they gave me access to). Then it still wouldn't work.

File size: 1.2 KB
Line 
1package org.greenstone.atea;
2
3import java.util.ArrayList;
4
5public class WebpageInfo {
6
7 /** db table ids */
8 public final long webpageID;
9 public final int websiteID;
10
11 public final int totalSentences;
12
13 public final String text;
14 public final String URL;
15 public final boolean isMRI;
16
17 public final String charEncoding;
18 public final String modifiedTime;
19 public final String fetchTime;
20 public final ArrayList<SentenceInfo> singleSentences;
21 public final ArrayList<SentenceInfo> overlappingSentences;
22
23 public WebpageInfo (long webpageID, int websiteID,
24 String pageText, String pageURL, boolean isMRI, int totalSentences,
25 String charEncoding, String modifiedTime, String fetchTime,
26 ArrayList<SentenceInfo> singleSentences,
27 ArrayList<SentenceInfo> overlappingSentences)
28 {
29
30 this.webpageID = webpageID;
31 this.websiteID = websiteID;
32
33 this.totalSentences = totalSentences;
34
35 this.text = pageText;
36 this.URL = pageURL;
37 this.isMRI = isMRI;
38
39 this.charEncoding = charEncoding;
40 this.modifiedTime = modifiedTime;
41 this.fetchTime = fetchTime;
42
43 this.singleSentences = singleSentences;
44 this.overlappingSentences = overlappingSentences;
45
46 }
47}
Note: See TracBrowser for help on using the repository browser.