source: other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java@ 33653

Last change on this file since 33653 was 33653, checked in by ak19, 4 years ago
  1. As suggested by Dr Bainbridge, made the code changes to use Morphia as ODM for MongoDB (Object Document Mapper, ODM for MongoDB is equivalent to what ORM is to RDBMS). 2. Adding jar files to get this to work. 3. Further changes to store site folder names of form ##### as primary key of Websites collection. However, may in a future commit decide to store a reference to a WebsiteInfo object (representing a JSON document in a Websites MongoDB collection) inside a WebpageInfo object. 4. The MongoDB collections are now called Websites and Webpages, not websites and webpages. 5. geolocation of site now stored as field in Websites mongodb collection. And containsMRI now stored as field in Webpages collection of mongoDB. 6. Tried out some mongodb query commands based on what Dr Bainbridge did yesterday.
File size: 1.4 KB
Line 
1package org.greenstone.atea.morphia;
2
3import dev.morphia.annotations.*;
4
5@Entity("Websites")
6public class WebsiteInfo {
7 //public final int id;
8 @Id
9 public final String siteFolderName;
10 public final String domain;
11
12 public final int totalPages;
13 public final int countOfWebPagesWithBodyText;
14 public final int numPagesInMRI;
15
16 public final long siteCrawledTimestamp;
17 public final boolean siteCrawlUnfinished;
18 public final boolean redoCrawl;
19
20 public final String geoLocationCountryCode;
21 public final boolean urlContainsLangCodeInpath;
22
23 public WebsiteInfo(/*int siteCount,*/ String siteFolderName, String domainOfSite,
24 int totalPages, int countOfWebPagesWithBodyText, int numPagesInMRI,
25 long siteCrawledTimestamp, boolean siteCrawlUnfinished, boolean redoCrawl,
26 String geoLocationCountryCode, boolean urlContainsLangCodeInpath)
27 {
28 //this.id = siteCount;
29 this.siteFolderName = siteFolderName;
30 this.domain = domainOfSite;
31
32 this.totalPages = totalPages;
33 this.countOfWebPagesWithBodyText = countOfWebPagesWithBodyText;
34 this.numPagesInMRI = numPagesInMRI;
35
36 this.siteCrawledTimestamp = siteCrawledTimestamp;
37 this.siteCrawlUnfinished = siteCrawlUnfinished;
38 this.redoCrawl = redoCrawl;
39
40 this.geoLocationCountryCode = geoLocationCountryCode;
41 this.urlContainsLangCodeInpath = urlContainsLangCodeInpath;
42 }
43}
Note: See TracBrowser for help on using the repository browser.