Ignore:
Timestamp:
2019-12-13T18:40:46+13:00 (4 years ago)
Author:
ak19
Message:
  1. NutchTextDumpToMongoDB Added an extra field to each document in Websites mongodb collection: numPagesContainingMRI. 2. Bugfix to yesterday's commit: performing a substring() was off by one.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • other-projects/maori-lang-detection/src/org/greenstone/atea/morphia/WebsiteInfo.java

    r33698 r33801  
    1212    public final int totalPages;
    1313    public final int countOfWebPagesWithBodyText;
     14   
    1415    public final int numPagesInMRI;
     16    public final int numPagesContainingMRI;
    1517   
    1618    public final long siteCrawledTimestamp;
     
    2224   
    2325    public WebsiteInfo(/*int siteCount,*/ String siteFolderName, String domainOfSite,
    24                int totalPages, int countOfWebPagesWithBodyText, int numPagesInMRI,
     26               int totalPages, int countOfWebPagesWithBodyText,
     27               int numPagesInMRI, int numPagesContainingMRI,
    2528               long siteCrawledTimestamp, boolean siteCrawlUnfinished, boolean redoCrawl,
    2629               String geoLocationCountryCode, boolean urlContainsLangCodeInPath)
     
    3235    this.totalPages = totalPages;
    3336    this.countOfWebPagesWithBodyText = countOfWebPagesWithBodyText;
     37   
    3438    this.numPagesInMRI = numPagesInMRI;
     39    this.numPagesContainingMRI = numPagesContainingMRI;
    3540   
    3641    this.siteCrawledTimestamp = siteCrawledTimestamp;
Note: See TracChangeset for help on using the changeset viewer.