Ignore:
Timestamp:
2019-11-05T21:04:09+13:00 (4 years ago)
Author:
ak19
Message:
  1. Incorporated Dr Nichols earlier suggestion of storing page modified time and char-encoding metadata if present in the crawl dump output. Have done so, but neither modifiedTime nor fetchTime metadata of the dump file appear to be a webpage's actual modified time, as they're from 2019 and set around the period we've been crawling. 2. Moved getDomainFromURL() function from CCWETProcessor.java to Utility.java since it's been reused. 3. MongoDBAccess class successfully connects (at least, no exceptions) and uses the newly added properties in config.properties to make the connection.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt

    r33621 r33623  
    492492
    493493
     494INSTALLATION MONGO-DB AND CLIENT
     495FROM: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
     496    wget -qO - https://www.mongodb.org/static/pgp/server-4.2.asc | sudo apt-key add -
     497    echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list
     498    sudo apt-get update
     499    sudo apt-get install -y mongodb-org
     500
     501UNINSTALLING
     502    https://www.anintegratedworld.com/uninstall-mongodb-in-ubuntu-via-command-line-in-3-easy-steps/
    494503
    495504
Note: See TracChangeset for help on using the changeset viewer.