Ignore:
Timestamp:
2019-10-24T22:04:37+13:00 (5 years ago)
Author:
ak19
Message:

Incorporating Dr Nichols suggestion to help weed out product sites: if tld of seed URL addresses containing /mi/ is outside NZ, add to list of possible-product-sites.txt. This should be a smaller number hopefully than all urls containing /mi and, because they're located outside nz, more likely to be a product site than not.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/MoreReading/crawling-Nutch.txt

    r33565 r33603  
    314314
    315315
    316 
     316-------------------
     317Dr Nichols's suggestion: can store listing of potential product sites to inspect by checking url for /mi in combination with whether the domain's IP geolocates to OUTSIDE New Zealand (tld nz).
     318* https://stackoverflow.com/questions/1415851/best-way-to-get-geo-location-in-java
     319  - https://mvnrepository.com/artifact/com.maxmind.geoip/geoip-api/1.2.10
     320  - older .dat.gz file is archived at https://web.archive.org/web/20180917084618/http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
     321  - and newer geo country data at https://dev.maxmind.com/geoip/geoip2/geolite2/
     322* https://dev.maxmind.com/geoip/geoip2/geolite2/
     323* older GeoIp API (has LookupService): https://github.com/maxmind/geoip-api-java
     324* Newer GeoIp2 API: https://dev.maxmind.com/geoip/geoip2/downloadable/#MaxMind_APIs
     325    and https://maxmind.github.io/GeoIP2-java/doc/v2.12.0/
     326* https://maxmind.github.io/GeoIP2-java/
     327* https://github.com/AtlasOfLivingAustralia/ala-hub/issues/11
     328
     329
     330---
     331https://check-host.net/ip-info
     332https://ipinfo.info/html/ip_checker.php
     333
Note: See TracChangeset for help on using the changeset viewer.