Changeset 34460

Show
Ignore:
Timestamp:
17.10.2020 17:17:49 (9 days ago)
Author:
davidb
Message:

Tidy up of code and notes

Location:
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare
Files:
5 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/01-PREP.sh

    r34459 r34460  
    3030 
    3131 
    32 if [ ! -d import ] ; then 
    33     echo "Making directory: 'import'" 
    34     mkdir import 
     32if [ ! -d downloads ] ; then 
     33    echo "Making directory: 'downloads'" 
     34    mkdir downloads 
    3535else 
    36     echo "Already created directory: 'import'" 
     36    echo "Already created directory: 'downloads'" 
    3737fi 
    3838 
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/02-RUN.sh

    r34459 r34460  
    1717echo "****" 
    1818echo "* Downloaded JSON records saved to: " 
    19 echo "*    import" 
     19echo "*    downloads" 
    2020echo "*" 
    2121echo "* To remove a previous download set from collection's import folder:" 
     
    2323echo "*" 
    2424echo "* To move in a freshly downloaded set into the collection's import folder" 
    25 echo "*   echo import/http* | xargs mv -t ../import/" 
     25echo "*   echo downloads/http* | xargs mv -t ../import/" 
    2626echo "****" 
    2727echo 
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/README.txt

    r34447 r34460  
    11 
    2 # The following was done: 
    3 #   12 Oct 2020 
     2# To setup Digital NZ's Python library code for accessing their API, 
     3# along with supporting files, run: 
    44 
    5 git clone https://github.com/fogonwater/pydnz.git 
     5  ./01-PREP.sh 
    66 
    7 # If, in the future, working with the latest git clone leads 
    8 # to problems, then a fall-back position is to work with the 
    9 # locally provided tar-ball of the code, that was taken  
    10 # 12 Oct 2020 
    11  
    12 # Alternative version: 
    13 # 
    14   tar xvzf pydnz.tar.gz 
    15  
    16 # Working towards script 
     7# Note: 
     8#   Edit the start of this file to control whether or not the script 
     9#   clones the live github repository, or untars the snapshot of the 
     10#   code taken, representing a 'checkpoint' version that is known 
     11#   to work with the Greenstone/Atea code written that uses the 
     12#   Python library 
    1713 
    1814 
    1915 
    20 if [ ! -d import ] ; then 
    21     echo "Making directory: 'import'" 
    22     mkdir import 
    23 fi 
     16# To access the Digtial NZ API and download JSON records which match 
     17#  'language=mi' run: 
    2418 
     19  ./02-RUN.sh 
     20 
     21# Notes: 
     22#   This runs the Python3 version of the script. 
     23# 
     24#   The API returns duplicated dc_identifiers, so the number of records 
     25#   ingested into Greenstone is ultimately smaller than the number of 
     26#   records reported as matching in the Digital NZ API 
     27# 
     28#   Script utilizes the /v3/ version of the API (latest version at 
     29#   the time the script was being developed) 
     30 
     31 
     32# General Notes about the developed bespoke Python scripts that 
     33# call the API 
     34 
     35 
     36# The developed script was initially written for python2 
     37# 
     38# In a later period of development it was ported to python3 
     39#   Main different (other than print/print()) was how to 
     40#   handle UTF-8 data input/output as UTF8 
     41# 
     42# The python3 script follows the 'futures' pattern, looking 
     43# to be backwards compatible with python2, but this hasn't 
     44# been tested. 
     45# 
     46# The python2 version of the script hasn't been updated to 
     47# match improvements made to the python3 scripts, such as 
     48# reading in a key from a file, and outputing to 'downloads' 
     49# rather than the more confusingly named 'import', but 
     50# it is otherwise basically sound. 
     51 
     52# Overall ... 
     53# 
    2554# The needs of the DNZ python scripts seems pretty light 
    26 #   The system python3 most recently trialed had all the 
    27 #   necessary packages. 
     55#   The system installed python3 most recently trialed had all the 
     56#   necessary packages.  Other than setting PYTHONPATH so it could 
     57#   see the code in the pydnz folder, things ran smoothly 
    2858# 
    2959#   If need be, virtualenv setup such as 
     60# 
    3061#      virtualenv --python=python3 venv-python3-dnz 
     62# 
    3163#   Combined with (from within 'pydnz): 
     64# 
    3265#      pip install -r requirements.txt 
     66# 
    3367#   should help 
    3468# 
    35 #   Note: running 'python setup.py install' in 'pydnz' resulted 
    36 #         in SSL errors trying to download from the python/pip 
    37 #         repository that seem to be related to a change a while 
    38 #         back in python that broke a lot of package installers, and 
    39 #         proved to be excessively fussy to resolve, hence the 
    40 #         decision to go down the PYTHONPATH path route 
     69#   Note: 
     70#     Running 'python setup.py install' in 'pydnz' resulted 
     71#     in SSL errors trying to download from the python/pip 
     72#     repository that seem to be related to a change a while 
     73#     back in python that broke a lot of package installers, and 
     74#     proved to be excessively fussy to resolve (something the 
     75#     developers of pydnz to do???), hence the decision to go 
     76#     down the PYTHONPATH path route for now 
     77# 
    4178 
    4279 
    43 export PYTHONPATH=`pwd`/pydnz 
    44  
    45 ./dnz-search-language-mi-python3.py 
    46  
    47  
    48 # tidy up script 
    49 # rewrite script to key is readin from .in file 
    50  
    51 # Note: the API returns duplicated dc_identifiers 
    52  
    53 # Note: At time of writing (12 Oct 2020) 
    54 #       working with the /v3/ version of the API 
    55  
    56  
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python2.py

    r34447 r34460  
    3333        landing_url = unicode(landing_url).encode('utf8') 
    3434        json_landing_filename = re.sub('[:/.]', '-', landing_url) + ".json" 
    35         full_json_landing_filename = "import/" + json_landing_filename 
     35        full_json_landing_filename = "downloads/" + json_landing_filename 
    3636 
    3737        dc_identifiers = rec["dc_identifier"] 
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python3.py

    r34459 r34460  
    77from builtins import str 
    88 
    9 import dnz.api 
    10 import pprint 
    11 import json 
     9import os 
    1210import sys 
    1311import re 
     12import json 
     13#import pprint 
     14 
     15import dnz.api 
     16 
     17 
     18output_dir = "downloads" 
    1419 
    1520# DNZ key 
     
    4954        json_landing_filename  = re.sub('[.]', '~DOT~', json_landing_filename2) + ".json" 
    5055                 
    51         full_json_landing_filename = "import/" + json_landing_filename 
     56        full_json_landing_filename = os.path.join(output_dir,json_landing_filename) 
    5257        print("json_landing_filename:\t" + json_landing_filename) 
    5358         
     
    7681        print("") 
    7782 
    78 #    with open(json_filename, 'w') as outfile: 
    79 #        json.dump(data, outfile) 
    80  
    8183#pprint.pprint(result.records) 
    8284 
    8385 
    84 #with open('data.json', 'w') as outfile: 
    85 #    json.dump(data, outfile) 
    8686 
    8787