Changeset 34460 for main


Ignore:
Timestamp:
2020-10-17T17:17:49+13:00 (4 years ago)
Author:
davidb
Message:

Tidy up of code and notes

Location:
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/01-PREP.sh

    r34459 r34460  
    3030
    3131
    32 if [ ! -d import ] ; then
    33     echo "Making directory: 'import'"
    34     mkdir import
     32if [ ! -d downloads ] ; then
     33    echo "Making directory: 'downloads'"
     34    mkdir downloads
    3535else
    36     echo "Already created directory: 'import'"
     36    echo "Already created directory: 'downloads'"
    3737fi
    3838
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/02-RUN.sh

    r34459 r34460  
    1717echo "****"
    1818echo "* Downloaded JSON records saved to: "
    19 echo "*    import"
     19echo "*    downloads"
    2020echo "*"
    2121echo "* To remove a previous download set from collection's import folder:"
     
    2323echo "*"
    2424echo "* To move in a freshly downloaded set into the collection's import folder"
    25 echo "*   echo import/http* | xargs mv -t ../import/"
     25echo "*   echo downloads/http* | xargs mv -t ../import/"
    2626echo "****"
    2727echo
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/README.txt

    r34447 r34460  
    11
    2 # The following was done:
    3 #   12 Oct 2020
     2# To setup Digital NZ's Python library code for accessing their API,
     3# along with supporting files, run:
    44
    5 git clone https://github.com/fogonwater/pydnz.git
     5  ./01-PREP.sh
    66
    7 # If, in the future, working with the latest git clone leads
    8 # to problems, then a fall-back position is to work with the
    9 # locally provided tar-ball of the code, that was taken
    10 # 12 Oct 2020
    11 
    12 # Alternative version:
    13 #
    14   tar xvzf pydnz.tar.gz
    15 
    16 # Working towards script
     7# Note:
     8#   Edit the start of this file to control whether or not the script
     9#   clones the live github repository, or untars the snapshot of the
     10#   code taken, representing a 'checkpoint' version that is known
     11#   to work with the Greenstone/Atea code written that uses the
     12#   Python library
    1713
    1814
    1915
    20 if [ ! -d import ] ; then
    21     echo "Making directory: 'import'"
    22     mkdir import
    23 fi
     16# To access the Digtial NZ API and download JSON records which match
     17#  'language=mi' run:
    2418
     19  ./02-RUN.sh
     20
     21# Notes:
     22#   This runs the Python3 version of the script.
     23#
     24#   The API returns duplicated dc_identifiers, so the number of records
     25#   ingested into Greenstone is ultimately smaller than the number of
     26#   records reported as matching in the Digital NZ API
     27#
     28#   Script utilizes the /v3/ version of the API (latest version at
     29#   the time the script was being developed)
     30
     31
     32# General Notes about the developed bespoke Python scripts that
     33# call the API
     34
     35
     36# The developed script was initially written for python2
     37#
     38# In a later period of development it was ported to python3
     39#   Main different (other than print/print()) was how to
     40#   handle UTF-8 data input/output as UTF8
     41#
     42# The python3 script follows the 'futures' pattern, looking
     43# to be backwards compatible with python2, but this hasn't
     44# been tested.
     45#
     46# The python2 version of the script hasn't been updated to
     47# match improvements made to the python3 scripts, such as
     48# reading in a key from a file, and outputing to 'downloads'
     49# rather than the more confusingly named 'import', but
     50# it is otherwise basically sound.
     51
     52# Overall ...
     53#
    2554# The needs of the DNZ python scripts seems pretty light
    26 #   The system python3 most recently trialed had all the
    27 #   necessary packages.
     55#   The system installed python3 most recently trialed had all the
     56#   necessary packages.  Other than setting PYTHONPATH so it could
     57#   see the code in the pydnz folder, things ran smoothly
    2858#
    2959#   If need be, virtualenv setup such as
     60#
    3061#      virtualenv --python=python3 venv-python3-dnz
     62#
    3163#   Combined with (from within 'pydnz):
     64#
    3265#      pip install -r requirements.txt
     66#
    3367#   should help
    3468#
    35 #   Note: running 'python setup.py install' in 'pydnz' resulted
    36 #         in SSL errors trying to download from the python/pip
    37 #         repository that seem to be related to a change a while
    38 #         back in python that broke a lot of package installers, and
    39 #         proved to be excessively fussy to resolve, hence the
    40 #         decision to go down the PYTHONPATH path route
     69#   Note:
     70#     Running 'python setup.py install' in 'pydnz' resulted
     71#     in SSL errors trying to download from the python/pip
     72#     repository that seem to be related to a change a while
     73#     back in python that broke a lot of package installers, and
     74#     proved to be excessively fussy to resolve (something the
     75#     developers of pydnz to do???), hence the decision to go
     76#     down the PYTHONPATH path route for now
     77#
    4178
    4279
    43 export PYTHONPATH=`pwd`/pydnz
    44 
    45 ./dnz-search-language-mi-python3.py
    46 
    47 
    48 # tidy up script
    49 # rewrite script to key is readin from .in file
    50 
    51 # Note: the API returns duplicated dc_identifiers
    52 
    53 # Note: At time of writing (12 Oct 2020)
    54 #       working with the /v3/ version of the API
    55 
    56 
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python2.py

    r34447 r34460  
    3333        landing_url = unicode(landing_url).encode('utf8')
    3434        json_landing_filename = re.sub('[:/.]', '-', landing_url) + ".json"
    35         full_json_landing_filename = "import/" + json_landing_filename
     35        full_json_landing_filename = "downloads/" + json_landing_filename
    3636
    3737        dc_identifiers = rec["dc_identifier"]
  • main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python3.py

    r34459 r34460  
    77from builtins import str
    88
    9 import dnz.api
    10 import pprint
    11 import json
     9import os
    1210import sys
    1311import re
     12import json
     13#import pprint
     14
     15import dnz.api
     16
     17
     18output_dir = "downloads"
    1419
    1520# DNZ key
     
    4954        json_landing_filename  = re.sub('[.]', '~DOT~', json_landing_filename2) + ".json"
    5055               
    51         full_json_landing_filename = "import/" + json_landing_filename
     56        full_json_landing_filename = os.path.join(output_dir,json_landing_filename)
    5257        print("json_landing_filename:\t" + json_landing_filename)
    5358       
     
    7681        print("")
    7782
    78 #    with open(json_filename, 'w') as outfile:
    79 #        json.dump(data, outfile)
    80 
    8183#pprint.pprint(result.records)
    8284
    8385
    84 #with open('data.json', 'w') as outfile:
    85 #    json.dump(data, outfile)
    8686
    8787
Note: See TracChangeset for help on using the changeset viewer.