- Timestamp:
- 2020-10-17T17:17:49+13:00 (4 years ago)
- Location:
- main/trunk/model-sites-dev/atea/collect/digital-nz/prepare
- Files:
-
- 5 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/01-PREP.sh
r34459 r34460 30 30 31 31 32 if [ ! -d import] ; then33 echo "Making directory: ' import'"34 mkdir import32 if [ ! -d downloads ] ; then 33 echo "Making directory: 'downloads'" 34 mkdir downloads 35 35 else 36 echo "Already created directory: ' import'"36 echo "Already created directory: 'downloads'" 37 37 fi 38 38 -
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/02-RUN.sh
r34459 r34460 17 17 echo "****" 18 18 echo "* Downloaded JSON records saved to: " 19 echo "* import"19 echo "* downloads" 20 20 echo "*" 21 21 echo "* To remove a previous download set from collection's import folder:" … … 23 23 echo "*" 24 24 echo "* To move in a freshly downloaded set into the collection's import folder" 25 echo "* echo import/http* | xargs mv -t ../import/"25 echo "* echo downloads/http* | xargs mv -t ../import/" 26 26 echo "****" 27 27 echo -
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/README.txt
r34447 r34460 1 1 2 # T he following was done:3 # 12 Oct 20202 # To setup Digital NZ's Python library code for accessing their API, 3 # along with supporting files, run: 4 4 5 git clone https://github.com/fogonwater/pydnz.git 5 ./01-PREP.sh 6 6 7 # If, in the future, working with the latest git clone leads 8 # to problems, then a fall-back position is to work with the 9 # locally provided tar-ball of the code, that was taken 10 # 12 Oct 2020 11 12 # Alternative version: 13 # 14 tar xvzf pydnz.tar.gz 15 16 # Working towards script 7 # Note: 8 # Edit the start of this file to control whether or not the script 9 # clones the live github repository, or untars the snapshot of the 10 # code taken, representing a 'checkpoint' version that is known 11 # to work with the Greenstone/Atea code written that uses the 12 # Python library 17 13 18 14 19 15 20 if [ ! -d import ] ; then 21 echo "Making directory: 'import'" 22 mkdir import 23 fi 16 # To access the Digtial NZ API and download JSON records which match 17 # 'language=mi' run: 24 18 19 ./02-RUN.sh 20 21 # Notes: 22 # This runs the Python3 version of the script. 23 # 24 # The API returns duplicated dc_identifiers, so the number of records 25 # ingested into Greenstone is ultimately smaller than the number of 26 # records reported as matching in the Digital NZ API 27 # 28 # Script utilizes the /v3/ version of the API (latest version at 29 # the time the script was being developed) 30 31 32 # General Notes about the developed bespoke Python scripts that 33 # call the API 34 35 36 # The developed script was initially written for python2 37 # 38 # In a later period of development it was ported to python3 39 # Main different (other than print/print()) was how to 40 # handle UTF-8 data input/output as UTF8 41 # 42 # The python3 script follows the 'futures' pattern, looking 43 # to be backwards compatible with python2, but this hasn't 44 # been tested. 45 # 46 # The python2 version of the script hasn't been updated to 47 # match improvements made to the python3 scripts, such as 48 # reading in a key from a file, and outputing to 'downloads' 49 # rather than the more confusingly named 'import', but 50 # it is otherwise basically sound. 51 52 # Overall ... 53 # 25 54 # The needs of the DNZ python scripts seems pretty light 26 # The system python3 most recently trialed had all the 27 # necessary packages. 55 # The system installed python3 most recently trialed had all the 56 # necessary packages. Other than setting PYTHONPATH so it could 57 # see the code in the pydnz folder, things ran smoothly 28 58 # 29 59 # If need be, virtualenv setup such as 60 # 30 61 # virtualenv --python=python3 venv-python3-dnz 62 # 31 63 # Combined with (from within 'pydnz): 64 # 32 65 # pip install -r requirements.txt 66 # 33 67 # should help 34 68 # 35 # Note: running 'python setup.py install' in 'pydnz' resulted 36 # in SSL errors trying to download from the python/pip 37 # repository that seem to be related to a change a while 38 # back in python that broke a lot of package installers, and 39 # proved to be excessively fussy to resolve, hence the 40 # decision to go down the PYTHONPATH path route 69 # Note: 70 # Running 'python setup.py install' in 'pydnz' resulted 71 # in SSL errors trying to download from the python/pip 72 # repository that seem to be related to a change a while 73 # back in python that broke a lot of package installers, and 74 # proved to be excessively fussy to resolve (something the 75 # developers of pydnz to do???), hence the decision to go 76 # down the PYTHONPATH path route for now 77 # 41 78 42 79 43 export PYTHONPATH=`pwd`/pydnz44 45 ./dnz-search-language-mi-python3.py46 47 48 # tidy up script49 # rewrite script to key is readin from .in file50 51 # Note: the API returns duplicated dc_identifiers52 53 # Note: At time of writing (12 Oct 2020)54 # working with the /v3/ version of the API55 56 -
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python2.py
r34447 r34460 33 33 landing_url = unicode(landing_url).encode('utf8') 34 34 json_landing_filename = re.sub('[:/.]', '-', landing_url) + ".json" 35 full_json_landing_filename = " import/" + json_landing_filename35 full_json_landing_filename = "downloads/" + json_landing_filename 36 36 37 37 dc_identifiers = rec["dc_identifier"] -
main/trunk/model-sites-dev/atea/collect/digital-nz/prepare/dnz-search-language-mi--python3.py
r34459 r34460 7 7 from builtins import str 8 8 9 import dnz.api 10 import pprint 11 import json 9 import os 12 10 import sys 13 11 import re 12 import json 13 #import pprint 14 15 import dnz.api 16 17 18 output_dir = "downloads" 14 19 15 20 # DNZ key … … 49 54 json_landing_filename = re.sub('[.]', '~DOT~', json_landing_filename2) + ".json" 50 55 51 full_json_landing_filename = "import/" + json_landing_filename56 full_json_landing_filename = os.path.join(output_dir,json_landing_filename) 52 57 print("json_landing_filename:\t" + json_landing_filename) 53 58 … … 76 81 print("") 77 82 78 # with open(json_filename, 'w') as outfile:79 # json.dump(data, outfile)80 81 83 #pprint.pprint(result.records) 82 84 83 85 84 #with open('data.json', 'w') as outfile:85 # json.dump(data, outfile)86 86 87 87
Note:
See TracChangeset
for help on using the changeset viewer.