Changeset 33498


Ignore:
Timestamp:
2019-09-23T16:43:22+12:00 (5 years ago)
Author:
ak19
Message:

Corrections to script. Modified the tests checking for file/dir existence on hdfs. The use of direct hdfs commands return values worked on the command line but not in the script. So have used another way suggested on stackoverflow that does appear to work in the intended manner.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/bin/script/get_maori_WET_records_for_crawl.sh

    r33495 r33498  
    7474
    7575# https://stackoverflow.com/questions/26513861/checking-if-directory-in-hdfs-already-exists-or-not
    76 hdfs dfs -test -d $OUTPUT_PARENTDIR
    77 if [ $? == 0 ]; then
     76#hdfs dfs -test -d $OUTPUT_PARENTDIR
     77#if [ $? == 0 ]; then
     78if $(hdfs dfs -test -d "$OUTPUT_PARENTDIR"); then
    7879    echo "Directory $OUTPUT_PARENTDIR already exists."       
    7980else
     
    131132# The above generates ZIPPED part*.csv files in $OUTPUTDIR (folder cc-mri-csv).
    132133# First create a folder and unzip into it:
    133 hdfs dfs -test -d $OUTPUT_PARENTDIR/cc-mri-unzipped-csv
    134 if [ $? == 0 ]; then
     134#hdfs dfs -test -d $OUTPUT_PARENTDIR/cc-mri-unzipped-csv
     135#if [ $? == 0 ]; then
     136if $(hdfs dfs -test -d "$OUTPUT_PARENTDIR/cc-mri-unzipped-csv"); then
    135137    echo "Directory cc-mri-unzipped-csv already exists for crawl ${CRAWL_ID}."
    136138    echo "Assuming cc-mri.csv also exists inside $OUTPUT_PARENTDIR"
     
    172174# PHASE 3: convert warc files to wet files and copy the wet files into the mounted shared area
    173175
    174 hdfs dfs -test -f $OUTPUTDIR/_SUCCESS
    175 if [ $? == 0 ]; then
     176#hdfs dfs -test -f $OUTPUTDIR/_SUCCESS
     177#if [ $? == 0 ]; then
     178if $(hdfs dfs -test -f "$OUTPUTDIR/_SUCCESS"); then
    176179    # ia-hadoop-tools converts warc files into wet (and wat) files but expects a particular folder structure
    177180    # Create the expected folder structure: a "wet" and a "wat" folder should exist
Note: See TracChangeset for help on using the changeset viewer.