Changeset 33498

Show
Ignore:
Timestamp:
23.09.2019 16:43:22 (3 weeks ago)
Author:
ak19
Message:

Corrections to script. Modified the tests checking for file/dir existence on hdfs. The use of direct hdfs commands return values worked on the command line but not in the script. So have used another way suggested on stackoverflow that does appear to work in the intended manner.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • gs3-extensions/maori-lang-detection/bin/script/get_maori_WET_records_for_crawl.sh

    r33495 r33498  
    7474 
    7575# https://stackoverflow.com/questions/26513861/checking-if-directory-in-hdfs-already-exists-or-not 
    76 hdfs dfs -test -d $OUTPUT_PARENTDIR 
    77 if [ $? == 0 ]; then 
     76#hdfs dfs -test -d $OUTPUT_PARENTDIR 
     77#if [ $? == 0 ]; then 
     78if $(hdfs dfs -test -d "$OUTPUT_PARENTDIR"); then 
    7879    echo "Directory $OUTPUT_PARENTDIR already exists."         
    7980else 
     
    131132# The above generates ZIPPED part*.csv files in $OUTPUTDIR (folder cc-mri-csv). 
    132133# First create a folder and unzip into it: 
    133 hdfs dfs -test -d $OUTPUT_PARENTDIR/cc-mri-unzipped-csv 
    134 if [ $? == 0 ]; then 
     134#hdfs dfs -test -d $OUTPUT_PARENTDIR/cc-mri-unzipped-csv 
     135#if [ $? == 0 ]; then 
     136if $(hdfs dfs -test -d "$OUTPUT_PARENTDIR/cc-mri-unzipped-csv"); then 
    135137    echo "Directory cc-mri-unzipped-csv already exists for crawl ${CRAWL_ID}." 
    136138    echo "Assuming cc-mri.csv also exists inside $OUTPUT_PARENTDIR" 
     
    172174# PHASE 3: convert warc files to wet files and copy the wet files into the mounted shared area 
    173175 
    174 hdfs dfs -test -f $OUTPUTDIR/_SUCCESS 
    175 if [ $? == 0 ]; then 
     176#hdfs dfs -test -f $OUTPUTDIR/_SUCCESS 
     177#if [ $? == 0 ]; then 
     178if $(hdfs dfs -test -f "$OUTPUTDIR/_SUCCESS"); then 
    176179    # ia-hadoop-tools converts warc files into wet (and wat) files but expects a particular folder structure 
    177180    # Create the expected folder structure: a "wet" and a "wat" folder should exist