source: other-projects/is-sheet-music-encore/trunk/gen-corpus-ids/HATHI-EXTRACT-PD-NON-GOOGLE.sh@ 32963

Last change on this file since 32963 was 32963, checked in by davidb, 5 years ago

Added text and some refinement of scripts to make things easier to run

  • Property svn:executable set to *
File size: 517 bytes
Line 
1#!/bin/bash
2
3input=${1:-'hathi_brief_20190301.txt'}
4output=${2:-'hathi_pd_MU_Not-Google_20190301.txt'}
5
6echo ""
7echo "===="
8echo " Script to filter down the extracted Music Format data entries that at"
9echo " publicly available: public domain and NOT scanned by Google"
10echo "===="
11
12echo ""
13echo "Reading in : $input"
14echo "Writing out : $output"
15echo ""
16
17echo "Processing ..."
18
19cat "$input" \
20 | awk -F '\t' '$2 == "pd" && $3 == "MU" && $4 != "google" { print $0 }' \
21 > "$output"
22
23echo "... Done"
24echo ""
25
Note: See TracBrowser for help on using the repository browser.