Last change
on this file since 33340 was 33138, checked in by davidb, 5 years ago |
Scripts that focus on language (for non-music related work)
|
-
Property svn:executable
set to
*
|
File size:
791 bytes
|
Line | |
---|
1 | #!/bin/bash
|
---|
2 |
|
---|
3 | . ./latest-dump.sh
|
---|
4 |
|
---|
5 | #input=${1:-'hathi_full_20190301.txt.gz'}
|
---|
6 | #output=${2:-'hathi_brief_20190301.txt'}
|
---|
7 |
|
---|
8 | input=${1:-"hathi_full_$latest_date.txt.gz"}
|
---|
9 | output=${2:-"hathi_brief_lang_$latest_date.txt"}
|
---|
10 |
|
---|
11 | echo ""
|
---|
12 | echo "===="
|
---|
13 | echo " Script to extract Format (and related fields, such as copyright)"
|
---|
14 | echo " from HathiTrust tab-delimited metadata dump"
|
---|
15 | echo "===="
|
---|
16 |
|
---|
17 | echo ""
|
---|
18 | echo "Reading in : $input"
|
---|
19 | echo "Writing out : $output"
|
---|
20 | echo ""
|
---|
21 |
|
---|
22 | echo "Processing ..."
|
---|
23 | zcat "$input" \
|
---|
24 | | awk -F '\t' '{print $1 "\t" $3 "\t" $19 "\t" $24} ' \
|
---|
25 | > "$output"
|
---|
26 |
|
---|
27 | echo "... Done"
|
---|
28 | echo ""
|
---|
29 |
|
---|
30 | echo "===="
|
---|
31 | echo " Next, extract entried that are Music Format, Public Domain and"
|
---|
32 | echo " NOT scanned by Google (so called 'open-open' files):"
|
---|
33 | echo " ./HATHI-EXTRACT-PD-NON-GOOGLE.sh"
|
---|
34 | echo "===="
|
---|
35 |
|
---|
Note:
See
TracBrowser
for help on using the repository browser.