Last change
on this file since 32963 was 32963, checked in by davidb, 5 years ago |
Added text and some refinement of scripts to make things easier to run
|
File size:
558 bytes
|
Line | |
---|
1 | Identifying HathiTrust volume IDs suitable for inclusion in a corpus
|
---|
2 | of images for identifying sheet music.
|
---|
3 |
|
---|
4 | Order to run the scripts:
|
---|
5 |
|
---|
6 | 1. To get a fresh HathiTrust tab-delimited metadata dump
|
---|
7 | (for the time of writing: March 2019)
|
---|
8 |
|
---|
9 |
|
---|
10 | ./HATHI-GET-TAB-DELIM-DUMP.sh
|
---|
11 |
|
---|
12 | 2. To winnow the file down to a more manageable size
|
---|
13 | (just the columns we're interested in)
|
---|
14 |
|
---|
15 | ./HATHI-EXTRACT-FORMAT.sh
|
---|
16 |
|
---|
17 | 3. To generate a list of Music Notation entries that are in the public
|
---|
18 | domain, and not-scanned by Google (so called 'open-open'):
|
---|
19 |
|
---|
20 |
|
---|
21 | ./HATHI-EXTRACT-PD-NON-GOOGLE.sh
|
---|
Note:
See
TracBrowser
for help on using the repository browser.