1 |
|
---|
2 | # Local Config Settings
|
---|
3 |
|
---|
4 | Open in a text editor, for example:
|
---|
5 |
|
---|
6 | emacs _local_prepare_config.sh
|
---|
7 |
|
---|
8 | review and edit appropriately.
|
---|
9 |
|
---|
10 | If you plan to use the Spotify based 'Track a' for MIR-based features,
|
---|
11 | then create an appropriate Spotify API credentials through:
|
---|
12 |
|
---|
13 | https://developer.spotify.com/dashboard/applications
|
---|
14 |
|
---|
15 | Otherwise you can use 'Track b' which -- proxied through the
|
---|
16 | eurovisionworld.com fan web site -- page scrapes YouTube content of
|
---|
17 | the songs, and then goes through a process using ffmpeg and essentia
|
---|
18 | to generate audio features. No key setup is needed if Track b.
|
---|
19 |
|
---|
20 |
|
---|
21 | # Setup a Virtual Python3 Environment
|
---|
22 |
|
---|
23 | Next create a virtual Python environment:
|
---|
24 |
|
---|
25 | ./CREATE-VENV-PYTHON3.sh my-python3-dev
|
---|
26 |
|
---|
27 | And then install required Python packages:
|
---|
28 |
|
---|
29 | * For processing Excel Voting data spreadsheet
|
---|
30 | pip3 install openpyxl
|
---|
31 |
|
---|
32 | * For parsing Wikipeida pages for missing category entries
|
---|
33 | pip3 install wikipedia beautifulsoup4
|
---|
34 |
|
---|
35 | * For Spotify/MusicBrainz musically computed audio content (Track 'a')
|
---|
36 | pip3 install -r spotify-musicbrainz/requirements.txt
|
---|
37 |
|
---|
38 | * For Essentia Audio Features (Track 'b')
|
---|
39 |
|
---|
40 | git clone https://github.com/davidbwaikato/eurovision-dataset essentia-audio-features
|
---|
41 |
|
---|
42 | pip3 install -r essentia-audio-features/requirements.txt
|
---|
43 |
|
---|
44 | # One-time Errata Triplestore Setup
|
---|
45 |
|
---|
46 | The (personal) convention of using ALL-CAPS is to signify that these
|
---|
47 | scripts can be run without any arguments and they will do something
|
---|
48 | meaningful, with the caveat that the current working directory must be
|
---|
49 | the directory where the scripts are located.
|
---|
50 |
|
---|
51 | Below you will learn about the sequence of 'prepare' scripts that
|
---|
52 | reach out to external resoures -- initially Linked Data through
|
---|
53 | DBpedia, but then topped up with some page scraping -- but first a
|
---|
54 | local (Greenstone3) triple store graph needs to set up with some
|
---|
55 | errata values that help the SPARQL query run against DBpedia to yield
|
---|
56 | more accurate results.
|
---|
57 |
|
---|
58 | Check your Greenstone3 Triplestore server is running:
|
---|
59 |
|
---|
60 | sudo systemctl status greenstone3-triplestore
|
---|
61 |
|
---|
62 | Then run:
|
---|
63 |
|
---|
64 | ./UPLOAD-TTL-EUROVISION-ERRATA-GRAPH.sh
|
---|
65 |
|
---|
66 | This currently talks to the Apache Jena v1 Triplestore version that the
|
---|
67 | Greenstone3 extension operates on port 3030
|
---|
68 |
|
---|
69 |
|
---|
70 | # Running the 'prepare' ALL-CAPS scripts
|
---|
71 |
|
---|
72 |
|
---|
73 | The prefix to the scripts to run are sequentially numbers 01-...,
|
---|
74 | 02-... and so on.
|
---|
75 |
|
---|
76 |
|
---|
77 | To aid development and testing, there is a 'small' set of files that
|
---|
78 | can be prepared, based on only the Eurovision entries for 2015.
|
---|
79 |
|
---|
80 | You can run all the steps to generate the small version with:
|
---|
81 |
|
---|
82 | ./PREPARE-ALL-SMALL.sh
|
---|
83 |
|
---|
84 | Or else generate the data for the full collection with
|
---|
85 |
|
---|
86 | ./PREPARE-ALL.sh
|
---|
87 |
|
---|
88 | Look inside the scripts, and copy and paste just the bits you want to
|
---|
89 | run smaller segements of the prepare process.
|
---|
90 |
|
---|
91 | With the a complete run through done, move up one directory to the
|
---|
92 | main collection directory and build the collection.
|
---|
93 |
|
---|
94 |
|
---|
95 |
|
---|
96 |
|
---|
97 |
|
---|