source: main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/prepare/README.txt@ 35940

Last change on this file since 35940 was 35940, checked in by davidb, 2 years ago

Details updated

File size: 3.0 KB
Line 
1
2# Local Config Settings
3
4Create _local_prepare_config.sh by copying its '.in' file:
5
6 cp _local_prepare_config.sh.in _local_prepare_config.sh
7
8Then open in a text editor, for example:
9
10 emacs _local_prepare_config.sh
11
12review and edit appropriately.
13
14If you plan to use the Spotify based 'Track a' for MIR-based features,
15then create an appropriate Spotify API credentials through:
16
17 https://developer.spotify.com/dashboard/applications
18
19and then update SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET with
20your specific credentials.
21
22Otherwise you can use 'Track b' which -- proxied through the
23eurovisionworld.com fan web site -- page scrapes YouTube content of
24the songs, and then goes through a process using ffmpeg and essentia
25to generate audio features. No key setup is needed if Track b.
26
27
28# Setup a Virtual Python3 Environment
29
30Next create a virtual Python environment:
31
32 ./CREATE-VENV-PYTHON3.sh my-python3-dev
33
34And then install required Python packages:
35
36 * For processing Excel Voting data spreadsheet
37 pip3 install openpyxl
38
39 * For parsing Wikipeida pages for missing category entries
40 pip3 install wikipedia beautifulsoup4
41
42 * For Spotify/MusicBrainz musically computed audio content (Track 'a')
43 pip3 install -r spotify-musicbrainz/requirements.txt
44
45 * For Essentia Audio Features (Track 'b')
46
47 git clone https://github.com/davidbwaikato/eurovision-dataset essentia-audio-features
48
49 pip3 install -r essentia-audio-features/requirements.txt
50
51# One-time Errata Triplestore Setup
52
53The (personal) convention of using ALL-CAPS is to signify that these
54scripts can be run without any arguments and they will do something
55meaningful, with the caveat that the current working directory must be
56the directory where the scripts are located.
57
58Below you will learn about the sequence of 'prepare' scripts that
59reach out to external resoures -- initially Linked Data through
60DBpedia, but then topped up with some page scraping -- but first a
61local (Greenstone3) triple store graph needs to set up with some
62errata values that help the SPARQL query run against DBpedia to yield
63more accurate results.
64
65Check your Greenstone3 Triplestore server is running:
66
67 sudo systemctl status greenstone3-triplestore
68
69Then run:
70
71 ./UPLOAD-TTL-EUROVISION-ERRATA-GRAPH.sh
72
73This currently talks to the Apache Jena v1 Triplestore version that the
74Greenstone3 extension operates on port 3030
75
76
77# Running the 'prepare' ALL-CAPS scripts
78
79
80The prefix to the scripts to run are sequentially numbers 01-...,
8102-... and so on.
82
83
84To aid development and testing, there is a 'small' set of files that
85can be prepared, based on only the Eurovision entries for 2015.
86
87You can run all the steps to generate the small version with:
88
89 ./PREPARE-ALL-SMALL.sh
90
91Or else generate the data for the full collection with
92
93 ./PREPARE-ALL.sh
94
95Look inside the scripts, and copy and paste just the bits you want to
96run smaller segements of the prepare process.
97
98With the a complete run through done, move up one directory to the
99main collection directory and build the collection.
100
101
102
103
104
Note: See TracBrowser for help on using the repository browser.