source: main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/prepare/README.txt@ 35875

Last change on this file since 35875 was 35875, checked in by davidb, 2 years ago

Details added about uploading the errata TTL triples

File size: 2.8 KB
Line 
1
2# Local Config Settings
3
4Open in a text editor, for example:
5
6 emacs _local_prepare_config.sh
7
8review and edit appropriately.
9
10If you plan to use the Spotify based 'Track a' for MIR-based features,
11then create an appropriate Spotify API credentials through:
12
13 https://developer.spotify.com/dashboard/applications
14
15Otherwise you can use 'Track b' which -- proxied through the
16eurovisionworld.com fan web site -- page scrapes YouTube content of
17the songs, and then goes through a process using ffmpeg and essentia
18to generate audio features. No key setup is needed if Track b.
19
20
21# Setup a Virtual Python3 Environment
22
23Next create a virtual Python environment:
24
25 ./CREATE-VENV-PYTHON3.sh my-python3-dev
26
27And then install required Python packages:
28
29 * For processing Excel Voting data spreadsheet
30 pip3 install openpyxl
31
32 * For parsing Wikipeida pages for missing category entries
33 pip3 install wikipedia beautifulsoup4
34
35 * For Spotify/MusicBrainz musically computed audio content (Track 'a')
36 pip3 install -r spotify-musicbrainz/requirements.txt
37
38 * For Essentia Audio Features (Track 'b')
39
40 git clone https://github.com/davidbwaikato/eurovision-dataset essentia-audio-features
41
42 pip3 install -r essentia-audio-features/requirements.txt
43
44# One-time Errata Triplestore Setup
45
46The (personal) convention of using ALL-CAPS is to signify that these
47scripts can be run without any arguments and they will do something
48meaningful, with the caveat that the current working directory must be
49the directory where the scripts are located.
50
51Below you will learn about the sequence of 'prepare' scripts that
52reach out to external resoures -- initially Linked Data through
53DBpedia, but then topped up with some page scraping -- but first a
54local (Greenstone3) triple store graph needs to set up with some
55errata values that help the SPARQL query run against DBpedia to yield
56more accurate results.
57
58Check your Greenstone3 Triplestore server is running:
59
60 sudo systemctl status greenstone3-triplestore
61
62Then run:
63
64 ./UPLOAD-TTL-EUROVISION-ERRATA-GRAPH.sh
65
66This currently talks to the Apache Jena v1 Triplestore version that the
67Greenstone3 extension operates on port 3030
68
69
70# Running the 'prepare' ALL-CAPS scripts
71
72
73The prefix to the scripts to run are sequentially numbers 01-...,
7402-... and so on.
75
76
77To aid development and testing, there is a 'small' set of files that
78can be prepared, based on only the Eurovision entries for 2015.
79
80You can run all the steps to generate the small version with:
81
82 ./PREPARE-ALL-SMALL.sh
83
84Or else generate the data for the full collection with
85
86 ./PREPARE-ALL.sh
87
88Look inside the scripts, and copy and paste just the bits you want to
89run smaller segements of the prepare process.
90
91With the a complete run through done, move up one directory to the
92main collection directory and build the collection.
93
94
95
96
97
Note: See TracBrowser for help on using the repository browser.