source: main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/transform/pages/about.xsl@ 35093

Last change on this file since 35093 was 35093, checked in by davidb, 3 years ago

General text update; auto-focus param added into ssv_execute()

  • Property svn:executable set to *
File size: 32.6 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE videocollection [
3 <!ENTITY ndash "&#8211;">
4 <!ENTITY mdash "&#8212;">
5]>
6<xsl:stylesheet version="1.0"
7 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
8 xmlns:java="http://xml.apache.org/xslt/java"
9 xmlns:util="xalan://org.greenstone.gsdl3.util.XSLTUtil"
10 xmlns:gslib="http://www.greenstone.org/skinning"
11 xmlns:gsf="http://www.greenstone.org/greenstone3/schema/ConfigFormat"
12 extension-element-prefixes="java util"
13 exclude-result-prefixes="java util">
14
15
16 <xsl:template name="coll-description">
17 <gsf:style src="sites/{$site_name}/collect/{$collName}/css/eurovision.css"/>
18 <gsf:script src="sites/{$site_name}/collect/{$collName}/js/jquery.show-more.js"/>
19
20
21 <div id="about-desc">
22 <h2>Introduction</h2>
23 <!--
24 <p style="padding-bottom: 10px;">
25 The <a href="https://eurovision.tv">Eurovision Song
26 Contest</a> is a live-broadcast televised event that
27 was first held in 1956 featuring artists singing original songs from
28 7 countries. Since then it has grown into an event involving
29 over 40 countries, and streamed all around the world. ...
30
31 </p>
32 -->
33
34 <p style="padding-bottom: 10px;">
35 <i style="padding-right:6px;">A help to shore up a post war Europe in 1956 it
36 all began, where there were only seven countries and one
37 camera man!</i>
38 </p>
39 <p>
40 The <a href="https://eurovision.tv">Eurovision Song
41 Contest</a> is a long-running, live-broadcast televised multi-national
42 competition with a collaborative mission, not dissimilar
43 in spirit to the Olympics.
44 The contest has grown significantly from
45 that modest start with 7 countries (and one cameraman),
46 with over 40 countries competing these days—Australia
47 even takes part now, through a specially
48 arranged invitation. It's an annual celebration of
49 European culture and the highlight of many people's
50 year.
51 </p>
52
53 <div id="about-show-more">
54 <p>
55 At Eurovision there is no division because wherever
56 you come from Eurovision is home. The Eurovision song
57 contest is widely known as a safe space for LGBTQIA+
58 people and a platform for free expression. For example
59 trans-woman
60 <a href="https://en.wikipedia.org/wiki/Dana_International">Dana International</a>
61 won as far back as 1998.
62 There have been songs in many different languages over the
63 years, although most are in English these days. This
64 doesn't matter, however, because music is a language we all
65 know how to speak.
66 </p>
67 <p>
68 In its latest incarnation, after
69 all the performances are over, artists wait
70 nervously as via live television link-ups the show's hosts visit each
71 of the 40+ countries in turn collecting all points cast
72 by the country appointed juries. This includes
73 the all important top score that can be cast, 12 points
74 (douze points!), a double-increment up from the
75 10 points awarded to the song a country ranks second,
76 followed by 8, 7, 6 
 1 points awarded.
77 With over 20 countries competing in a final, this means
78 that not all performers gets points from that country.
79 Next comes the &quot;the popular vote&quot;
80 where fans, still grouped by country, have
81 the votes they cast by phone, SMS or the Eurovision app
82 tallied and mapped into the same format of 12 points for 1st
83 place, and so on.
84 This all culminates in a new winner being crowned, with
85 the competition typically being hosted the following year
86 in that country.
87 </p>
88 </div>
89 <gsf:script>
90 $('#about-show-more').showMore({
91 minheight: 0,
92 buttontxtmore:"show more ...",
93 buttontxtless:"... show less"
94 });
95 </gsf:script>
96
97
98 <h2>Features of this Website</h2>
99
100 <p>
101 This (unofficial) website has been developed by a small
102 team of dedicated Digital Library researchers who also
103 happen to be <i>huge</i> fans of Eurovision. We wish to
104 share our love for the competition, and at the same time
105 demonstrate what is possible when—harnessing some of that
106 passion!—the techniques of
107 <a href="https://en.wikipedia.org/wiki/Linked_data">Linked
108 Open Data</a> are applied
109 to the Open Source
110 <a href="https://www.greenstone.org">Greenstone3</a>
111 Digital Library platform. For the technically interested
112 see the
113 <a href="{$library_name}/collection/{$collName}/page/about#it-all-started-with">
114 <i style="padding-right: 6px;">It All Started with a Little <strike>Sparkle</strike>SPARQL</i></a>
115 below for details about how the digital library was formed.
116 </p>
117
118 <!--
119 <p>
120 For those who want to jump right in and access information about, as well as see and hear some of the past performances,
121 we suggest you
122 start by exploring the assembled information through
123 the browsing tabs, such as
124 <a href="{$library_name}/collection/{$collName}/browse/CL3">browse by countries</a>
125 if you want (for instance) to reminisce about songs your country have entered in the past, or
126 <a href="{$library_name}/collection/{$collName}/browse/CL4">browse by years</a> if
127 you are curious about who were the countries competing in that inaugural year of 1956.
128 Alternatively, use the quick-search box to query the DL collection for a term that you sparks
129 interest, such as
130 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=love&amp;s1.index=ZZ">love</a>
131 and
132 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=amore&amp;s1.index=ZZ">amore</a>,
133 or maybe something more frivolous such as
134 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=la&amp;s1.index=ZZ">la</a>.
135
136 </p>
137-->
138
139 <p>
140 For those who want to jump right in and access information about, as well as see and hear some of the past performances,
141 we suggest you
142 start by exploring the assembled information through
143 the browsing tabs. For example:
144 <ul>
145 <li><a href="{$library_name}/collection/{$collName}/browse/CL3">Browse by countries</a>
146 if you want (for instance) to reminisce about songs your country have entered in the past; or</li>
147 <li><a href="{$library_name}/collection/{$collName}/browse/CL4">Browse by years</a> if
148 you are curious about who were the countries competing in that inaugural year of 1956.</li>
149 </ul>
150 </p>
151 <p>
152 Alternatively, use the quick-search box to query the DL collection for a term that sparks
153 your interest. For example:
154 <ul>
155 <li>
156 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=love&amp;s1.index=ZZ">love</a>
157 and
158 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=amore&amp;s1.index=ZZ">amore</a>,
159 or maybe something more frivolous such as
160 <a href="{$library_name}/collection/{$collName}/search/TextQuery?qs=1&amp;rt=rd&amp;s1.level=Doc&amp;startPage=1&amp;s1.query=la&amp;s1.index=ZZ">la</a>.
161 </li>
162 </ul>
163 </p>
164
165
166 <h2>Data Analysis and Visualization</h2>
167
168 <gsf:script src="ext/jena/sgvizler2/sgvizler2.js"/>
169
170 <gsf:script>
171 $(document).ready(
172 function() {
173
174 // Exaple triple
175 // "s": { "type": "uri" , "value": "http://127.0.0.1:8383/greenstone3/library/collection/eurovision/document/HASH0191e9cc7bfdf14743472257s10" } ,
176 // "p": { "type": "uri" , "value": "gsdlextracted:Country" } ,
177 // "o": { "type": "literal" , "value": "United Kingdom" }
178
179 sgvizler2.containerDraw('sgvizler2-country-count');
180 }
181 );
182 </gsf:script>
183
184 <xsl:variable name="graphURI">https://so-we-must-think.space<xsl:value-of select="$siteURL"/><xsl:value-of select="$library_name"/>/collection/<xsl:value-of select="$collName"/></xsl:variable>
185 <div id="sgvizler2-country-count"
186 data-sgvizler-endpoint="//sowemustthink.space/greenstone3-lod3/greenstone/query"
187 data-sgvizler-chart="google.visualization.BarChart"
188 data-sgvizler-chart-options="title=Number of Songs from each Country|legend.position=none|height=900|chartArea.height=840|fontSize=11"
189 data-sgvizler-log="2"
190 style="width:900px; height:300px; margin-left: auto; margin-right: auto; overflow-y: scroll; overflow-x: hidden;">
191 <xsl:attribute name="data-sgvizler-query">
192 PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
193 PREFIX gsdlextracted: &lt;http://greenstone.org/gsdlextracted#&gt;
194
195 SELECT ?country (COUNT(?country) AS ?freqCount)
196 WHERE {
197 GRAPH &lt;<xsl:value-of select="$graphURI"/>&gt; {
198 {
199 SELECT DISTINCT ?country ?year WHERE {
200 ?s gsdlextracted:Country ?country.
201 ?s gsdlextracted:Year ?year.
202 } ORDER BY ?country ?year
203 }
204 }
205 }
206 GROUP BY ?country ORDER BY ASC(?country)
207 </xsl:attribute>
208 <xsl:text> Loading ...</xsl:text>
209 </div>
210
211
212 <p style="padding-top: 10px;">
213 All the metadata in the digital library is simultaneously
214 published as linked data, meaning it is possible to
215 extract and analyze the data contained here in a variety
216 of ways. To aid in such analysis we have
217 added in a data visualization layer to the digital
218 library. This is how the bar-graph above has been
219 created, which shows how many times each country has
220 competed, alphabetically sorted.
221 </p>
222 <p>
223 Through our:
224 <ul>
225 <li>
226 <a href="{$library_name}/collection/{$collName}/page/sgvizler">Visualizer page</a>
227 </li>
228 </ul>
229 </p>
230 <p>
231 we provide samples you can try out to give you an idea of
232 the sorts of visualization that can be produced. More
233 importantly, these samples are editable so you are free to
234 change them however you wish. On the visualization page
235 you'll find a sample that shows you how often different
236 countries have won Eurovision, but perhaps you'd like to
237 find out who has lost the most often? We also provide a
238 sample dataflow visualization of jury voting patterns over
239 the last decade, which makes for interesting viewing!
240 Adjust the values used to discover how this compares
241 with other time periods.
242 </p>
243
244 <div id="viz-show-more" style="margin-bottom: 10px;">
245
246 <p>
247 In addition to the visualizer, through the:
248 <ul>
249 <li>
250 <a href="{$library_name}/collection/{$collName}/page/sparql">Data Analysis page</a>
251 </li>
252 </ul>
253 you will find a set of samples you can test-drive to give you an idea of the
254 sorts of raw data analysis that can be done. The syntax used is called
255 <a href="https://en.wikipedia.org/wiki/SPARQL" target="_blank">SPARQL</a> (pronounced &quot;sparkle&quot;). If you are unfamiliar
256 with this syntax, there are a variety of tutorials available online where you can learn about query language, such as
257 the one done by <a href="https://jena.apache.org/tutorials/sparql.html" target="_blank">Apache Jena</a>, an Open Source
258 initiative that provides a variety of Semantic Web and Linked Data tools.
259 As before, these samples are editable so you are free to
260 change them however you wish to adjust the analysis undertaken, or once you're mastered the
261 query syntax, develop completely original forms of
262 analysis.
263 </p>
264
265
266 <p>
267 We suggest starting with viewing <a href="{$library_name}/collection/{$collName}/page/sgvizler">sample visualizations</a> to see what's possible,
268 and making minor edits to that to adjust what is visualized.
269 Then, if you want to start visualizing the data in a more substantially different way
270 or else export the data for more detailed analysis under your own control,
271 switch to the <a href="{$library_name}/collection/{$collName}/page/sparql">SPARQL-based data analysis</a> page to ensure the underlying
272 data retrieved is as you intended. Then take the newly developed SPARQL query back to the visualizer page, and through the
273 additional text-input fields provided there, develop the visualization.
274
275 </p>
276
277 </div>
278
279 <gsf:script>
280 $('#viz-show-more').showMore({
281 minheight: 0,
282 buttontxtmore:"show more ...",
283 buttontxtless:"... show less"
284 });
285 </gsf:script>
286
287<!--
288 <p>
289 If you'd like to dig into the data behind this Digital Library collection, this can be done directly
290 using the <a href="{$library_name}/collection/{$collName}/page/sparql">SPARQL Query interface</a>.
291 This is a good place to go to see what sort of data is being stored, and we provide some sample
292 queries to get you going. But if you like to see the data presented more visually, we suggest
293 you try out the <a href="{$library_name}/collection/{$collName}/page/sgvizler">SGVizler page</a>,
294 which takes things to the next level, using pie-charts, histograms and other forms of
295 visualization to present the data.
296 </p>
297
298-->
299
300 <h2 id="it-all-started-with">It All Started with a Little <strike>Sparkle</strike>SPARQL</h2>
301
302
303 <p>
304 In terms of how this collection was developed using the
305 Greenstone3 Digital Library (DL) architecture, we are
306 being a touch irreverent to say <i>it all started with a
307 little SPARQL</i>.
308 It is certainly true to say that, operationally, the DL
309 was created using SPARQL query that draws down JSON
310 records from
311 <a href="https://dbpedia.org" target="_blank">DBPedia</a>
312 about all the different entrants in the Eurovision. This
313 is then ingested into Greenstone using its document- and
314 metadata-processing pipeline: expand through the <i>show
315 more ...</i> button below to see the actual query.
316 But in truth, our starting point of the SPARQL query is
317 only possible due to the Herculean efforts of the
318 contributors to the Wikipedia pages about
319 the Eurovision Song Contest, and following on from
320 that the endeavors of the DBPedia project to
321 transform a substantial portion of that information
322 into machine-readable linked data.
323 </p>
324
325 <p>
326 Continuing the technical development of the DL,
327 to the DBpedia extracted content, we then added in voting metadata—again
328 using the Greenstone document- and metadata- processing
329 pipeline—this time in the form of CSV-based spreadsheet derived from the
330 <a href="https://www.kaggle.com/datagraver/eurovision-song-contest-scores-19752019" target="_blank">Kaggle Eurovision Voting dataset 1975-2019</a>.
331 </p>
332
333
334 <div id="dl-tech-show-more">
335 <p>
336 Here's the SPARQL query that retrieves, for every year
337 Eurovision has been held, the countries that took part.
338 At under 20 lines of code, we think it's pretty awesome!
339 The information retrieved includes the country, year,
340 title of the song, and name of the entrant (the
341 act/artist), amongst other things. All useful core
342 information to seed the digital library collection. As
343 the 2020 Eurovision event did not run due to the
344 Covid-19 Pandemic, and (at the time of writing the 2021
345 is yet to occur), we have opted to filter the matches
346 returned to be prior to 2020.
347 </p>
348<!--
349# bind( REPLACE(str(?country_in_year), ".*(\\d{4})", "$1") AS ?year).
350
351PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
352xsd:
353skos:
354prov:
355
356dbc:
357dbp:
358
359dct:
360-->
361 <pre style="background-color: #fff; color: #000; padding: 12px; margin-right: 6px;">
362SELECT ?countries_in_esc_by_year ?country_in_year (?year AS ?Year) (?country AS ?Country) ?entrant (?entrant_label AS ?Creator) ?song (?song_label AS ?Title) (?was_derived_from AS ?WikipediaURL)
363WHERE {
364 ?countries_in_esc_by_year skos:broader dbc:Countries_in_the_Eurovision_Song_Contest_by_year.
365
366 ?country_in_year dct:subject ?countries_in_esc_by_year.
367 ?country_in_year dbp:year ?year.
368 FILTER ( xsd:integer(?year) &lt; 2020).
369
370 ?country_in_year dbp:country ?country.
371
372 ?country_in_year dbp:entrant ?entrant.
373 ?entrant rdfs:label ?entrant_label
374 FILTER (lang(?entrant_label) = 'en').
375
376 ?country_in_year dbp:song ?song.
377 ?song rdfs:label ?song_label
378 FILTER (lang(?song_label) = 'en').
379
380 OPTIONAL {
381 ?song prov:wasDerivedFrom ?was_derived_from
382 }
383}
384ORDER BY DESC(?countries_in_esc_by_year)
385 </pre>
386
387 <p>
388 You can try this query out yourself if you like. Select the entirety of the SPARQL query
389 in the above text box, and press <i>Control-C</i> to place it in your Copy-buffer.
390 Next visit the DBPedia SPARQL Endpoint given below, and in the main text box of the page
391 that appears, press <i>Control-V</i>
392 to paste in your SPARQL query. Finally, click on the <i>Execute Query</i> button
393 to initiate the search.
394 <ul>
395 <li>
396 <a href="https://dbpedia.org/sparql/" target="_blank">DBpedia's SPARQL endpoint</a>
397 </li>
398 </ul>
399 </p>
400 <p>
401 Through the SPARQL Endpoint you can change the output format that is used to, for example, JSON or Turtle.
402 For convenience, if you are just interested in seeing what the outcome of running the query is, displayed as a web page:
403 <ul>
404 <li>
405 <a href="https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&amp;query=SELECT+%3Fcountries_in_esc_by_year+%3Fcountry_in_year+%3Fyear+as+%3FYear+%3Fcountry+as+%3FCountry+%3Fentrant+%3Fentrant_label+as+%3FCreator+%3Fsong+%3Fsong_label+as+%3FTitle+%3Fwas_derived_from+as+%3FWikipediaURL%0D%0AWHERE+%7B%0D%0A++++%3Fcountries_in_esc_by_year+skos%3Abroader+dbc%3ACountries_in_the_Eurovision_Song_Contest_by_year.%0D%0A%0D%0A++++%3Fcountry_in_year+dct%3Asubject+%3Fcountries_in_esc_by_year.%0D%0A++++bind%28+REPLACE%28str%28%3Fcountry_in_year%29%2C+%22.*%28%5C%5Cd%7B4%7D%29%22%2C+%22%241%22%29+as+%3Fyear%29.%0D%0A++++FILTER+%28+xsd%3Ainteger%28%3Fyear%29+%3C+2020%29.%0D%0A%0D%0A++++%3Fcountry_in_year+dbp%3Acountry+%3Fcountry.%0D%0A%0D%0A++++%3Fcountry_in_year+dbp%3Aentrant+%3Fentrant.%0D%0A++++%3Fentrant+rdfs%3Alabel+%3Fentrant_label%0D%0A++++++FILTER+%28lang%28%3Fentrant_label%29+%3D+%27en%27%29.%0D%0A%0D%0A++++%3Fcountry_in_year+dbp%3Asong+%3Fsong.%0D%0A++++%3Fsong+rdfs%3Alabel+%3Fsong_label%0D%0A++++++FILTER+%28lang%28%3Fsong_label%29+%3D+%27en%27%29.%0D%0A%0D%0A++++OPTIONAL+%7B%0D%0A++++++%3Fsong+prov%3AwasDerivedFrom+%3Fwas_derived_from%0D%0A++++%7D%0D%0A%7D%0D%0AORDER+BY+DESC%28%3Fcountries_in_esc_by_year%29&amp;format=text%2Fhtml&amp;timeout=30000&amp;signal_void=on&amp;signal_unconnected=on" target="_blank">Click here to run the query directly</a>
406 </li>
407 </ul>
408 </p>
409
410 <h2>Triplestore Errata</h2>
411
412 <p>
413 The above SPARQL query is a good starting point to
414 extract all the Eurovision entries over the years,
415 however a more careful study of the returned results
416 revealed a few complications that needed to be
417 addressed. One issue stems from the fact that in its
418 inaugural year, countries were allowed to send two
419 entries each. For 1956, for every URI representing a
420 country in that year there are two title and two
421 entrants represented. As initially expressed, the
422 SPARQL query does not cater for this circumstance and
423 results in 2 x 2 = 4 combinations of artist and title
424 per song.
425 </p>
426 <p>
427 The way to address this is to include an additional
428 constraint that ensures that the URI representing
429 <i>?song</i> includes the relationship <i>dbp:artist</i>
430 for <i>?entrant</i>, effectively locking in to the
431 artist that performed that particular song. Studying
432 the result of this change, however, showed up a more
433 wide-reaching problem which was that not all the
434 <i>?country_year</i> URI entries expressed relationships
435 to songs and artists that were themselves URI: sometimes
436 they were represented as a string literal, meaning the
437 added constraint would fail, and reject entirely the
438 details about a country's entry in that
439 year. Compounding this, we also saw that some of the
440 processing work by DBPedia to turn the manually curated
441 information in Wikipedia into machine-readable form
442 erroneously handled the formation of some of the song
443 titles and artists.
444 </p>
445 <p>
446 Given that the erroneous entries were strings (even
447 integer numbers at times!) and not URI gave us a way in
448 to see how wide-spread the problem was. Using adapted
449 versions of the the main SPARQL query we had formulated,
450 we were able to produce lists of the affected entries.
451 The lists are available here through the following
452 links:
453 <ul>
454 <li>
455 <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-songs.html">Problem Songs (titles are literals not URIs/IRIs)</a>
456 </li>
457 <li>
458 <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-entrants.html">Problem Entrants (artists are literals not URIs/IRIs)</a>
459 </li>
460 </ul>
461 </p>
462
463 <p>
464 The generation of these lists also provided the key to
465 the approach we used to compensate for the complications
466 these issues introduced. Skipping ahead slightly to the
467 formation of the Digital Library collection with
468 Greenstone3, we make use of this software architecture's
469 Triplestore Extension, which means that in addition to
470 the main DL and Open Archive Initiative (OAI) server
471 endpoints, there is also a triplestore backend. While
472 the triplestore extension was designed to provide SPARQL
473 access to the metadata and document content of the DL
474 collections, its existence means we can include in it a
475 graph that represents the necessary errata information
476 we need to &quot;course correct&quot; the SPARQL query
477 to perform how it is intended.
478 </p>
479
480 <p>
481 This does admittedly complicate the expression of the
482 query, but the additions are manageable. The expanded
483 query makes use of SPARQL's federated search feature:
484 the query starts as before with the retrieval of triples
485 from the DBPedia endpoint; based on resolved values of
486 entities such as <i>?country_year</i> and <i>?song</i>,
487 it then optionally retrieves matching items from the DL
488 SPARQL endpoint. The final step is to use a conditional
489 clause (if-statement) to test to see if the DBpedia
490 version of the song is a literal, and if it is and if
491 there is a bound value for the DL retrieved one, then it
492 selects that one in preference.
493 </p>
494
495 <p>
496 The DBpedia SPARQL endpoint doesn't allow for federated
497 queries, and so we initiate the SPARQL queries through
498 the DLs SPARQL endpoint, using SERVICE blocks to specify
499 the parts of the query that are run on the DBpedia endpoint.
500 <ul>
501 <li>
502 <a href="{$library_name}/collection/{$collName}/page/sparql">DL's (local) SPARQL endpoint</a>
503 </li>
504 </ul>
505 </p>
506
507 <h3>Adding in Voting Metadata</h3>
508
509 <p>
510 To fulfill our vision of developing this DL collection
511 as a rich resource through which people can explore the
512 phenomenon we went looking for voting data that was
513 available in a machine-readable format.
514 We found data compiled through a manual curation process
515 about how countries have voted going back to 1975 is available through the
516 <a href="https://www.kaggle.com/datagraver/eurovision-song-contest-scores-19752019">Kaggle website as an Excel spreadsheet</a>.
517 </p>
518 <p>
519 To incorporate this as metadata into the DL, we wrote
520 some Python code to transform the data into the internal
521 serialized metadata format used by Greenstone. Prior to
522 this project, the only serialized form for this was XML,
523 which is processed by the MetadataXML plugin. As it was
524 more convenient to generate JSON from our Python code,
525 we took the step of adding in a new plugin to
526 Greenstone3: MetadataJSON.
527 </p>
528
529 <h3>Page Scraping</h3>
530
531 <p>
532 Despite our best intentions work soley with
533 machine-readable data—primarily as you have seen in the
534 form of Linked Open Data, but also utilizing a
535 spreadsheet of voting data—to form the Eurovision DL,
536 in looking to expand the metadata in the DL to cover
537 details concerning the draw position of acts, and their
538 overall placing, we have resorted to page-scraping
539 content from Wikipedia itself. This was because such
540 information was not part of the entity extraction
541 process that occurs when Wikipedia is mapped to DBpedia.
542 </p>
543
544 <p>
545 A review of Wikipedia article pages about the event in
546 any given year showed these pages to be especially well
547 curated, and included a table in each that listed the
548 information we sought. While there was some variation
549 in how this table was expressed in HTML, with a
550 considerably portion of the heavy lifting being done by
551 the Python library BeautifulSoup4, it was not too
552 complex a task to develop a program that extracted this
553 information and turned it into the newly developed
554 Greenstone JSON metadata format.
555 </p>
556
557 <h3>Patching in Missing Data</h3>
558
559
560 <p>
561 Another difficulty we have encountered is that
562 not every country who had an entry in Eurovision
563 in a given year has its own standalone article page.
564 This leads to missing entries in the category
565 page for the contest in a given year, which is
566 problematic to us, because it is this category
567 information that we draw upon in our SPARQL query
568 to populate the DL with all the acts.
569 </p>
570 <p>
571 The information about all the countries competing
572 in a given year does, however, appear in the
573 article page for the contest in that year. In fact
574 it's in the same table we targetted to extract out
575 draw position and placement. We therefore
576 wrote a further page-scraping program to compare
577 the countries in that table with the countries
578 listed on the category page for the contest in
579 that year. For any entries we find in the
580 table, but not in the Category page, we
581 produce a metadata record for the DL
582 with basic information about the entry:
583 country, year, song title, artist,
584 draw-position, placement, and (where available)
585 their total score.
586 </p>
587 <p>
588 Comparable with the problem titles and artist/entrants,
589 we have formulated a SPARQL query that enumerates
590 these missing category entrants:
591 <!--
592 We took the opportunity to add in further fields: Performing Position, Placement, Voting Total, thumbnail flag image.
593
594
595 An unintended side-affect of this is that we have also been able to expand
596 -->
597
598
599 <ul>
600 <li>
601 <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-category-in-year.html">Problem Category pages (some countries not listed in a given year despite competing)</a>
602 </li>
603 </ul>
604 </p>
605
606
607 </div>
608 <gsf:script>
609 $('#dl-tech-show-more').showMore({
610 minheight: 0,
611 buttontxtmore:"show more ...",
612 buttontxtless:"... show less"
613 });
614 </gsf:script>
615
616
617 <div>
618 <h2>The Gory Details</h2>
619<!--
620 <p>
621 The resulting SPARQL query result set (JSON format
622 selected for output) is then ingested into a Greenstone
623 DL collection, and used in a variety of ways. For now
624 an (admittedly cryptic) list of technical steps that
625 were developed and/or deployed to provide the
626 functionality encountered in interacting with this site.
627
628 <ul>
629 <li>New SPARQL plugin for <i>download_from.pl</i> developed, used in GLI to enter the above query</li>
630 <li>New SPARQL <i>Document Processing</i> plugin developed</li>
631 <li>Greenstone3 Apache Jena Triple Store Extension activated</li>
632 <li>SGVizler used to display Google Visualizations such as the pie-chart above.</li>
633 <li>Metadata in document view enhanced through Greenstone Format Statements micro-data</li>
634 <li>Custom <i>interface</i> developed</li>
635 </ul>
636 </p>
637-->
638 <p>
639 Viewing the
640 <a download="collectionConfig.xml"
641 href="sites/{$site_name}/collect/{$collName}/etc/collectionConfig.xml">collection
642 configuration file</a> provides a good insight into how
643 all of these technical aspects are brought together.
644 </p>
645
646 <p>
647 Full disclosure as to how the collection all ticks is
648 provided through our Subversion repository. Topping up
649 our
650 <a href="https//trac.greenstone.org/browser/main/trunk/greenstone3">Greenstone3
651 code base</a> we have:
652
653 <ul>
654 <li>The site: <a href="https://trac.greenstone.org/browser/main/trunk/model-sites-dev/eurovision-lod">eurovision-lod</a></li>
655 <li>The interface: <a href="https://trac.greenstone.org/browser/main/trunk/model-interfaces-dev/eurovision-lod">eurovision-lod</a></li>
656 <li>The triplestore extension: <a href="https://trac.greenstone.org/browser/gs2-extensions/apache-jena/trunk/src">apache-jena</a></li>
657 </ul>
658
659 </p>
660
661 </div>
662
663<!--
664 <div id="technicaldev-turnstyle" style="margin-top: 12px;">
665 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
666 DL Technical Development
667 </div>
668
669 <div style="display: none; padding-left: 6px; padding-top: 6px; margin-left: 2px; margin-right: 2px; border-left: white solid 1px; border-right: white solid 1px; border-bottom: white solid 1px;">
670 <p>
671 In terms of how this collection was developed using the
672 Greenstone DL architecture, the starting point is the
673 formulation of a SPARQL query to retrieve from DBpedia
674 entries about all the entrants in the contest over the
675 years:
676
677 </p>
678
679 </div>
680 </div>
681
682 <script>
683 <xsl:text disable-output-escaping="yes">
684 $(function(){
685 transformToTurnstyleBlock("technicaldev");
686 });
687 </xsl:text>
688 </script>
689-->
690
691<!--
692 <div id="LOD-turnstyle" style="margin-top: 12px;">
693 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
694 Linked Open Data
695 </div>
696
697 <div style="display: none; padding-left: 6px; padding-top: 6px; margin-left: 2px; margin-right: 2px; border-left: white solid 1px; border-right: white solid 1px; border-bottom: white solid 1px;">
698
699
700 <h2>Eurovision LOD SPARQL Endpoints</h2>
701 <p>
702 The source data can be access vis the DBpedia SPARQL endpoint. The ingested,
703 data (with correction) is available through the collection's local
704 SPARQL endpoint:
705 <ul>
706 <li>
707 <a href="https://dbpedia.org/sparql/">DBpedia's SPARQL endpoint</a>
708 </li>
709 <li>
710 <a href="{$library_name}/collection/{$collName}/page/sparql">DL's (local) SPARQL endpoint</a>
711 </li>
712 </ul>
713 </p>
714
715 <h2>Eurovision LOD Errata</h2>
716 </div>
717 </div>
718
719 <script>
720 <xsl:text disable-output-escaping="yes">
721 $(function(){
722 transformToTurnstyleBlock("LOD");
723 });
724 </xsl:text>
725 </script>
726-->
727
728<!--
729 <div id="voting-turnstyle" style="margin-top: 12px;">
730 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
731 Voting Data
732 </div>
733
734 <div style="display: none; padding-left: 6px; padding-top: 6px; margin-left: 2px; margin-right: 2px; border-left: white solid 1px; border-right: white solid 1px; border-bottom: white solid 1px;">
735 <p>
736 The Voting data used in this collection is sourced from the Kaggle, which in turn
737 is derived from work available through Data Graver:
738 <ul>
739 <li><a href="https://www.kaggle.com/datagraver/eurovision-song-contest-scores-19752019">Kaggle Eurovision Voting dataset 1975-2019</a></li>
740 <li><a href="https://data.world/datagraver/eurovision-song-contest-scores-1975-2019">Data Graver</a></li>
741 <li><a href="https://docs.google.com/spreadsheets/d/1veXpiF54hQGP4OVuf1xjowumIe8HUOhI/edit#gid=528591420">Google Spreadsheet (internal use only)</a></li>
742
743 </ul>
744 </p>
745 </div>
746 </div>
747
748 <script>
749 <xsl:text disable-output-escaping="yes">
750 $(function(){
751 transformToTurnstyleBlock("voting");
752 });
753 </xsl:text>
754 </script>
755
756-->
757
758 </div>
759
760 </xsl:template>
761
762
763</xsl:stylesheet>
764
Note: See TracBrowser for help on using the repository browser.