Changeset 35044 for main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/transform/pages/about.xsl
- Timestamp:
- 2021-04-08T17:24:10+12:00 (3 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/transform/pages/about.xsl
r35043 r35044 379 379 380 380 ?country_in_year dct:subject ?countries_in_esc_by_year. 381 bind( REPLACE(str(?country_in_year), ".*(\\d{4})", "$1") as ?year). 381 # bind( REPLACE(str(?country_in_year), ".*(\\d{4})", "$1") as ?year). 382 ?country_in_year dbp:year ?year. 382 383 FILTER ( xsd:integer(?year) < 2020). 383 384 … … 424 425 <h3>Triplestore Errata</h3> 425 426 427 <p> 428 The above SPARQL query is a good starting point to 429 extract all the Eurovision entries over the years, 430 however a more careful study of the returned results 431 revealed a few complications that needed to be 432 adressed. One issue stems from the fact that in its 433 inaugural year, countries were allowed to send two 434 entries each. For 1956, for every URI representing a 435 country in that year there are two title and two 436 entrants represented. As initially expressed, the 437 SPARQL query does not cater for this circumstance and 438 results in 2 x 2 = 4 combintations of artist and title 439 per song. 440 </p> 441 <p> 442 The way to address this is to include an additional 443 constraint that ensures that the URI representing 444 <i>?song</i> includes the relationship <i>dbp:artist</i> 445 for <i>?entrant</i>, effectively locking in to the 446 artist that performed that particular song. Studying 447 the result of this change, however, showed up a more 448 wide-reaching problem which was that not all the 449 <i>?country_year</i> URI entries expressed relationships 450 to songs and artists that were themselves URI: sometimes 451 they were represented as a string literal, meaning the 452 added constraint would fail, and reject entirely the 453 details about a country's entry in that 454 year. Compounding this, we also saw that some of the 455 processing work by DBPedia to turn the manually curated 456 information in Wikipedia into machine-readable form 457 errenouesly handled the formation of some of the song 458 titles and artists. 459 </p> 460 <p> 461 Given that the erroneous entries were strings (even 462 integer numbers at times!) and not URI gave us a way in 463 to see how wide-spread the problem was. Using adapted 464 versions of the the main SPARQL query we had formulated, 465 we were able to produce lists of the affected entries. 466 The lists are available here through the following 467 links: 468 <ul> 469 <li> 470 <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-songs.html">Problem Songs (titles are literals not URIs/IRIs)</a> 471 </li> 472 <li> 473 <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-entrants.html">Problem Entrants (artists are literals not URIs/IRIs)</a> 474 </li> 475 </ul> 476 </p> 477 478 <p> 479 The generation of these lists also provided the key to 480 the approach we used to compensate for the compliations 481 these issues introduced. Skipping ahead slightly to the 482 formation of the Digital Library collection with 483 Greenstone3, we make use of this software architecture's 484 Triplestore Extension, which means that in addition to 485 the main DL and Open Archive Inititiave (OAI) server 486 endpoints, there is also an triplestore backend. While 487 the triplestore extension was designed to provide SPARQL 488 access to the metadata and document content of the DL 489 collections, its existence means we can include in it a 490 graph that represents the necessary errata information 491 we need to "course correct" the SPARQL query 492 to perform how it is intended. 493 </p> 494 495 <p> 496 This does admittedly complicate the expression of the 497 query, but the additions are managible. The expanded 498 query makes use of SPARQL's federated search feature: 499 the query starts as before with the retrieval of triples 500 from the DBPedia endpoint; based on resolved values of 501 entities such as <i>?country_year</i> and <i>?song</i>, 502 it then optionally retrieves matching items from the DL 503 SPARQL endpoint. The final step is to use a conditional 504 clause (if-statment) to test to see if the DBpedia 505 version of the song is a literal, and if it is and if 506 there is a bound value for the DL retrieved one, then it 507 selects that one in preference. 508 </p> 509 510 <p> 511 The DBpedia SPARQL endpoint doesn't allow for federated 512 queries, and so we initiate the SPARQL queries through 513 the DLs SPARQL endpoint, using SERVICE blocks to specify 514 the parts of the query that are run on the DBpedia endpoint. 515 <ul> 516 <li> 517 <a href="{$library_name}/collection/{$collName}/page/sparql">DL's (local) SPARQL endpoint</a> 518 </li> 519 </ul> 520 </p> 521 426 522 <h3>Adding in Voting Metadata</h3> 427 523 428 <h3>Page Scraping</h3> 429 430 524 <p> 525 To fulfill our vision of developing this DL collection as a rich resource to 526 through which to explore the phenonomon 527 528 </p> 529 530 <h3>Patching in Missing Data: Page Scraping</h3> 531 532 533 <p> 534 Despite our best intentions to work soley with .... 535 .. missing categories ... 536 537 totting up how many entrie per year ... 538 thousands of entries 539 540 We took the opportunity to add in further fields: Performing Position, Placement, Voting Total, thumbnail flag image. 541 542 <ul> 543 <li> 544 <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-category-in-year.html">Problem Category pages (some countries not listed in a given year despite competing)</a> 545 </li> 546 </ul> 547 </p> 548 549 550 </div> 551 <gsf:script> 552 $('#dl-tech-show-more').showMore({ 553 minheight: 0, 554 buttontxtmore:"show more ...", 555 buttontxtless:"... show less" 556 }); 557 </gsf:script> 558 559 560 <div> 431 561 <h3>The Gory Details</h3> 432 562 <!-- … … 472 602 </p> 473 603 474 475 </div> 476 <gsf:script> 477 $('#dl-tech-show-more').showMore({ 478 minheight: 0, 479 buttontxtmore:"show more ...", 480 buttontxtless:"... show less" 481 }); 482 </gsf:script> 483 484 604 </div> 605 606 <!-- 485 607 <div id="technicaldev-turnstyle" style="margin-top: 12px;"> 486 608 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);"> … … 498 620 </p> 499 621 500 501 <p>502 Bullet points above to be expanded upon!503 </p>504 505 622 </div> 506 623 </div> … … 513 630 </xsl:text> 514 631 </script> 515 632 --> 633 634 <!-- 516 635 <div id="LOD-turnstyle" style="margin-top: 12px;"> 517 636 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);"> … … 538 657 539 658 <h2>Eurovision LOD Errata</h2> 540 <p>541 Songs titles and Entrants (artists) that do not resolve to URIs:542 <ul>543 <li>544 <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-songs.html">Problem Songs (titles are string literals not IRIs)</a>545 </li>546 <li>547 <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-entrants.html">Problem Entrants (artists are string literals not IRIs)</a>548 </li>549 <li>550 <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-category-in-year.html">Problem Category pages (some countries not listed in a given year despite competing)</a>551 </li>552 </ul>553 </p>554 659 </div> 555 660 </div> … … 562 667 </xsl:text> 563 668 </script> 564 669 --> 670 671 <!-- 565 672 <div id="voting-turnstyle" style="margin-top: 12px;"> 566 673 <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);"> … … 590 697 </script> 591 698 699 --> 592 700 593 701 </div> 594 702 595 703 </xsl:template> 596 704
Note:
See TracChangeset
for help on using the changeset viewer.