Changeset 35044


Ignore:
Timestamp:
04/08/21 17:24:10 (2 weeks ago)
Author:
davidb
Message:

Expanded/Updated text

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/transform/pages/about.xsl

    r35043 r35044  
    379379
    380380    ?country_in_year dct:subject ?countries_in_esc_by_year.
    381     bind( REPLACE(str(?country_in_year), ".*(\\d{4})", "$1") as ?year).
     381#    bind( REPLACE(str(?country_in_year), ".*(\\d{4})", "$1") as ?year).
     382    ?country_in_year dbp:year ?year.
    382383    FILTER ( xsd:integer(?year) < 2020).
    383384
     
    424425        <h3>Triplestore Errata</h3>
    425426
     427        <p>
     428          The above SPARQL query is a good starting point to
     429          extract all the Eurovision entries over the years,
     430          however a more careful study of the returned results
     431          revealed a few complications that needed to be
     432          adressed. One issue stems from the fact that in its
     433          inaugural year, countries were allowed to send two
     434          entries each.  For 1956, for every URI representing a
     435          country in that year there are two title and two
     436          entrants represented.  As initially expressed, the
     437          SPARQL query does not cater for this circumstance and
     438          results in 2 x 2 = 4 combintations of artist and title
     439          per song.
     440        </p>
     441        <p>
     442          The way to address this is to include an additional
     443          constraint that ensures that the URI representing
     444          <i>?song</i> includes the relationship <i>dbp:artist</i>
     445          for <i>?entrant</i>, effectively locking in to the
     446          artist that performed that particular song.  Studying
     447          the result of this change, however, showed up a more
     448          wide-reaching problem which was that not all the
     449          <i>?country_year</i> URI entries expressed relationships
     450          to songs and artists that were themselves URI: sometimes
     451          they were represented as a string literal, meaning the
     452          added constraint would fail, and reject entirely the
     453          details about a country's entry in that
     454          year. Compounding this, we also saw that some of the
     455          processing work by DBPedia to turn the manually curated
     456          information in Wikipedia into machine-readable form
     457          errenouesly handled the formation of some of the song
     458          titles and artists.
     459        </p>
     460        <p>
     461          Given that the erroneous entries were strings (even
     462          integer numbers at times!) and not URI gave us a way in
     463          to see how wide-spread the problem was.  Using adapted
     464          versions of the the main SPARQL query we had formulated,
     465          we were able to produce lists of the affected entries.
     466          The lists are available here through the following
     467          links:
     468          <ul>
     469        <li>
     470          <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-songs.html">Problem Songs (titles are literals not URIs/IRIs)</a>
     471        </li>
     472        <li>
     473          <a target="_blank" href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-entrants.html">Problem Entrants (artists are literals not URIs/IRIs)</a>
     474        </li>
     475          </ul>
     476        </p>
     477
     478        <p>
     479          The generation of these lists also provided the key to
     480          the approach we used to compensate for the compliations
     481          these issues introduced.  Skipping ahead slightly to the
     482          formation of the Digital Library collection with
     483          Greenstone3, we make use of this software architecture's
     484          Triplestore Extension, which means that in addition to
     485          the main DL and Open Archive Inititiave (OAI) server
     486          endpoints, there is also an triplestore backend.  While
     487          the triplestore extension was designed to provide SPARQL
     488          access to the metadata and document content of the DL
     489          collections, its existence means we can include in it a
     490          graph that represents the necessary errata information
     491          we need to &quot;course correct&quot; the SPARQL query
     492          to perform how it is intended.
     493        </p>
     494       
     495        <p>
     496          This does admittedly complicate the expression of the
     497          query, but the additions are managible.  The expanded
     498          query makes use of SPARQL's federated search feature:
     499          the query starts as before with the retrieval of triples
     500          from the DBPedia endpoint; based on resolved values of
     501          entities such as <i>?country_year</i> and <i>?song</i>,
     502          it then optionally retrieves matching items from the DL
     503          SPARQL endpoint.  The final step is to use a conditional
     504          clause (if-statment) to test to see if the DBpedia
     505          version of the song is a literal, and if it is and if
     506          there is a bound value for the DL retrieved one, then it
     507          selects that one in preference.
     508        </p>
     509
     510        <p>
     511          The DBpedia SPARQL endpoint doesn't allow for federated
     512          queries, and so we initiate the SPARQL queries through
     513          the DLs SPARQL endpoint, using SERVICE blocks to specify
     514          the parts of the query that are run on the DBpedia endpoint.
     515          <ul>
     516        <li>
     517          <a href="{$library_name}/collection/{$collName}/page/sparql">DL's (local) SPARQL endpoint</a>
     518        </li>
     519          </ul>
     520        </p>
     521       
    426522        <h3>Adding in Voting Metadata</h3>
    427523
    428         <h3>Page Scraping</h3>
    429 
    430 
     524        <p>
     525          To fulfill our vision of developing this DL collection as a rich resource to
     526          through which to explore the phenonomon
     527         
     528        </p>
     529       
     530        <h3>Patching in Missing Data: Page Scraping</h3>
     531
     532       
     533        <p>
     534          Despite our best intentions to work soley with ....
     535          .. missing categories ...
     536
     537          totting up how many entrie per year ...
     538          thousands of entries
     539         
     540          We took the opportunity to add in further fields: Performing Position, Placement, Voting Total, thumbnail flag image.
     541         
     542          <ul>
     543        <li>
     544          <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-category-in-year.html">Problem Category pages (some countries not listed in a given year despite competing)</a>
     545        </li>
     546          </ul>
     547        </p>
     548       
     549         
     550      </div>
     551      <gsf:script>
     552        $('#dl-tech-show-more').showMore({
     553            minheight: 0,
     554        buttontxtmore:"show more ...",
     555        buttontxtless:"... show less"
     556          });
     557      </gsf:script>
     558
     559
     560      <div>
    431561        <h3>The Gory Details</h3>
    432562<!--
     
    472602          </p>
    473603
    474          
    475       </div>
    476       <gsf:script>
    477         $('#dl-tech-show-more').showMore({
    478             minheight: 0,
    479         buttontxtmore:"show more ...",
    480         buttontxtless:"... show less"
    481           });
    482       </gsf:script>
    483 
    484      
     604      </div>
     605     
     606<!--     
    485607      <div id="technicaldev-turnstyle" style="margin-top: 12px;">
    486608        <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
     
    498620          </p>
    499621         
    500          
    501           <p>
    502         Bullet points above to be expanded upon!
    503           </p>
    504          
    505622        </div>
    506623      </div>
     
    513630        </xsl:text>
    514631      </script>
    515      
     632-->
     633
     634<!--
    516635      <div id="LOD-turnstyle" style="margin-top: 12px;">
    517636        <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
     
    538657         
    539658          <h2>Eurovision LOD Errata</h2>
    540           <p>
    541         Songs titles and Entrants (artists) that do not resolve to URIs:
    542         <ul>
    543           <li>
    544             <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-songs.html">Problem Songs (titles are string literals not IRIs)</a>
    545           </li>
    546           <li>
    547             <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-entrants.html">Problem Entrants (artists are string literals not IRIs)</a>
    548           </li>
    549           <li>
    550             <a href="sites/{$site_name}/collect/{$collName}/prepare/problem-lod-lists/dbpedia-problem-category-in-year.html">Problem Category pages (some countries not listed in a given year despite competing)</a>
    551           </li>
    552         </ul>
    553           </p>
    554659        </div>
    555660      </div>
     
    562667        </xsl:text>
    563668      </script>
    564 
     669-->
     670
     671<!--
    565672      <div id="voting-turnstyle" style="margin-top: 12px;">
    566673        <div class="turnstyle-header" style="background-image: none; background-color: hsl(195, 47%, 35%);">
     
    590697      </script>
    591698
     699-->
    592700     
    593701    </div>
    594    
     702
    595703  </xsl:template>
    596704   
Note: See TracChangeset for help on using the changeset viewer.