Ignore:
Timestamp:
2021-04-22T14:50:00+12:00 (3 years ago)
Author:
davidb
Message:

General text update; auto-focus param added into ssv_execute()

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/model-sites-dev/eurovision-lod/collect/eurovision/transform/pages/about.xsl

    r35066 r35093  
    1818    <gsf:script src="sites/{$site_name}/collect/{$collName}/js/jquery.show-more.js"/>
    1919
     20   
    2021      <div id="about-desc">
    2122      <h2>Introduction</h2>
     
    2324      <p style="padding-bottom: 10px;">
    2425        The <a href="https://eurovision.tv">Eurovision Song
    25         Content</a> is a live-broadcast televised event that
     26        Contest</a> is a live-broadcast televised event that
    2627        was first held in 1956 featuring artists singing original songs from
    2728        7 countries.  Since then it has grown into an event involving
     
    4344        The contest has grown significantly from
    4445        that modest start with 7 countries (and one cameraman),
    45         with over 40 countries competing these days—even
    46         Australia takes part now, through a specially
     46        with over 40 countries competing these days—Australia
     47        even takes part now, through a specially
    4748        arranged invitation. It's an annual celebration of
    4849        European culture and the highlight of many people's
     
    507508
    508509        <p>
    509           Access to and the analysis of how countries have voted over the years
     510          To fulfill our vision of developing this DL collection
     511          as a rich resource through which people can explore the
     512          phenomenon we went looking for voting data that was
     513          available in a machine-readable format.
     514          We found data compiled through a manual curation process
     515          about how countries have voted going back to 1975 is available through the
     516          <a href="https://www.kaggle.com/datagraver/eurovision-song-contest-scores-19752019">Kaggle website as an Excel spreadsheet</a>.
     517        </p>
     518        <p>
     519          To incorporate this as metadata into the DL, we wrote
     520          some Python code to transform the data into the internal
     521          serialized metadata format used by Greenstone.  Prior to
     522          this project, the only serialized form for this was XML,
     523          which is processed by the MetadataXML plugin.  As it was
     524          more convenient to generate JSON from our Python code,
     525          we took the step of adding in a new plugin to
     526          Greenstone3: MetadataJSON.
     527        </p>
     528
     529        <h3>Page Scraping</h3>
     530
     531        <p>
     532              Despite our best intentions work soley with
     533              machine-readable data—primarily as you have seen in the
     534              form of Linked Open Data, but also utilizing a
     535              spreadsheet of voting data—to form the Eurovision DL,
     536          in looking to expand the metadata in the DL to cover
     537          details concerning the draw position of acts, and their
     538          overall placing, we have resorted to page-scraping
     539          content from Wikipedia itself.  This was because such
     540          information was not part of the entity extraction
     541          process that occurs when Wikipedia is mapped to DBpedia.
     542        </p>
     543
     544        <p>
     545          A review of Wikipedia article pages about the event in
     546          any given year showed these pages to be especially well
     547          curated, and included a table in each that listed the
     548          information we sought.  While there was some variation
     549          in how this table was expressed in HTML, with a
     550          considerably portion of the heavy lifting being done by
     551          the Python library BeautifulSoup4, it was not too
     552          complex a task to develop a program that extracted this
     553          information and turned it into the newly developed
     554          Greenstone JSON metadata format.
     555        </p>
    510556         
    511           To fulfill our vision of developing this DL collection as a rich resource to
    512           through which people can explore the phenomenon.
    513          
    514         </p>
     557        <h3>Patching in Missing Data</h3>
     558
    515559       
    516         <h3>Patching in Missing Data: Page Scraping</h3>
    517 
    518        
    519         <p>
    520           Despite our best intentions to work solely with ....
    521           .. missing categories ...
    522 
    523           totting up how many entrie per year ...
    524           thousands of entries
    525          
     560        <p>
     561          Another difficulty we have encountered is that
     562          not every country who had an entry in Eurovision
     563          in a given year has its own standalone article page.
     564          This leads to missing entries in the category
     565          page for the contest in a given year, which is
     566          problematic to us, because it is this category
     567          information that we draw upon in our SPARQL query
     568          to populate the DL with all the acts.
     569        </p>
     570        <p>
     571          The information about all the countries competing
     572          in a given year does, however, appear in the
     573          article page for the contest in that year.  In fact
     574          it's in the same table we targetted to extract out
     575          draw position and placement.  We therefore
     576          wrote a further page-scraping program to compare
     577          the countries in that table with the countries
     578          listed on the category page for the contest in
     579          that year.  For any entries we find in the
     580          table, but not in the Category page, we
     581          produce a metadata record for the DL
     582          with basic information about the entry:
     583          country, year, song title, artist,
     584          draw-position, placement, and (where available)
     585          their total score.
     586        </p>
     587        <p>
     588          Comparable with the problem titles and artist/entrants,
     589          we have formulated a SPARQL query that enumerates
     590          these missing category entrants:
     591          <!--
    526592          We took the opportunity to add in further fields: Performing Position, Placement, Voting Total, thumbnail flag image.
     593
     594
     595          An unintended side-affect of this is that we have also been able to expand
     596          -->
     597
    527598         
    528599          <ul>
Note: See TracChangeset for help on using the changeset viewer.